Re: linux-next: Tree for Nov 27 (file notify)

2012-11-27 Thread Eric Paris
On Tue, 2012-11-27 at 08:43 -0800, Randy Dunlap wrote:
> On 11/26/2012 10:25 PM, Stephen Rothwell wrote:
> 
> > Hi all,
> > 
> > Changes since 20121126:
> > 
> 
> 
> when CONFIG_PROC_FS is not enabled (also seen in mmotm):
> 
> fs/notify/fanotify/fanotify_user.c:436:17: error: 'fanotify_show_fdinfo' 
> undeclared here (not in a function)
> fs/notify/inotify/inotify_user.c:333:17: error: 'inotify_show_fdinfo' 
> undeclared here (not in a function

It is something coming in from Andrew's tree.  I believe they already
have a patch.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kmsg: Honor dmesg_restrict sysctl on /dev/kmsg

2013-02-27 Thread Eric Paris
Fine Fine, I'll get off my lazy butt and look at this.

On Wed, 2013-02-27 at 10:14 -0800, Kees Cook wrote:
> On Wed, Feb 27, 2013 at 10:01 AM, Josh Boyer  wrote:
> > On Wed, Feb 27, 2013 at 09:54:27AM -0800, Kees Cook wrote:
> >> On Fri, Feb 22, 2013 at 01:18:57PM -0500, Josh Boyer wrote:
> >> > Originally, the addition of dmesg_restrict covered both the syslog
> >> > method of accessing dmesg, as well as /dev/kmsg itself.  This was done
> >> > indirectly by security_syslog calling cap_syslog before doing any LSM
> >> > checks.
> >> >
> >> > However, commit 12b3052c3ee (capabilities/syslog: open code cap_syslog
> >> > logic to fix build failure) moved the code around and pushed the checks
> >> > into the caller itself.  That seems to have inadvertently dropped the
> >> > checks for dmesg_restrict on /dev/kmsg.

Not sure this is correct.  There was no devkmsg_open() in commit
12b3052c3ee.  That was added in e11fea92e.  Looks like before that
commit the devkmsg code was even worse.  It didn't call security_syslog
or capable().  Uggh.

>   Most people haven't noticed
> >> > because util-linux dmesg(1) defaults to using the syslog method for
> >> > access in older versions.  With util-linux 2.22 and a kernel newer than
> >> > 3.5, dmesg(1) defaults to reading directly from /dev/kmsg.
> >> >
> >> > Fix this by making an explicit check in the devkmsg_open function.
> >> >
> >> > This fixes https://bugzilla.redhat.com/show_bug.cgi?id=903192
> >> >
> >> > Reported-by: Christian Kujau 
> >> > CC: sta...@vger.kernel.org
> >> > Signed-off-by: Josh Boyer 
> >> > ---
> >> >  kernel/printk.c | 3 +++
> >> >  1 file changed, 3 insertions(+)
> >> >
> >> > diff --git a/kernel/printk.c b/kernel/printk.c
> >> > index f24633a..398ef9a 100644
> >> > --- a/kernel/printk.c
> >> > +++ b/kernel/printk.c
> >> > @@ -615,6 +615,9 @@ static int devkmsg_open(struct inode *inode, struct 
> >> > file *file)
> >> > struct devkmsg_user *user;
> >> > int err;
> >> >
> >> > +   if (dmesg_restrict && !capable(CAP_SYSLOG))
> >> > +   return -EACCES;
> >> > +
> >>
> >> I think this should use check_syslog_permissions() instead, as done for
> >> /proc/kmsg and the syslog syscall.
> >>
> >>   err = check_syslog_permissions(SYSLOG_ACTION_OPTION, 
> >> SYSLOG_FROM_FILE);
> >
> > Did you mean SYSLOG_ACTION_OPEN?
> 
> Oops, yes, typo.
> 
> > I didn't code it that way because the comment in that function about the
> > capability checks already being done seem pretty off to me.  I could
> > have just misread the /proc code though.  I can resend with the change
> > you suggest if everyone thinks that's a better way.
> 
> Yeah, the comment is meaningful if you examine how /proc/kmsg works,
> which was basically as a wrapper to the syslog syscall. The issue was
> that we had to catch (and potentially block) the "open" action on
> /proc/kmsg vs blocking the the syslog read action. In this case, we've
> got another file-based interface, so we should use OPEN and FROM_FILE
> to block the open. (Though it could be argued that we only want to
> block the open if it's reading, which is exactly what Kay was trying
> to do originally it looks like.)

Right.  Now we have /proc/kmsg, /dev/kmsg, and the syscall.  /proc/kmsg
and the syscall both use do_syslog() which calls
check_syslog_permissions() and security_syslog().  /dev/kmsg only calls
security_syslog(), which we all agree needs fixed.

> > Also, the LSM hooks aren't doing any capability checks at all that I can
> > see, which may or may not be a bug in and of itself but I have no idea.
> > I was hoping Eric would speak up about that.

I wouldn't call it a bug.  But it sure is a pretty shitty design pattern
to have security_* sometimes the right thing to do and sometimes
capable() is the right thing to do.  It is pervasive in the kernel that
you have either/or, but I can't think of anywhere that functions are
expected to do BOTH.  So yeah, that needs fixed.

> Eric explicitly removed the cap check since it was cluttering things
> the way it was originally written. I do think security_syslog() should
> pass through check_syslog_permissions(), though. Then this wouldn't
> have happened. That might actually be the right way to clean this up,
> but I'd like to see Eric's thoughts first.

How about something like this?

diff --git a/kernel/printk.c b/kernel/printk.c
index 7c69b3e..ced2cac 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -626,7 +626,7 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
if ((file->f_flags & O_ACCMODE) == O_WRONLY)
return 0;
 
-   err = security_syslog(SYSLOG_ACTION_READ_ALL);
+   err = check_syslog_permissions(SYSLOG_ACTION_OPEN, SYSLOG_FROM_FILE);
if (err)
return err;
 
@@ -840,22 +840,23 @@ static int check_syslog_permissions(int type, bool 
from_file)
 * already done the capabilities checks at open time.
 */
if (from_file && type != SYSLOG_ACTION_OPEN)
-   

Re: IMA: How to manage user space signing policy with others

2013-02-28 Thread Eric Paris
On Thu, Feb 28, 2013 at 4:35 PM, Vivek Goyal  wrote:
> On Thu, Feb 28, 2013 at 02:23:39PM -0500, Mimi Zohar wrote:

I think just a second for both of you to step back and see a slightly
larger picture/problem might help.

This is a weird case where Vivek does not trust root to make the
policy decision.  If the box is configured for secure boot, it needs
to make these decisions no matter what the admin wants.  This is why
he talks about trying to merge multiple competing policies.  The
current IMA policy is controlled by whomever can first write to the
ima policy file interface.  Vivek does not want an admin to be able to
overwrite the secureboot policy.  So I get why he thinks changes may
be needed to support this use case.

The ima_tcb policy was meant to be larger than needed to determine a
trusted computing base, but it is clearly not a superset of what he is
hoping to accomplish.

So how do we take a system where the admin/software has some control
over the integrity policy (as it is today?) and the kernel/system
itself also has control (as Vivek wants it)?

It seems unsolved with what we have today

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 1/3] seccomp: add generic code for jitted seccomp filters.

2013-03-15 Thread Eric Paris
On Fri, 2013-03-15 at 11:45 -0700, Kees Cook wrote:
> On Fri, Mar 15, 2013 at 11:28 AM, Nicolas Schichan  
> wrote:

> > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> > index 6f19cfd..af27494 100644
> > --- a/include/linux/seccomp.h
> > +++ b/include/linux/seccomp.h
> > @@ -6,6 +6,7 @@
> >  #ifdef CONFIG_SECCOMP
> >
> >  #include 
> > +#include 
> >  #include 
> >
> >  struct seccomp_filter;
> > @@ -47,6 +48,44 @@ static inline int seccomp_mode(struct seccomp *s)
> > return s->mode;
> >  }
> >
> > +/**
> > + * struct seccomp_filter - container for seccomp BPF programs
> > + *
> > + * @usage: reference count to manage the object lifetime.
> > + * get/put helpers should be used when accessing an instance
> > + * outside of a lifetime-guarded section.  In general, this
> > + * is only needed for handling filters shared across tasks.
> > + * @prev: points to a previously installed, or inherited, filter
> > + * @len: the number of instructions in the program
> > + * @insns: the BPF program instructions to evaluate
> 
> This should be updated to include the new bpf_func field.
> 
> Regardless, it'd be better to not expose this structure to userspace.

This is fine

include/uapi/linux/seccomp.h is exposed to userspace
include/linux/seccomp.h is  kernel internal

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] security: unconditionally call Yama

2012-08-31 Thread Eric Paris
On Fri, Aug 31, 2012 at 2:39 PM, Alan Cox  wrote:
> On Fri, 31 Aug 2012 14:31:26 -0700
> Kees Cook  wrote:
>
>> Unconditionally call Yama, no matter what LSM module is selected.

> Not a good plan. What happens when everyone decides to stack every module
> by ifdeffing in the kernel - mayhem. I'm all for it being possible but
> done right - espeicall as I believe a proper proposal now for stacking
> modules, and it should be done as part of that.

I think a lot of us agree it's a 'difficult' plan going forward.  Kees
wrote this patch after we (James, SELinux, AppArmor people) talked at
the Security Summit earlier today.  From my PoV we have a couple of
'classes' of LSMs.

Big with Labels: SELinux and SMACK
Big without Labels: AppArmor and TOMOYO
Small and stateless: Capabilities and Yama (and maybe integrity)

Those small and stateless LSMs can easily stack.  We don't have object
lifetime issues.  We don't have difficult technical details.  We all
here seem to want to stack 'small and stateless' with 'big boy'.
Outside one little capabilities and selinux setxattr issue I can't
think of anything even quirky.  So the fast path forward, in my mind,
is to start here with Yama.  Our choice *today* is adding these static
hooks inside the 'big boy' LSMs (which I think all of us are willing
to do) vs adding it one time in the LSM and not having to worry about
it all over the place.  Getting the big guys and the little guys to
play together is not going to lead to a mess of crazy.

Stacking the big boys is a future goal most of us are happy with
seeing done, if it turns out to be reasonable.  We know it's close to
possible.  Dave Howell's gave us an implementation about 2 years ago.
Casey Schaufler demo'd another stacking interface today.  No code from
the latter, but the only limitation he really mentioned today was that
two big guys with labels can't stack together.  I don't know how/if
Dave's implementation wanted to handle that case.

I really think this patch is good.

Acked-by: Eric Paris 

I think I even want to do the same thing with capabilities so I don't
have to make sure I'm getting those right in SELinux.  And everyone
else probably doesn't want to keep those hooks themselves either.  I
should send that patch to Linus.  I bet I can give him a large
negative diff.  He did just say two days ago that he'll take any -1000
line diff after -rc6   :)

I think Casey and Dave need to both get their larger stacking
solutions posted and we should go with the best one.  It'll let us
pull out the static code, but for now, I like the static coding
between the big boys and the little guys.  Lets fix today what's easy
and fix tomorrow what we can't fix today.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] security: unconditionally call Yama

2012-08-31 Thread Eric Paris
On Fri, Aug 31, 2012 at 4:59 PM, Eric W. Biederman
 wrote:

> From a overal kernel maintenance and use perspective the unconditional
> enablement is a pain.
>
> We long ago established the principle that compiling additional code
> into the kernel should not change the semenatics of the kernel.
>
> So this code needs to come with a command line or sysctl on/off switch
> not an unconditional enable.

Your argument makes zero sense.  If I decide to build new code, that
new code can do something.  It happens all the time.  If you don't
like Yama, don't build Yama.  If you don't like the only thing that
Yama does (it only implements one protection), disable that protection
from sysctl.  I don't get it.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Second attempt at kernel secure boot support

2012-11-01 Thread Eric Paris
On Thu, Nov 1, 2012 at 5:59 AM, James Bottomley
 wrote:

> But that doesn't really help me: untrusted root is an oxymoron.

Imagine you run windows and you've never heard of Linux.  You like
that only windows kernels can boot on your box and not those mean
nasty hacked up malware kernels.  Now some attacker manages to take
over your box because you clicked on that executable for young models
in skimpy bathing suits.  That executable rewrote your bootloader to
launch a very small carefully crafted Linux environment.  This
environment does nothing but launch a perfectly valid signed Linux
kernel, which gets a Windows environment all ready to launch after
resume and goes to sleep.  Now you have to hit the power button twice
every time you turn on your computer, weird, but Windows comes up, and
secureboot is still on, so you must be safe!

In this case we have a completely 'untrusted' root inside Linux.  From
the user PoV root and Linux are both malware.  Notice the EXACT same
attack would work launching rootkit'd Linux from Linux.  So don't
pretend not to care about Windows.  It's just that launching malware
Linux seems like a reason to get your key revoked.  We don't want
signed code which can be used as an attack vector on ourselves or on
others.

That make sense?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Second attempt at kernel secure boot support

2012-11-01 Thread Eric Paris
On Thu, Nov 1, 2012 at 10:42 AM, James Bottomley
 wrote:
> On Thu, 2012-11-01 at 10:29 -0400, Eric Paris wrote:
>> On Thu, Nov 1, 2012 at 5:59 AM, James Bottomley
>>  wrote:
>>
>> > But that doesn't really help me: untrusted root is an oxymoron.
>>
>> Imagine you run windows and you've never heard of Linux.  You like
>> that only windows kernels can boot on your box and not those mean
>> nasty hacked up malware kernels.  Now some attacker manages to take
>> over your box because you clicked on that executable for young models
>> in skimpy bathing suits.  That executable rewrote your bootloader to
>> launch a very small carefully crafted Linux environment.  This
>> environment does nothing but launch a perfectly valid signed Linux
>> kernel, which gets a Windows environment all ready to launch after
>> resume and goes to sleep.  Now you have to hit the power button twice
>> every time you turn on your computer, weird, but Windows comes up, and
>> secureboot is still on, so you must be safe!
>
> So you're going back to the root exploit problem?  I thought that was
> debunked a few emails ago in the thread?
>
> Your attack vector isn't plausible because for the suspend attack to
> work, the box actually has to be running Linux by default ... I think
> the admin of that box might notice if it suddenly started running
> windows ...

Maybe you misread.  The owner of the box would never know a shim Linux
was loaded.  In any case, as I said, windows really is irrelevant.
It's just using Linux as an attack vectore against Windows is what
would get keys revoked.  If your key is revoke Linux can't boot on a
large amount of new hardware without BIOS twiddling.u

>> In this case we have a completely 'untrusted' root inside Linux.  From
>> the user PoV root and Linux are both malware.  Notice the EXACT same
>> attack would work launching rootkit'd Linux from Linux.  So don't
>> pretend not to care about Windows.  It's just that launching malware
>> Linux seems like a reason to get your key revoked.  We don't want
>> signed code which can be used as an attack vector on ourselves or on
>> others.
>>
>> That make sense?
>
> Not really, no.  A windows attack vector is a pointless abstraction
> because we're talking about securing Linux and your vector requires a
> Linux attack for the windows compromise ... let's try to keep on point
> to how we're using this feature to secure Linux.

I pointed out that the exact same attack exists with Linux on Linux.
To launch a malware linux kernel all you have to do is launch a shim
signed acceptable linux environment, have it set up the malware kernel
to launch after resume, and go to sleep.  Agreed, it'd be very weird
that the first time you hit the power button your machine comes on and
then quickly goes right back to sleep, but certainly we can envision
that being ignored by many desktop users...

Do you see how 'root' in the first environment is untrusted?  Now you
can pretend not to care because the 'original' root was trusted.  But
people install bad crap all the time.  There are hundreds of ways to
install bad software as root.  Go to any site distributing rpms or
debs to get that new version of mod_perl and it could install the
malware kernel and shim environment.

The point of secureboot is even if the admin did something which
allowed his kernel to be compromised, it won't persist.  Sure,
secureboot moves the attack up the stack to userspace, but at least we
can do something about the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Second attempt at kernel secure boot support

2012-11-01 Thread Eric Paris
On Thu, Nov 1, 2012 at 10:46 AM, Alan Cox  wrote:
>> Imagine you run windows and you've never heard of Linux.
>
> To those people I think you mean "never heard of Ubuntu" ;-)

:-)

> With all the current posted RH patches I can still take over
> the box as root trivially enough and you seem to have so far abolished
> suspend to disk, kexec and a pile of other useful stuff. To actually lock
> it down you'll have to do a ton more of this.

I'm guessing those writing the patches would like to hear about these.
 Suspend to disk and kexec can probably both be fixed up to work...

> Actually from what I've seen on
> the security front there seems to a distinct view that secure boot is
> irrelevant because Windows 8 is so suspend/resume focussed that you might
> as well just trojan the box until the next reboot as its likely to be a
> couple of weeks a way.

Bit of a straw man isn't it?  Hey, don't fix A, I can do B!  I'm not
saying you're wrong, nor that maybe online attacks which don't persist
across reboot wouldn't be more likely, but they aren't attacking the
same problem.  (I haven't heard any progress on what you point out,
but at least we have some progress on some small class of boot time
persistent attacks)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Second attempt at kernel secure boot support

2012-11-01 Thread Eric Paris
On Thu, Nov 1, 2012 at 11:06 AM, James Bottomley
 wrote:

> But surely that's fanciful ... you've already compromised windows to get
> access to the ESP.  If you've done it once, you can do it again until
> the exploit is patched.

You work under the assumption that any bad operation was done by means
of a compromised kernel.  Admins install things all the time,
sometimes, things which they shouldn't.  (This statement is OS
agnostic)

> There are likely many easier ways of ensuring
> persistence than trying to install a full linux kernel with a
> compromised resume system.

I'm sure lots of us would love to hear the ideas.  And the attack is
on the suspend side, nothing about resume has to be malicious (not
really relevant I guess)...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Second attempt at kernel secure boot support

2012-11-01 Thread Eric Paris
On Thu, Nov 1, 2012 at 11:18 AM, James Bottomley
 wrote:

> You're completely confusing two separate goals:
>
>  1. Is it possible to use secure boot to implement a security policy
> on Linux
>  2. What do we have to do anything to prevent Linux being used to
> attack windows which may lead to a revocation of the distro
> signing key.
>
> "untrusted root" is a silly answer to 1 because it's incredibly
> difficult to remove sufficient trust from root and still have it be
> trusted enough to be effective as root.  The trust bound up in root is
> incredibly intertwined.  It would be far better to start by eliminating
> the root user altogether and building up on the capabilities in a
> granular fashion for this type of lockdown.

granular lockdown!?  sounds like SELinux.  But that certainly can't
solve the problems here...

I think your premise has a couple of problems.  First is how you chose
to word #2.  Lets reword it as:

What do we have to do to prevent Linux being used to attack Linux
which may lead to secure boot being useless.

If we accept that as #2, then we think, "What makes secure boot
useless"  or "What is the security goal as envisioned by secure boot."
 The goal of secure boot is to implement an operating system which
prevents uid==0 to ring 0 escalation.  That is the security policy
secure boot needs to not be completely useless.  And as it turns out,
that security policy is useful in other situations.

> "untrusted root", if it can even be achieved, might be a sufficient
> condition for 2 but it's way overkill for a necessary one.

But it is a condition, as specifically stated, that others have wanted
long before secure boot even came to rise.  I've talked with and
worked with a public cloud operator who wants to prevent even a
malicious root user from being able to run code in ring 0 inside their
VM.  The hope in that case was that in doing so they can indirectly
shrink the attack surface between virtual machine and hypervisor.
They hoped to limit the ways the guest could interact to only those
methods the linux kernel implemented.

They want to launch a vm running a kernel they chose and make sure
root inside the vm could not run some other kernel or run arbitrary
code in kernel space.  It's wasn't something they solved completely.
If it was, all of this secure boot work would be finished.  Which is
why we are having these discussions to understand all of the way that
you an Alan seem to have to get around the secure boot restrictions.
And look for solutions to retain functionality which meeting the
security goal of 'prevent uid=0 to ring 0 privilege escalation.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Second attempt at kernel secure boot support

2012-11-02 Thread Eric Paris
I know I started it, but Windows really isn't necessary to see value,
even if it is what pushed the timing.

A user installs a package as root.  Absent any flaws in the Linux
kernel (cough) that should be all it can do in a Secure Boot world.
But if you can drop a small trusted Linux system in there and use that
to boot a compromised Linux kernel, it can make itself persistent.

And like I said, I know there are cloud providers out there who want
EXACTLY this type of system.  One in which root in the guest is
untrusted and they want to keep them out of ring 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Second attempt at kernel secure boot support

2012-11-03 Thread Eric Paris
On Sat, Nov 3, 2012 at 12:31 PM, Alan Cox  wrote:
>> You're guaranteed to be able
>> to do this on any Windows 8 certified hardware.
>
> Thats not my understanding of the situation.

Windows 8 certification has this as a requirement for x86 hardware.  I
belied the opposite is a requirement for arm hardware.  However it's
possible that it just doesn't specifiy at all for arm.

So yes, you're guaranteed to be able to do this on any Windows 8
certified x86 hardware.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option

2013-09-18 Thread Eric Paris
On Wed, 2013-09-18 at 15:06 -0400, Richard Guy Briggs wrote:
> reaahead-collector abuses the audit logging facility to discover which files
> are accessed at boot time to make a pre-load list
> 
> Add a tuning option to audit_backlog_wait_time so that if auditd can't keep 
> up,
> or gets blocked, the callers won't be blocked.
> 
> Signed-off-by: Richard Guy Briggs 
> ---
>  include/uapi/linux/audit.h |2 ++
>  kernel/audit.c |   22 +-
>  2 files changed, 23 insertions(+), 1 deletions(-)
> 
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 75cef3f..493a66e 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -316,6 +316,7 @@ enum {
>  #define AUDIT_STATUS_PID 0x0004
>  #define AUDIT_STATUS_RATE_LIMIT  0x0008
>  #define AUDIT_STATUS_BACKLOG_LIMIT   0x0010
> +#define AUDIT_STATUS_BACKLOG_WAIT_TIME   0x0020
>   /* Failure-to-log actions */
>  #define AUDIT_FAIL_SILENT0
>  #define AUDIT_FAIL_PRINTK1
> @@ -367,6 +368,7 @@ struct audit_status {
>   __u32   backlog_limit;  /* waiting messages limit */
>   __u32   lost;   /* messages lost */
>   __u32   backlog;/* messages waiting in queue */
> + __u32   backlog_wait_time;/* message queue wait timeout */
>  };
>  
>  struct audit_tty_status {
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 3d17670..fc535b6 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -321,6 +321,12 @@ static int audit_set_backlog_limit(int limit)
>   return audit_do_config_change("audit_backlog_limit", 
> &audit_backlog_limit, limit);
>  }
>  
> +static int audit_set_backlog_wait_time(int timeout)
> +{
> + return audit_do_config_change("audit_backlog_wait_time",
> +   &audit_backlog_wait_time, timeout);
> +}
> +
>  static int audit_set_enabled(int state)
>  {
>   int rc;
> @@ -669,6 +675,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
> nlmsghdr *nlh)
>   s.backlog_limit = audit_backlog_limit;
>   s.lost   = atomic_read(&audit_lost);
>   s.backlog= skb_queue_len(&audit_skb_queue);
> + s.backlog_wait_time = audit_backlog_wait_time;
>   audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
>&s, sizeof(s));
>   break;
> @@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
> nlmsghdr *nlh)
>   if (err < 0)
>   return err;
>   }
> - if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
> + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) {
>   err = audit_set_backlog_limit(s.backlog_limit);
> + if (err < 0)
> + return err;
> + }
> + if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) {
> + if (sizeof(s) > (size_t)nlh->nlmsg_len)
> + break;

What gets returned here?  I think err has a value of 0, but it doesn't
seem to have been clearly intentional.  If they know about the
AUDIT_STATUS_BACKLOG_WAIT_TIME flag, but they didn't send a long enough
skb?  That seems like an error condition

> + if (s.backlog_wait_time < 0 ||
> + s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME)
> + return -EINVAL;
> + err = audit_set_backlog_wait_time(s.backlog_wait_time);
> + if (err < 0)
> + return err;
> + }
>   break;
>   }
>   case AUDIT_USER:


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option

2013-09-18 Thread Eric Paris
On Wed, 2013-09-18 at 16:49 -0400, Richard Guy Briggs wrote:
> On Wed, Sep 18, 2013 at 04:33:25PM -0400, Eric Paris wrote:
> > On Wed, 2013-09-18 at 15:06 -0400, Richard Guy Briggs wrote:
> > > reaahead-collector abuses the audit logging facility to discover which 
> > > files
> > > are accessed at boot time to make a pre-load list
> > > 
> > > Add a tuning option to audit_backlog_wait_time so that if auditd can't 
> > > keep up,
> > > or gets blocked, the callers won't be blocked.
> 
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 3d17670..fc535b6 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, 
> > > struct nlmsghdr *nlh)
> > >   if (err < 0)
> > >   return err;
> > >   }
> > > - if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT)
> > > + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) {
> > >   err = audit_set_backlog_limit(s.backlog_limit);
> > > + if (err < 0)
> > > + return err;
> > > + }
> > > + if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) {
> > > + if (sizeof(s) > (size_t)nlh->nlmsg_len)
> > > + break;
> > 
> > What gets returned here?  I think err has a value of 0, but it doesn't
> > seem to have been clearly intentional.  If they know about the
> > AUDIT_STATUS_BACKLOG_WAIT_TIME flag, but they didn't send a long enough
> > skb?  That seems like an error condition
> 
> The intent was that it is a NOP, since err would have a value of zero,
> but I see your point, that if that flag is present, the struct member
> should be too.  My original intent was that if the structure member
> wasn't present, it would default to zero, unintentionally setting the
> wait time to zero.  It was part of my paranoia in the absence of an API
> version indicator.  No harm done, but I agree it should return an error.
> 
> Thanks for the catch.
> 
> > > + if (s.backlog_wait_time < 0 ||
> > > + s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME)
> > > + return -EINVAL;
> 
> I assume values less than zero or larger than 10 times the current
> default of one minute are errors or unreasonable.
> 
> Any argument for more than 10 minutes?

10 Minutes already seems nuts.  If someone has a reason to go higher, we
can change this sanity check then.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API

2013-09-20 Thread Eric Paris
On Thu, 2013-09-19 at 17:18 -0400, Steve Grubb wrote:
> On Wednesday, September 18, 2013 03:06:52 PM Richard Guy Briggs wrote:
> > Re-named confusing local variable names (status_set and status_get didn't
> > agree with their command type name) and reduced their scope.
> > 
> > Future-proof API changes by not depending on the exact size of the
> > audit_status struct.
> 
> I wished things like this were coordinated before being written. We had some 
> discussion of this back in July under a topic, "audit: implement generic 
> feature setting and retrieving". Maybe that API can be fixed so its not just 
> set/unset but can take a number as well.

Your right, we did talk about it.  Though it seems we talked past each
other.  What was implemented was an on off extensible interface.  The
status interface already fits for setting numbers.  And because of how
it is used has been extended is is extensible for setting numbers.

> 
> Also, because there is no way to query the kernel to see what kind of things 
> it supports

I'll agree.  Richard, can you please add a version field to the status?
Start at version 1.  Any time we add a new audit feature we'll bump it.

> , we've typically defined a new message type and put into it exactly 
> what we need. In other words, if you want something expandable, the define a 
> new message type like AUDIT_GET_EXT and AUDIT_SET_EXT and build it to be 
> expandable.

There is no reason we can't use what we have.  (As we've done it before)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/1] audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_context

2013-09-20 Thread Eric Paris
On Sun, 2013-09-15 at 19:11 +0200, Oleg Nesterov wrote:
> If audit_filter_task() nacks the new thread it makes sense
> to clear TIF_SYSCALL_AUDIT which can be copied from parent
> by dup_task_struct().
> 
> A wrong TIF_SYSCALL_AUDIT is not really bad but it triggers
> the "slow" audit paths in entry.S to ensure the task can not
> miss audit_syscall_*() calls, this is pointless if the task
> has no ->audit_context.
> 
> Signed-off-by: Oleg Nesterov 
> Acked-by: Steve Grubb 

Acked-by: Eric Paris 

Richard, please pick this up into your tree.

> ---
>  kernel/auditsc.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 9845cb3..95293ab 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -943,8 +943,10 @@ int audit_alloc(struct task_struct *tsk)
>   return 0; /* Return if not auditing. */
>  
>   state = audit_filter_task(tsk, &key);
> - if (state == AUDIT_DISABLED)
> + if (state == AUDIT_DISABLED) {
> + clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>   return 0;
> + }
>  
>   if (!(context = audit_alloc_context(state))) {
>   kfree(key);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: remove newline accidentally added during session id helper refactor

2013-09-20 Thread Eric Paris
On Wed, 2013-09-18 at 11:17 -0400, Richard Guy Briggs wrote:
> A newline was accidentally added during session ID helper refactorization in
> commit 4d3fb709.  This needlessly uses up buffer space, messes up syslog
> formatting and makes userspace processing less efficient.  Remove it.
> 
> Signed-off-by: Richard Guy Briggs 

Acked-by: Eric Paris 

> ---
>  kernel/audit.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 3d17670..ac16540 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1413,7 +1413,7 @@ void audit_log_session_info(struct audit_buffer *ab)
>   u32 sessionid = audit_get_sessionid(current);
>   uid_t auid = from_kuid(&init_user_ns, audit_get_loginuid(current));
>  
> - audit_log_format(ab, " auid=%u ses=%u\n", auid, sessionid);
> + audit_log_format(ab, " auid=%u ses=%u", auid, sessionid);
>  }
>  
>  void audit_log_key(struct audit_buffer *ab, char *key)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] ipc: shm and msg fixes

2013-09-20 Thread Eric Paris
On Thu, 2013-09-19 at 14:22 -0700, Davidlohr Bueso wrote:
> On Sun, 2013-09-15 at 20:04 -0700, Davidlohr Bueso wrote:
> > This patchset deals with the selinux and rmid races Manfred found on
> > the ipc scaling work that has been going on. It specifically addresses
> > shared mem and msg queues. While semaphores still need updated, I want
> > to make sure these are correct first. Also, Manfred had already sent out
> > a patchset that deals with a race in sem complex operations. So any changes
> > should be on top of his.
> > 
> > Patches 1 and 2 deal with shared memory.
> > Patches 3 and 4 deal with msg queues.
> > Specific details about each race and its fix are in the corresponding
> > patches.
> > 
> > Note that Linus suggested a good alternative to patches 1 and 3: use
> > kfree_rcu() and delay the freeing of the security structure. I would
> > much prefer that approach to doing security checks with the lock held,
> > but I want to leave the patches out and ready in case we go with the
> > later solution.
> 
> Cc'ing selinux/security people - folks, could you please confirm that
> this change is ok and won't break anything related to security?

I agree with the approach to delay the freeing and it does not appear to
be a problem from an SELinux PoV, but I don't necessarily agree with the
location of the delay.  Why should every LSM be required to know the
details of that object lifetime?  It seems to me like we might want to
move this delay up to shm_destroy.  I know that's going to be more code
then using kfree_rcu(), but it seems like a much better abstraction to
hide that knowledge away from the LSM.

Possibly place the rcu_head in struct kern_ipc_perm?  Then use
call_rcu() to call the security destroy hook?  You'll have to re-write
to LSM hook to take an rcu_head, etc, but that shouldn't be a big
problem.

Doing it this way, also means you won't break the security model of
SMACK.  It looks like you'd still have the exact same race with SMACK,
except they can't be fixed with kfree_rcu().  They are just setting a
pointer to NULL.  Which could then be dereferenced later.  They don't
actually do allocation/free.

So ACK on the RCU delay, but NACK to making the hooks do the delay,
instead do it in the IPC code.

> 
> Thanks!
> 
> 
> 8<---
> 
> From: Davidlohr Bueso 
> Date: Thu, 19 Sep 2013 09:32:05 -0700
> Subject: [PATCH] selinux: rely on rcu to free ipc isec
> 
> Currently, IPC mechanisms do security and auditing related checks under
> RCU. However, since selinux can free the security structure, through
> selinux_[sem,msg_queue,shm]_free_security(), we can race if the structure
> is freed before other tasks are done with it, creating a use-after-free
> condition. Manfred illustrates this nicely, for example with shared mem:
> 
> --> do_shmat calls rcu_read_lock()
> --> do_shmat calls shm_object_check().
>  Checks that the object is still valid - but doesn't acquire any locks.
>  Then it returns.
> --> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
> --> selinux_shm_shmat calls ipc_has_perm()
> --> ipc_has_perm accesses ipc_perms->security
> 
> shm_close()
> --> shm_close acquires rw_mutex & shm_lock
> --> shm_close calls shm_destroy
> --> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
> --> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
> --> ipc_free_security calls kfree(ipc_perms->security)
> 
> There are two solutions to this: (i) perform the security checking and 
> whatever
> IPC operation(s) atomically (that is, with the kern_ipc_perm.lock held), or
> (ii) delay the freeing of the isec structure after all RCU readers are done.
> This patch takes the second approach, which, at least from a performance 
> perspective,
> is more practical than the first as it keeps the IPC critical regions smaller.
> 
> I have tested this patch with IPC testcases from LTP on both my quad-core
> laptop and on a 64 core NUMA server. In both cases selinux is enabled, and
> tests pass for both voluntary and forced preemption models.
> 
> Signed-off-by: Davidlohr Bueso 
> ---
>  security/selinux/hooks.c  | 9 -
>  security/selinux/include/objsec.h | 5 +++--
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index a5091ec..179ffe9 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4892,8 +4892,15 @@ static int ipc_alloc_security(struct task_struct *task,
>  static void ipc_free_security(struct kern_ipc_perm *perm)
>  {
>   struct ipc_security_struct *isec = perm->security;
> +
> + /*
> +  * All IPC mechanisms do security and audit
> +  * checking under RCU: wait a grace period before
> +  * freeing isec, otherwise we can run into a
> +  * use-after-free scenario.
> +  */
> + kfree_rcu(isec, rcu);
>   perm->security = NULL;
> - kfree(isec);
>  }
>  
>  static int msg_

Re: Oddness in security/Makefile

2013-09-10 Thread Eric Paris
>From 4675ca3470e3c2e325c5be6d9a11f47ac0917537 Mon Sep 17 00:00:00 2001
From: Eric Paris 
Date: Tue, 10 Sep 2013 09:51:50 -0400
Subject: [PATCH] security: remove erroneous comment about capabilities.o link
 ordering

Back when we had half ass LSM stacking we had to link capabilities.o
after bigger LSMs so that on initialization the bigger LSM would
register first and the capabilities module would be the one stacked as
the 'seconday'.  Somewhere around 6f0f0fd496333777d53 (back in 2008) we
finally removed the last of the kinda module stacking code but this
comment in the makefile still lives today.

Reported-by: Valdis Kletnieks 
Signed-off-by: Eric Paris 
---
 security/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/security/Makefile b/security/Makefile
index c26c81e..a5918e0 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -16,7 +16,6 @@ obj-$(CONFIG_MMU) += min_addr.o
 # Object file lists
 obj-$(CONFIG_SECURITY) += security.o capability.o
 obj-$(CONFIG_SECURITYFS)   += inode.o
-# Must precede capability.o in order to stack properly.
 obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o
 obj-$(CONFIG_SECURITY_SMACK)   += smack/built-in.o
 obj-$(CONFIG_AUDIT)+= lsm_audit.o

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] audit: avoid soft lockup in audit_log_start()

2013-09-10 Thread Eric Paris
On Mon, 2013-09-09 at 18:32 +0400, Konstantin Khlebnikov wrote:
> Luiz Capitulino wrote:
> > I'm getting the following soft lockup:
> >
> > CPU: 6 PID: 2278 Comm: killall5 Tainted: GF3.11.0-rc7+ #1
> > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >   0099 88011fd83de8 815324df 2800
> >   817d48f9 88011fd83e68 8152e669 88011fd83e68
> >   0008 88011fd83e78 88011fd83e18 004081dac040
> > Call Trace:
> >  [] dump_stack+0x46/0x58
> >   [] panic+0xbb/0x1c4
> >   [] watchdog_timer_fn+0x163/0x170
> >   [] __run_hrtimer+0x81/0x1c0
> >   [] ? watchdog+0x30/0x30
> >   [] hrtimer_interrupt+0x107/0x240
> >   [] local_apic_timer_interrupt+0x3b/0x60
> >   [] smp_apic_timer_interrupt+0x45/0x60
> >   [] apic_timer_interrupt+0x6a/0x70
> >  [] ? audit_log_start+0xbf/0x430
> >   [] ? audit_log_start+0x147/0x430
> >   [] ? try_to_wake_up+0x2a0/0x2a0
> >   [] audit_log_exit+0x6ae/0xc30
> >   [] ? __alloc_fd+0x42/0x100
> >   [] __audit_syscall_exit+0x257/0x2b0
> >   [] sysret_audit+0x17/0x21
> >
> > The reproducer is somewhat unusual:
> >
> >   1. Install RHEL6.5 (maybe a similar older user-space will do)
> >   2. Boot the just installed system
> >   3. In this first boot you'll meet the firstboot script, which
> >  will do some setup and (depending on your answers) it will
> >  reboot the machine
> >   4. During that first reboot the system hangs while terminating
> >  all processes:
> >
> > Sending all processes the TERM signal...
> >
> >  It's when the soft lockup above happens. And yes, I managed
> >  to get this with latest upstream kernel (HEAD fa8218def1b1)
> >
> > I'm reproducing it on a VM, but the first report was on bare-metal.
> >
> > This is what is happening:
> >
> >   1. audit_log_start() is called
> >   2. As we have SKBs waiting in audit_skb_queue and all conditions
> >  evaluate to true, we sleep in wait_for_auditd()
> >   3. Go to 2, until sleep_time gets negative and audit_log_start()
> >  just busy-waits
> >
> > Now, *why* this is happening is a mistery to me. I tried debugging
> > it, but all I could find is that at some point the kauditd thread
> > never wakes up after having called schedule(). I even tried waking
> > it up before calling wait_for_auditd(), but it didn't.
> 
> We run into the same problem in rhel6 kernel.
> 
> "readahead-collector" uses audit interface and sometimes stuck in 'stopped' 
> state.
> 
> After commit 829199197a430dade2519d54f5545c4a094393b8
> (which was backported by RH into their kernel)
> audit emiters will block forever if userspace daemon cannot handle backlog.
> That commit just breaks timeout condition, after timeout waiting loop turns
> into busy loop until deamon dies or returns back to work.
> 
> this trivial patch should fix that problem
> 
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct 
> audit_context *ctx, gfp_t gfp_mask,
> 
>  sleep_time = timeout_start + audit_backlog_wait_time 
> -
>  jiffies;
> -   if ((long)sleep_time > 0)
> +   if ((long)sleep_time > 0) {
>  wait_for_auditd(sleep_time);
> -   continue;
> +   continue;
> +   }
>  }
>  if (audit_rate_check() && printk_ratelimit())
>  printk(KERN_WARNING

I think this is the right(ish) fix, at least it gets at the real bug.
829199197a430dade2519d54f5545c4a094393b8 definitely is the problem.  The
60 second wait is NOT causing the soft lockup.  Instead after the 60
second wait, sleep_time, will go negative on subsequent trips through
the loop.  Since sleep_time is negative we will hit the continue and
loop FOREVER with no opportunity for progress.  Aka, softlockup...

I think it's a really ?interesting? (poor) coding style to leave
sleep_time going negative on the second trip through the loop to
indicate a timeout in wait_for_auditd().  Maybe the function should be
rewritten a bit.  Have a return code from wait_for_auditd, if it waited
60 seconds, call the failure function, if it succeeded in less the 60
seconds continue along...  At least something that makes the logic here
obvious.

I think Richard is also going to set audit_backlog_wait_time to 0 on
failure and will reset it to default on success.  To address the other
issue as well.  That is just dumb.

RGB should be taking care of it in RHEL as soon as he gets this worked
out.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] security: allow Yama to be unconditionally stacked

2012-09-05 Thread Eric Paris
On Wed, Sep 5, 2012 at 11:47 AM, Serge Hallyn
 wrote:
> Quoting Kees Cook (keesc...@chromium.org):
>> Unconditionally call Yama when CONFIG_SECURITY_YAMA_STACKED is selected,
>> no matter what LSM module is primary.
>>
>> Ubuntu and Chrome OS already carry patches to do this, and Fedora
>> has voiced interest in doing this as well. Instead of having multiple
>> distributions (or LSM authors) carrying these patches, just allow Yama
>> to be called unconditionally when selected by the new CONFIG.
>
> I don't really like having both the STACKED and non-stacked paths. But
> I don't have a good alternative.
>
>> Signed-off-by: Kees Cook 
>
> Acked-by: Serge E. Hallyn 

I said basically the same thing to Kees off list.  But I don't have an
answer either.

Acked-by: Eric Paris 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/11] kexec: Disable in a secure boot environment

2012-09-05 Thread Eric Paris
On Wed, Sep 5, 2012 at 5:41 PM, Matthew Garrett  wrote:

> Yeah, I think renaming the cap is a given.

CAP_RING_ZERO

Needed for any activity which would give root the ability to run code in ring 0?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc4 audit_log_d_path oops.

2012-09-06 Thread Eric Paris
On Thu, Sep 6, 2012 at 11:16 AM, Dave Jones  wrote:
> On Thu, Sep 06, 2012 at 09:46:28AM -0400, Dave Jones wrote:
>  > Hit this in overnight fuzz testing..
>  >
>  > BUG: unable to handle kernel NULL pointer dereference at 0020
>  > IP: [] audit_log_d_path+0x35/0xf0
>  > PGD 12fded067 PUD 142c06067 PMD 0
>  > Oops:  [#1] SMP
>  > Modules linked in: tun fuse ipt_ULOG binfmt_misc nfnetlink nfc caif_socket 
> caif phonet can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key 
> decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 lockd 
> sunrpc bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 
> ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
> nf_conntrack kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr 
> i2c_i801 e1000e uinput i915 video i2c_algo_bit drm_kms_helper drm i2c_core
>  > CPU 5
>  > Pid: 7007, comm: trinity-child5 Not tainted 3.6.0-rc4+ #36
>  > RIP: 0010:[]  [] 
> audit_log_d_path+0x35/0xf0
>  > RSP: 0018:880116b33ec8  EFLAGS: 00010246
>  > RAX:  RBX:  RCX: 0035
>  > RDX: 819dbf5d RSI: 819da5bb RDI: 
>  > RBP: 880116b33ee8 R08: 88001942533e R09: 
>  > R10: 0001 R11:  R12: 880116b33f38
>  > R13: 880116b33f38 R14: 819fa1eb R15: 0005
>  > FS:  7f5742581740() GS:88014860() 
> knlGS:
>  > CS:  0010 DS:  ES:  CR0: 80050033
>  > CR2: 0020 CR3: 0001046d3000 CR4: 001407e0
>  > DR0:  DR1:  DR2: 
>  > DR3:  DR6: 0ff0 DR7: 0400
>  > Process trinity-child5 (pid: 7007, threadinfo 880116b32000, task 
> 880019424d00)
>  > Stack:
>  >  880116b33ee8  880116b33f38 880019424d00
>  >  880116b33f18 81103522 01d0 880119ed6500
>  >  01d0 0109 880116b33f78 811ebb95
>  > Call Trace:
>  >  [] audit_log_link_denied+0x92/0x100
>  >  [] sys_linkat+0x195/0x1e0
>  >  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  >  [] system_call_fastpath+0x1a/0x1f
>  > Code: 5d e8 4c 89 65 f0 4c 89 6d f8 0f 1f 44 00 00 48 85 f6 48 89 fb 49 89 
> d5 74 11 48 89 f2 31 c0 48 c7 c6 bb a5 9d 81 e8 8b e1 ff ff <8b> 73 20 40 f6 
> c6 01 75 62 48 8b 3d 33 33 93 01 48 85 ff 74 4e
>  > RIP  [] audit_log_d_path+0x35/0xf0
>  >  RSP 
>  > CR2: 0020
>  > ---[ end trace 85b88c850143bb1c ]---
>  >
>  > That's here in kernel/audit.c
>  >
>  > 1433
>  > 1434 /* We will allow 11 spaces for ' (deleted)' to be appended */
>  > 1435 pathname = kmalloc(PATH_MAX+11, ab->gfp_mask);
>  >
>  > 'ab' is NULL.
>  >
>  > Looks like audit_log_link_denied needs to handle potential failure from
>  > audit_log_start and abort early. (oddly, it looks like every other
>  > function called there checks for !ab.)
>  >
>  > Maybe additional code should be added here to printk the audit message
>  > to dmesg so that we don't lose it entirely, but for now, minimal fix.
>  >
>  > Signed-off-by: Dave Jones 
>  >
>  > diff --git a/kernel/audit.c b/kernel/audit.c
>  > index ea3b7b6..c3e85bb 100644
>  > --- a/kernel/audit.c
>  > +++ b/kernel/audit.c
>  > @@ -1466,6 +1466,9 @@ void audit_log_link_denied(const char *operation, 
> struct path *link)
>  >
>  >  ab = audit_log_start(current->audit_context, GFP_KERNEL,
>  >   AUDIT_ANOM_LINK);
>  > +if (!ab)
>  > +return;
>  > +
>  >  audit_log_format(ab, "op=%s action=denied", operation);
>  >  audit_log_format(ab, " pid=%d comm=", current->pid);
>  >  audit_log_untrustedstring(ab, current->comm);
>
> Added Kees, as this was introduced in a51d9eaa41866ab6b4b6ecad7b621f8b66ece0dc
>
> I just realised, the funny thing about this is that the machine running that 
> test
> had selinux/audit disabled. And yet here we are, screwing around with audit 
> buffers.
>
> Should there be a test on audit_enable=0 in audit_log_link_denied() ?
>
> I'm now curious how much more of the audit code is getting run through 
> similar lack of tests

Possible there are other places.  The usual idium is to put a a static
inline audit_[blah] in the header file which just does if
(!audit_dummy_context()) return __audit_[blah]; return 0;

Thus if audit is off or compiled out nothing gets run.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Check all returns from audit_log_start

2012-09-06 Thread Eric Paris
On Thu, Sep 6, 2012 at 11:08 AM, Dave Jones  wrote:
> Following on from the previous patch that fixed an oops, these
> are all the other similar code patterns in the tree with the same
> checks added.  I never saw these causing problems, but checking
> this everywhere seems to make more sense than every subsequent
> routine that gets passed 'ab' having to check it.
>
> Later we could remove all those same checks from audit_log_format
> and friends. For now, this just prevents similar bugs being introduced
> as the one in my previous patch.
>
> Signed-off-by: Dave Jones 

Not certain because I haven't looked at what happens with the error
code, but I think this might not be right.  auditd can be explictly
told not to audit certain events, in which case it is normal and
expected that ab would come back NULL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] security: introduce kernel_module_from_file hook

2012-09-07 Thread Eric Paris
Acked-by: Eric Paris 

On Fri, Sep 7, 2012 at 2:38 PM, Kees Cook  wrote:
> Now that kernel module origins can be reasoned about, provide a hook to
> the LSMs to make policy decisions about the module file.
>
> Signed-off-by: Kees Cook 
> Acked-by: Serge E. Hallyn 
> ---
>  include/linux/security.h |   13 +
>  kernel/module.c  |9 +
>  security/capability.c|6 ++
>  security/security.c  |5 +
>  4 files changed, 33 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 3dea6a9..368e539 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -693,6 +693,12 @@ static inline void security_free_mnt_opts(struct 
> security_mnt_opts *opts)
>   * userspace to load a kernel module with the given name.
>   * @kmod_name name of the module requested by the kernel
>   * Return 0 if successful.
> + * @kernel_module_from_file:
> + * Load a kernel module from userspace.
> + * @file contains the file structure pointing to the file containing
> + * the kernel module to load. If the module is being loaded from a blob,
> + * this argument will be NULL.
> + * Return 0 if permission is granted.
>   * @task_fix_setuid:
>   * Update the module's state after setting one or more of the user
>   * identity attributes of the current process.  The @flags parameter
> @@ -1507,6 +1513,7 @@ struct security_operations {
> int (*kernel_act_as)(struct cred *new, u32 secid);
> int (*kernel_create_files_as)(struct cred *new, struct inode *inode);
> int (*kernel_module_request)(char *kmod_name);
> +   int (*kernel_module_from_file)(struct file *file);
> int (*task_fix_setuid) (struct cred *new, const struct cred *old,
> int flags);
> int (*task_setpgid) (struct task_struct *p, pid_t pgid);
> @@ -1764,6 +1771,7 @@ void security_transfer_creds(struct cred *new, const 
> struct cred *old);
>  int security_kernel_act_as(struct cred *new, u32 secid);
>  int security_kernel_create_files_as(struct cred *new, struct inode *inode);
>  int security_kernel_module_request(char *kmod_name);
> +int security_kernel_module_from_file(struct file *file);
>  int security_task_fix_setuid(struct cred *new, const struct cred *old,
>  int flags);
>  int security_task_setpgid(struct task_struct *p, pid_t pgid);
> @@ -2277,6 +2285,11 @@ static inline int security_kernel_module_request(char 
> *kmod_name)
> return 0;
>  }
>
> +static inline int security_kernel_module_from_file(struct file *file)
> +{
> +   return 0;
> +}
> +
>  static inline int security_task_fix_setuid(struct cred *new,
>const struct cred *old,
>int flags)
> diff --git a/kernel/module.c b/kernel/module.c
> index c1e446e..0ad03c4 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2430,6 +2431,10 @@ int copy_module_from_user(const void __user *umod, 
> unsigned long len,
> if (info->len < sizeof(*(info->hdr)))
> return -ENOEXEC;
>
> +   err = security_kernel_module_from_file(NULL);
> +   if (err)
> +   return err;
> +
> /* Suck in entire file: we'll want most of it. */
> info->hdr = vmalloc(info->len);
> if (!info->hdr)
> @@ -2468,6 +2473,10 @@ int copy_module_from_fd(int fd, struct load_info *info)
> if (err)
> goto out;
>
> +   err = security_kernel_module_from_file(file);
> +   if (err)
> +   goto out;
> +
> size = stat.size;
> info->hdr = vmalloc(size);
> if (!info->hdr) {
> diff --git a/security/capability.c b/security/capability.c
> index 61095df..8acb304 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -395,6 +395,11 @@ static int cap_kernel_module_request(char *kmod_name)
> return 0;
>  }
>
> +static int cap_kernel_module_from_file(struct file *file)
> +{
> +   return 0;
> +}
> +
>  static int cap_task_setpgid(struct task_struct *p, pid_t pgid)
>  {
> return 0;
> @@ -967,6 +972,7 @@ void __init security_fixup_ops(struct security_operations 
> *ops)
> set_to_cap_if_null(ops, kernel_act_as);
> set_to_cap_if_null(ops, kernel_create_files_as);
> set_to_cap_if_null(ops, kernel_module_request);
> +   set_to_cap_if_null(ops, kernel_mo

Re: [PATCH] Smack: Add missing depends on INET in Kconfig

2012-11-30 Thread Eric Paris
Do other LSMs need this too Casey?  I remember we mentioned how select
was dangerous  :-(

On Fri, Nov 30, 2012 at 12:28 PM, Casey Schaufler
 wrote:
> Because NETLABEL depends on INET SECURITY_SMACK
> has to explicitly call out the dependency.
>
> Signed-off-by: Casey Schaufler 
> ---
>  security/smack/Kconfig |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/security/smack/Kconfig b/security/smack/Kconfig
> index 9fb14ef..1be1088 100644
> --- a/security/smack/Kconfig
> +++ b/security/smack/Kconfig
> @@ -1,5 +1,6 @@
>  config SECURITY_SMACK
> bool "Simplified Mandatory Access Control Kernel Support"
> +   depends on INET
> depends on NET
> depends on SECURITY
> select NETLABEL
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: backing up ext4 fs, system unresponsive, thrashing like crazy even though swap is unused

2012-12-03 Thread Eric Paris
On Mon, Dec 3, 2012 at 12:43 PM, Theodore Ts'o  wrote:

> If you are seeing a large number of inodes still in the ext4 inode
> cache after using drop_caches, then I'd look to see whether you have
> something like SELinux or auditing enabled which is pinning a bunch of
> dentries or inodes

You can safely ignore this suggestion as it does make sense.  SELinux
only grabs a references to dentries during its call to
fs_ops->getxattr, which can't last a meaningful length of time (unless
the filesystem is busted).  It only grabs references to inodes during
system initialization, when you couldn't have many in core.

Audit, likewise, only grabs a reference to a dentry during execve()
and only long enough to run getxattr and does not grab any reference
directly to an inode at all.

> or whether your backup program (or some other
> program running on your system) is keeping lots of directories or
> inodes open for some reason.

Certainly could be this suggestion though..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SUNRPC: connect to UNIX sockets synchronously

2012-12-04 Thread Eric Paris
On Tue, Dec 4, 2012 at 6:10 AM, Stanislav Kinsbursky
 wrote:

> But there should be noted, that such implementation introduces limitation
> (Trond's quote):
> "That approach can fall afoul of the selinux restrictions on the process
> context. Processes that are allowed to write data, may not be allowed to
> create sockets or call connect(). That is the main reason for doing it
> in the rpciod context, which is a clean kernel process context."

So you tested this and Trond was wrong?  This work just fine even on
an SELinux box?  Or it does break tons and tons of people's computers?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ima: policy search speedup

2012-12-11 Thread Eric Paris
On Tue, Dec 11, 2012 at 12:55 PM, Linus Torvalds
 wrote:

> And your "pseudo-filesystems" argument is pretty stupid too, since WE
> ALREADY HAVE A FLAG FOR THAT!
>
> Guess where it is? Oh, it's in the place I already mentioned makes
> more sense. Look for S_PRIVATE in inode->i_flags, and IS_PRIVATE() in
> users. It's what the other security models already use to avoid
> bothering calling down to the security layers. The fact that the
> integrity layer bypasses the normal security layer in
> ima_file_check(), for example, is no excuse to then make up totally
> new flags.

IS_PRIVATE() is not used by and darn well better not be used by, all
psuedo filesystems like procfs which IMA may want to ignore.  LSMs
like to do control on them.  I thought S_PRIVATE was really only used
by the anon_inode and reiserfs's really crazy ass internal inodes.  I
could always be wrong.

I don't know if I agree with dmitry but let me explain what's going on here.

Lets say the user accesses an inode in procfs.  Without this patch one
would search the ima policy and find a rule that says 'if this inode
is on procfs we don't care.'  We can then cache that in the struct
inode like you say and move along.  If another inode on procfs is
opened we will have the same policy search and the same per inode 'i
don't give a crap' marking.  This absolutely works you are right.  But
we search the IMA policy for every inode.

With Dmitry's patch we can instead search the IMA policy one time and
mark the whole superblock as 'i don't give a crap' if IMA policy says
we don't care about that fstype.  When the second procfs inode is
opened we instead look at the per superblock 'who gives a crap' and
never search the IMA policy.  So we save all future policy searches.

I'd say this patch would only be a good idea if there is a real
performance hit which is measurable in a real work situation.  Not,
'look how much faster it is to access proc inodes' microbenchmark,
since noone is actually going to do that, but some results of a useful
benchmark you care about.  Maybe Dmitry gave those numbers and I
missed them?  Otherwise, stick with per inode like Linus wants...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ima: policy search speedup

2012-12-11 Thread Eric Paris
On Tue, Dec 11, 2012 at 1:18 PM, Mimi Zohar  wrote:

> The appraisal policy is based on the object metadata, such as the uid,
> so the result is static and can be cached.  The measurement policy, on
> the other hand, is normally based on the subject (eg. who is
> reading/executing) the file.  Knowledge of whether the file has been
> measured is cached in the iint, but unlike the appraisal policy, not
> whether it needs to be measured.  Having the flag on a per inode basis,
> doesn't really help.

Can you try again?  Even I can't parse this.  Not sure what to tell
you to try again, maybe give us a summary at a high level again and
then why this patch is specifically necessary?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ima: policy search speedup

2012-12-11 Thread Eric Paris
S_PRIVATE is totally unacceptable as it has a meaning across all LSMs,
not just IMA.

S_NOSEC means 'this is not setuid or setgid and we don't need to do
those checks on modify'

You are going to need to use a S_NOIMA.

Of Dmitry's 90,000 fewer policy lookups using the per sb flag, how
many of them are the same inode over and over again which would be
circumvented using S_NOIMA per inode flag?



On Tue, Dec 11, 2012 at 2:48 PM, Mimi Zohar  wrote:
> On Tue, 2012-12-11 at 11:10 -0800, Linus Torvalds wrote:
>
>> Anyway, the whole "you can do it at file granularity" isn't the bulk
>> of my argument (the "we already have the field that makes sense" is).
>> But my point is that per-inode is not only the logically more
>> straightforward place to do it, it's also the much more flexible place
>> to do it. Because it *allows* for things like that.
>
> Ok. To summarize, S_IMA indicates that there is a rule and that the iint
> was allocated.  To differentiate between 'haven't looked/don't know' and
> 'definitely not', we need another bit.  For this, you're suggesting
> using IS_PRIVATE()?  Hopefully, I misunderstood.
>
> thanks,
>
> Mimi
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ima: policy search speedup

2012-12-11 Thread Eric Paris
On Tue, Dec 11, 2012 at 3:05 PM, Linus Torvalds
 wrote:

> The "IS_PRIVATE()" thing is more a "if you know a-priori that you
> aren't interested in pseudo-filesystems, you can already check that
> bit, because it will be set for things like /proc and shmem mappings
> and pipes etc".

I know it isn't relevant to the final solution, but this is simply
wrong.  IS_PRIVATE() means 'this inode is filesystem internal.'  It is
not used by anything except rieser and the anon_inode.  If it is used
by psuedo filesystems in general, like /proc, shmem mappings, and
pipes that is a huge bug and is absolutely wrong.

You are correct IS_PRIVATE is sufficient to know you don't need to do
any IMA stuff with that inode, but it is used in damn few places and
better remain that way

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: recent rebases

2012-12-11 Thread Eric Paris
On Wed, 2012-12-12 at 08:41 +1100, Stephen Rothwell wrote:
> Hi Linus,
> 
> I am not sure how much you care about this, but here is a list of the
> trees in linux-next that have been rebased in the last day (i.e. since I
> fetched the trees for next-20121211):
> 
> fsnotify: rebased to v3.6(!) to resolve a conflict.
>   git://git.infradead.org/users/eparis/notify.git#for-next

I planned to let it sit in -next until at least next thursday before I
sent a pull request.  Thought that fit within the rules...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ima: policy search speedup

2012-12-11 Thread Eric Paris
Linus made it clear he likes per-inode.  Do a test with per-inode
S_NOIMA.  See how that compares to your 100,000 vs 10,000 lookups.
Then we can discuss with facts if per sb is worth it or not.  We still
don't actually know if 100,000 lookups vs 10,000 lookups really
matters, but at least we'll have some facts...

On Tue, Dec 11, 2012 at 5:57 PM, Kasatkin, Dmitry
 wrote:
> On Tue, Dec 11, 2012 at 10:08 PM, Eric Paris  wrote:
>> S_PRIVATE is totally unacceptable as it has a meaning across all LSMs,
>> not just IMA.
>>
>> S_NOSEC means 'this is not setuid or setgid and we don't need to do
>> those checks on modify'
>>
>> You are going to need to use a S_NOIMA.
>>
>> Of Dmitry's 90,000 fewer policy lookups using the per sb flag, how
>> many of them are the same inode over and over again which would be
>> circumvented using S_NOIMA per inode flag?
>>
>
> IIRC those were only newly instantiated inodes.
>
> For new inodes, S_NOIMA would not be set and used at that point.
> IMA must do the policy search and then would set the S_NOIMA.
>
> For subsequent calls, S_NOIMA makes sense.
>
> If we would have the SB flag like MS_NOIMA, then we could replicate sb
> flag to inode S_NOIMA flag,
> in similar way as inode_has_no_xattr() does:
>
> static inline void inode_has_no_xattr(struct inode *inode)
> {
> if (!is_sxid(inode->i_mode) && (inode->i_sb->s_flags & MS_NOSEC))
> inode->i_flags |= S_NOSEC;
> }
>
> Then later there is no need to dereference inode->i_sb...
>
> Can we reach conclusion about it?
> Will we have SB flag?
> if yes, then where, s_flags, or in some other field like s_feature_flags?
> if not, then we can stop this discussion...
>
> - Dmitry
>
>>
>>
>> On Tue, Dec 11, 2012 at 2:48 PM, Mimi Zohar  wrote:
>>> On Tue, 2012-12-11 at 11:10 -0800, Linus Torvalds wrote:
>>>
>>>> Anyway, the whole "you can do it at file granularity" isn't the bulk
>>>> of my argument (the "we already have the field that makes sense" is).
>>>> But my point is that per-inode is not only the logically more
>>>> straightforward place to do it, it's also the much more flexible place
>>>> to do it. Because it *allows* for things like that.
>>>
>>> Ok. To summarize, S_IMA indicates that there is a rule and that the iint
>>> was allocated.  To differentiate between 'haven't looked/don't know' and
>>> 'definitely not', we need another bit.  For this, you're suggesting
>>> using IS_PRIVATE()?  Hopefully, I misunderstood.
>>>
>>> thanks,
>>>
>>> Mimi
>>>
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bisected regression: iterate_fd() selinux change affects flash plugin

2012-11-12 Thread Eric Paris
OMG this +1 -1 stuff is nuts...

iterate_fd passes fd+1 instead of the fd for the file?  seriously?
e.  I see how this patch fixes it, but still, wouldn't some magic
or negative value for no entries found be better than having to
understand the crazy logics?

/me pokes Al.

On Sun, Nov 11, 2012 at 11:50 PM, Pavel Roskin  wrote:
> Quoting Pavel Roskin :
>
>> Hello, Al!
>>
>> I have noticed that Mozilla Firefox gets stuck for seconds or minutes
>> on some sites, in particular on Facebook with Linux 3.7-rc1 and newer
>> mainline kernels.  Disabling flash plugin fixes the delays.
>>
>> This is a Fedora 17 system with SELinux enabled, on x86_64
>> architecture, with all updates, with LXDE desktop.  It's not the Fedora
>> 16 system I mentioned before, it has never had LXDE login problems due
>> to replace_fd().
>>
>> Bisecting lead me to the patch that introduced iterate_fd():
>>
>> commit c3c073f808b22dfae15ef8412b6f7b998644139a
>> Author: Al Viro 
>> Date:   Tue Aug 21 22:32:06 2012 -0400
>>
>> new helper: iterate_fd()
>>
>> iterates through the opened files in given descriptor table,
>> calling a supplied function; we stop once non-zero is returned.
>> Callback gets struct file *, descriptor number and const void *
>> argument passed to iterator.  It is called with files->file_lock
>> held, so it is not allowed to block.
>>
>> tty_io, netprio_cgroup and selinux flush_unauthorized_files()
>> converted to its use.
>>
>> Signed-off-by: Al Viro 
>>
>> I have found that reverting the changes to security/selinux/hooks.c is
>> sufficient to restore the correct behavior.
>>
>> --
>> Regards,
>> Pavel Roskin
>
>
> I've made a bugzilla entry for the bug and put a preliminary patch there.
> https://bugzilla.kernel.org/show_bug.cgi?id=50401
>
>
> --
> Regards,
> Pavel Roskin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] smack: SMACK_MAGIC to include/uapi/linux/magic.h

2012-11-06 Thread Eric Paris
It looks like smack_lsm.c includes linux/magic.h, but smackfs.c
doesn't...   Seems like they should have included linux/magic.h in the
same header they removed the definition from

On Tue, Nov 6, 2012 at 4:59 PM, Casey Schaufler  wrote:
> On 11/6/2012 12:17 AM, Jarkko Sakkinen wrote:
>> SMACK_MAGIC moved to a proper place for easy user space access
>> (i.e. libsmack).
>>
>> Signed-off-by: Jarkko Sakkinen 
>> ---
>>  include/uapi/linux/magic.h |1 +
>>  security/smack/smack.h |5 -
>>  2 files changed, 1 insertion(+), 5 deletions(-)
>
> Will security/smack/smack_lsm.c and security/smack/smackfs.c
> compile after this change?
>
>>
>> diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
>> index e15192c..12735ad 100644
>> --- a/include/uapi/linux/magic.h
>> +++ b/include/uapi/linux/magic.h
>> @@ -11,6 +11,7 @@
>>  #define DEBUGFS_MAGIC  0x64626720
>>  #define SECURITYFS_MAGIC 0x73636673
>>  #define SELINUX_MAGIC0xf97cff8c
>> +#define SMACK_MAGIC  0x43415d53  /* "SMAC" */
>>  #define RAMFS_MAGIC  0x858458f6  /* some random number */
>>  #define TMPFS_MAGIC  0x01021994
>>  #define HUGETLBFS_MAGIC  0x958458f6  /* some random number */
>> diff --git a/security/smack/smack.h b/security/smack/smack.h
>> index 99b3612..8ad3095 100644
>> --- a/security/smack/smack.h
>> +++ b/security/smack/smack.h
>> @@ -149,11 +149,6 @@ struct smack_known {
>>  #define SMACK_CIPSO_SOCKET   1
>>
>>  /*
>> - * smackfs magic number
>> - */
>> -#define SMACK_MAGIC  0x43415d53 /* "SMAC" */
>> -
>> -/*
>>   * CIPSO defaults.
>>   */
>>  #define SMACK_CIPSO_DOI_DEFAULT  3   /* Historical */
>
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] smack: SMACK_MAGIC to include/uapi/linux/magic.h

2012-11-08 Thread Eric Paris
Then it only works by accident that magic.h is included by some random
path in smackfs.c.  You really should be including it in smack.h (or
each .c file individually, up to casey)

On Thu, Nov 8, 2012 at 5:43 AM, Jarkko Sakkinen  wrote:
> On Tue, Nov 6, 2012 at 11:59 PM, Casey Schaufler  
> wrote:
>>
>> On 11/6/2012 12:17 AM, Jarkko Sakkinen wrote:
>> > SMACK_MAGIC moved to a proper place for easy user space access
>> > (i.e. libsmack).
>> >
>> > Signed-off-by: Jarkko Sakkinen 
>> > ---
>> >  include/uapi/linux/magic.h |1 +
>> >  security/smack/smack.h |5 -
>> >  2 files changed, 1 insertion(+), 5 deletions(-)
>>
>> Will security/smack/smack_lsm.c and security/smack/smackfs.c
>> compile after this change?
>
> Sorry I haven't replied earlier. Anyway, I made a sanity check.
>
> I retried build from clean. Works. I also checked that vmlinux contains
> SMACK symbols. It does.
>
>>
>> >
>> > diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
>> > index e15192c..12735ad 100644
>> > --- a/include/uapi/linux/magic.h
>> > +++ b/include/uapi/linux/magic.h
>> > @@ -11,6 +11,7 @@
>> >  #define DEBUGFS_MAGIC  0x64626720
>> >  #define SECURITYFS_MAGIC 0x73636673
>> >  #define SELINUX_MAGIC0xf97cff8c
>> > +#define SMACK_MAGIC  0x43415d53  /* "SMAC" */
>> >  #define RAMFS_MAGIC  0x858458f6  /* some random number */
>> >  #define TMPFS_MAGIC  0x01021994
>> >  #define HUGETLBFS_MAGIC  0x958458f6  /* some random number */
>> > diff --git a/security/smack/smack.h b/security/smack/smack.h
>> > index 99b3612..8ad3095 100644
>> > --- a/security/smack/smack.h
>> > +++ b/security/smack/smack.h
>> > @@ -149,11 +149,6 @@ struct smack_known {
>> >  #define SMACK_CIPSO_SOCKET   1
>> >
>> >  /*
>> > - * smackfs magic number
>> > - */
>> > -#define SMACK_MAGIC  0x43415d53 /* "SMAC" */
>> > -
>> > -/*
>> >   * CIPSO defaults.
>> >   */
>> >  #define SMACK_CIPSO_DOI_DEFAULT  3   /* Historical */
>>
>
> /Jarkko
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] prctl: fix validation of an address

2012-12-30 Thread Eric Paris
On Sat, 2012-12-29 at 15:00 +0400, Andrey Vagin wrote:
> The address should be bigger than dac_mmap_min_addr, because
> a process with CAP_RAWIO can map a vma bellow mmap_min_addr.

NAK

This doesn't make any sense.  dac_mmap_min_addr should ONLY be used in
security/min_addr.c and security/commoncap.c.  Period.  You should not
be allowed to circumvent LSM protections.  Maybe you are missing that
mmap_min_addr = max(dac_mmap_min_addr, CONFIG_LSM_MMAP_MIN_ADDR)  ?

But this patch is absolutely unacceptable.  Maybe you can help me
understand what problem you had and what you were hoping for?

-Eric
> 
> Cc: Andrew Morton 
> Cc: Kees Cook 
> Cc: Cyrill Gorcunov 
> Cc: Serge Hallyn 
> Cc: "Eric W. Biederman" 
> Cc: Eric Paris 
> Cc: James Morris 
> Signed-off-by: Andrey Vagin 
> ---
>  kernel/sys.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 265b376..e0e1bbd 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1868,7 +1868,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
>   if (opt == PR_SET_MM_EXE_FILE)
>   return prctl_set_mm_exe_file(mm, (unsigned int)addr);
>  
> - if (addr >= TASK_SIZE || addr < mmap_min_addr)
> + if (addr >= TASK_SIZE || addr < dac_mmap_min_addr)
>   return -EINVAL;
>  
>   error = -EINVAL;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] prctl: fix validation of an address

2012-12-31 Thread Eric Paris
On Mon, 2012-12-31 at 14:14 +0400, Andrew Vagin wrote:
> On Sun, Dec 30, 2012 at 05:03:07PM -0500, Eric Paris wrote:
> > On Sat, 2012-12-29 at 15:00 +0400, Andrey Vagin wrote:
> > > The address should be bigger than dac_mmap_min_addr, because
> > > a process with CAP_RAWIO can map a vma bellow mmap_min_addr.
> > 
> > NAK
> 
> Currently prctl(PR_SET_MM_*, addr, ) returns EINVAL for valid addresses.
> I think it's a bug. Are you agree?

Can you help me understand how prctl(PR_SET_MM_*, relates to
checkpoint/restore?  My worry here is that somehow this interface could
be used to bypass the security properties of mmap_min_addr.I have no
idea how the interface is used, so I don't know if my fears are founded.
When I hear 'restore' I think of a privileged application setting up
some unprivileged application based on untrusted data.  My fear is that
some unpriv application, that doesn't have permission to map below
mmap_min_addr, may be able to trick the privileged application, which
would have this permission, into doing it on its behalf.  Does that make
sense?  Is that a realistic scenario with how this interface is used?

> CONFIG_LSM_MMAP_MIN_ADDR could not be got from user space.
> 
> This application can use a real value of mmap_min_addr, but it is not
> provided into userspace.

Unrelated to this patch issue, but I guess either could be exposed if
there is a need.

> Currently a task can have user memory area bellow dac_mmap_min_addr,
> but prctl returns -EINVAL for such addresses.
> How can I understand the reason, if I know that the address is valid?

Talking about dac_mmap_min_addr is wrong.  The capabilities security
module uses dac_mmap_min_addr but other LSMs can (and obviously do) use
other things.  mmap_min_addr is just the shorthand to make sure you
clear all hurdles.  Breaking those hurdles up outside of the security
subsystem is wrong.

The kernel makes the decision on what is valid via security_mmap_addr().
Assuming there are no security fears of an untrusted application
tricking some priviledged application to set up these maps the answer is
just calling security_mmap_addr() instead of doing if(addr <
mmap_min_addr) return -EINVAL;

I don't know if it is a good idea to allow this interface to ever go
below mmap_min_addr, but I do know that using (or even thinking about)
dac_mmap_min_addr is wrong and you should be looking at
security_mmap_addr() if you look at anything...

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] prctl: fix validation of an address

2012-12-31 Thread Eric Paris
On Mon, 2012-12-31 at 19:13 +0400, Cyrill Gorcunov wrote:
> On Mon, Dec 31, 2012 at 09:27:14AM -0500, Eric Paris wrote:
> > On Mon, 2012-12-31 at 14:14 +0400, Andrew Vagin wrote:
> > > On Sun, Dec 30, 2012 at 05:03:07PM -0500, Eric Paris wrote:
> > > > On Sat, 2012-12-29 at 15:00 +0400, Andrey Vagin wrote:
> > > > > The address should be bigger than dac_mmap_min_addr, because
> > > > > a process with CAP_RAWIO can map a vma bellow mmap_min_addr.
> > > > 
> > > > NAK
> > > 
> > > Currently prctl(PR_SET_MM_*, addr, ) returns EINVAL for valid addresses.
> > > I think it's a bug. Are you agree?
> > 
> > Can you help me understand how prctl(PR_SET_MM_*, relates to
> > checkpoint/restore?  My worry here is that somehow this interface could
> 
> Here how we use it (from userspace code)
> 
>   ret |= sys_prctl_safe(PR_SET_MM, PR_SET_MM_START_CODE, 
> (long)args->mm.mm_start_code, 0);
>   ret |= sys_prctl_safe(PR_SET_MM, PR_SET_MM_END_CODE, 
> (long)args->mm.mm_end_code, 0);
>   ...
> 
> the values of mm.mm_start_code and such are saved in image file and obtained
> during checkpoint stage. Note the prctl_set_mm requires the caller to have
> CAP_SYS_RESOURCE privilege granted.

Is there anything which prevents an unpriv application from changing
mm.mm_start_code and mm.mm_end_code in the image, thus taking advantage
of the privileged restore code to bypass the mmap_min_addr
restrictions? 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] notification tree changes for 3.8

2012-12-20 Thread Eric Paris
I believe you would get a build failure after this pull due to the
addition of procfs information for *notify fds.  The attached patch from
sfr should be applied during the merge to change the spin_lock in that
patch to the mutex in this patch.

This pull mostly is about locking changes in the fsnotify system.  By
switching the group lock from a spin_lock() to a mutex() we can now hold
the lock across things like iput().  This fixes a problem involving
unmounting a fs and having inodes be busy, first pointed out by FAT, but
reproducible with tmpfs.

This also restores signal driven I/O for inotify, which has been broken
since about 2.6.32.

$ git request-pull v3.6 git://git.infradead.org/users/eparis/notify.git
The following changes since commit a0d271cbfed1dd50278c6b06bead3d00ba0a88f9:

  Linux 3.6 (2012-09-30 16:47:46 -0700)

are available in the git repository at:

  git://git.infradead.org/users/eparis/notify.git for-next

for you to fetch changes up to 1ca39ab9d21ac93f94b9e3eb364ea9a5cf2aba06:

  inotify: automatically restart syscalls (2012-12-11 13:44:37 -0500)


Eric Paris (2):
  fsnotify: make fasync generic for both inotify and fanotify
  inotify: automatically restart syscalls

Lino Sanfilippo (12):
  inotify, fanotify: replace fsnotify_put_group() with 
fsnotify_destroy_group()
  fsnotify: introduce fsnotify_get_group()
  fsnotify: use reference counting for groups
  fsnotify: take groups mark_lock before mark lock
  fanotify: add an extra flag to mark_remove_from_mask that indicates 
wheather a mark should be destroyed
  fsnotify: use a mutex instead of a spinlock to protect a groups mark list
  fsnotify: pass group to fsnotify_destroy_mark()
  fsnotify: introduce locked versions of fsnotify_add_mark() and 
fsnotify_remove_mark()
  fsnotify: dont put marks on temporary list when clearing marks by group
  fsnotify: change locking order
  fanotify: dont merge permission events
  inotify: dont skip removal of watch descriptor if creation of ignored 
event failed

 fs/notify/dnotify/dnotify.c  |  4 ++--
 fs/notify/fanotify/fanotify.c|  6 ++
 fs/notify/fanotify/fanotify_user.c   | 37 +
 fs/notify/group.c| 47 
+++
 fs/notify/inode_mark.c   | 14 +++---
 fs/notify/inotify/inotify_fsnotify.c |  4 +++-
 fs/notify/inotify/inotify_user.c | 34 ++
 fs/notify/mark.c | 91 
++-
 fs/notify/notification.c |  1 +
 fs/notify/vfsmount_mark.c| 14 +++---
 include/linux/fsnotify_backend.h | 31 +--
 kernel/audit_tree.c  | 10 +-
 kernel/audit_watch.c |  4 ++--
 13 files changed, 178 insertions(+), 119 deletions(-)
From: Stephen Rothwell 
Date: Wed, 19 Dec 2012 11:53:20 +1100
Subject: [PATCH] fsnotify: cope with change from spinlock to mutex

Signed-off-by: Stephen Rothwell 
---
 fs/notify/fdinfo.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 514c4b8..238a593 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -27,13 +27,13 @@ static int show_fdinfo(struct seq_file *m, struct file *f,
 	struct fsnotify_mark *mark;
 	int ret = 0;
 
-	spin_lock(&group->mark_lock);
+	mutex_lock(&group->mark_mutex);
 	list_for_each_entry(mark, &group->marks_list, g_list) {
 		ret = show(m, mark);
 		if (ret)
 			break;
 	}
-	spin_unlock(&group->mark_lock);
+	mutex_unlock(&group->mark_mutex);
 	return ret;
 }
 


Re: [GIT PULL] notification tree changes for 3.8

2012-12-20 Thread Eric Paris
On Thu, 2012-12-20 at 17:50 -0800, Linus Torvalds wrote:
> On Thu, Dec 20, 2012 at 2:38 PM, Eric Paris  wrote:
> > I believe you would get a build failure after this pull due to the
> > addition of procfs information for *notify fds.  The attached patch from
> > sfr should be applied during the merge to change the spin_lock in that
> > patch to the mutex in this patch.
> 
> I'm not going to bother pulling this.
> 
> It's coming in late in the merge window, it has been rebased days ago,
> and it changes fundamental infrastructure.
> 
> This has been a huge merge window, and this is just too late and too
> surprising. The commits are marked as being written 18 months ago, but
> then the commit date is after the merge window started, and sent the
> day before it closes.
> 
> WTF? If they've waited 18 months, there cannot be this kind of
> last-minute hurry now. Why did you wait until the last minute?

The changes have been in next for about 18 months.  I've just been a
missing maintainer.  I'm back with some time to fix that fact, but I
understand if you want to reject it on that account.  I'm available now
if something goes wrong.  Most of this work was originally done just as
cleanup, but recently people started complaining about the FAT race/NULL
ptr deref.

https://bugzilla.redhat.com/show_bug.cgi?id=768534

Which got me to finally get off my lazy ass.  When looking to send the
pull request on the first day of the window I realized just how far
behind my tree was and that there was a merge conflict for which sfr had
been carrying a patch forever.  With 3.7 less than a day old I figured
3.6 was a good place to go in order to solve the conflict.  Since I
rebased, it obviously needed more testing and some time in -next.  Which
was why I said to you, back on the 11th, that I was planning to send my
pull request today.

http://marc.info/?l=linux-next&m=135526314222685&w=2

It is a locking change, but if anything goes wrong, at least I've got
time without other work commitments over the next 2 weeks to fix them.
If you are wondering about testing, I've written the attached
notification stressor.  On a dual core box it will cause a kernel panic
is about 1 second on your kernel.  With the patch set it doesn't pop...

Your call, but it does fix a real bug that people are reporting today.

-Eric
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

/* huerristic on how hard to load a box */
static unsigned int num_cores;
/* number of threads which just read from inotify_fd and ignore the results */
static unsigned int num_data_dumpers;
/* each thread will add a watch to a given file as fast as it can */
static unsigned int num_adder_threads;
/* each thread will remove all watches between low and high as fast as they can */
static unsigned int num_remover_threads;
/* a multiplier for adders and removers so you have X adders per file */
static unsigned int watcher_multiplier;
/* this removes watches from low_wd to low_wd + 3 just for extra removal races */
static unsigned int num_low_remover_threads;
/* create and destroy the files watches are being added to and removed from */
static unsigned int num_file_creaters;
/* how many inotify_fd's we have total (so another multiplier for adders and removers) */
static unsigned int num_inotify_instances;

static char *working_dir = "/tmp/inotify_syscall_thrash";
/* if mounting a real filesystem, where is the source?  (doesn't matter for tmpfs) */
static char *mnt_src;
/* what is the fstype to mount and unmonut? */
static char *fstype = "tmpfs";

static pthread_attr_t attr;

static pthread_mutex_t wait_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t wait_var = PTHREAD_COND_INITIALIZER;
static int wait;
#define WAIT_CHILD do {\
		pthread_mutex_lock(&wait_mutex); \
		if (wait == 0) \
			pthread_cond_wait(&wait_var, &wait_mutex); \
		wait = 0; \
		pthread_mutex_unlock(&wait_mutex); \
	} while (0);
#define WAKE_PARENT do {\
		pthread_mutex_lock(&wait_mutex); \
		wait = 1; \
		pthread_cond_signal(&wait_var); \
		pthread_mutex_unlock(&wait_mutex); \
	} while (0);

struct adder_struct {
	int inotify_fd;
	int file_num;
};

struct operator_struct {
	int inotify_fd;
};

struct thread_data {
	int inotify_fd;
	pthread_t *adders;
	pthread_t *removers;
	pthread_t *lownum_removers;
	pthread_t *data_dumpers;
};

pthread_t *file_creaters;
pthread_t low_wd_reseter;
pthread_t mounter;

static int stopped = 0;

static int high_wd = 0;
static int low_wd = INT_MAX;

static int handle_error(const char *arg)
{
	perror(arg);
	exit(EXIT_FAILURE);
}

static void sigfunc(int sig_num)
{
	if (sig_num == SIGINT)
		stopped = 1;
	else
		printf("Got an unknown signal!\n");
}

/* constantly crea

Re: Bisected regression: iterate_fd() selinux change affects flash plugin

2012-11-16 Thread Eric Paris
On Mon, Nov 12, 2012 at 11:57 AM, Pavel Roskin  wrote:
> Quoting Eric Paris :
>
>> OMG this +1 -1 stuff is nuts...

Ping, Al.

int iterate_fd(struct files_struct *files, unsigned n,
[snip]
while (!res && n < fdt->max_fds) {
file = rcu_dereference_check_fdtable(files, fdt->fd[n++]);
if (file)
res = f(p, file, n);
}
spin_unlock(&files->file_lock);
return res;

So we increment n (the file descriptor number) in the dereference,
then pass that (wrong) number to f().

Every single f() (including SELinux, the cause of this bug) returns
fd+1 (so now we are up by 2).  Then all of the users of iterate fd
actually use fd-1 (which is wrong)

Why not have iterate_fd return -ENOENT on no entries and stop all of
the stupid games?  We fix the real bug (the above function should do
the n++ after the f() call, and the interface is sane to design
against...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] audit: destroy filename correctly PING.

2013-04-23 Thread Eric Paris
I picked it up for 3.10.  Sorry, should have said something.  Thanks!

-Eric

- Original Message -
> On Mon,  1 Apr 2013 11:00:00 +0400, Dmitry Monakhov 
> wrote:
> Ping. Patch (https://lkml.org/lkml/2013/4/1/65) was not a 1'st April's joke.
> Add CC:linux-au...@redhat.com
> > filename should be destroyed via final_putname() instead of __putname()
> > Otherwise this result in following BUGON() in case of long names:
> >   kernel BUG at mm/slab.c:3006!
> >   Call Trace:
> >   kmem_cache_free+0x1c1/0x850
> >   audit_putname+0x88/0x90
> >   putname+0x73/0x80
> >   sys_symlinkat+0x120/0x150
> >   sys_symlink+0x16/0x20
> >   system_call_fastpath+0x16/0x1b
> > 
> > Signed-off-by: Dmitry Monakhov 
> > ---
> >  kernel/auditsc.c |6 +++---
> >  1 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index a371f85..bfe7ca6 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -1010,7 +1010,7 @@ static inline void audit_free_names(struct
> > audit_context *context)
> > list_for_each_entry_safe(n, next, &context->names_list, list) {
> > list_del(&n->list);
> > if (n->name && n->name_put)
> > -   __putname(n->name);
> > +   final_putname(n->name);
> > if (n->should_free)
> > kfree(n);
> > }
> > @@ -2036,7 +2036,7 @@ void audit_putname(struct filename *name)
> > BUG_ON(!context);
> > if (!context->in_syscall) {
> >  #if AUDIT_DEBUG == 2
> > -   printk(KERN_ERR "%s:%d(:%d): __putname(%p)\n",
> > +   printk(KERN_ERR "%s:%d(:%d): final_putname(%p)\n",
> >__FILE__, __LINE__, context->serial, name);
> > if (context->name_count) {
> > struct audit_names *n;
> > @@ -2047,7 +2047,7 @@ void audit_putname(struct filename *name)
> >n->name, n->name->name ?: "(null)");
> > }
> >  #endif
> > -   __putname(name);
> > +   final_putname(name);
> > }
> >  #if AUDIT_DEBUG
> > else {
> > --
> > 1.7.1
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Part1 PATCH 00/22] Add namespace support for audit

2013-06-19 Thread Eric Paris
On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
> > This patchset is first part of namespace support for audit.
> > in this patchset, the mainly resources of audit system have
> > been isolated. the audit filter, rules havn't been isolated
> > now. It will be implemented in Part2. We finished the isolation
> > of user audit message in this patchset.
> > 
> > I choose to assign audit to the user namespace.
> > Right now,there are six kinds of namespaces, such as
> > net, mount, ipc, pid, uts and user. the first five
> > namespaces have special usage. the audit isn't suitable to
> > belong to these five namespaces, And since the flag of system
> > call clone is in short supply, we can't provide a new flag such
> > as CLONE_NEWAUDIT to enable audit namespace separately. so the
> > user namespace may be the best choice.
> 
> I thought it was said on the last submission that to tie userns and
> audit namespace would be a bad idea?

I consider it a non-starter.  unpriv users are allowed to launch their
own user namespace.  The whole point of audit is to have only a priv
user be allowed to make changes.  If you tied audit namespace to user
namespace you grant an unpriv user the ability to modify audit.

NAK.

If there are not clone flags you will either need to only do this from
unshare and not from clone, or get more flags to clone

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Part1 PATCH 00/22] Add namespace support for audit

2013-06-20 Thread Eric Paris
On Thu, 2013-06-20 at 11:02 +0800, Gao feng wrote:
> On 06/20/2013 04:51 AM, Eric Paris wrote:
> > On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote:
> >> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote:
> >>> This patchset is first part of namespace support for audit.
> >>> in this patchset, the mainly resources of audit system have
> >>> been isolated. the audit filter, rules havn't been isolated
> >>> now. It will be implemented in Part2. We finished the isolation
> >>> of user audit message in this patchset.
> >>>
> >>> I choose to assign audit to the user namespace.
> >>> Right now,there are six kinds of namespaces, such as
> >>> net, mount, ipc, pid, uts and user. the first five
> >>> namespaces have special usage. the audit isn't suitable to
> >>> belong to these five namespaces, And since the flag of system
> >>> call clone is in short supply, we can't provide a new flag such
> >>> as CLONE_NEWAUDIT to enable audit namespace separately. so the
> >>> user namespace may be the best choice.
> >>
> >> I thought it was said on the last submission that to tie userns and
> >> audit namespace would be a bad idea?
> > 
> > I consider it a non-starter.  unpriv users are allowed to launch their
> > own user namespace.  The whole point of audit is to have only a priv
> > user be allowed to make changes.  If you tied audit namespace to user
> > namespace you grant an unpriv user the ability to modify audit.
> > 
> 
> I understand your views.
> 
> But ven the unpriv user are allowed to make changes, they can do no harm.
> they can only make changes on the audit namespace they created.they can
> only communicate with the audit namespace they created.

Imagine I set up my machine to audit all user access to super secret
information.  With your patch set all an malicious user has to do is
create a new user namespace.  Now when he accesses the super secret
information it will be logged inside the user namespace he created.  So
he can just dump those logs in the trash.

I believe that each audit namespace should require priv
(CAP_AUDIT_CONTROL) in the user namespace that created the current audit
namespace.  So lets say the machine boots and we are in the init_user
and init_audit namespace.  Creating a new audit namespace should require
CAP_AUDIT_CONTROL in the init_user namespace.  If instead we spawned a
new user namespace userns1 and try to create a new audit namespace, we
should STILL check for CAP_AUDIT_CONTROL in the init_user namespace.

Assuming we've spawned auditns1 in userns1 and want to spawn auditns2 it
should require CAP_AUDIT_CONTROL in userns1.  So now you only have
permission to change your audit config (create a new audit namespace) if
you already had permission to change your current audit config.

Now how to handle coding this...

When the kernel receives an audit message on the netlink socket it can
always check the current->[whatever] to figure out which audit namespace
it came from.  Then it can be processed accordingly...

Sending messages to the userspace auditd is a little more tricky.  We
need to somehow map the audit namespace to a socket connected to auditd
in userspace.  I'd imagine we'd have to either have per auditns backlog
queues and run one kauditd per audit namespace, or we'd have to tag the
skb's with the intended namespace somehow and then find the right socket
in kauditd.  Either way it doesn't seem too onerous (although I admit, I
don't know how to code the per namespace kauditd right offhand)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stupid VFS name lookup interface..

2013-05-29 Thread Eric Paris
On Sat, May 25, 2013 at 10:19 PM, Linus Torvalds
 wrote:
> On Sat, May 25, 2013 at 10:04 PM, James Morris  wrote:
>> On Sat, 25 May 2013, Linus Torvalds wrote:
>>
>>> But I haven't even looked at what non-selinux setups do to
>>> performance. Last time I tried Ubuntu (they still use apparmor, no?),
>>> "make modules_install ; make install" didn't work for the kernel, and
>>> if the Ubuntu people don't want to support kernel engineers, I
>>> certainly am not going to bother with them. Who uses smack?
>>
>> Tizen, perhaps a few others.
>
> Btw, it really would be good if security people started realizing that
> performance matters. It's annoying to see the security lookups cause
> 50% performance degradations on pathname lookup (and no, I'm not
> exaggerating, that's literally what it was before we fixed it - and
> no, by "we" I don't mean security people).

I take a bit of exception to this.  I do care.  Stephen Smalley, the
only other person who does SELinux kernel work, cares.  I don't speak
for other LSMs, but at least both of us who have done anything with
SELinux in the last years care.  I did the RCU work for selinux and
you, sds, and I did a bunch of work to stop wasting so much stack
space which was crapping on performance.  And I'm here again   :)

> Right now (zooming into the kernel only - ignoring the fact that make
> really spends a fair amount of time in user space) I get
>
>   9.79%  make  [k] __d_lookup_rcu
>   5.48%  make  [k] link_path_walk
>   2.94%  make  [k] avc_has_perm_noaudit
>   2.47%  make  [k] selinux_inode_permission
>   2.25%  make  [k] path_lookupat
>   1.89%  make  [k] generic_fillattr
>   1.50%  make  [k] lookup_fast
>   1.27%  make  [k] copy_user_generic_string
>   1.17%  make  [k] generic_permission
>   1.15%  make  [k] dput
>   1.12%  make  [k] inode_has_perm.constprop.58
>   1.11%  make  [k] __inode_permission
>   1.08%  make  [k] kmem_cache_alloc
>   ...

I tried something else, doing caching of the last successful security
check inside the isec.  It isn't race free, so it's not ready for
prime time.  But right now my >1% looks like:

  7.97%  make  [k] __d_lookup_rcu
  5.79%  make  [k] link_path_walk
  3.67%  make  [k] selinux_inode_permission
  2.02%  make  [k] lookup_fast
  1.90%  make  [k] system_call
  1.76%  make  [k] path_lookupat
  1.68%  make  [k] inode_has_perm.isra.45.constprop.61
  1.53%  make  [k] copy_user_enhanced_fast_string
  1.39%  make  [k] generic_permission
  1.35%  make  [k] kmem_cache_free
  1.30%  make  [k] __audit_syscall_exit
  1.13%  make  [k] kmem_cache_alloc
  1.00%  make  [k] strncpy_from_user

How do I tell what is taking time inside selinux_inode_permission?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 1/2] selinux: merge selinux_inode_permission and inode_has_perm

2013-06-03 Thread Eric Paris
selinux_inode_permission had some heavy lifting done to make it more
performance polite.  But it still does largely the same thing as
inode_has_perm.  So move that work into inode_has_perm and call
inode_has_perm from selinux_inode_permission.

Signed-off-by: Eric Paris 
---
 security/selinux/hooks.c | 92 ++--
 1 file changed, 42 insertions(+), 50 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 5c6f2cd..cfecb52 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1494,17 +1494,41 @@ static int task_has_system(struct task_struct *tsk,
SECCLASS_SYSTEM, perms, NULL);
 }
 
+static noinline int audit_inode_permission(struct inode *inode,
+  struct common_audit_data *adp,
+  u32 perms, u32 audited, u32 denied,
+  unsigned flags)
+{
+   struct common_audit_data ad;
+   struct inode_security_struct *isec = inode->i_security;
+   int rc;
+
+   if (!adp) {
+   ad.type = LSM_AUDIT_DATA_INODE;
+   ad.u.inode = inode;
+   adp = &ad;
+   }
+
+   rc = slow_avc_audit(current_sid(), isec->sid, isec->sclass, perms,
+   audited, denied, adp, flags);
+   if (rc)
+   return rc;
+   return 0;
+}
+
 /* Check whether a task has a particular permission to an inode.
The 'adp' parameter is optional and allows other audit
data to be passed (e.g. the dentry). */
 static int inode_has_perm(const struct cred *cred,
  struct inode *inode,
- u32 perms,
+ u32 perms, u32 dontaudit,
  struct common_audit_data *adp,
  unsigned flags)
 {
struct inode_security_struct *isec;
-   u32 sid;
+   struct av_decision avd;
+   u32 sid, denied, audited;
+   int rc, rc2;
 
validate_creds(cred);
 
@@ -1514,6 +1538,14 @@ static int inode_has_perm(const struct cred *cred,
sid = cred_sid(cred);
isec = inode->i_security;
 
+   rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
+   audited = avc_audit_required(perms, &avd, rc, dontaudit, &denied);
+   if (likely(!audited))
+   return rc;
+
+   rc2 = audit_inode_permission(inode, adp, perms, audited, denied, flags);
+   if (rc2)
+   return rc2;
return avc_has_perm_flags(sid, isec->sid, isec->sclass, perms, adp, 
flags);
 }
 
@@ -1529,7 +1561,7 @@ static inline int dentry_has_perm(const struct cred *cred,
 
ad.type = LSM_AUDIT_DATA_DENTRY;
ad.u.dentry = dentry;
-   return inode_has_perm(cred, inode, av, &ad, 0);
+   return inode_has_perm(cred, inode, av, 0, &ad, 0);
 }
 
 /* Same as inode_has_perm, but pass explicit audit data containing
@@ -1544,7 +1576,7 @@ static inline int path_has_perm(const struct cred *cred,
 
ad.type = LSM_AUDIT_DATA_PATH;
ad.u.path = *path;
-   return inode_has_perm(cred, inode, av, &ad, 0);
+   return inode_has_perm(cred, inode, av, 0, &ad, 0);
 }
 
 /* Check whether a task can use an open file descriptor to
@@ -1580,7 +1612,7 @@ static int file_has_perm(const struct cred *cred,
/* av is zero if only checking access to the descriptor. */
rc = 0;
if (av)
-   rc = inode_has_perm(cred, inode, av, &ad, 0);
+   rc = inode_has_perm(cred, inode, av, 0, &ad, 0);
 
 out:
return rc;
@@ -2635,64 +2667,24 @@ static int selinux_inode_follow_link(struct dentry 
*dentry, struct nameidata *na
return dentry_has_perm(cred, dentry, FILE__READ);
 }
 
-static noinline int audit_inode_permission(struct inode *inode,
-  u32 perms, u32 audited, u32 denied,
-  unsigned flags)
-{
-   struct common_audit_data ad;
-   struct inode_security_struct *isec = inode->i_security;
-   int rc;
-
-   ad.type = LSM_AUDIT_DATA_INODE;
-   ad.u.inode = inode;
-
-   rc = slow_avc_audit(current_sid(), isec->sid, isec->sclass, perms,
-   audited, denied, &ad, flags);
-   if (rc)
-   return rc;
-   return 0;
-}
-
 static int selinux_inode_permission(struct inode *inode, int mask)
 {
const struct cred *cred = current_cred();
-   u32 perms;
-   bool from_access;
+   u32 perms, dontaudit = 0;
unsigned flags = mask & MAY_NOT_BLOCK;
-   struct inode_security_struct *isec;
-   u32 sid;
-   struct av_decision avd;
-   int rc, rc2;
-   u32 audited, denied;
 
-   from_access = mask & MAY_ACCESS;
+   if (mask & MAY_ACCESS)
+ 

[RFC PATCH 2/2] SELinux: cache inode checks inside struct inode

2013-06-03 Thread Eric Paris
This patch adds a cache of selinux security checks into struct inode.
It is protected by the seq counter against updates by other nodes.  This
has a measurable impact on one benchmark Linus mentioned.  The cpu
time using make to check a huge project for changes.  It is going to
have a negative impact on cases where tasks with different labels are
accessing the same object.  In these cases each one will grab the i_lock
to reset the in inode cache.  The only place I imagine this would be
common would be with shared libraries.  But as those are typically
openned and mmapped, they don't have continuous checks...

Stock Kernel:
8.23%  make  [k] __d_lookup_rcu
6.27%  make  [k] link_path_walk
3.91%  make  [k] selinux_inode_permission <
3.37%  make  [k] avc_has_perm_noaudit <
2.26%  make  [k] lookup_fast
2.12%  make  [k] system_call
1.86%  make  [k] path_lookupat
1.82%  make  [k] inode_has_perm.isra.32.constprop.61 <
1.57%  make  [k] copy_user_enhanced_fast_string
1.48%  make  [k] generic_permission
1.34%  make  [k] __audit_syscall_exit
1.31%  make  [k] kmem_cache_free
1.24%  make  [k] kmem_cache_alloc
1.20%  make  [k] generic_fillattr
1.12%  make  [k] __inode_permission
1.06%  make  [k] dput
1.04%  make  [k] strncpy_from_user
1.04%  make  [k] _raw_spin_lock
Total: 3.91 + 3.37 + 1.82 = 9.1%

My Changes:
8.54%  make  [k] __d_lookup_rcu
6.52%  make  [k] link_path_walk
3.93%  make  [k] inode_has_perm <
2.31%  make  [k] lookup_fast
2.05%  make  [k] system_call
1.79%  make  [k] path_lookupat
1.72%  make  [k] generic_permission
1.50%  make  [k] __audit_syscall_exit
1.49%  make  [k] selinux_inode_permission <
1.47%  make  [k] copy_user_enhanced_fast_string
1.28%  make  [k] __inode_permission
1.23%  make  [k] kmem_cache_alloc
1.19%  make  [k] _raw_spin_lock
1.15%  make  [k] lg_local_lock
1.10%  make  [k] strncpy_from_user
1.10%  make  [k] dput
1.08%  make  [k] kmem_cache_free
1.08%  make  [k] generic_fillattr
Total: 3.93 + 1.49 = 5.42

In inode_has_perm the big time takers are loading cred->sid and the
raw_seqcount_begin(inode->i_security_seccount).  I'm not certain how to
make either of those much faster.

In selinux_inode_permission() we spend time in getting current->cred and
in calling inode_has_perm().

Signed-off-by: Eric Paris 
---
 include/linux/fs.h  |  5 +++
 security/selinux/hooks.c| 62 ++---
 security/selinux/include/security.h |  1 +
 security/selinux/ss/services.c  |  5 +++
 4 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 43db02e..5268cf3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -535,6 +535,11 @@ struct inode {
struct address_space*i_mapping;
 
 #ifdef CONFIG_SECURITY
+   seqcount_t  i_security_seqcount;
+   u32 i_last_task_sid;
+   u32 i_last_granting;
+   u32 i_last_perms;
+   u32 i_audit_allow;
void*i_security;
 #endif
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index cfecb52..00dd6d9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -82,6 +82,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "avc.h"
@@ -207,6 +208,7 @@ static int inode_alloc_security(struct inode *inode)
if (!isec)
return -ENOMEM;
 
+   seqcount_init(&inode->i_security_seqcount);
mutex_init(&isec->lock);
INIT_LIST_HEAD(&isec->list);
isec->inode = inode;
@@ -1516,6 +1518,44 @@ static noinline int audit_inode_permission(struct inode 
*inode,
return 0;
 }
 
+static bool inode_has_perm_cached(u32 sid, struct inode *inode, u32 perms)
+{
+   unsigned seq;
+   u32 last_task_sid;
+   u32 last_perms;
+   u32 last_granting;
+
+   seq = raw_seqcount_begin(&inode->i_security_seqcount);
+   last_task_sid = inode->i_last_task_sid;
+   last_perms = inode->i_last_perms;
+   last_granting = inode->i_last_granting;
+
+   /* something changed, bail! */
+   if (read_seqcount_retry(&inode->i_security_seqcount, seq))
+   return false;
+
+   return sid == last_task_sid && (perms & last_perms) == perms &&
+  security_get_latest_granting() == last_granting;
+}
+
+static void inode_set_perm_cache(struct inode *inode, u32 task_sid, u32 perms,
+u32 granting, u32 audit_allow)
+{
+   spin_lock(&inode->i_lock);
+   write_seqcount_begin(&inode->i_security_seqcount);
+   if (inode->i_last_task_sid == task_sid &&
+   inode->i_la

Re: [RFC PATCH 1/2] selinux: merge selinux_inode_permission and inode_has_perm

2013-06-03 Thread Eric Paris
On Mon, 2013-06-03 at 14:59 -0400, Eric Paris wrote:
> selinux_inode_permission had some heavy lifting done to make it more
> performance polite.  But it still does largely the same thing as
> inode_has_perm.  So move that work into inode_has_perm and call
> inode_has_perm from selinux_inode_permission.
> 
> Signed-off-by: Eric Paris 
> ---
>  security/selinux/hooks.c | 92 
> ++--
>  1 file changed, 42 insertions(+), 50 deletions(-)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 5c6f2cd..cfecb52 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c

> @@ -1514,6 +1538,14 @@ static int inode_has_perm(const struct cred *cred,
>   sid = cred_sid(cred);
>   isec = inode->i_security;
>  
> + rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
> + audited = avc_audit_required(perms, &avd, rc, dontaudit, &denied);
> + if (likely(!audited))
> + return rc;
> +
> + rc2 = audit_inode_permission(inode, adp, perms, audited, denied, flags);
> + if (rc2)
> + return rc2;
>   return avc_has_perm_flags(sid, isec->sid, isec->sclass, perms, adp, 
> flags);
>  }
>  

Should just return rc, not avc_has_perm_flags().  I fixed that in the
2/2 patch and this should work just fine.  Kills a little performance,
but still works.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/2] SELinux: cache inode checks inside struct inode

2013-06-03 Thread Eric Paris
On Tue, 2013-06-04 at 06:31 +0900, Linus Torvalds wrote:
> 
> 
> On Mon, 3 Jun 2013, Eric Paris wrote:
> >  
> >  #ifdef CONFIG_SECURITY
> > + seqcount_t  i_security_seqcount;
> > + u32 i_last_task_sid;
> > + u32 i_last_granting;
> > + u32 i_last_perms;
> > + u32 i_audit_allow;
> >   void*i_security;
> >  #endif
> 
> This is much too big. I was really hoping for "another word that the 
> security layer can use" or similar.

Not sure how that can work   :-(

> Something this big would be acceptable if it would be a *generic* security 
> cache, and others could use it too, and would avoid ever actually calling 
> into any security layer at all (ie we could do the checks entirely at the 
> VFS layer). Then it would be fine. But for just the fact that SELinux is 
> too slow? No.

There is nothing about it that can't be VFS-erized.  The fields are:

readlockless way to get the data
which task was allowed
which perms were allowed
what generation of security policy allowed it
what perms should be forced to call security hook anyway

defining "perms" from a VFS PoV is hard.

doing any of this with 'stacking' is hard.  Then again, I'm only so so
on the value of stacking.  I've waffled a few times...

I can do it entirely inside selinux, but we are still going to have the
cache hit you were originally seeing as we dereference isec to get the
info


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/48] Add namespace support for audit

2013-06-11 Thread Eric Paris
On Tue, 2013-06-11 at 13:59 +0800, Gao feng wrote:
> On 06/11/2013 05:24 AM, Serge E. Hallyn wrote:
> > Quoting Gao feng (gaof...@cn.fujitsu.com):
> >> On 06/07/2013 06:47 AM, Serge Hallyn wrote:
> >>> Quoting Serge Hallyn (serge.hal...@ubuntu.com):
>  Quoting Gao feng (gaof...@cn.fujitsu.com):
> > On 05/07/2013 10:20 AM, Gao feng wrote:

> In my option, the audit rules(inode, tree_list, filter) , some of audit
> controller related resources(enabled,pid,portid...) and skb queue, audit
> netlink sockets,kauditd thread should be per-userns. The audit user message
> which generated by the user in container should be per-userns too.
> 
> Since netns is not implemented as a hierarchy, and the network related
> resources are not global. so network related audit message should be 
> per-userns too.
> 
> The security related audit message should be send to init user namespace
> as we discussed before. Maybe tty related audit message should be send
> to init user namespace too, I have no idea now.
> 
> The next step, I will post a new patchset which only make the audit user
> message and the basic audit resource per userns. I think this patchset
> will easy to be reviewed and accepted, And will not influence the host.
> This patchset contains the below patches:

I think this would be easier for us do from a certification and
doumentation PoV if we had an audit namespace, not tied to the user
namespace.  creating a new audit namespace should require
CAP_AUDIT_CONTROL in the user namespace which created the current audit
namespace.

Does that make sense?  I don't mind messages staying completely inside
the current namespace, but that means we can't give unpriv users (even
if they have priv in their user namespace) a new audit namespace...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: security_inode_init_security() inode field requirements

2013-03-01 Thread Eric Paris
SELinux has no maximum   :-(

Realistically there are a couple of interfaces that limit things to
4k, but labels on files on disk could be even larger than that!

255 will fit most every label, but not necessarily all of them.


I know ext4 on Fedora allocates inodes which left about 255 bytes for
selinux.selinux, but will place the xattr in another block if it
happens to be larger than 255.  This is rare, but certainly
possible

We use the inode->i_mode.

In debug/error path we use:
  inode->i_sb inode->i_no

We could use the parent dir sb instead of the new inode->i_sb.  We
don't have to print the i_no when we hit a failure, but it is just
about the only information that can help for debugging/figuring out
which file had a failure..

-Eric

On Fri, Mar 1, 2013 at 9:07 AM, Mimi Zohar  wrote:
> On Fri, 2013-03-01 at 13:11 +, Steven Whitehouse wrote:
>> Hi,
>>
>> On Fri, 2013-03-01 at 07:27 -0500, Mimi Zohar wrote:
>> > On Fri, 2013-03-01 at 10:12 +, Steven Whitehouse wrote:
>> > > Hi,
>> > >
>> > > I'm wondering whether there is a list somewhere of fields which
>> > > security_inode_init_security() requires are set in an inode when it is
>> > > called? In particular, does it matter if the inode number itself is
>> > > unset when security_inode_init_security() is called?
>> > >
>> > > The problem that I'm looking at is the use of multiple transactions
>> > > during inode creation when some combination of ACLs/LSMs/IMA are turned
>> > > on. What we have currently (in GFS2, there are other fs which follow
>> > > broadly the same solution though) is this:
>> > >
>> > > 1. Create inode in core
>> > > 2. Create inode on disk
>> > > 3. Add xattrs one at a time for ACLs/LSMs/IMA
>> > > 4. Link inode into directory
>> > >
>> > > Steps 2 through 4 require separate transactions at the moment. What I'd
>> > > like to do is to be able to get the details of the xattrs ahead of time
>> > > such that the allocation of the inode can be one and the same allocation
>> > > as that for the xattr blocks. That allows merging of the transactions
>> > > into one and a greatly simplified error path too. This would look
>> > > something like:
>> > >
>> > > 1. Create in-core inode and init required fields
>> > > 2. Get details of all xattrs for the inode
>> > > 3. Alloc on disk inode and blocks for xattrs as needed
>> > > 4. Link into directory
>> > >
>> > > In this case, steps 2 through 4 are within a single transaction. We
>> > > don't actually need to have the content of the xattrs ahead of
>> > > allocating the inode, just the length (or even just a max length, if it
>> > > is not too large). However the easiest solution would just be to move
>> > > the call to security_inode_init_security() to the point before we've
>> > > allocated the inode (and thus we don't know its number yet) but after
>> > > we've filled out all the remaining fields if that makes sense?
>> > >
>> > > So I just wanted to check whether that would break anything,
>> >
>> > Hi Steve,
>> >
>> > Included in security_inode_init_security() is the call to
>> > evm_inode_init_security() to write the 'security.evm' extended
>> > attribute.  'security.evm' is an HMAC of the security extended
>> > attributes and other file metadata, including the inode.  For an exact
>> > list of other metadata included in the HMAC calculation refer to
>> > hmac_add_misc().  (The UUID is being added to the HMAC calculation in
>> > this open window.)
>> >
>> > thanks,
>> >
>> > Mimi
>> >
>>
>> Ok... but it is using inode->i_ino in that case, as well as the
>> generation number too. So that presumably can only be done after the
>> inode has been allocated, since we need to know its location in order to
>> know its inode number. Also the generation number is assigned at
>> allocation time. One potential issue though... the inode->i_ino is an
>> unsigned long, so that on 32 bit archs, that will be (on GFS2) a
>> truncated version of the full 64 bit inode number. I'm not sure if that
>> matters or not for EVM.
>
> The reason for adding additional information to the HMAC calculation is
> to prevent a cut & paste attack, taking a valid 'security.evm' from one
> file and using it for another file.
>
>> So since that appears to rule out doing the security init bits ahead of
>> the allocation of the inode, is it possible to get a maximum value for
>> the size of the xattr which EVM will add? Likewise for the other LSMs
>> too?
>
> Although 'security.evm' can contain either an HMAC or a digital
> signature, new inodes are created with an HMAC, which is hardcoded as
> hmac(sha1).  On a running system, the existing 'security.evm' digital
> signatures are replaced with an HMAC.
>
> SELinux defines INITCONTEXTLEN as 255, which seems to be the maximum
> xattr length, but I'm not sure.  From the smack header:
>
> /*
>  * Smack labels were limited to 23 characters for a long time.
>  */
> #define SMK_LABELLEN24
> #define SMK_LONGLABEL   256
>
> thanks,
>
> Mimi
>
> --
> To unsubscribe f

Re: IMA: How to manage user space signing policy with others

2013-03-04 Thread Eric Paris
I think that is what he was suggesting.  It reuses the integrity code
but it loses the integrity flexibility.  I don't think it is a good
solution   :-(

On Mon, Mar 4, 2013 at 1:59 PM, Mimi Zohar  wrote:
> On Mon, 2013-03-04 at 10:29 -0500, Vivek Goyal wrote:
> [...]
>
>> Hi Mimi,
>>
>> If we decide to merge flags, then practically we modified the
>> ima_appraise_tcb policy. ima_appraise_tcb policy expects to cache the
>> results and we will not do that. And this conflict just grows if we
>> are forced to add more options in future.
>>
>> Also as you mentioned that in some cases flag merging is OR operation
>> and in another cases it might be AND operation. And we will most likely
>> end up hardcoding all this. I think slowly this is getting complicated
>> and as people add more complex rules things can quickly get out of hand.
>>
>> I am wondering that why are we trying to make multiple policies work
>> together. Can we try to keep it simple and say that at one point of
>> time only one policy can be effective. It could either be a built in
>> policy or user defined one. In fact that's how things are working right now.
>> User defined policy replaces built-in policy.
>>
>> For the sake of backward compatibility "ima_tcb" and "ima_appraise_tcb"
>> can co-exist together (like today). But ima_secureboot_policy will not
>> be compatible with other policies. I understand that there might be a
>> desire to use multiple policies together down the line, but I guess in
>> that case policies need to specified using "policy" interface. And
>> ima_secureboot will be odd man out here as it can not trust the root
>> to specify policy. So practically ima_secureboot will be disabled.
>>
>> We just have to provide an IMA interface so that caller can query what's
>> the effective policy currently. Say, IMA_POLICY_SECUREBOOT,
>> IMA_POLICY_TCB, or IMA_POLICY_USER. Caller of the bprm_check() or
>> bprm_post_load() can also check for current policy in force and give
>> CAP_SIGNED only if desired policy is in effect.
>>
>> This reduces our options but trying to make multiple policies co-exist
>> together is just making it complicated. We can take it up again when
>> somebody has a strong use case of using secureboot policy along with
>> other policies. In fact a user can still define a custom policy which
>> is mix of multiple policies. Just that it is not compatible with
>> "secureboot" policy because for that we can't trust "root" to define
>> policy.
>
> Let me get this straight.  You're suggesting that distros/users will
> need to make a Kconfig build decision of enabling secureboot, including
> the secureboot built-in policy, or be allowed to enable other integrity
> policies.  If RH enables secureboot, then no other integrity policy will
> be permitted.  Is that what you're saying, and if so, why would I agree
> to this?
>
> thanks,
>
> Mimi
>
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [BZ905179] audit: omit check for uid and gid validity in audit rules and data

2013-05-12 Thread Eric Paris
On Thu, 2013-05-09 at 09:29 -0400, Steve Grubb wrote:
> On Tuesday, April 16, 2013 03:38:23 PM Richard Guy Briggs wrote:
> > On Tue, Apr 09, 2013 at 02:39:32AM -0700, Eric W. Biederman wrote:
> > > Andrew Morton  writes:
> > > > On Wed, 20 Mar 2013 15:18:17 -0400 Richard Guy Briggs  
> wrote:
> > > >> audit rule additions containing "-F auid!=4294967295" were failing with
> > > >> EINVAL.> 
> > > The only case where this appears to make the least little bit of sense
> > > is if the goal of the test is to test to see if an audit logloginuid
> > > has been set at all.  In which case depending on a test against
> > > 4294967295 is bogus because you are depending on an intimate internal
> > > kernel implementation detail.
> > > 
> > > How about something like my untested patch below that add an explicit
> > > operation to test if loginuid has been set?
> > 
> > Sorry for the delay in testing this, I had another urgent bug and a
> > belligerent test box...
> > 
> > I like this approach better than mine now that I understand it.  I've
> > tested the patch below without any changes.  It works as expected with
> > my previous test case.  I don't know if a Signed-off-by: is appropriate
> > for me in this case, but I'll throw in a:
> > 
> > Tested-by: Richard Guy Briggs 
> > 
> > and recommend a:
> > 
> > Reported-By: Steve Grubb 
> 
> If this is the approved patch, can it be put in stable? The audit system 
> hasn't worked as intended since January.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/auditsc.c?id=780a7654cee8d61819512385e778e4827db4bfbc

Should be queued for 3.7 and later now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the akpm tree with Linus' tree

2013-05-12 Thread Eric Paris
On Mon, 2013-05-13 at 12:07 +1000, Stephen Rothwell wrote:
> Hi Andrew,
> 
> Today's linux-next merge of the akpm tree got a conflict in
> kernel/auditsc.c between commit b24a30a73054 ("audit: fix event coverage
> of AUDIT_ANOM_LINK") from Linus' tree and commit "audit: fix mq_open and
> mq_unlink to add the MQ root as a hidden parent audit_names record" from
> the akpm tree.

Actually, I've already picked the patch up for 3.11.  So Andrew, you can
drop it.

> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).
> 
> BTW, commit b24a30a73054 from Linus' tree has Eric Paris as Author and
> Committer, but is only Signed-off-by Kees Cook.  It is part of a long
> series that did not go anywhere near linus-next.   I do have an audit
> tree in linux-next
> (git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit.git#for-next)
> but that hasn't seen any recent activity.

I thought I sent you a note asking for audit to get pulled into -next
quite a while back.  I'll resend...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: linux-next: manual merge of the akpm tree with Linus' tree

2013-05-12 Thread Eric Paris



-Original Message-
From: Kees Cook [keesc...@chromium.org]
Received: Monday, 13 May 2013, 12:49am
To: Eric Paris [epa...@redhat.com]
CC: Stephen Rothwell [s...@canb.auug.org.au]; Andrew Morton 
[a...@linux-foundation.org]; Linus [torva...@linux-foundation.org]; Linux-Next 
[linux-n...@vger.kernel.org]; LKML [linux-kernel@vger.kernel.org]; Jeff Layton 
[jlay...@redhat.com]; Al Viro [v...@zeniv.linux.org.uk]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree


On Sun, May 12, 2013 at 7:11 PM, Eric Paris  wrote:
> On Mon, 2013-05-13 at 12:07 +1000, Stephen Rothwell wrote:
>> Hi Andrew,
>>
>> Today's linux-next merge of the akpm tree got a conflict in
>> kernel/auditsc.c between commit b24a30a73054 ("audit: fix event coverage
>> of AUDIT_ANOM_LINK") from Linus' tree and commit "audit: fix mq_open and
>> mq_unlink to add the MQ root as a hidden parent audit_names record" from
>> the akpm tree.
>
> Actually, I've already picked the patch up for 3.11.  So Andrew, you can
> drop it.
>
>> I fixed it up (see below) and can carry the fix as necessary (no action
>> is required).
>>
>> BTW, commit b24a30a73054 from Linus' tree has Eric Paris as Author and
>> Committer, but is only Signed-off-by Kees Cook.  It is part of a long
>> series that did not go anywhere near linus-next.   I do have an audit
>> tree in linux-next
>> (git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit.git#for-next)
>> but that hasn't seen any recent activity.
>
> I thought I sent you a note asking for audit to get pulled into -next
> quite a while back.  I'll resend...
>

Hrm, how did the Author get mangled?

-Kees

--
Kees Cook
Chrome OS Security


RE: linux-next: manual merge of the akpm tree with Linus' tree

2013-05-12 Thread Eric Paris

-Original Message-
From: Kees Cook [keesc...@chromium.org]
Received: Monday, 13 May 2013, 12:49am
To: Eric Paris [epa...@redhat.com]
CC: Stephen Rothwell [s...@canb.auug.org.au]; Andrew Morton 
[a...@linux-foundation.org]; Linus [torva...@linux-foundation.org]; Linux-Next 
[linux-n...@vger.kernel.org]; LKML [linux-kernel@vger.kernel.org]; Jeff Layton 
[jlay...@redhat.com]; Al Viro [v...@zeniv.linux.org.uk]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Sun, May 12, 2013 at 7:11 PM, Eric Paris  wrote:
> On Mon, 2013-05-13 at 12:07 +1000, Stephen Rothwell wrote:
>> Hi Andrew,
>>
>> Today's linux-next merge of the akpm tree got a conflict in
>> kernel/auditsc.c between commit b24a30a73054 ("audit: fix event coverage
>> of AUDIT_ANOM_LINK") from Linus' tree and commit "audit: fix mq_open and
>> mq_unlink to add the MQ root as a hidden parent audit_names record" from
>> the akpm tree.
>
> Actually, I've already picked the patch up for 3.11.  So Andrew, you can
> drop it.
>
>> I fixed it up (see below) and can carry the fix as necessary (no action
>> is required).
>>
>> BTW, commit b24a30a73054 from Linus' tree has Eric Paris as Author and
>> Committer, but is only Signed-off-by Kees Cook.  It is part of a long
>> series that did not go anywhere near linus-next.   I do have an audit
>> tree in linux-next
>> (git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit.git#for-next)
>> but that hasn't seen any recent activity.
>
> I thought I sent you a note asking for audit to get pulled into -next
> quite a while back.  I'll resend...
>

Hrm, how did the Author get mangled?


I remember it having a conflict when I tried to merge it (someone else had 
changed the same area of the header file). So I used patch -p1 and fixed up the 
reject by hand. I wonder if I screwed up and used git commit -a instead of git 
am --resolved?   That is 2 things I should have caught on final review I 
missed.  :-(

Now to wait for everything else I screwed up...


[PATCH] fork: reorder permissions when violating number of processes limits

2013-05-14 Thread Eric Paris
When a task is attempting to violate the RLIMIT_NPROC limit we have a
check to see if the task is sufficiently priviledged.  The check first
looks at CAP_SYS_ADMIN, then CAP_SYS_RESOURCE, then if the task is
uid=0.  A result is that tasks which are allowed by the uid=0 check are
first checked against the security subsystem.  This results in the
security subsystem auditting a denial for sys_admin and sys_resource and
then the task passing the uid=0 check.  This patch rearrainges the code
to first check uid=0, since if we pass that we shouldn't hit the
security system at all.  We then check sys_resource, since it is the
smallest capability which will solve the problem.  Lastly we check the
fallback everything cap_sysadmin.  We don't want to give this capability
many places since it is so powerful.  This will eliminate many of the
false positive/needless denial messages we get when a root task tries to
violate the nproc limit.  (note that kthreads count against root, so on
a sufficiently large machine we can actually get past the default limits
before any userspace tasks are launched.)

Signed-off-by: Eric Paris 
---
 kernel/fork.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 987b28a..09dbda3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1199,8 +1199,8 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
retval = -EAGAIN;
if (atomic_read(&p->real_cred->user->processes) >=
task_rlimit(p, RLIMIT_NPROC)) {
-   if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
-   p->real_cred->user != INIT_USER)
+   if (p->real_cred->user != INIT_USER &&
+   !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_free;
}
current->flags &= ~PF_NPROC_EXCEEDED;
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build warning after merge of the final tree (in Linus' tree)

2013-05-15 Thread Eric Paris
On Wed, 2013-05-15 at 13:20 +1000, Stephen Rothwell wrote:
> Hi ,
> 
> After merging the final tree, today's linux-next build (i386 defconfig)
> produced this warning:
> 
> kernel/auditfilter.c: In function 'audit_data_to_entry':
> kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned only 
> in ISO C90 [enabled by default]
> 
> Introduced by commit 780a7654cee8 ("audit: Make testing for a valid
> loginuid explicit") from Linus' tree.

Thank you, I'll fix it up.  What am I likely missing that I don't see it
on my builds?  I'm using gcc 4.8.  Is there a config option that enables
additional warnings?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] powerpc: Optimise 64bit syscall auditing entry path

2013-04-10 Thread Eric Paris
Patches 1 and 2 I applied for 3.10, but I'd really like to have someone who 
knows PPC ack 3 and 4.  Especially if there is a hope that it goes through my 
tree...

Link to original messages for your ease of review...

http://marc.info/?l=linux-kernel&m=135768892320439&w=2
http://marc.info/?l=linux-kernel&m=135768895320472&w=2

-Eric

- Original Message -
> 
> Add an assembly fast path for the syscall audit entry path on
> 64bit. Some distros enable auditing by default which forces us
> through the syscall auditing path even if there are no rules.
> 
> I wrote some test cases to validate the patch:
> 
> http://ozlabs.org/~anton/junkcode/audit_tests.tar.gz
> 
> And to test the performance I ran a simple null syscall
> microbenchmark on a POWER7 box:
> 
> http://ozlabs.org/~anton/junkcode/null_syscall.c
> 
> Baseline: 949.2 cycles
> Patched:  920.6 cycles
> 
> An improvement of 3%. Most of the potential gains are masked by
> the syscall audit exit path which will be fixed in a
> subsequent patch.
> 
> Signed-off-by: Anton Blanchard 
> ---
> 
> Index: b/arch/powerpc/kernel/entry_64.S
> ===
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -34,6 +34,12 @@
>  #include 
>  #include 
>  
> +/* Avoid __ASSEMBLER__'ifying  just for this.  */
> +#include 
> +#define AUDIT_ARCH_PPC   (EM_PPC)
> +#define AUDIT_ARCH_PPC64 (EM_PPC64|__AUDIT_ARCH_64BIT)
> +#define __AUDIT_ARCH_64BIT 0x8000
> +
>  /*
>   * System calls.
>   */
> @@ -244,6 +250,10 @@ syscall_error:
>   
>  /* Traced system call support */
>  syscall_dotrace:
> +#ifdef CONFIG_AUDITSYSCALL
> + andi.   r11,r10,(_TIF_SYSCALL_T_OR_A & ~_TIF_SYSCALL_AUDIT)
> + beq audit_entry
> +#endif
>   bl  .save_nvgprs
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  .do_syscall_trace_enter
> @@ -253,6 +263,7 @@ syscall_dotrace:
>* for the call number to look up in the table (r0).
>*/
>   mr  r0,r3
> +.Laudit_entry_return:
>   ld  r3,GPR3(r1)
>   ld  r4,GPR4(r1)
>   ld  r5,GPR5(r1)
> @@ -264,6 +275,34 @@ syscall_dotrace:
>   ld  r10,TI_FLAGS(r10)
>   b   .Lsyscall_dotrace_cont
>  
> +#ifdef CONFIG_AUDITSYSCALL
> +audit_entry:
> + ld  r4,GPR0(r1)
> + ld  r5,GPR3(r1)
> + ld  r6,GPR4(r1)
> + ld  r7,GPR5(r1)
> + ld  r8,GPR6(r1)
> +
> + andi.   r11,r10,_TIF_32BIT
> + beq 1f
> +
> + lis r3,AUDIT_ARCH_PPC@h
> + ori r3,r3,AUDIT_ARCH_PPC@l
> + clrldi  r5,r5,32
> + clrldi  r6,r6,32
> + clrldi  r7,r7,32
> + clrldi  r8,r8,32
> + bl  .__audit_syscall_entry
> + ld  r0,GPR0(r1)
> + b   .Laudit_entry_return
> +
> +1:   lis r3,AUDIT_ARCH_PPC64@h
> + ori r3,r3,AUDIT_ARCH_PPC64@l
> + bl  .__audit_syscall_entry
> + ld  r0,GPR0(r1)
> + b   .Laudit_entry_return
> +#endif
> +
>  syscall_enosys:
>   li  r3,-ENOSYS
>   b   syscall_exit
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -next] audit: remove duplicated include from audit.c

2013-04-10 Thread Eric Paris
net/netlink.h includes linux/netlink.h but linux/netlink.h does not include 
net/netlink.h

since the audit code uses the nlmsg_* functions provided in net/netlink.h it 
seems we need this include.

since the audit code uses netlink_unicast provided in linux/netlink.h it seems 
we need that include.

I don't see duplication even if by chance one of them would have included the 
other...

- Original Message -
> From: Wei Yongjun 
> 
> Remove duplicated include.
> 
> Signed-off-by: Wei Yongjun 
> ---
>  kernel/audit.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 488f85f..9377913 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -58,7 +58,6 @@
>  #ifdef CONFIG_SECURITY
>  #include 
>  #endif
> -#include 
>  #include 
>  #include 
>  #include 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: audit: beautify code, for extern function, better to check its parameters by itself

2013-04-10 Thread Eric Paris
- Original Message -
> 
>   __audit_socketcall is an extern function.
>   better to check its parameters by itself.
> 
> also can return error code, when fail (find invalid parameters).
> also use macro instead of real hard code number
> also give related comments for it.
> 
> Signed-off-by: Chen Gang 
> ---
>  include/linux/audit.h |   12 
>  kernel/auditsc.c  |9 ++---
>  net/socket.c  |6 --
>  3 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h

> @@ -354,7 +358,7 @@ static inline int audit_bprm(struct linux_binprm *bprm)
>  {
>   return 0;
>  }
> -static inline void audit_socketcall(int nargs, unsigned long *args)
> +static inline int audit_socketcall(int nargs, unsigned long *args)
>  { }
>  static inline void audit_fd_pair(int fd1, int fd2)
>  { }

This now returns a value but you forgot to return a value.  Thus this would not 
even build...   I fixed it up myself.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs

2013-04-10 Thread Eric Paris
We only allow one filter key per rule.  So we should never be able to get into 
this situation.  See audit_data_to_entry()

-Eric

- Original Message -
> 
>   in the 'fcount' looping,
> if 'new->fields[*].type" has 2 or more AUDIT_FILTERKEYs
> need judge new->filterkey whether has value, or memory leak.
> 
> Signed-off-by: Chen Gang 
> ---
>  kernel/auditfilter.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index f9fc54b..936ac79 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -859,6 +859,8 @@ struct audit_entry *audit_dupe_rule(struct audit_krule
> *old)
>  &old->fields[i]);
>   break;
>   case AUDIT_FILTERKEY:
> + if (new->filterkey)
> + break;
>   fk = kstrdup(old->filterkey, GFP_KERNEL);
>   if (unlikely(!fk))
>   err = -ENOMEM;
> --
> 1.7.7.6
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs

2013-04-10 Thread Eric Paris
- Original Message -
> 
> 
> in another function: audit_data_to_entry:
> 
>   a. has the same issue for case AUDIT_WATCH.

You are saying if there were 2 of them it will leak the old one?  No.  If you 
have 2 AUDIT_WATCH entries the first one will set entry->rule->watch and the 
second will bomb with EINVAL in audit_to_watch()
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs

2013-04-10 Thread Eric Paris
- Original Message -

>   b. has an new issue for AUDIT_DIR:
>after AUDIT_DIR succeed, it will set rule->tree.
>next, the other case fail, then will call audit_free_rule.
>but audit_free_rule will not free rule->tree.

Definitely a couple of leaks here...

I'm seeing leaks on size 8, 64, and 128.

Al, what do you think?  Should I be calling audit_put_tree() in the error case 
if entry->tree != NULL?  The audit trees are some of the most complex code in 
the kernel I think.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs

2013-04-10 Thread Eric Paris
- Original Message -
> 
>   also for function audit_list:
> when call audit_make_reply fails (will return NULL).
> we need free all its related variables instead of only kfree rull.
>   (such as call autit_free_rule)
> 
>   please help check, thanks.

audit_free_rule() takes a struct audit_entry, not an audit_rule.  struct 
audit_rule does not have additional things which need to be freed...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs

2013-04-10 Thread Eric Paris
- Original Message -
> 
>   also for function audit_list_rules:
> when call audit_make_reply fails (will return NULL).
> we also need process data->buf, not only data itself.
> 
>   please help check, thanks.

struct audit_rule_data {
[...]
charbuf[0]; /* string fields buffer */
};

The last element in the struct is 0 length.  But the allocation in 
audit_krule_to_data() looks like:

data = kmalloc(sizeof(*data) + krule->buflen, GFP_KERNEL);

So now data->buf appears as an allocation of size krule->buflen.

We do not need to free it separately.  This is a pretty common C trick.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/48] Add namespace support for audit

2013-05-08 Thread Eric Paris
What kernel are these patches against?


On Tue, 2013-05-07 at 10:20 +0800, Gao feng wrote:
> This patchset try to add namespace support for audit.
> 
> I choose to assign audit to the user namespace.
> Right now,there are six kinds of namespaces, such as
> net, mount, ipc, pid, uts and user. the first five
> namespaces have special usage. the audit isn't suitable to
> belong to these five namespaces, so the user namespace
> may be the best choice.
> 
> Through I decide to make audit related resources per user
> namespace, but audit uses netlink to communicate between kernel
> space and user space, and the netlink is a private resource
> of per net namespace. So we need the capability to allow the
> netlink sockets to communicate with each other in the same user
> namespace even they are in different net namespace. [PATCH 2/48]
> does this job, it adds a new function "compare" for per netlink
> table to compare two sockets. it means the netlink protocols can
> has its own compare fuction, For other protocols, two netlink
> sockets are different if they belong to the different net namespace.
> For audit protocol, two sockets can be the same even they in different
> net namespace,we use user namespace not net namespace to make the
> decision.
> 
> There is one point that some people may dislike,in [PATCH 1/48],
> the kernel side audit netlink socket is created only when we create
> the first netns for the userns, and this userns will hold the netns
> until we destroy this userns.
> 
> The other patches just make the audit related resources per
> user namespace.
> 
> This patchset is sent as an RFC,any comments are welcome.
> 
> Gao feng (48):
>   Audit: make audit kernel side netlink sock per userns
>   netlink: Add compare function for netlink_table
>   Audit: implement audit self-defined compare function
>   Audit: make audit_skb_queue per user namespace
>   Audit: make audit_skb_hold_queue per user namespace
>   Audit: make kauditd_task per user namespace
>   Audit: make audit_pid per user namespace
>   Audit: make audit_nlk_portid per user namesapce
>   Audit: make audit_enabled per user namespace
>   Audit: change type of audit_ever_enabled to bool
>   Audit: make audit_ever_enabled per user namespace
>   Audit: make audit_initialized per user namespace
>   Audit: only allow init user namespace to change audit_rate_limit
>   Audit: only allow init user namespace to change audit_failure
>   Audit: allow to send netlink message to auditd in uninit user
> namespace
>   Audit: user proper user namespace in audit_log_config_change
>   Audit: make kauditd_wait per user namespace
>   Audit: make audit_backlog_wait per user namespace
>   Audit: remove duplicate comments
>   Audit: introduce new audit logging interface for user namespace
>   Audit: pass proper user namespace to audit_log_common_recv_msg
>   Audit: Log audit config change in uninit user namespace
>   Audit: netfilter: Log xt table replace behavior in proper user
> namespace
>   Audit: xt_AUDIT: Log audit message in proper user namespace
>   Audit: send reply message to the auditd in proper user namespace
>   Audit: make audit_inode_hash per user namespace
>   Audit: make tree_list per user namespace
>   Audit: make audit filter list per user namespace
>   Audit: make audit_krule belongs to user namespace
>   Audit: reply audit filter list request to proper user namespace
>   Audit: pass proper user namespace to audit_filter_syscall
>   Audit: pass proper user namespace to audit_filter_inode_name
>   Audit: Log filter related audit message to proper user namespace
>   Log audit tree related message in proper user namespace
>   Audit: Log task related audit message to proper user namespace
>   Audit: Log watch related audit message to proper user namespace
>   Audit: translate audit_log_start to audit_log_start_ns
>   Audit: tty: translate audit_log_start to audit_log_start_ns
>   Audit: netlabel: translate audit_log_start to audit_log_start_ns
>   Audit: ima: translate audit_log_start to audit_log_start_ns
>   Audit: lsm: translate audit_log_start to audit_log_start_ns
>   Audit: selinux: translate audit_log_start to audit_log_start_ns
>   Audit: xfrm: translate audit_log_start to audit_log_start_ns
>   Audit: rename audit_log_start_ns to audit_log_start
>   Audit: user audit_enabled_ns to replace audit_enabled
>   Audit: rename audit_enabled_ns to audit_enabled
>   Audit: make audit_log user namespace awared
>   Audit: allow root user of un-init user namespace to set audit
> 
>  drivers/tty/tty_audit.c |   9 +-
>  include/linux/audit.h   |  44 ++--
>  include/linux/netlink.h |   1 +
>  include/linux/user_namespace.h  |  25 +++
>  include/net/xfrm.h  |   7 +-
>  kernel/audit.c  | 393 
> +---
>  kernel/audit.h  |  24 +--
>  kernel/audit_tree.c |  49 ++---
>  kernel/audit_watch.c|  23

Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs

2013-04-11 Thread Eric Paris
- Original Message -
> On 2013年04月11日 05:19, Eric Paris wrote:
> > - Original Message -
> > 
> >> >   b. has an new issue for AUDIT_DIR:
> >> >after AUDIT_DIR succeed, it will set rule->tree.
> >> >next, the other case fail, then will call audit_free_rule.
> >> >but audit_free_rule will not free rule->tree.
> > Definitely a couple of leaks here...
> > 
> > I'm seeing leaks on size 8, 64, and 128.
> > 
> > Al, what do you think?  Should I be calling audit_put_tree() in the error
> > case if entry->tree != NULL?  The audit trees are some of the most complex
> > code in the kernel I think.
> > 
> > 
> 
>   can we add it in audit_free_rule ?
> 
>   maybe like this:
> 
> @@ -75,6 +75,8 @@ static inline void audit_free_rule(struct audit_entry *e)
>   /* some rules don't have associated watches */
>   if (erule->watch)
>   audit_put_watch(erule->watch);
> + if (erule->tree)
> + audit_put_tree(erule->tree);
>   if (erule->fields)
>   for (i = 0; i < erule->field_count; i++) {
>   struct audit_field *f = &erule->fields[i];

Where does the tree information get freed normally?  That's the code you need 
to run down.  You don't want to start getting double frees on the non-error 
case.  I'll try to dig into it if Al doesn't.  It's easy to show the leak on 
current kernels.

while(1)
auditctl -a exit,always -w /etc -F auid=-1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -next] audit: remove duplicated include from audit.c

2013-04-12 Thread Eric Paris
Came in via the net-next tree from commit 
941912133025926307c7a65b203fa38403b1063a

I'll ping dmiller and friends.

- Original Message -
> Hi Eric,
> 
> Sorry reply on top.
> 
> From the source code from linex-next.git tree, line 55~64:
> 
> #include 
> #include ***
> #include 
> #ifdef CONFIG_SECURITY
> #include 
> #endif
> #include ***
> #include 
> #include 
> #include 
> 
> net/netlink.h is included twice, and linux/netlink.h is not included by this
> file.
> 
> -Original Message-
> From: Eric Paris [mailto:epa...@redhat.com]
> Sent: 2013年4月11日 1:26
> To: Wei Yongjun
> Cc: v...@zeniv.linux.org.uk; Yongjun Wei (RD-CN);
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH -next] audit: remove duplicated include from audit.c
> 
> net/netlink.h includes linux/netlink.h but linux/netlink.h does not include
> net/netlink.h
> 
> since the audit code uses the nlmsg_* functions provided in net/netlink.h it
> seems we need this include.
> 
> since the audit code uses netlink_unicast provided in linux/netlink.h it
> seems we need that include.
> 
> I don't see duplication even if by chance one of them would have included the
> other...
> 
> - Original Message -
> > From: Wei Yongjun 
> > 
> > Remove duplicated include.
> > 
> > Signed-off-by: Wei Yongjun 
> > ---
> >  kernel/audit.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/audit.c b/kernel/audit.c index 488f85f..9377913
> > 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -58,7 +58,6 @@
> >  #ifdef CONFIG_SECURITY
> >  #include 
> >  #endif
> > -#include 
> >  #include 
> >  #include 
> >  #include 
> > 
> > 
> 
> 
> TREND MICRO EMAIL NOTICE
> The information contained in this email and any attachments is confidential
> and may be subject to copyright or other intellectual property protection.
> If you are not the intended recipient, you are not authorized to use or
> disclose this information, and we request that you notify us by reply mail or
> telephone and delete the original message from your mail system.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] audit: replace obsolete NLMSG_* with type safe nlmsg_*

2013-04-12 Thread Eric Paris
On Wed, Mar 27, 2013 at 12:49 PM, Hong Zhiguo  wrote:
> Signed-off-by: Hong Zhiguo 
> ---
>  kernel/audit.c |   10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index d596e53..4dbb047 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -58,7 +58,7 @@
>  #ifdef CONFIG_SECURITY
>  #include 
>  #endif
> -#include 
> +#include 

This is a duplicate include.  We already included net/netlink.h a
couple of lines above.  We use netlink_unicast, which comes from
linux/netlink.h.  I recognize that it comes in from net/netlink.h, but
I thought it good practice to pull it in ourselves.  Should I send a
separate patch to revert back to linux/netlink.h or a patch to just
remove the include altogether?

-Eric

>  #include 
>  #include 
>  #include 
> @@ -910,7 +910,7 @@ static void audit_receive_skb(struct sk_buff *skb)
>  {
> struct nlmsghdr *nlh;
> /*
> -* len MUST be signed for NLMSG_NEXT to be able to dec it below 0
> +* len MUST be signed for nlmsg_next to be able to dec it below 0
>  * if the nlmsg_len was not aligned
>  */
> int len;
> @@ -919,13 +919,13 @@ static void audit_receive_skb(struct sk_buff *skb)
> nlh = nlmsg_hdr(skb);
> len = skb->len;
>
> -   while (NLMSG_OK(nlh, len)) {
> +   while (nlmsg_ok(nlh, len)) {
> err = audit_receive_msg(skb, nlh);
> /* if err or if this message says it wants a response */
> if (err || (nlh->nlmsg_flags & NLM_F_ACK))
> netlink_ack(skb, nlh, err);
>
> -   nlh = NLMSG_NEXT(nlh, len);
> +   nlh = nlmsg_next(nlh, len);
> }
>  }
>
> @@ -1483,7 +1483,7 @@ void audit_log_end(struct audit_buffer *ab)
> audit_log_lost("rate limit exceeded");
> } else {
> struct nlmsghdr *nlh = nlmsg_hdr(ab->skb);
> -   nlh->nlmsg_len = ab->skb->len - NLMSG_SPACE(0);
> +   nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN;
>
> if (audit_pid) {
> skb_queue_tail(&audit_skb_queue, ab->skb);
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build warning after merge of the final tree (in Linus' tree)

2013-05-17 Thread Eric Paris
On Fri, 2013-05-17 at 10:47 +0530, Viresh Kumar wrote:
> On Wed, May 15, 2013 at 7:02 PM, Eric Paris  wrote:
> > On Wed, 2013-05-15 at 13:20 +1000, Stephen Rothwell wrote:
> >> Hi ,
> >>
> >> After merging the final tree, today's linux-next build (i386 defconfig)
> >> produced this warning:
> >>
> >> kernel/auditfilter.c: In function 'audit_data_to_entry':
> >> kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned 
> >> only in ISO C90 [enabled by default]
> >>
> >> Introduced by commit 780a7654cee8 ("audit: Make testing for a valid
> >> loginuid explicit") from Linus' tree.
> >
> > Thank you, I'll fix it up.  What am I likely missing that I don't see it
> > on my builds?  I'm using gcc 4.8.  Is there a config option that enables
> > additional warnings?
> 
> Is  this fixed? I couldn't find a relevant patch in linux-next/master for it.

Something should be posted by the end of the day.  Sorry for the delay.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in audit_copy_inode

2012-08-02 Thread Eric Paris
I believe this was already found and fixed:

https://lkml.org/lkml/2012/7/25/259

Which was pulled by Linus in:

3134f37e931d75931bdf6d4eacd82a3fd26eca7c

-Eric

On Wed, 2012-08-01 at 18:11 +0200, Miklos Szeredi wrote:
> Hi Peter,
> 
> Thanks for the report.
> 
> Here's a patch.  I haven't tested it but I'm pretty confident that it
> fixes the bug.
> 
> Thanks,
> Miklos
> 
> 
> Subject: vfs: fix audit_inode on negative dentry
> From: Miklos Szeredi 
> 
> Peter Moody reported an oops in audit_copy_inode() and bisected it to commit
> 7157486541 (vfs: do_last(): common slow lookup).
> 
> The problem is that audit_inode() in do_last() is called with a negative 
> dentry.
> 
> Previously the non-O_CREAT case didn't call audit_inode() here, but now both
> O_CREAT and non-O_CREAT opens are handled by the same code.
> 
> I really have no idea why this audit_inode() is needed here at all but am 
> afaid
> to remove this for fear of breaking audit somehow.  So just fix this case by
> checking for a negative dentry.
> 
> Reported-by: Peter Moody 
> Signed-off-by: Miklos Szeredi 
> CC: sta...@vger.kernel.org
> ---
>  fs/namei.c |   10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/fs/namei.c
> ===
> --- linux-2.6.orig/fs/namei.c 2012-08-01 17:47:28.0 +0200
> +++ linux-2.6/fs/namei.c  2012-08-01 17:49:26.0 +0200
> @@ -2607,10 +2607,12 @@ static int do_last(struct nameidata *nd,
>   goto finish_open_created;
>   }
>  
> - /*
> -  * It already exists.
> -  */
> - audit_inode(pathname, path->dentry);
> + if (path->dentry->d_inode) {
> + /*
> +  * It already exists.
> +  */
> + audit_inode(pathname, path->dentry);
> + }
>  
>   /*
>* If atomic_open() acquired write access it is dropped now due to
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fix sel_netnode_insert suspicious rcu dereference.

2012-08-06 Thread Eric Paris
I thought I HAD committed it months ago!

Man I suck at this

On Mon, Aug 6, 2012 at 3:10 PM, Paul Moore  wrote:
> On Mon, Aug 6, 2012 at 12:49 PM, Dave Jones  wrote:
>> On Tue, Jun 05, 2012 at 01:12:39AM -0400, Dave Jones wrote:
>>  > I reported this a year ago (https://lkml.org/lkml/2011/4/20/308).
>>  > It's still a problem apparently ...
>>
>> And another two months pass in silence.
>>
>> This is happening to other people too.
>> https://bugzilla.redhat.com/show_bug.cgi?id=846037
>>
>> Can someone please apply this patch, or at least point out what's wrong with 
>> it ?
>
> I thought Eric had applied it a few months ago, but I just checked and
> I don't see it in Linus' tree for some reason?  I know multiple
> patches have been posted from different authors, all fixing the same
> thing ...
>
> Acked-by: Paul Moore 
>
> --
> paul moore
> www.paul-moore.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] audit: Use a tracepoint for getname

2012-09-20 Thread Eric Paris
cc'ing Jeff Layton who has recently done a lot of getname work and I
want to make sure he sees this.


On Wed, 19 Sep 2012 15:56:59 -0700
Arnaldo Carvalho de Melo  wrote:

> Al, Eric,
> 
>   Was this considered before? Acceptable?
> 
> - Arnaldo
> 
> ---
> 
> Instead of an explicit hook only for audit, use a tracepoint, so that
> other users that need to know about filenames can hook there just like
> audit.

> @@ -978,6 +986,9 @@ static int __init audit_init(void)
>   else
>   audit_sock->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
>  
> + if (register_trace_getname(audit_getname, NULL))
> + audit_panic("cannot register getname tracepoint");
> +
>   skb_queue_head_init(&audit_skb_queue);
>   skb_queue_head_init(&audit_skb_hold_queue);
>   audit_initialized = AUDIT_INITIALIZED;

I think we need to just use panic instead of audit_panic.  This early
at boot userspace would not have been able to tell the kernel that
audit_panic == panic nor would the box die later if userspace ask for
that functionality.  Instead the box would run but audit would be
broken, which customers who want audit_panic == panic would be most
upset about.

Otherwise, its good to me.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Paris
On Wed, Aug 8, 2012 at 3:38 PM, Eric Dumazet  wrote:
> On Wed, 2012-08-08 at 15:26 -0400, Paul Moore wrote:

> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index ba39a52..027a331 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1524,6 +1524,10 @@ void ip_send_unicast_reply(struct net *net, struct 
> sk_buff *skb, __be32 daddr,
> sk->sk_priority = skb->priority;
> sk->sk_protocol = ip_hdr(skb)->protocol;
> sk->sk_bound_dev_if = arg->bound_dev_if;
> +#ifdef CONFIG_SECURITY
> +   if (!sk->sk_security && security_sk_alloc(sk, PF_INET, GFP_ATOMIC))
> +   goto out;
> +#endif
> sock_net_set(sk, net);
> __skb_queue_head_init(&sk->sk_write_queue);
> sk->sk_sndbuf = sysctl_wmem_default;
> @@ -1539,7 +1543,7 @@ void ip_send_unicast_reply(struct net *net, struct 
> sk_buff *skb, __be32 daddr,
> skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb));
> ip_push_pending_frames(sk, &fl4);
> }
> -
> +out:
> put_cpu_var(unicast_sock);
>
> ip_rt_put(rt);

Seems wrong.  We shouldn't ever need ifdef CONFIG_SECURITY in core
code.  Ifndef CONF_SECURITY then security_sk_alloc() is a static
inline return 0;   I guess the question is "Where did the sk come
from"?  Why wasn't security_sk_alloc() called when it was allocated?
Should it have been updated at some time and that wasn't done either?
Seems wrong to be putting packets on the queue for a socket where the
security data was never allocated and was never set to its proper
state.

there must be a bigger bug here...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Paris
On Wed, Aug 8, 2012 at 4:35 PM, Paul Moore  wrote:
> On Wednesday, August 08, 2012 10:09:38 PM Eric Dumazet wrote:

> Actually, the issue is that the shared socket doesn't have an init/alloc
> function to do the LSM allocation like we do with other sockets so Eric's
> patch does it as part of ip_send_unicast_reply().
>
> If we look at the relevant part of Eric's patch:
>
>  +#ifdef CONFIG_SECURITY
>  +   if (!sk->sk_security && security_sk_alloc(sk, PF_INET, GFP_ATOMIC))
>  +   goto out;
>  +#endif
>
> ... if we were to remove the CONFIG_SECURITY conditional we would end up
> calling security_sk_alloc() each time through in the CONFIG_SECURITY=n case as
> sk->sk_security would never be initialized to a non-NULL value.  In the
> CONFIG_SECURITY=y case it should only be called once as security_sk_alloc()
> should set sk->sk_security to a LSM blob.

Ifndef SECURITY this turns into (because security_sk_alloc is a static
inline in that case)

if (!sk->sk_security && 0)
goto out;

Which I'd hope the compiler would optimize.  So that only leaves us
caring about the case there CONFIG_SECURITY is true.  In that case if
we need code which does if !alloc'd then alloc it seems we broke the
model of everything else in the code and added a branch needlessly.

Could we add a __init function which does the security_sk_alloc() in
the same file where we declared them?

>> IMHO it seems wrong to even care about security for internal sockets.
>>
>> They are per cpu, shared for all users on the machine.
>
> The issue, from a security point of view, is that these sockets are sending
> network traffic; even if it is just resets and timewait ACKs, it is still
> network traffic and the LSMs need to be able to enforce security policy on
> this traffic.  After all, what would you say if your firewall let these same
> packets pass without any filtering?
>
> The issue I'm struggling with at present is how should we handle this traffic
> from a LSM perspective.  The label based LSMs, e.g. SELinux and Smack, use the
> LSM blob assigned to locally generated outbound traffic to identify the
> traffic and apply the security policy, so not only do we have to resolve the
> issue of ensuring the traffic is labeled correctly, we have to do it with a
> shared socket (although the patch didn't change the shared nature of the
> socket).
>
> For those who are interested, I think the reasonable labeling solution here is
> to go with SECINITSID_KERNEL/kernel_t for SELinux and likely the ambient label
> for Smack as in both the TCP reset and timewait ACK there shouldn't be any
> actual user data present.

I'm willing to accept that argument from an SELinux perspective.  I'd
also accept the argument that it is private and do something similar
to what we do with IS_PRIVATE on inodes.  Although sockets probably
don't have a good field to use...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Paris
On Wed, Aug 8, 2012 at 5:03 PM, Paul Moore  wrote:
> On Wednesday, August 08, 2012 04:51:56 PM Eric Paris wrote:

>> Could we add a __init function which does the security_sk_alloc() in
>> the same file where we declared them?
>
> Is it safe to call security_sk_alloc() from inside another __init function?  I
> think in both the case of SELinux and Smack it shouldn't be a problem, but I'm
> concerned about the more general case of calling a LSM hook potentially before
> the LSM has been initialized.
>
> If that isn't an issue we could probably do something in ip_init().

The security_initcall() functions should happen way before __init
functions.  If an LSM busts, it's the LSM initializing itself too late
not the code here being wrong...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock

2012-08-09 Thread Eric Paris
On Thu, Aug 9, 2012 at 11:36 AM, Eric Dumazet  wrote:
> On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote:
>
>> Is is possible to do the call to security_sk_alloc() in the ip_init() 
>> function
>> or does the per-cpu nature of the socket make this a pain?
>>
>
> Its a pain, if we want NUMA affinity.
>
> Here, each cpu should get memory from its closest node.

I really really don't like it.  I won't say NAK, but it is the first
and only place in the kernel where I believe we allocate an object and
don't allocate the security blob until some random later point in
time.  If it is such a performance issue to have the security blob in
the same numa node, isn't adding a number of branches and putting this
function call on every output at least as bad?  Aren't we discouraged
from GFP_ATOMIC?  In __init we can use GFP_KERNEL.

This still doesn't fix these sockets entirely.  We now have the
security blob allocated, but it was never set to something useful.
Paul, are you looking into this?  This is a bandaide, not a fix

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock

2012-08-09 Thread Eric Paris
NAK.

I personally think commit be9f4a44e7d41cee should be reverted until it
is fixed.  Let me explain what all I believe it broke and how.

Old callchain of the creation of the 'equivalent' socket previous to
the patch in question just for reference:

inet_ctl_sock_create
  sock_create_kern
__sock_create
  pf->create (inet_create)
sk_alloc
  sk_prot_alloc
security_sk_alloc()


This WAS working properly.  All of it.  The equivalent struct sock was
being created and allocated in inet_create(), which called to
sk_alloc->sk_prot_alloc->security_sk_alloc().  We all agree that
failing to call security_sk_alloc() is the first regression
introduced.

The second regression was the labeling issue.  There was a call to
security_socket_post_create (from __sock_create) which was properly
setting the labels on both the socket and sock.  This new patch broke
that as well.  We don't expose an equivalent
security_sock_post_create() interface in the LSM currently, and until
we do, this can't be fixed.  It's why I say it should be reverted.

I have a patch I'm testing right now which takes care of the first
part the way I like (and yes, I'm doing the allocation on the correct
number node).  It basically looks like so:

+   for_each_possible_cpu(cpu) {
+   sock = &per_cpu(unicast_sock, cpu);
+   rc = security_sk_alloc(&sock->sk, PF_INET, GFP_KERNEL,
cpu_to_node(cpu));
+   if (rc)
+   return rc;
+   }

I'm going to work right now on exposing the equivalent struct sock LSM
interface so we can call that as well.  But it's going to take me a
bit.  Attached is the patch just to (hopefully untested) shut up the
panic.

-Eric

On Thu, Aug 9, 2012 at 10:50 AM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a
> selinux regression, reported and bisected by John Stultz
>
> selinux_ip_postroute_compat() expect to find a valid sk->sk_security
> pointer, but this field is NULL for unicast_sock
>
> Fix this by adding a new 'kernel' parameter to security_sk_alloc(),
> set to true if socket might already have a valid sk->sk_security
> pointer. ip_send_unicast_reply() uses a percpu fake socket, so the first
> call to security_sk_alloc() will populate sk->sk_security pointer,
> subsequent ones will reuse existing context.
>
> Reported-by: John Stultz 
> Bisected-by: John Stultz 
> Signed-off-by: Eric Dumazet 
> Cc: Paul Moore 
> Cc: Eric Paris 
> Cc: "Serge E. Hallyn" 
> ---
>  include/linux/security.h   |6 +++---
>  net/core/sock.c|2 +-
>  net/ipv4/ip_output.c   |4 +++-
>  security/security.c|4 ++--
>  security/selinux/hooks.c   |5 -
>  security/smack/smack_lsm.c |   10 --
>  6 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 4e5a73c..4d8e454 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1601,7 +1601,7 @@ struct security_operations {
> int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb);
> int (*socket_getpeersec_stream) (struct socket *sock, char __user 
> *optval, int __user *optlen, unsigned len);
> int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff 
> *skb, u32 *secid);
> -   int (*sk_alloc_security) (struct sock *sk, int family, gfp_t 
> priority);
> +   int (*sk_alloc_security) (struct sock *sk, int family, gfp_t 
> priority, bool kernel);
> void (*sk_free_security) (struct sock *sk);
> void (*sk_clone_security) (const struct sock *sk, struct sock *newsk);
> void (*sk_getsecid) (struct sock *sk, u32 *secid);
> @@ -2539,7 +2539,7 @@ int security_sock_rcv_skb(struct sock *sk, struct 
> sk_buff *skb);
>  int security_socket_getpeersec_stream(struct socket *sock, char __user 
> *optval,
>   int __user *optlen, unsigned len);
>  int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff 
> *skb, u32 *secid);
> -int security_sk_alloc(struct sock *sk, int family, gfp_t priority);
> +int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool 
> kernel);
>  void security_sk_free(struct sock *sk);
>  void security_sk_clone(const struct sock *sk, struct sock *newsk);
>  void security_sk_classify_flow(struct sock *sk, struct flowi *fl);
> @@ -2667,7 +2667,7 @@ static inline int 
> security_socket_getpeersec_dgram(struct socket *sock, struct s
> return -ENOPROTOOPT;
>  }
>
> -static inline int security_sk_alloc(struct sock *sk, int family, gfp_t 
&g

Re: Friendlier EPERM - Request for input

2013-01-10 Thread Eric Paris
On Fri, 2013-01-11 at 00:14 +0900, Tetsuo Handa wrote:
> The reason I think is that people turn off LSMs because they are using LSMs
> without understanding "what the current configuration is" and/or "how to 
> change
> configuration". People do not spend (or cannot afford spending) resources for
> understanding LSM's configuration.

This is not the point I am arguing.  This is not about LSMs, how hard
they are to configure, or how to 'fix' them.  It certainly isn't about
how one LSM is better, easier, or superior to another.  This is about
getting more information in userspace when operations fail.  I'll quote
an off list e-mail I received:

Friendlier/more complete error messages would eliminate an awful lot of
digging around trying to figure *what* the problem is, preparatory to
discerning *where* the problem is and *how* to fix it.

There are so many things that might go into fixing problems in an LSM.
That's what you are talking about, but it isn't relevant here.  This is
about knowing WHAT the problem is, maybe helping with where and how.
And this isn't just about LSM.  Heck, LSMs are just a small part of it.
I want extended errno interface to talk about DAC, capabilities, ACLs,
LSMs, everything!

> > The audit log is complete crap from a usability PoV.  If a web admin is
> 
> TOMOYO's audit log is very useful. TOMOYO is a security tool but is also 
> useful
> for education/training/debugging/development/profiling etc.

The TOMOYO audit log is a very poor fit for this as well.  I'm not
trying to be rude, but there is no reasonable way for applications to
use it, it is TOMOYO specific, and it doesn't cover non-LSM errors.  I
want applications like httpd to be able to put what went wrong in its
log message.  I want python to be able to get extended information and
present that up the stack.  Nothing we have today comes close.  My
proposal isn't perfect, it suffers from the same problems as errno
(except even worse because it is harder to save and restore), but at
least it will usually be helpful...

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ima: policy search speedup

2012-12-12 Thread Eric Paris
On Wed, Dec 12, 2012 at 8:56 AM, Kasatkin, Dmitry
 wrote:

> I have done few tests.
> Ratio between lookups is different, but I do not really remember what
> exactly it was.
> Probably I did measurement with directory integrity protection...
>
> First test is done using upstream IMA.
>
> IMA-appraisal
> [   59.554072] counter1 - before (inode->i_flags & S_NOIMA): 74586
> [   59.554628] counter2 - after (inode->i_flags & S_NOIMA): 74558
> [   59.555024] counter3 - after (inode->i_sb->s_feature_flags & SF_NOIMA): 
> 52895
>
> You can see 3 counters after 55 seconds uptime.
> Counters count entries to the policy search function.
> First counter counts every entry.
> Second counter counts if control passes after inode flag S_NOIMA.
> Third counts if control passes after SB flag SF_NOIMA.
>
> As you can see, inode flag does not make a difference - just 28
> entries differences.

That is quite shocking to me that on system start up we only ever
reference the same /proc file 28 times.  Wow.  I just don't know what
to say.  Obviously the per inode flag is dead.  Now there is probably
some CPU benefit to a per-sb flag, but I'm not sure it is worth the
code to do it if you can't really measure that as something that makes
a difference in userspace.

Maybe if you can show that after your integrity protection work is
done we can revisit.  But I'd think this patch is dead until you can
show the impact on real userspace.  Someone else might disagree, but
that's my thought.

Sorry.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] audit: clean up refcounting in audit-tree

2012-07-06 Thread Eric Paris
On Mon, 2012-06-25 at 19:46 +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi 
> 
> Drop the initial reference by fsnotify_init_mark early instead of
> audit_tree_freeing_mark() at destroy time.
> 
> In the cases we destroy the mark before we drop the initial reference we need 
> to
> get rid of the get_mark that balances the put_mark in 
> audit_tree_freeing_mark().
> 
> Signed-off-by: Miklos Szeredi 

I don't love this.  The reference from fsnotify_init_mark() should live
until fsnotify_mark_destroy().  If we need to drop a reference in
audit_tree_freeing_mark() we should be taking that elsewhere

> ---
>  kernel/audit_tree.c |   12 +---
>  1 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 31fdc48..7b95706 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -292,6 +292,7 @@ static void untag_chunk(struct node *p)
>   spin_unlock(&hash_lock);
>   spin_unlock(&entry->lock);
>   fsnotify_destroy_mark(entry);
> + fsnotify_put_mark(&new->mark);  /* drop initial reference */
>   goto out;
>  
>  Fallback:
> @@ -330,7 +331,6 @@ static int create_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   spin_unlock(&hash_lock);
>   chunk->dead = 1;
>   spin_unlock(&entry->lock);
> - fsnotify_get_mark(entry);
>   fsnotify_destroy_mark(entry);
>   fsnotify_put_mark(entry);
>   return 0;
> @@ -346,6 +346,7 @@ static int create_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   insert_hash(chunk);
>   spin_unlock(&hash_lock);
>   spin_unlock(&entry->lock);
> + fsnotify_put_mark(entry);   /* drop initial reference */
>   return 0;
>  }
>  
> @@ -411,7 +412,6 @@ static int tag_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   spin_unlock(&chunk_entry->lock);
>   spin_unlock(&old_entry->lock);
>  
> - fsnotify_get_mark(chunk_entry);
>   fsnotify_destroy_mark(chunk_entry);
>  
>   fsnotify_put_mark(chunk_entry);
> @@ -444,6 +444,7 @@ static int tag_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   spin_unlock(&chunk_entry->lock);
>   spin_unlock(&old_entry->lock);
>   fsnotify_destroy_mark(old_entry);
> + fsnotify_put_mark(chunk_entry); /* drop initial reference */
>   fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
>   return 0;
>  }
> @@ -915,7 +916,12 @@ static void audit_tree_freeing_mark(struct fsnotify_mark 
> *entry, struct fsnotify
>   struct audit_chunk *chunk = container_of(entry, struct audit_chunk, 
> mark);
>  
>   evict_chunk(chunk);
> - fsnotify_put_mark(entry);
> +
> + /*
> +  * We are guaranteed to have at least one reference to the mark from
> +  * either the inode or the caller of fsnotify_destroy_mark().
> +  */
> + BUG_ON(atomic_read(&entry->refcnt) < 1);
>  }
>  
>  static bool audit_tree_send_event(struct fsnotify_group *group, struct inode 
> *inode,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] audit: fix refcounting in audit-tree

2012-07-06 Thread Eric Paris
On Mon, 2012-06-25 at 19:46 +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi 
> 
> Refcounting of fsnotify_mark in audit tree is broken.  E.g:
> 
>   refcount
> create_chunk
>   alloc_chunk 1
>   fsnotify_add_mark   2
> 
> untag_chunk
>   fsnotify_get_mark   3
>   fsnotify_destroy_mark
> audit_tree_freeing_mark   2
>   fsnotify_put_mark   1
>   fsnotify_put_mark   0
>   via destroy_list
> fsnotify_mark_destroy-1
> 
> This was reported by various people as triggering Oops when stopping auditd.
> 
> We could just remove the put_mark from audit_tree_freeing_mark() but that 
> would
> break freeing via inode destruction.  So this patch simply omits a put_mark
> after calling destroy_mark (or adds a get_mark before).
> 
> Next patch will clean up the remaining mess.
> 
> Signed-off-by: Miklos Szeredi 
> CC: sta...@vger.kernel.org

Agreed this is needed.  My changelog was:

audit: fix ref count problems in audit trees

Before the switch from in kernel inotify to fsnotify for audit trees the
code regularly did:
inotify_evict_watch(&chunk->watch);
put_inotify_watch(&chunk->watch);

I translated this in fsnotify_speak into:
fsnotify_destroy_mark_by_entry(chunk_entry);
fsnotify_put_mark(chunk_entry);

The problem is that the inotify_evict_watch function actually took a
reference on chunk->watch, which is what was being dropped by
put_inotify_watch().  The fsnotify code does not take such a reference
during fsnotify_destroy_mark_by_entry().  Thus we are dropping reference
counts prematurely and eventually we hit a use after free!  Whoops!

Fix these call sites to not drop the extra reference.

Reported-by: Valentin Avram 
Reported-by: Peter Moody 
Partial-patch-by: Marcelo Cerri 
Signed-off-by: Eric Paris 

Maybe you can use some of that changelog in your next post?  (If you do
one?)  The only reason you would repost is because I don't understand
why you took a ref in some places instead of just not dropping it
everywhere...

> ---
>  kernel/audit_tree.c |5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index d52d247..31fdc48 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -250,7 +250,6 @@ static void untag_chunk(struct node *p)
>   spin_unlock(&hash_lock);
>   spin_unlock(&entry->lock);
>   fsnotify_destroy_mark(entry);
> - fsnotify_put_mark(entry);
>   goto out;
>   }
>  
> @@ -293,7 +292,6 @@ static void untag_chunk(struct node *p)
>   spin_unlock(&hash_lock);
>   spin_unlock(&entry->lock);
>   fsnotify_destroy_mark(entry);
> - fsnotify_put_mark(entry);
>   goto out;
>  
>  Fallback:
> @@ -332,6 +330,7 @@ static int create_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   spin_unlock(&hash_lock);
>   chunk->dead = 1;
>   spin_unlock(&entry->lock);
> + fsnotify_get_mark(entry);
>   fsnotify_destroy_mark(entry);
>   fsnotify_put_mark(entry);
>   return 0;

Like here?  Why not just avoid the atomic op altogether?

> @@ -412,6 +411,7 @@ static int tag_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   spin_unlock(&chunk_entry->lock);
>   spin_unlock(&old_entry->lock);
>  
> + fsnotify_get_mark(chunk_entry);
>   fsnotify_destroy_mark(chunk_entry);
>  
>   fsnotify_put_mark(chunk_entry);
> @@ -445,7 +445,6 @@ static int tag_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   spin_unlock(&old_entry->lock);
>   fsnotify_destroy_mark(old_entry);
>   fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
> - fsnotify_put_mark(old_entry); /* and kill it */
>   return 0;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] audit: don't free_chunk() after fsnotify_add_mark()

2012-07-06 Thread Eric Paris
On Mon, 2012-06-25 at 19:46 +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi 
> 
> Don't do free_chunk() after fsnotify_add_mark().  That one does a delayed 
> unref
> via the destroy list and this results in use-after-free.
> 
> Signed-off-by: Miklos Szeredi 
> CC: sta...@vger.kernel.org

Al, can you send these along?  I am not going to see a computer again
for a month.  Hurray!

Acked-by: Eric Paris 

> ---
>  kernel/audit_tree.c |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 5bf0790..d52d247 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -259,7 +259,7 @@ static void untag_chunk(struct node *p)
>  
>   fsnotify_duplicate_mark(&new->mark, entry);
>   if (fsnotify_add_mark(&new->mark, new->mark.group, new->mark.i.inode, 
> NULL, 1)) {
> - free_chunk(new);
> + fsnotify_put_mark(&new->mark);
>   goto Fallback;
>   }
>  
> @@ -322,7 +322,7 @@ static int create_chunk(struct inode *inode, struct 
> audit_tree *tree)
>  
>   entry = &chunk->mark;
>   if (fsnotify_add_mark(entry, audit_tree_group, inode, NULL, 0)) {
> - free_chunk(chunk);
> + fsnotify_put_mark(entry);
>   return -ENOSPC;
>   }
>  
> @@ -396,7 +396,7 @@ static int tag_chunk(struct inode *inode, struct 
> audit_tree *tree)
>   fsnotify_duplicate_mark(chunk_entry, old_entry);
>   if (fsnotify_add_mark(chunk_entry, chunk_entry->group, 
> chunk_entry->i.inode, NULL, 1)) {
>   spin_unlock(&old_entry->lock);
> - free_chunk(chunk);
> + fsnotify_put_mark(chunk_entry);
>   fsnotify_put_mark(old_entry);
>   return -ENOSPC;
>   }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: inotify: notify IN_DELETE_SELF when file is deleted or inode deleted ?

2012-07-06 Thread Eric Paris
It is inode oriented.  Makes it a pain to work with, but that's how it
is.  Sorry!

On Fri, Jun 29, 2012 at 10:06 AM, Pierre PEIFFER
 wrote:
> Hi,
>
> By playing with inotify (on user side) to know whether the file I'm using is
> deleted by someone else,  I have noted that I do not receive the
> IN_DELETE_SELF event, exactly because I'm using it. By checking into kernel
> area, I see that at unlink(), an IN_ATTRIB event is sent, but IN_DELETE_SELF
> is, indeed, sent only when the inode is deleted.
>
> But such IN_ATTRIB event doesn't tell to the user what has changed among
> permissions, timestamps, link count, etc...
> So it doesn't much help. Of course, I have noted that I can monitor the
> parent directory for IN_DELETE and then check which file has been deleted;
> few more stuff to do but it works, no pb.
>
> But I'm still wondering after reading in the man.:
>IN_DELETE_SELFWatched file/directory was itself deleted.
> Is this really the expected behavior ? Shouldn't the kernel trig such event
> at unlink() ? Or is inotify clearly inode oriented ?
>
> Thanks,
>
> Pierre
>
> PS: keep me in cc, I'm not subscribed to the list.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Friendlier EPERM - Request for input

2013-01-09 Thread Eric Paris
Getting an EPERM/EACCES in userspace really kinda blows.  As a user you
don't have any idea why you got it.  It could be SELinux, it could be
rwx bits on the file, it could be a missing capability, it could be an
ACL, it could be who knows what.  We'd like to start figuring out the
who knows what and hopefully find a way to expose that to userspace.  My
initial thought is a small buffer in the task struct in which the kernel
can dump some data when it is going to send back an EPERM.  I was
thinking of exposing that data via a /proc/self/task/[tid]/file which
userspace could poll after an EPERM.  The hope would be to all userspace
a better understanding of why it failed.  Wouldn't it be nice if instead
of httpd log just saying 'permission denied' it would be able to say
'permision denied because the file was 640"?

I was thinking this buffer would look something like (very pseudo codie)

struct friendly_response {
enum { NONE, CAPABILITIES, SELINUX, FILE_PERMS } type;
union {
struct capabilities {
int cap;
};
struct selinux {
__u32 source_label
__u32 target_label
__u32 operation;
__u32 tclass;
};
struct file_perms {
int uid;
int gid;
int file_uid;
int file_gid;
int file_mode;
};
};
};

The idea being that the buffer should only hold stateless data which can
be overwritten or destroyed on the next EPERM.  Reading the proc file
could then translate that into something a little more useful, but
hopefully still computer friendly enough that it can be translated and
useful to a user/developer.  So after a denial from file permissions the
proc file would return a string like:

3 500 500 0 0 0640
^  ^   ^  ^ ^  ^--- Mode
|  |   |  | |--- File Gid
|  |   |  |---File Uid
|  |   |--- User Gid
|  |---User Uid
|--- Type for failure (file perms)

Or from SELinux:

2 unconfined_u:unconfined_r:unconfined_t:s0 system_u:system_r:system_t:s0 { 
read } file

This would take patching hundreds of call sites in the kernel to fill
the buffer.  But just getting extended denial information in a couple of
the hot spots would be a huge win.  Put it in capable(), LSM hooks, the
open() syscall and path walk code.

It was pointed out that this isn't perfect, and I'm well aware of that.
Lots of userspace libraries do things like save errno, make other kernel
calls which could overwrite the buffer, then pass errno up the stack.  I
don't believe there is much that can done about that.  Other than
userspace trying to save the extended error information as well?

One suggestion was to make it a ring buffer of some configurable size
instead.  I'm not sure how a program could find the 'right' message in
the ring buffer, but it is a thought to help the problem of intervening
kernel calls overwriting the last extended error info.  I was pointed to
the win32 api for getLastError() as something to look at.  This API is
mostly like errno/strerror in libc, except without the easy ability to
save it for later.  My new proposed API would have the same problems.
No 'easy' way to save it for later.  (also getLastError() doesn't pass
any kind of extended information, so it seems to solve a different
problem and I haven't seen any OS that is doing this, so point me to
places I can learn)

Another thought was that some people might want to turn off the ability
to get this information.  I don't see a reason a sysctl couldn't be used
to disable extended errno information if the admin so chose.

But maybe those great minds on the lists can help me think of ways to
get Friendlier denials that I haven't thought of.  Please.  What are you
thoughts, concerns, issues?

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Friendlier EPERM - Request for input

2013-01-09 Thread Eric Paris
>On Wed, 2013-01-09 at 11:04 -0500, Eric Paris wrote:
> >Getting an EPERM/EACCES in userspace really kinda blows.  As a user you
> >don't have any idea why you got it.  

Stephen Smalley wrote:
> What if the denial was due to lacking sufficient permission to stat
> the file (in which case you shouldn't leak the file's attributes to the
> caller)?
>
> Also, the present inability of (unprivileged) userspace to distinguish
> DAC from MAC denials is relied upon to avoid any covert channel when
> the DAC check happens before the MAC check, as otherwise you leak
> information about the DAC attributes.  If you are going to indicate to
> all of userspace whether it was DAC or MAC, you have to make sure that
> the MAC check / LSM hook always comes first in every case.  Which then
> leads to lots of unnecessary MAC audit messages that would have been
> denied by DAC anyway.

I'd have to fall back to my original thought about who to expose this
information to:

> >Another thought was that some people might want to turn off the ability
> >to get this information.  I don't see a reason a sysctl couldn't be used
> >to disable extended errno information if the admin so chose.

On systems with a strict security policy worried about such things this
would quite reasonably need to be disabled.  But most of the reason
people turn off LSMs is because it gets in the way and they get pissed
getting an EPERM, checking rwx bits, having no idea WTF happened, and
then being REALLY pissed at the LSM when they figure out that was the
problem.  I want to put it right there, out front, here is what went
wrong for the common case.

I mean, we sorta, kinda sometimes, poorly give this information.  Isn't
DAC usually EPERM and LSMs usually EACCES.  From a DAC PoV exposing the
DAC info is completely reasonable.  You can't hide a stat(), even
setting chmod 000.  So this is an LSM issue.  Not an issue for
DAC/Capabilities/ACLs.  We can add an LSM hook on the /proc file to make
sure nothing can get the data even if the global sysctl allows such
access.

We could if it was worth it make filling in the extended dial
information an LSM hook where you can check on a per denial basis.  But
I don't see the need.

> Privileged userspace already has a way to identify when it was a MAC
> denial, via the audit logs.  But those are considered
> security-sensitive, and so exposing that same information to
> unprivileged userspace will be a problem.

The audit log is complete crap from a usability PoV.  If a web admin is
trying to figure out why his web server isn't serving out web pages
in /home he's going to look in the web server logs and see EPERM.
That's it.  Now he has to start blindly guessing which of the many
possible reasons for EPERM is his problem.  Wouldn't it be better if the
information was actually available?  Using the audit log in real time is
a bloody nightmare.  As the audit maintainer I know how to use ausearch,
but then again, as the SELinux maintainer I know how to figure out which
of the many possibilities from EPERM are the cause.  But I want to make
it easier for a normal admin to figure that out.

I know many people are worried about information leaks, so I'll right up
front say lets add the sysctl to disable the interface for those who are
concerned about the metadata information leak.  But for most of us I
want that data right when it happens, where it happens, so It can be
exposed, used, and acted upon by the admin trying to troubleshoot why
the shit just hit the fan.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Friendlier EPERM - Request for input

2013-01-09 Thread Eric Paris
On Wed, 2013-01-09 at 12:14 -0800, Casey Schaufler wrote:
> On 1/9/2013 11:43 AM, Eric Paris wrote:

> > I know many people are worried about information leaks, so I'll right up
> > front say lets add the sysctl to disable the interface for those who are
> > concerned about the metadata information leak.  But for most of us I
> > want that data right when it happens, where it happens, so It can be
> > exposed, used, and acted upon by the admin trying to troubleshoot why
> > the shit just hit the fan.
> 
> How on earth is your webadmin supposed to match the failure
> of a system call with the content of /proc/8675309/whydiditfail
> at the time of the failure? It's not like he's going to go into
> the apache source and add code to look at what's there.
> 
> Unfortunately, what you probably need is an interface that gives
> the program access to the audit records it has generated.

The first thought was wanting to add it by default to perror/strerror in
libc, but libc guys didn't think it reasonable since it couldn't be
save/restored across calls, like errno.  Instead I hope for a libc
function, something like char *get_extended_error_info(void), which
outputs a localized string based on the information in the /proc
interface.

Programs like httpd would need to be changed to use this function when
logging denials.  Programs like python would use this interface to
populate additional information in the exception they pass up the stack.

I agree that something access to the audit record is appropriate.  It's
sorta like what I'm proposing.  But audit is not really useful here.  It
doesn't collect information about any denial other than LSM.  And the
number of DAC or capabilities denials on a normally operating box is, I
expect, quite large.  Thus we can't pump them all into audit just on the
unlikely chance something cares.  Setting a couple of ints is cheap.
Audit is really nice for people will well defined security goals and
information retention and analysis purposes, but it kinda completely
sucks butt for a normal sysadmin or developer...

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Friendlier EPERM - Request for input

2013-01-09 Thread Eric Paris
On Wed, 2013-01-09 at 12:53 -0800, Casey Schaufler wrote:

> Let me try again, I think I didn't quite get the idea across.
> 
> I'm suggesting that the string returned by get_extended_error_info()
> ought to be the audit record the system call would generate, regardless
> of whether the audit system would emit it or not.
> If the audit record doesn't have the information you need we should
> fix the audit system to provide it. Any bit of the information in
> the audit record might be relevant, and your admin or developer might
> need to see it.
> 
> I'm suggesting using the audit record because there are tools to
> examine them and it's a pity to use a different format instead of
> fixing the one that's already there.

I get the point.  My problem with using audit records is that they have
to be stored on disk, forever.  We have to store a record on disk for
EVERY denial because of rwx bits, acls, capabilities, LSM, etc.  We
don't do that today and I'm scared of disk growth explosion.  Then we
could have a kernel interface, say get_last_audit_record(), which could
query the audit system for that record number.

A thought on disk size explosion might be something like generating
these records in the kernel and just store them in the task struct until
some later point in time.  If userspace calls get_last_audit_record() we
might be able to dump the record to auditd.  If another record comes
along we have to free the last one and replace it.  Lot more of a perf
hit than setting a couple of ints and taking the hit at the time when
userspace actually wants to collect/use this information.

But are we just building up a rube goldburg machine?  I don't see a
problem storing the last audit record if it exists, but I don't like
making audit part of the normal workflow.  I'd do it if others like that
though

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Friendlier EPERM - Request for input

2013-01-09 Thread Eric Paris
On Wed, 2013-01-09 at 21:59 +0100, Jakub Jelinek wrote:
> On Wed, Jan 09, 2013 at 12:53:40PM -0800, Casey Schaufler wrote:
> > I'm suggesting that the string returned by get_extended_error_info()
> > ought to be the audit record the system call would generate, regardless
> > of whether the audit system would emit it or not.
> 
> What system call would that info be for and would it be reset on next
> syscall that succeeded, or also failed?
> 
> The thing is, various functions e.g. perform some syscall, save errno, do
> some other syscall, and if they decide that the first syscall should be what
> determines the whole function's errno, just restore errno from the saved
> value and return.  Similarly, various functions just set errno upon
> detecting some error condition in userspace.
> There is no 1:1 mapping between many libc library calls and syscalls.
> So, when would it be safe to call this new get_extended_error_info function
> and how to determine to which syscall it was relevant?

I was thinking of it to be the last kernel error.  So if the first and
that second operation caused the kernel to want to make available
extended errno information you would end up with the second.  I see this
is an informative piece of information, not normative.  Not a
replacement for errno.  I'm hoping for a best effort way to provide
extended errno information.

It would be really neat for libc to have a way to save and restore the
extended errno information, maybe even supply its own if it made the
choice in userspace, but that sounds really hard for the first pass.

I mean it would be great if we could rewrite every system call with a
cookie so userspace could reliably match things back up, but I just
don't see that as practical.  Instead we do the best we can and help
admins and developers most of the time, instead of none of the time.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] Audit: break a large single execve argument into smaller records

2007-10-08 Thread Eric Paris
support single arguments that are large, not just large lists of execve 
args.
This also means we never have to get a kernel buffer larger than
MAX_EXECVE_AUDIT_LEN no matter how large the argument is.  Before this patch
we could need to allocate 32 consecutive pages to hold one argument which 
could
pretty easily oom.

a single argument larger than MAX_EXECVE_AUDIT_LEN is broken into multiple
records and have a format like   a10[0] a10[1] a10[2] etc.

Signed-off-by: Eric Paris <[EMAIL PROTECTED]>
---

example audit log (about 50k long) for the whole patch series can be
found at http://people.redhat.com/~eparis/audit/audit.log the execve in
question was something like:

program_name [about 50 arguments] [one argument which is about 17k long] [about 
1000 arguments]

 kernel/auditsc.c |   42 +
 1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4176db6..ffc8d4b 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -853,6 +853,48 @@ static void audit_log_execve_info(struct audit_context 
*context,
send_sig(SIGKILL, current, 0);
}
 
+   if (unlikely(len > MAX_EXECVE_AUDIT_LEN)) {
+   /* deal with single arugments > MAX_EXECVE_AUDIT_LEN */
+   int j;
+   const long tmplen = sizeof(char) * MAX_EXECVE_AUDIT_LEN;
+
+   buf = kmalloc(tmplen + 1, GFP_KERNEL);
+   if (!buf) {
+   audit_panic("out of memory for argv string\n");
+   return;
+   }
+   buf[tmplen] = '\0';
+   for (j = 0; len > 0; j++) {
+   if (len > tmplen) {
+   ret = copy_from_user(buf, p, tmplen);
+   p += tmplen;
+   len -= tmplen;
+   } else {
+   ret = copy_from_user(buf, p, len);
+   /* p is at the next arg */
+   p += len;
+   /* 27 is the max length of a%d[%d] */
+   len_sent = len + 27;
+   len  = 0;
+   }
+   if (ret) {
+   WARN_ON(1);
+   send_sig(SIGKILL, current, 0);
+   }
+   audit_log_end(*ab);
+   *ab = audit_log_start(context, GFP_KERNEL,
+ AUDIT_EXECVE);
+   if (!*ab) {
+   kfree(buf);
+   return;
+   }
+   audit_log_format(*ab, "a%d[%d]=", i, j);
+   audit_log_untrustedstring(*ab, buf);
+   audit_log_format(*ab, "\n");
+   }
+   continue;
+   }
+
buf = kmalloc(len, GFP_KERNEL);
if (!buf) {
audit_panic("out of memory for argv string\n");


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] Audit: break up execve argument lists into multiple records

2007-10-08 Thread Eric Paris
Break the auditing of a list of execve arguments into smaller records if 
there
are a too many.  The limit is currently around 7.5k of arguments as 
userspace
has an 8k buffer limit and will drop messages which are longer.

Signed-off-by: Eric Paris <[EMAIL PROTECTED]>
---
Basically the same patch as last time, used a #define, cleaned up a
memory leak on a malloc failure code path.  Other than that it's the
same.

7500 also is a good size because it means we never need more than a 2 page 
allocation.
I'd say this is a good thing as even if we used a full netlink message of 32k 
its just
making it harder on the kernel to have the memory it needs.  I think userspace 
will want
to get fixed eventually to handle a full 32k just in case, but keeping the 
kernel under
8k when we know we can seems like a good idea.

 kernel/auditsc.c |   39 +--
 1 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 04f3ffb..4176db6 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -78,6 +78,9 @@ extern struct list_head audit_filter_list[];
 /* Indicates that audit should log the full pathname. */
 #define AUDIT_NAME_FULL -1
 
+/* no execve audit message should be longer than this (userspace limits) */
+#define MAX_EXECVE_AUDIT_LEN 7500
+
 /* number of audit rules */
 int audit_n_rules;
 
@@ -819,11 +822,12 @@ static int audit_log_pid_context(struct audit_context 
*context, pid_t pid,
return rc;
 }
 
-static void audit_log_execve_info(struct audit_buffer *ab,
+static void audit_log_execve_info(struct audit_context *context,
+   struct audit_buffer **ab,
struct audit_aux_data_execve *axi)
 {
int i;
-   long len, ret;
+   long len, ret, len_sent = 0;
const char __user *p;
char *buf;
 
@@ -833,7 +837,11 @@ static void audit_log_execve_info(struct audit_buffer *ab,
p = (const char __user *)axi->mm->arg_start;
 
for (i = 0; i < axi->argc; i++, p += len) {
+   char tmp_buf[12];
+   /* how many digits are in i? */
+   int i_len = snprintf(tmp_buf, 12, "%d", i);
len = strnlen_user(p, MAX_ARG_STRLEN);
+
/*
 * We just created this mm, if we can't find the strings
 * we just copied into it something is _very_ wrong. Similar
@@ -862,9 +870,28 @@ static void audit_log_execve_info(struct audit_buffer *ab,
send_sig(SIGKILL, current, 0);
}
 
-   audit_log_format(ab, "a%d=", i);
-   audit_log_untrustedstring(ab, buf);
-   audit_log_format(ab, "\n");
+   /*
+* If there are a lot of args just break them into multiple
+* messages.  the last ab started will get closed by the
+* caller.
+*
+* + 3 + i_len because we know at least a = and \n will be sent
+* as well as the number of digits in i (i_len).
+*/
+   len_sent += (len + 3 + i_len);
+   if (len_sent > MAX_EXECVE_AUDIT_LEN) {
+   len_sent = len + 3 + i_len;
+   audit_log_end(*ab);
+   *ab = audit_log_start(context, GFP_KERNEL, 
AUDIT_EXECVE);
+   if (!*ab) {
+   kfree(buf);
+   return;
+   }
+   }
+
+   audit_log_format(*ab, "a%d=", i);
+   audit_log_untrustedstring(*ab, buf);
+   audit_log_format(*ab, "\n");
 
kfree(buf);
}
@@ -1010,7 +1037,7 @@ static void audit_log_exit(struct audit_context *context, 
struct task_struct *ts
 
case AUDIT_EXECVE: {
struct audit_aux_data_execve *axi = (void *)aux;
-   audit_log_execve_info(ab, axi);
+   audit_log_execve_info(context, &ab, axi);
break; }
 
case AUDIT_SOCKETCALL: {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] Audit: remove the limit on execve arguments when audit is running

2007-10-08 Thread Eric Paris
Remove the limitation on argv size.  The audit system now logs arguments in
smaller chunks (currently about 8k due to userspace audit system buffer 
sizes)
so this is no longer a requirement.

Signed-off-by: Eric Paris <[EMAIL PROTECTED]>
Acked-by: Peter Zijlstra <[EMAIL PROTECTED]>

---

This patch hasn't changed since the last series, just reposted as 3/3 and 
rediffed.

 kernel/auditsc.c |   10 --
 kernel/sysctl.c  |   11 ---
 2 files changed, 0 insertions(+), 21 deletions(-)

diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index ffc8d4b..5d39727 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1917,8 +1917,6 @@ int __audit_ipc_set_perm(unsigned long qbytes, uid_t uid, 
gid_t gid, mode_t mode
return 0;
 }
 
-int audit_argv_kb = 32;
-
 int audit_bprm(struct linux_binprm *bprm)
 {
struct audit_aux_data_execve *ax;
@@ -1927,14 +1925,6 @@ int audit_bprm(struct linux_binprm *bprm)
if (likely(!audit_enabled || !context || context->dummy))
return 0;
 
-   /*
-* Even though the stack code doesn't limit the arg+env size any more,
-* the audit code requires that _all_ arguments be logged in a single
-* netlink skb. Hence cap it :-(
-*/
-   if (bprm->argv_len > (audit_argv_kb << 10))
-   return -E2BIG;
-
ax = kmalloc(sizeof(*ax), GFP_KERNEL);
if (!ax)
return -ENOMEM;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 53a456e..88e5d06 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -77,7 +77,6 @@ extern int percpu_pagelist_fraction;
 extern int compat_log;
 extern int maps_protect;
 extern int sysctl_stat_interval;
-extern int audit_argv_kb;
 
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
@@ -347,16 +346,6 @@ static ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = &proc_dointvec,
},
-#ifdef CONFIG_AUDITSYSCALL
-   {
-   .ctl_name   = CTL_UNNUMBERED,
-   .procname   = "audit_argv_kb",
-   .data   = &audit_argv_kb,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = &proc_dointvec,
-   },
-#endif
{
.ctl_name   = KERN_CORE_PATTERN,
.procname   = "core_pattern",


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Allow Kconfig to set default mmap_min_addr protection

2007-12-21 Thread Eric Paris

On Thu, 2007-12-20 at 00:29 +0100, Jan Engelhardt wrote:
> On Dec 19 2007 16:59, Eric Paris wrote:
> >
> >+config SECURITY_DEFAULT_MMAP_MIN_ADDR
> >+int "Low address space to protect from user allocation"
> 
> Hm, should not this be 'hex'?

I guess it could be, but the input for /proc/sys/vm/mmap_min_addr is
base 10 as well so I figured consistency was a good thing.  Do you have
strong feelings?  I guess so since you posted about it.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Allow Kconfig to set default mmap_min_addr protection

2008-01-02 Thread Eric Paris

On Fri, 2007-12-21 at 23:59 +0100, Jan Engelhardt wrote:
> On Dec 21 2007 14:35, Greg KH wrote:
> >> >> >I guess it could be, but the input for /proc/sys/vm/mmap_min_addr is
> >> >> >base 10 as well
> >> >> 
> >> >> sysfs is autobase, i.e. echo "0xb000" >/sys/foo will Do The Right Thing.
> >> >
> >> >yes but if you cat  /proc/sys/vm/mmap_min_addr, it returns in base 10.
> >> 
> >> sysfs should probably be tuned to output it in a preferred base.
> >
> >Again, this is sysctl, not sysfs.  two very different things...
> >
> Argh... :)  Just shows that /proc is the wrong place for system variables.
> 
> Well, module_params(integer) are autobase, and that's all I needed so 
> far :-D

So in the end we are all happy with the original patch I sent?

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] VM/Security: add security hook to do_brk

2007-12-04 Thread Eric Paris
Given a specifically crafted binary do_brk() can be used to get low
pages available in userspace virtually memory and can thus be used to
circumvent the mmap_min_addr low memory protection.  Add security checks
in do_brk().

Signed-off-by: Eric Paris <[EMAIL PROTECTED]>

---

 mm/mmap.c |4 
 1 file changed, 4 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index f4cfc6a..15678aa 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1941,6 +1941,10 @@ unsigned long do_brk(unsigned long addr, unsigned long 
len)
if (is_hugepage_only_range(mm, addr, len))
return -EINVAL;
 
+   error = security_file_mmap(0, 0, 0, 0, addr, 1);
+   if (error)
+   return error;
+
flags = VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
 
error = arch_mmap_check(addr, len, flags);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] security: allow capable check to permit mmap or low vm space

2007-11-20 Thread Eric Paris
On Sat, 2007-11-17 at 09:12 +1100, James Morris wrote:
> On Fri, 16 Nov 2007, Eric Paris wrote:
> 
> > When this protection was originally concieved it intentionally was
> > offing something even without an more 'full featured' LSM.  That was the
> > whole reason I had to drop the secondary stacking hook inside the
> > selinux code.
> > 
> > While I now understand the question, I think that this is the behavior
> > most people would want.  I'll revert the security enhancement for
> > non-LSM systems if others agree with James, but I think adding another
> > small bit of protection against kernel flaws for everyone who wants
> > security is a win.  (and remember, in kernel we still default this to
> > off so noone is going to 'accidentally' see and security checks in the
> > dummy hooks)
> 
> If it's off by default and generally useful across LSMs, why not just put 
> it in the base kernel code?

It was placed in CONFIG_SECURITY so that users can (If their given LSM
supports it) selectively allow this stuff.  Some LSMs are fine grained
enough to allow applications like dosemu and X to use low pages without
globally disabling.  I can't think of any reasonable way to move all of
this into base kernel code while still allowing overrides.

I'd love to hear suggestions on how to move all of this out of the
security code and into the base kernel while not neglecting those few
users who need this functionality.

-Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >