Re: linux-next: Tree for Nov 27 (file notify)
On Tue, 2012-11-27 at 08:43 -0800, Randy Dunlap wrote: > On 11/26/2012 10:25 PM, Stephen Rothwell wrote: > > > Hi all, > > > > Changes since 20121126: > > > > > when CONFIG_PROC_FS is not enabled (also seen in mmotm): > > fs/notify/fanotify/fanotify_user.c:436:17: error: 'fanotify_show_fdinfo' > undeclared here (not in a function) > fs/notify/inotify/inotify_user.c:333:17: error: 'inotify_show_fdinfo' > undeclared here (not in a function It is something coming in from Andrew's tree. I believe they already have a patch. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kmsg: Honor dmesg_restrict sysctl on /dev/kmsg
Fine Fine, I'll get off my lazy butt and look at this. On Wed, 2013-02-27 at 10:14 -0800, Kees Cook wrote: > On Wed, Feb 27, 2013 at 10:01 AM, Josh Boyer wrote: > > On Wed, Feb 27, 2013 at 09:54:27AM -0800, Kees Cook wrote: > >> On Fri, Feb 22, 2013 at 01:18:57PM -0500, Josh Boyer wrote: > >> > Originally, the addition of dmesg_restrict covered both the syslog > >> > method of accessing dmesg, as well as /dev/kmsg itself. This was done > >> > indirectly by security_syslog calling cap_syslog before doing any LSM > >> > checks. > >> > > >> > However, commit 12b3052c3ee (capabilities/syslog: open code cap_syslog > >> > logic to fix build failure) moved the code around and pushed the checks > >> > into the caller itself. That seems to have inadvertently dropped the > >> > checks for dmesg_restrict on /dev/kmsg. Not sure this is correct. There was no devkmsg_open() in commit 12b3052c3ee. That was added in e11fea92e. Looks like before that commit the devkmsg code was even worse. It didn't call security_syslog or capable(). Uggh. > Most people haven't noticed > >> > because util-linux dmesg(1) defaults to using the syslog method for > >> > access in older versions. With util-linux 2.22 and a kernel newer than > >> > 3.5, dmesg(1) defaults to reading directly from /dev/kmsg. > >> > > >> > Fix this by making an explicit check in the devkmsg_open function. > >> > > >> > This fixes https://bugzilla.redhat.com/show_bug.cgi?id=903192 > >> > > >> > Reported-by: Christian Kujau > >> > CC: sta...@vger.kernel.org > >> > Signed-off-by: Josh Boyer > >> > --- > >> > kernel/printk.c | 3 +++ > >> > 1 file changed, 3 insertions(+) > >> > > >> > diff --git a/kernel/printk.c b/kernel/printk.c > >> > index f24633a..398ef9a 100644 > >> > --- a/kernel/printk.c > >> > +++ b/kernel/printk.c > >> > @@ -615,6 +615,9 @@ static int devkmsg_open(struct inode *inode, struct > >> > file *file) > >> > struct devkmsg_user *user; > >> > int err; > >> > > >> > + if (dmesg_restrict && !capable(CAP_SYSLOG)) > >> > + return -EACCES; > >> > + > >> > >> I think this should use check_syslog_permissions() instead, as done for > >> /proc/kmsg and the syslog syscall. > >> > >> err = check_syslog_permissions(SYSLOG_ACTION_OPTION, > >> SYSLOG_FROM_FILE); > > > > Did you mean SYSLOG_ACTION_OPEN? > > Oops, yes, typo. > > > I didn't code it that way because the comment in that function about the > > capability checks already being done seem pretty off to me. I could > > have just misread the /proc code though. I can resend with the change > > you suggest if everyone thinks that's a better way. > > Yeah, the comment is meaningful if you examine how /proc/kmsg works, > which was basically as a wrapper to the syslog syscall. The issue was > that we had to catch (and potentially block) the "open" action on > /proc/kmsg vs blocking the the syslog read action. In this case, we've > got another file-based interface, so we should use OPEN and FROM_FILE > to block the open. (Though it could be argued that we only want to > block the open if it's reading, which is exactly what Kay was trying > to do originally it looks like.) Right. Now we have /proc/kmsg, /dev/kmsg, and the syscall. /proc/kmsg and the syscall both use do_syslog() which calls check_syslog_permissions() and security_syslog(). /dev/kmsg only calls security_syslog(), which we all agree needs fixed. > > Also, the LSM hooks aren't doing any capability checks at all that I can > > see, which may or may not be a bug in and of itself but I have no idea. > > I was hoping Eric would speak up about that. I wouldn't call it a bug. But it sure is a pretty shitty design pattern to have security_* sometimes the right thing to do and sometimes capable() is the right thing to do. It is pervasive in the kernel that you have either/or, but I can't think of anywhere that functions are expected to do BOTH. So yeah, that needs fixed. > Eric explicitly removed the cap check since it was cluttering things > the way it was originally written. I do think security_syslog() should > pass through check_syslog_permissions(), though. Then this wouldn't > have happened. That might actually be the right way to clean this up, > but I'd like to see Eric's thoughts first. How about something like this? diff --git a/kernel/printk.c b/kernel/printk.c index 7c69b3e..ced2cac 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -626,7 +626,7 @@ static int devkmsg_open(struct inode *inode, struct file *file) if ((file->f_flags & O_ACCMODE) == O_WRONLY) return 0; - err = security_syslog(SYSLOG_ACTION_READ_ALL); + err = check_syslog_permissions(SYSLOG_ACTION_OPEN, SYSLOG_FROM_FILE); if (err) return err; @@ -840,22 +840,23 @@ static int check_syslog_permissions(int type, bool from_file) * already done the capabilities checks at open time. */ if (from_file && type != SYSLOG_ACTION_OPEN) -
Re: IMA: How to manage user space signing policy with others
On Thu, Feb 28, 2013 at 4:35 PM, Vivek Goyal wrote: > On Thu, Feb 28, 2013 at 02:23:39PM -0500, Mimi Zohar wrote: I think just a second for both of you to step back and see a slightly larger picture/problem might help. This is a weird case where Vivek does not trust root to make the policy decision. If the box is configured for secure boot, it needs to make these decisions no matter what the admin wants. This is why he talks about trying to merge multiple competing policies. The current IMA policy is controlled by whomever can first write to the ima policy file interface. Vivek does not want an admin to be able to overwrite the secureboot policy. So I get why he thinks changes may be needed to support this use case. The ima_tcb policy was meant to be larger than needed to determine a trusted computing base, but it is clearly not a superset of what he is hoping to accomplish. So how do we take a system where the admin/software has some control over the integrity policy (as it is today?) and the kernel/system itself also has control (as Vivek wants it)? It seems unsolved with what we have today -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 1/3] seccomp: add generic code for jitted seccomp filters.
On Fri, 2013-03-15 at 11:45 -0700, Kees Cook wrote: > On Fri, Mar 15, 2013 at 11:28 AM, Nicolas Schichan > wrote: > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > > index 6f19cfd..af27494 100644 > > --- a/include/linux/seccomp.h > > +++ b/include/linux/seccomp.h > > @@ -6,6 +6,7 @@ > > #ifdef CONFIG_SECCOMP > > > > #include > > +#include > > #include > > > > struct seccomp_filter; > > @@ -47,6 +48,44 @@ static inline int seccomp_mode(struct seccomp *s) > > return s->mode; > > } > > > > +/** > > + * struct seccomp_filter - container for seccomp BPF programs > > + * > > + * @usage: reference count to manage the object lifetime. > > + * get/put helpers should be used when accessing an instance > > + * outside of a lifetime-guarded section. In general, this > > + * is only needed for handling filters shared across tasks. > > + * @prev: points to a previously installed, or inherited, filter > > + * @len: the number of instructions in the program > > + * @insns: the BPF program instructions to evaluate > > This should be updated to include the new bpf_func field. > > Regardless, it'd be better to not expose this structure to userspace. This is fine include/uapi/linux/seccomp.h is exposed to userspace include/linux/seccomp.h is kernel internal -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] security: unconditionally call Yama
On Fri, Aug 31, 2012 at 2:39 PM, Alan Cox wrote: > On Fri, 31 Aug 2012 14:31:26 -0700 > Kees Cook wrote: > >> Unconditionally call Yama, no matter what LSM module is selected. > Not a good plan. What happens when everyone decides to stack every module > by ifdeffing in the kernel - mayhem. I'm all for it being possible but > done right - espeicall as I believe a proper proposal now for stacking > modules, and it should be done as part of that. I think a lot of us agree it's a 'difficult' plan going forward. Kees wrote this patch after we (James, SELinux, AppArmor people) talked at the Security Summit earlier today. From my PoV we have a couple of 'classes' of LSMs. Big with Labels: SELinux and SMACK Big without Labels: AppArmor and TOMOYO Small and stateless: Capabilities and Yama (and maybe integrity) Those small and stateless LSMs can easily stack. We don't have object lifetime issues. We don't have difficult technical details. We all here seem to want to stack 'small and stateless' with 'big boy'. Outside one little capabilities and selinux setxattr issue I can't think of anything even quirky. So the fast path forward, in my mind, is to start here with Yama. Our choice *today* is adding these static hooks inside the 'big boy' LSMs (which I think all of us are willing to do) vs adding it one time in the LSM and not having to worry about it all over the place. Getting the big guys and the little guys to play together is not going to lead to a mess of crazy. Stacking the big boys is a future goal most of us are happy with seeing done, if it turns out to be reasonable. We know it's close to possible. Dave Howell's gave us an implementation about 2 years ago. Casey Schaufler demo'd another stacking interface today. No code from the latter, but the only limitation he really mentioned today was that two big guys with labels can't stack together. I don't know how/if Dave's implementation wanted to handle that case. I really think this patch is good. Acked-by: Eric Paris I think I even want to do the same thing with capabilities so I don't have to make sure I'm getting those right in SELinux. And everyone else probably doesn't want to keep those hooks themselves either. I should send that patch to Linus. I bet I can give him a large negative diff. He did just say two days ago that he'll take any -1000 line diff after -rc6 :) I think Casey and Dave need to both get their larger stacking solutions posted and we should go with the best one. It'll let us pull out the static code, but for now, I like the static coding between the big boys and the little guys. Lets fix today what's easy and fix tomorrow what we can't fix today. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] security: unconditionally call Yama
On Fri, Aug 31, 2012 at 4:59 PM, Eric W. Biederman wrote: > From a overal kernel maintenance and use perspective the unconditional > enablement is a pain. > > We long ago established the principle that compiling additional code > into the kernel should not change the semenatics of the kernel. > > So this code needs to come with a command line or sysctl on/off switch > not an unconditional enable. Your argument makes zero sense. If I decide to build new code, that new code can do something. It happens all the time. If you don't like Yama, don't build Yama. If you don't like the only thing that Yama does (it only implements one protection), disable that protection from sysctl. I don't get it. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
On Thu, Nov 1, 2012 at 5:59 AM, James Bottomley wrote: > But that doesn't really help me: untrusted root is an oxymoron. Imagine you run windows and you've never heard of Linux. You like that only windows kernels can boot on your box and not those mean nasty hacked up malware kernels. Now some attacker manages to take over your box because you clicked on that executable for young models in skimpy bathing suits. That executable rewrote your bootloader to launch a very small carefully crafted Linux environment. This environment does nothing but launch a perfectly valid signed Linux kernel, which gets a Windows environment all ready to launch after resume and goes to sleep. Now you have to hit the power button twice every time you turn on your computer, weird, but Windows comes up, and secureboot is still on, so you must be safe! In this case we have a completely 'untrusted' root inside Linux. From the user PoV root and Linux are both malware. Notice the EXACT same attack would work launching rootkit'd Linux from Linux. So don't pretend not to care about Windows. It's just that launching malware Linux seems like a reason to get your key revoked. We don't want signed code which can be used as an attack vector on ourselves or on others. That make sense? -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
On Thu, Nov 1, 2012 at 10:42 AM, James Bottomley wrote: > On Thu, 2012-11-01 at 10:29 -0400, Eric Paris wrote: >> On Thu, Nov 1, 2012 at 5:59 AM, James Bottomley >> wrote: >> >> > But that doesn't really help me: untrusted root is an oxymoron. >> >> Imagine you run windows and you've never heard of Linux. You like >> that only windows kernels can boot on your box and not those mean >> nasty hacked up malware kernels. Now some attacker manages to take >> over your box because you clicked on that executable for young models >> in skimpy bathing suits. That executable rewrote your bootloader to >> launch a very small carefully crafted Linux environment. This >> environment does nothing but launch a perfectly valid signed Linux >> kernel, which gets a Windows environment all ready to launch after >> resume and goes to sleep. Now you have to hit the power button twice >> every time you turn on your computer, weird, but Windows comes up, and >> secureboot is still on, so you must be safe! > > So you're going back to the root exploit problem? I thought that was > debunked a few emails ago in the thread? > > Your attack vector isn't plausible because for the suspend attack to > work, the box actually has to be running Linux by default ... I think > the admin of that box might notice if it suddenly started running > windows ... Maybe you misread. The owner of the box would never know a shim Linux was loaded. In any case, as I said, windows really is irrelevant. It's just using Linux as an attack vectore against Windows is what would get keys revoked. If your key is revoke Linux can't boot on a large amount of new hardware without BIOS twiddling.u >> In this case we have a completely 'untrusted' root inside Linux. From >> the user PoV root and Linux are both malware. Notice the EXACT same >> attack would work launching rootkit'd Linux from Linux. So don't >> pretend not to care about Windows. It's just that launching malware >> Linux seems like a reason to get your key revoked. We don't want >> signed code which can be used as an attack vector on ourselves or on >> others. >> >> That make sense? > > Not really, no. A windows attack vector is a pointless abstraction > because we're talking about securing Linux and your vector requires a > Linux attack for the windows compromise ... let's try to keep on point > to how we're using this feature to secure Linux. I pointed out that the exact same attack exists with Linux on Linux. To launch a malware linux kernel all you have to do is launch a shim signed acceptable linux environment, have it set up the malware kernel to launch after resume, and go to sleep. Agreed, it'd be very weird that the first time you hit the power button your machine comes on and then quickly goes right back to sleep, but certainly we can envision that being ignored by many desktop users... Do you see how 'root' in the first environment is untrusted? Now you can pretend not to care because the 'original' root was trusted. But people install bad crap all the time. There are hundreds of ways to install bad software as root. Go to any site distributing rpms or debs to get that new version of mod_perl and it could install the malware kernel and shim environment. The point of secureboot is even if the admin did something which allowed his kernel to be compromised, it won't persist. Sure, secureboot moves the attack up the stack to userspace, but at least we can do something about the kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
On Thu, Nov 1, 2012 at 10:46 AM, Alan Cox wrote: >> Imagine you run windows and you've never heard of Linux. > > To those people I think you mean "never heard of Ubuntu" ;-) :-) > With all the current posted RH patches I can still take over > the box as root trivially enough and you seem to have so far abolished > suspend to disk, kexec and a pile of other useful stuff. To actually lock > it down you'll have to do a ton more of this. I'm guessing those writing the patches would like to hear about these. Suspend to disk and kexec can probably both be fixed up to work... > Actually from what I've seen on > the security front there seems to a distinct view that secure boot is > irrelevant because Windows 8 is so suspend/resume focussed that you might > as well just trojan the box until the next reboot as its likely to be a > couple of weeks a way. Bit of a straw man isn't it? Hey, don't fix A, I can do B! I'm not saying you're wrong, nor that maybe online attacks which don't persist across reboot wouldn't be more likely, but they aren't attacking the same problem. (I haven't heard any progress on what you point out, but at least we have some progress on some small class of boot time persistent attacks) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
On Thu, Nov 1, 2012 at 11:06 AM, James Bottomley wrote: > But surely that's fanciful ... you've already compromised windows to get > access to the ESP. If you've done it once, you can do it again until > the exploit is patched. You work under the assumption that any bad operation was done by means of a compromised kernel. Admins install things all the time, sometimes, things which they shouldn't. (This statement is OS agnostic) > There are likely many easier ways of ensuring > persistence than trying to install a full linux kernel with a > compromised resume system. I'm sure lots of us would love to hear the ideas. And the attack is on the suspend side, nothing about resume has to be malicious (not really relevant I guess)... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
On Thu, Nov 1, 2012 at 11:18 AM, James Bottomley wrote: > You're completely confusing two separate goals: > > 1. Is it possible to use secure boot to implement a security policy > on Linux > 2. What do we have to do anything to prevent Linux being used to > attack windows which may lead to a revocation of the distro > signing key. > > "untrusted root" is a silly answer to 1 because it's incredibly > difficult to remove sufficient trust from root and still have it be > trusted enough to be effective as root. The trust bound up in root is > incredibly intertwined. It would be far better to start by eliminating > the root user altogether and building up on the capabilities in a > granular fashion for this type of lockdown. granular lockdown!? sounds like SELinux. But that certainly can't solve the problems here... I think your premise has a couple of problems. First is how you chose to word #2. Lets reword it as: What do we have to do to prevent Linux being used to attack Linux which may lead to secure boot being useless. If we accept that as #2, then we think, "What makes secure boot useless" or "What is the security goal as envisioned by secure boot." The goal of secure boot is to implement an operating system which prevents uid==0 to ring 0 escalation. That is the security policy secure boot needs to not be completely useless. And as it turns out, that security policy is useful in other situations. > "untrusted root", if it can even be achieved, might be a sufficient > condition for 2 but it's way overkill for a necessary one. But it is a condition, as specifically stated, that others have wanted long before secure boot even came to rise. I've talked with and worked with a public cloud operator who wants to prevent even a malicious root user from being able to run code in ring 0 inside their VM. The hope in that case was that in doing so they can indirectly shrink the attack surface between virtual machine and hypervisor. They hoped to limit the ways the guest could interact to only those methods the linux kernel implemented. They want to launch a vm running a kernel they chose and make sure root inside the vm could not run some other kernel or run arbitrary code in kernel space. It's wasn't something they solved completely. If it was, all of this secure boot work would be finished. Which is why we are having these discussions to understand all of the way that you an Alan seem to have to get around the secure boot restrictions. And look for solutions to retain functionality which meeting the security goal of 'prevent uid=0 to ring 0 privilege escalation. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
I know I started it, but Windows really isn't necessary to see value, even if it is what pushed the timing. A user installs a package as root. Absent any flaws in the Linux kernel (cough) that should be all it can do in a Secure Boot world. But if you can drop a small trusted Linux system in there and use that to boot a compromised Linux kernel, it can make itself persistent. And like I said, I know there are cloud providers out there who want EXACTLY this type of system. One in which root in the guest is untrusted and they want to keep them out of ring 0. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Second attempt at kernel secure boot support
On Sat, Nov 3, 2012 at 12:31 PM, Alan Cox wrote: >> You're guaranteed to be able >> to do this on any Windows 8 certified hardware. > > Thats not my understanding of the situation. Windows 8 certification has this as a requirement for x86 hardware. I belied the opposite is a requirement for arm hardware. However it's possible that it just doesn't specifiy at all for arm. So yes, you're guaranteed to be able to do this on any Windows 8 certified x86 hardware. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option
On Wed, 2013-09-18 at 15:06 -0400, Richard Guy Briggs wrote: > reaahead-collector abuses the audit logging facility to discover which files > are accessed at boot time to make a pre-load list > > Add a tuning option to audit_backlog_wait_time so that if auditd can't keep > up, > or gets blocked, the callers won't be blocked. > > Signed-off-by: Richard Guy Briggs > --- > include/uapi/linux/audit.h |2 ++ > kernel/audit.c | 22 +- > 2 files changed, 23 insertions(+), 1 deletions(-) > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h > index 75cef3f..493a66e 100644 > --- a/include/uapi/linux/audit.h > +++ b/include/uapi/linux/audit.h > @@ -316,6 +316,7 @@ enum { > #define AUDIT_STATUS_PID 0x0004 > #define AUDIT_STATUS_RATE_LIMIT 0x0008 > #define AUDIT_STATUS_BACKLOG_LIMIT 0x0010 > +#define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020 > /* Failure-to-log actions */ > #define AUDIT_FAIL_SILENT0 > #define AUDIT_FAIL_PRINTK1 > @@ -367,6 +368,7 @@ struct audit_status { > __u32 backlog_limit; /* waiting messages limit */ > __u32 lost; /* messages lost */ > __u32 backlog;/* messages waiting in queue */ > + __u32 backlog_wait_time;/* message queue wait timeout */ > }; > > struct audit_tty_status { > diff --git a/kernel/audit.c b/kernel/audit.c > index 3d17670..fc535b6 100644 > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -321,6 +321,12 @@ static int audit_set_backlog_limit(int limit) > return audit_do_config_change("audit_backlog_limit", > &audit_backlog_limit, limit); > } > > +static int audit_set_backlog_wait_time(int timeout) > +{ > + return audit_do_config_change("audit_backlog_wait_time", > + &audit_backlog_wait_time, timeout); > +} > + > static int audit_set_enabled(int state) > { > int rc; > @@ -669,6 +675,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct > nlmsghdr *nlh) > s.backlog_limit = audit_backlog_limit; > s.lost = atomic_read(&audit_lost); > s.backlog= skb_queue_len(&audit_skb_queue); > + s.backlog_wait_time = audit_backlog_wait_time; > audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0, >&s, sizeof(s)); > break; > @@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, struct > nlmsghdr *nlh) > if (err < 0) > return err; > } > - if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) > + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) { > err = audit_set_backlog_limit(s.backlog_limit); > + if (err < 0) > + return err; > + } > + if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) { > + if (sizeof(s) > (size_t)nlh->nlmsg_len) > + break; What gets returned here? I think err has a value of 0, but it doesn't seem to have been clearly intentional. If they know about the AUDIT_STATUS_BACKLOG_WAIT_TIME flag, but they didn't send a long enough skb? That seems like an error condition > + if (s.backlog_wait_time < 0 || > + s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME) > + return -EINVAL; > + err = audit_set_backlog_wait_time(s.backlog_wait_time); > + if (err < 0) > + return err; > + } > break; > } > case AUDIT_USER: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/8] audit: add audit_backlog_wait_time configuration option
On Wed, 2013-09-18 at 16:49 -0400, Richard Guy Briggs wrote: > On Wed, Sep 18, 2013 at 04:33:25PM -0400, Eric Paris wrote: > > On Wed, 2013-09-18 at 15:06 -0400, Richard Guy Briggs wrote: > > > reaahead-collector abuses the audit logging facility to discover which > > > files > > > are accessed at boot time to make a pre-load list > > > > > > Add a tuning option to audit_backlog_wait_time so that if auditd can't > > > keep up, > > > or gets blocked, the callers won't be blocked. > > > > diff --git a/kernel/audit.c b/kernel/audit.c > > > index 3d17670..fc535b6 100644 > > > --- a/kernel/audit.c > > > +++ b/kernel/audit.c > > > @@ -701,8 +708,21 @@ static int audit_receive_msg(struct sk_buff *skb, > > > struct nlmsghdr *nlh) > > > if (err < 0) > > > return err; > > > } > > > - if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) > > > + if (s.mask & AUDIT_STATUS_BACKLOG_LIMIT) { > > > err = audit_set_backlog_limit(s.backlog_limit); > > > + if (err < 0) > > > + return err; > > > + } > > > + if (s.mask & AUDIT_STATUS_BACKLOG_WAIT_TIME) { > > > + if (sizeof(s) > (size_t)nlh->nlmsg_len) > > > + break; > > > > What gets returned here? I think err has a value of 0, but it doesn't > > seem to have been clearly intentional. If they know about the > > AUDIT_STATUS_BACKLOG_WAIT_TIME flag, but they didn't send a long enough > > skb? That seems like an error condition > > The intent was that it is a NOP, since err would have a value of zero, > but I see your point, that if that flag is present, the struct member > should be too. My original intent was that if the structure member > wasn't present, it would default to zero, unintentionally setting the > wait time to zero. It was part of my paranoia in the absence of an API > version indicator. No harm done, but I agree it should return an error. > > Thanks for the catch. > > > > + if (s.backlog_wait_time < 0 || > > > + s.backlog_wait_time > 10*AUDIT_BACKLOG_WAIT_TIME) > > > + return -EINVAL; > > I assume values less than zero or larger than 10 times the current > default of one minute are errors or unreasonable. > > Any argument for more than 10 minutes? 10 Minutes already seems nuts. If someone has a reason to go higher, we can change this sanity check then. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API
On Thu, 2013-09-19 at 17:18 -0400, Steve Grubb wrote: > On Wednesday, September 18, 2013 03:06:52 PM Richard Guy Briggs wrote: > > Re-named confusing local variable names (status_set and status_get didn't > > agree with their command type name) and reduced their scope. > > > > Future-proof API changes by not depending on the exact size of the > > audit_status struct. > > I wished things like this were coordinated before being written. We had some > discussion of this back in July under a topic, "audit: implement generic > feature setting and retrieving". Maybe that API can be fixed so its not just > set/unset but can take a number as well. Your right, we did talk about it. Though it seems we talked past each other. What was implemented was an on off extensible interface. The status interface already fits for setting numbers. And because of how it is used has been extended is is extensible for setting numbers. > > Also, because there is no way to query the kernel to see what kind of things > it supports I'll agree. Richard, can you please add a version field to the status? Start at version 1. Any time we add a new audit feature we'll bump it. > , we've typically defined a new message type and put into it exactly > what we need. In other words, if you want something expandable, the define a > new message type like AUDIT_GET_EXT and AUDIT_SET_EXT and build it to be > expandable. There is no reason we can't use what we have. (As we've done it before) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/1] audit_alloc: clear TIF_SYSCALL_AUDIT if !audit_context
On Sun, 2013-09-15 at 19:11 +0200, Oleg Nesterov wrote: > If audit_filter_task() nacks the new thread it makes sense > to clear TIF_SYSCALL_AUDIT which can be copied from parent > by dup_task_struct(). > > A wrong TIF_SYSCALL_AUDIT is not really bad but it triggers > the "slow" audit paths in entry.S to ensure the task can not > miss audit_syscall_*() calls, this is pointless if the task > has no ->audit_context. > > Signed-off-by: Oleg Nesterov > Acked-by: Steve Grubb Acked-by: Eric Paris Richard, please pick this up into your tree. > --- > kernel/auditsc.c |4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index 9845cb3..95293ab 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -943,8 +943,10 @@ int audit_alloc(struct task_struct *tsk) > return 0; /* Return if not auditing. */ > > state = audit_filter_task(tsk, &key); > - if (state == AUDIT_DISABLED) > + if (state == AUDIT_DISABLED) { > + clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT); > return 0; > + } > > if (!(context = audit_alloc_context(state))) { > kfree(key); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] audit: remove newline accidentally added during session id helper refactor
On Wed, 2013-09-18 at 11:17 -0400, Richard Guy Briggs wrote: > A newline was accidentally added during session ID helper refactorization in > commit 4d3fb709. This needlessly uses up buffer space, messes up syslog > formatting and makes userspace processing less efficient. Remove it. > > Signed-off-by: Richard Guy Briggs Acked-by: Eric Paris > --- > kernel/audit.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/kernel/audit.c b/kernel/audit.c > index 3d17670..ac16540 100644 > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -1413,7 +1413,7 @@ void audit_log_session_info(struct audit_buffer *ab) > u32 sessionid = audit_get_sessionid(current); > uid_t auid = from_kuid(&init_user_ns, audit_get_loginuid(current)); > > - audit_log_format(ab, " auid=%u ses=%u\n", auid, sessionid); > + audit_log_format(ab, " auid=%u ses=%u", auid, sessionid); > } > > void audit_log_key(struct audit_buffer *ab, char *key) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] ipc: shm and msg fixes
On Thu, 2013-09-19 at 14:22 -0700, Davidlohr Bueso wrote: > On Sun, 2013-09-15 at 20:04 -0700, Davidlohr Bueso wrote: > > This patchset deals with the selinux and rmid races Manfred found on > > the ipc scaling work that has been going on. It specifically addresses > > shared mem and msg queues. While semaphores still need updated, I want > > to make sure these are correct first. Also, Manfred had already sent out > > a patchset that deals with a race in sem complex operations. So any changes > > should be on top of his. > > > > Patches 1 and 2 deal with shared memory. > > Patches 3 and 4 deal with msg queues. > > Specific details about each race and its fix are in the corresponding > > patches. > > > > Note that Linus suggested a good alternative to patches 1 and 3: use > > kfree_rcu() and delay the freeing of the security structure. I would > > much prefer that approach to doing security checks with the lock held, > > but I want to leave the patches out and ready in case we go with the > > later solution. > > Cc'ing selinux/security people - folks, could you please confirm that > this change is ok and won't break anything related to security? I agree with the approach to delay the freeing and it does not appear to be a problem from an SELinux PoV, but I don't necessarily agree with the location of the delay. Why should every LSM be required to know the details of that object lifetime? It seems to me like we might want to move this delay up to shm_destroy. I know that's going to be more code then using kfree_rcu(), but it seems like a much better abstraction to hide that knowledge away from the LSM. Possibly place the rcu_head in struct kern_ipc_perm? Then use call_rcu() to call the security destroy hook? You'll have to re-write to LSM hook to take an rcu_head, etc, but that shouldn't be a big problem. Doing it this way, also means you won't break the security model of SMACK. It looks like you'd still have the exact same race with SMACK, except they can't be fixed with kfree_rcu(). They are just setting a pointer to NULL. Which could then be dereferenced later. They don't actually do allocation/free. So ACK on the RCU delay, but NACK to making the hooks do the delay, instead do it in the IPC code. > > Thanks! > > > 8<--- > > From: Davidlohr Bueso > Date: Thu, 19 Sep 2013 09:32:05 -0700 > Subject: [PATCH] selinux: rely on rcu to free ipc isec > > Currently, IPC mechanisms do security and auditing related checks under > RCU. However, since selinux can free the security structure, through > selinux_[sem,msg_queue,shm]_free_security(), we can race if the structure > is freed before other tasks are done with it, creating a use-after-free > condition. Manfred illustrates this nicely, for example with shared mem: > > --> do_shmat calls rcu_read_lock() > --> do_shmat calls shm_object_check(). > Checks that the object is still valid - but doesn't acquire any locks. > Then it returns. > --> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat) > --> selinux_shm_shmat calls ipc_has_perm() > --> ipc_has_perm accesses ipc_perms->security > > shm_close() > --> shm_close acquires rw_mutex & shm_lock > --> shm_close calls shm_destroy > --> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security) > --> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm) > --> ipc_free_security calls kfree(ipc_perms->security) > > There are two solutions to this: (i) perform the security checking and > whatever > IPC operation(s) atomically (that is, with the kern_ipc_perm.lock held), or > (ii) delay the freeing of the isec structure after all RCU readers are done. > This patch takes the second approach, which, at least from a performance > perspective, > is more practical than the first as it keeps the IPC critical regions smaller. > > I have tested this patch with IPC testcases from LTP on both my quad-core > laptop and on a 64 core NUMA server. In both cases selinux is enabled, and > tests pass for both voluntary and forced preemption models. > > Signed-off-by: Davidlohr Bueso > --- > security/selinux/hooks.c | 9 - > security/selinux/include/objsec.h | 5 +++-- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index a5091ec..179ffe9 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -4892,8 +4892,15 @@ static int ipc_alloc_security(struct task_struct *task, > static void ipc_free_security(struct kern_ipc_perm *perm) > { > struct ipc_security_struct *isec = perm->security; > + > + /* > + * All IPC mechanisms do security and audit > + * checking under RCU: wait a grace period before > + * freeing isec, otherwise we can run into a > + * use-after-free scenario. > + */ > + kfree_rcu(isec, rcu); > perm->security = NULL; > - kfree(isec); > } > > static int msg_
Re: Oddness in security/Makefile
>From 4675ca3470e3c2e325c5be6d9a11f47ac0917537 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 10 Sep 2013 09:51:50 -0400 Subject: [PATCH] security: remove erroneous comment about capabilities.o link ordering Back when we had half ass LSM stacking we had to link capabilities.o after bigger LSMs so that on initialization the bigger LSM would register first and the capabilities module would be the one stacked as the 'seconday'. Somewhere around 6f0f0fd496333777d53 (back in 2008) we finally removed the last of the kinda module stacking code but this comment in the makefile still lives today. Reported-by: Valdis Kletnieks Signed-off-by: Eric Paris --- security/Makefile | 1 - 1 file changed, 1 deletion(-) diff --git a/security/Makefile b/security/Makefile index c26c81e..a5918e0 100644 --- a/security/Makefile +++ b/security/Makefile @@ -16,7 +16,6 @@ obj-$(CONFIG_MMU) += min_addr.o # Object file lists obj-$(CONFIG_SECURITY) += security.o capability.o obj-$(CONFIG_SECURITYFS) += inode.o -# Must precede capability.o in order to stack properly. obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o obj-$(CONFIG_SECURITY_SMACK) += smack/built-in.o obj-$(CONFIG_AUDIT)+= lsm_audit.o -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] audit: avoid soft lockup in audit_log_start()
On Mon, 2013-09-09 at 18:32 +0400, Konstantin Khlebnikov wrote: > Luiz Capitulino wrote: > > I'm getting the following soft lockup: > > > > CPU: 6 PID: 2278 Comm: killall5 Tainted: GF3.11.0-rc7+ #1 > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > 0099 88011fd83de8 815324df 2800 > > 817d48f9 88011fd83e68 8152e669 88011fd83e68 > > 0008 88011fd83e78 88011fd83e18 004081dac040 > > Call Trace: > > [] dump_stack+0x46/0x58 > > [] panic+0xbb/0x1c4 > > [] watchdog_timer_fn+0x163/0x170 > > [] __run_hrtimer+0x81/0x1c0 > > [] ? watchdog+0x30/0x30 > > [] hrtimer_interrupt+0x107/0x240 > > [] local_apic_timer_interrupt+0x3b/0x60 > > [] smp_apic_timer_interrupt+0x45/0x60 > > [] apic_timer_interrupt+0x6a/0x70 > > [] ? audit_log_start+0xbf/0x430 > > [] ? audit_log_start+0x147/0x430 > > [] ? try_to_wake_up+0x2a0/0x2a0 > > [] audit_log_exit+0x6ae/0xc30 > > [] ? __alloc_fd+0x42/0x100 > > [] __audit_syscall_exit+0x257/0x2b0 > > [] sysret_audit+0x17/0x21 > > > > The reproducer is somewhat unusual: > > > > 1. Install RHEL6.5 (maybe a similar older user-space will do) > > 2. Boot the just installed system > > 3. In this first boot you'll meet the firstboot script, which > > will do some setup and (depending on your answers) it will > > reboot the machine > > 4. During that first reboot the system hangs while terminating > > all processes: > > > > Sending all processes the TERM signal... > > > > It's when the soft lockup above happens. And yes, I managed > > to get this with latest upstream kernel (HEAD fa8218def1b1) > > > > I'm reproducing it on a VM, but the first report was on bare-metal. > > > > This is what is happening: > > > > 1. audit_log_start() is called > > 2. As we have SKBs waiting in audit_skb_queue and all conditions > > evaluate to true, we sleep in wait_for_auditd() > > 3. Go to 2, until sleep_time gets negative and audit_log_start() > > just busy-waits > > > > Now, *why* this is happening is a mistery to me. I tried debugging > > it, but all I could find is that at some point the kauditd thread > > never wakes up after having called schedule(). I even tried waking > > it up before calling wait_for_auditd(), but it didn't. > > We run into the same problem in rhel6 kernel. > > "readahead-collector" uses audit interface and sometimes stuck in 'stopped' > state. > > After commit 829199197a430dade2519d54f5545c4a094393b8 > (which was backported by RH into their kernel) > audit emiters will block forever if userspace daemon cannot handle backlog. > That commit just breaks timeout condition, after timeout waiting loop turns > into busy loop until deamon dies or returns back to work. > > this trivial patch should fix that problem > > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct > audit_context *ctx, gfp_t gfp_mask, > > sleep_time = timeout_start + audit_backlog_wait_time > - > jiffies; > - if ((long)sleep_time > 0) > + if ((long)sleep_time > 0) { > wait_for_auditd(sleep_time); > - continue; > + continue; > + } > } > if (audit_rate_check() && printk_ratelimit()) > printk(KERN_WARNING I think this is the right(ish) fix, at least it gets at the real bug. 829199197a430dade2519d54f5545c4a094393b8 definitely is the problem. The 60 second wait is NOT causing the soft lockup. Instead after the 60 second wait, sleep_time, will go negative on subsequent trips through the loop. Since sleep_time is negative we will hit the continue and loop FOREVER with no opportunity for progress. Aka, softlockup... I think it's a really ?interesting? (poor) coding style to leave sleep_time going negative on the second trip through the loop to indicate a timeout in wait_for_auditd(). Maybe the function should be rewritten a bit. Have a return code from wait_for_auditd, if it waited 60 seconds, call the failure function, if it succeeded in less the 60 seconds continue along... At least something that makes the logic here obvious. I think Richard is also going to set audit_backlog_wait_time to 0 on failure and will reset it to default on success. To address the other issue as well. That is just dumb. RGB should be taking care of it in RHEL as soon as he gets this worked out. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] security: allow Yama to be unconditionally stacked
On Wed, Sep 5, 2012 at 11:47 AM, Serge Hallyn wrote: > Quoting Kees Cook (keesc...@chromium.org): >> Unconditionally call Yama when CONFIG_SECURITY_YAMA_STACKED is selected, >> no matter what LSM module is primary. >> >> Ubuntu and Chrome OS already carry patches to do this, and Fedora >> has voiced interest in doing this as well. Instead of having multiple >> distributions (or LSM authors) carrying these patches, just allow Yama >> to be called unconditionally when selected by the new CONFIG. > > I don't really like having both the STACKED and non-stacked paths. But > I don't have a good alternative. > >> Signed-off-by: Kees Cook > > Acked-by: Serge E. Hallyn I said basically the same thing to Kees off list. But I don't have an answer either. Acked-by: Eric Paris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/11] kexec: Disable in a secure boot environment
On Wed, Sep 5, 2012 at 5:41 PM, Matthew Garrett wrote: > Yeah, I think renaming the cap is a given. CAP_RING_ZERO Needed for any activity which would give root the ability to run code in ring 0? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.6-rc4 audit_log_d_path oops.
On Thu, Sep 6, 2012 at 11:16 AM, Dave Jones wrote: > On Thu, Sep 06, 2012 at 09:46:28AM -0400, Dave Jones wrote: > > Hit this in overnight fuzz testing.. > > > > BUG: unable to handle kernel NULL pointer dereference at 0020 > > IP: [] audit_log_d_path+0x35/0xf0 > > PGD 12fded067 PUD 142c06067 PMD 0 > > Oops: [#1] SMP > > Modules linked in: tun fuse ipt_ULOG binfmt_misc nfnetlink nfc caif_socket > caif phonet can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key > decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 lockd > sunrpc bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 > ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state > nf_conntrack kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr > i2c_i801 e1000e uinput i915 video i2c_algo_bit drm_kms_helper drm i2c_core > > CPU 5 > > Pid: 7007, comm: trinity-child5 Not tainted 3.6.0-rc4+ #36 > > RIP: 0010:[] [] > audit_log_d_path+0x35/0xf0 > > RSP: 0018:880116b33ec8 EFLAGS: 00010246 > > RAX: RBX: RCX: 0035 > > RDX: 819dbf5d RSI: 819da5bb RDI: > > RBP: 880116b33ee8 R08: 88001942533e R09: > > R10: 0001 R11: R12: 880116b33f38 > > R13: 880116b33f38 R14: 819fa1eb R15: 0005 > > FS: 7f5742581740() GS:88014860() > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 0020 CR3: 0001046d3000 CR4: 001407e0 > > DR0: DR1: DR2: > > DR3: DR6: 0ff0 DR7: 0400 > > Process trinity-child5 (pid: 7007, threadinfo 880116b32000, task > 880019424d00) > > Stack: > > 880116b33ee8 880116b33f38 880019424d00 > > 880116b33f18 81103522 01d0 880119ed6500 > > 01d0 0109 880116b33f78 811ebb95 > > Call Trace: > > [] audit_log_link_denied+0x92/0x100 > > [] sys_linkat+0x195/0x1e0 > > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > > [] system_call_fastpath+0x1a/0x1f > > Code: 5d e8 4c 89 65 f0 4c 89 6d f8 0f 1f 44 00 00 48 85 f6 48 89 fb 49 89 > d5 74 11 48 89 f2 31 c0 48 c7 c6 bb a5 9d 81 e8 8b e1 ff ff <8b> 73 20 40 f6 > c6 01 75 62 48 8b 3d 33 33 93 01 48 85 ff 74 4e > > RIP [] audit_log_d_path+0x35/0xf0 > > RSP > > CR2: 0020 > > ---[ end trace 85b88c850143bb1c ]--- > > > > That's here in kernel/audit.c > > > > 1433 > > 1434 /* We will allow 11 spaces for ' (deleted)' to be appended */ > > 1435 pathname = kmalloc(PATH_MAX+11, ab->gfp_mask); > > > > 'ab' is NULL. > > > > Looks like audit_log_link_denied needs to handle potential failure from > > audit_log_start and abort early. (oddly, it looks like every other > > function called there checks for !ab.) > > > > Maybe additional code should be added here to printk the audit message > > to dmesg so that we don't lose it entirely, but for now, minimal fix. > > > > Signed-off-by: Dave Jones > > > > diff --git a/kernel/audit.c b/kernel/audit.c > > index ea3b7b6..c3e85bb 100644 > > --- a/kernel/audit.c > > +++ b/kernel/audit.c > > @@ -1466,6 +1466,9 @@ void audit_log_link_denied(const char *operation, > struct path *link) > > > > ab = audit_log_start(current->audit_context, GFP_KERNEL, > > AUDIT_ANOM_LINK); > > +if (!ab) > > +return; > > + > > audit_log_format(ab, "op=%s action=denied", operation); > > audit_log_format(ab, " pid=%d comm=", current->pid); > > audit_log_untrustedstring(ab, current->comm); > > Added Kees, as this was introduced in a51d9eaa41866ab6b4b6ecad7b621f8b66ece0dc > > I just realised, the funny thing about this is that the machine running that > test > had selinux/audit disabled. And yet here we are, screwing around with audit > buffers. > > Should there be a test on audit_enable=0 in audit_log_link_denied() ? > > I'm now curious how much more of the audit code is getting run through > similar lack of tests Possible there are other places. The usual idium is to put a a static inline audit_[blah] in the header file which just does if (!audit_dummy_context()) return __audit_[blah]; return 0; Thus if audit is off or compiled out nothing gets run. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Check all returns from audit_log_start
On Thu, Sep 6, 2012 at 11:08 AM, Dave Jones wrote: > Following on from the previous patch that fixed an oops, these > are all the other similar code patterns in the tree with the same > checks added. I never saw these causing problems, but checking > this everywhere seems to make more sense than every subsequent > routine that gets passed 'ab' having to check it. > > Later we could remove all those same checks from audit_log_format > and friends. For now, this just prevents similar bugs being introduced > as the one in my previous patch. > > Signed-off-by: Dave Jones Not certain because I haven't looked at what happens with the error code, but I think this might not be right. auditd can be explictly told not to audit certain events, in which case it is normal and expected that ab would come back NULL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] security: introduce kernel_module_from_file hook
Acked-by: Eric Paris On Fri, Sep 7, 2012 at 2:38 PM, Kees Cook wrote: > Now that kernel module origins can be reasoned about, provide a hook to > the LSMs to make policy decisions about the module file. > > Signed-off-by: Kees Cook > Acked-by: Serge E. Hallyn > --- > include/linux/security.h | 13 + > kernel/module.c |9 + > security/capability.c|6 ++ > security/security.c |5 + > 4 files changed, 33 insertions(+), 0 deletions(-) > > diff --git a/include/linux/security.h b/include/linux/security.h > index 3dea6a9..368e539 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -693,6 +693,12 @@ static inline void security_free_mnt_opts(struct > security_mnt_opts *opts) > * userspace to load a kernel module with the given name. > * @kmod_name name of the module requested by the kernel > * Return 0 if successful. > + * @kernel_module_from_file: > + * Load a kernel module from userspace. > + * @file contains the file structure pointing to the file containing > + * the kernel module to load. If the module is being loaded from a blob, > + * this argument will be NULL. > + * Return 0 if permission is granted. > * @task_fix_setuid: > * Update the module's state after setting one or more of the user > * identity attributes of the current process. The @flags parameter > @@ -1507,6 +1513,7 @@ struct security_operations { > int (*kernel_act_as)(struct cred *new, u32 secid); > int (*kernel_create_files_as)(struct cred *new, struct inode *inode); > int (*kernel_module_request)(char *kmod_name); > + int (*kernel_module_from_file)(struct file *file); > int (*task_fix_setuid) (struct cred *new, const struct cred *old, > int flags); > int (*task_setpgid) (struct task_struct *p, pid_t pgid); > @@ -1764,6 +1771,7 @@ void security_transfer_creds(struct cred *new, const > struct cred *old); > int security_kernel_act_as(struct cred *new, u32 secid); > int security_kernel_create_files_as(struct cred *new, struct inode *inode); > int security_kernel_module_request(char *kmod_name); > +int security_kernel_module_from_file(struct file *file); > int security_task_fix_setuid(struct cred *new, const struct cred *old, > int flags); > int security_task_setpgid(struct task_struct *p, pid_t pgid); > @@ -2277,6 +2285,11 @@ static inline int security_kernel_module_request(char > *kmod_name) > return 0; > } > > +static inline int security_kernel_module_from_file(struct file *file) > +{ > + return 0; > +} > + > static inline int security_task_fix_setuid(struct cred *new, >const struct cred *old, >int flags) > diff --git a/kernel/module.c b/kernel/module.c > index c1e446e..0ad03c4 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -2430,6 +2431,10 @@ int copy_module_from_user(const void __user *umod, > unsigned long len, > if (info->len < sizeof(*(info->hdr))) > return -ENOEXEC; > > + err = security_kernel_module_from_file(NULL); > + if (err) > + return err; > + > /* Suck in entire file: we'll want most of it. */ > info->hdr = vmalloc(info->len); > if (!info->hdr) > @@ -2468,6 +2473,10 @@ int copy_module_from_fd(int fd, struct load_info *info) > if (err) > goto out; > > + err = security_kernel_module_from_file(file); > + if (err) > + goto out; > + > size = stat.size; > info->hdr = vmalloc(size); > if (!info->hdr) { > diff --git a/security/capability.c b/security/capability.c > index 61095df..8acb304 100644 > --- a/security/capability.c > +++ b/security/capability.c > @@ -395,6 +395,11 @@ static int cap_kernel_module_request(char *kmod_name) > return 0; > } > > +static int cap_kernel_module_from_file(struct file *file) > +{ > + return 0; > +} > + > static int cap_task_setpgid(struct task_struct *p, pid_t pgid) > { > return 0; > @@ -967,6 +972,7 @@ void __init security_fixup_ops(struct security_operations > *ops) > set_to_cap_if_null(ops, kernel_act_as); > set_to_cap_if_null(ops, kernel_create_files_as); > set_to_cap_if_null(ops, kernel_module_request); > + set_to_cap_if_null(ops, kernel_mo
Re: [PATCH] Smack: Add missing depends on INET in Kconfig
Do other LSMs need this too Casey? I remember we mentioned how select was dangerous :-( On Fri, Nov 30, 2012 at 12:28 PM, Casey Schaufler wrote: > Because NETLABEL depends on INET SECURITY_SMACK > has to explicitly call out the dependency. > > Signed-off-by: Casey Schaufler > --- > security/smack/Kconfig |1 + > 1 file changed, 1 insertion(+) > > diff --git a/security/smack/Kconfig b/security/smack/Kconfig > index 9fb14ef..1be1088 100644 > --- a/security/smack/Kconfig > +++ b/security/smack/Kconfig > @@ -1,5 +1,6 @@ > config SECURITY_SMACK > bool "Simplified Mandatory Access Control Kernel Support" > + depends on INET > depends on NET > depends on SECURITY > select NETLABEL > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: backing up ext4 fs, system unresponsive, thrashing like crazy even though swap is unused
On Mon, Dec 3, 2012 at 12:43 PM, Theodore Ts'o wrote: > If you are seeing a large number of inodes still in the ext4 inode > cache after using drop_caches, then I'd look to see whether you have > something like SELinux or auditing enabled which is pinning a bunch of > dentries or inodes You can safely ignore this suggestion as it does make sense. SELinux only grabs a references to dentries during its call to fs_ops->getxattr, which can't last a meaningful length of time (unless the filesystem is busted). It only grabs references to inodes during system initialization, when you couldn't have many in core. Audit, likewise, only grabs a reference to a dentry during execve() and only long enough to run getxattr and does not grab any reference directly to an inode at all. > or whether your backup program (or some other > program running on your system) is keeping lots of directories or > inodes open for some reason. Certainly could be this suggestion though.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] SUNRPC: connect to UNIX sockets synchronously
On Tue, Dec 4, 2012 at 6:10 AM, Stanislav Kinsbursky wrote: > But there should be noted, that such implementation introduces limitation > (Trond's quote): > "That approach can fall afoul of the selinux restrictions on the process > context. Processes that are allowed to write data, may not be allowed to > create sockets or call connect(). That is the main reason for doing it > in the rpciod context, which is a clean kernel process context." So you tested this and Trond was wrong? This work just fine even on an SELinux box? Or it does break tons and tons of people's computers? -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ima: policy search speedup
On Tue, Dec 11, 2012 at 12:55 PM, Linus Torvalds wrote: > And your "pseudo-filesystems" argument is pretty stupid too, since WE > ALREADY HAVE A FLAG FOR THAT! > > Guess where it is? Oh, it's in the place I already mentioned makes > more sense. Look for S_PRIVATE in inode->i_flags, and IS_PRIVATE() in > users. It's what the other security models already use to avoid > bothering calling down to the security layers. The fact that the > integrity layer bypasses the normal security layer in > ima_file_check(), for example, is no excuse to then make up totally > new flags. IS_PRIVATE() is not used by and darn well better not be used by, all psuedo filesystems like procfs which IMA may want to ignore. LSMs like to do control on them. I thought S_PRIVATE was really only used by the anon_inode and reiserfs's really crazy ass internal inodes. I could always be wrong. I don't know if I agree with dmitry but let me explain what's going on here. Lets say the user accesses an inode in procfs. Without this patch one would search the ima policy and find a rule that says 'if this inode is on procfs we don't care.' We can then cache that in the struct inode like you say and move along. If another inode on procfs is opened we will have the same policy search and the same per inode 'i don't give a crap' marking. This absolutely works you are right. But we search the IMA policy for every inode. With Dmitry's patch we can instead search the IMA policy one time and mark the whole superblock as 'i don't give a crap' if IMA policy says we don't care about that fstype. When the second procfs inode is opened we instead look at the per superblock 'who gives a crap' and never search the IMA policy. So we save all future policy searches. I'd say this patch would only be a good idea if there is a real performance hit which is measurable in a real work situation. Not, 'look how much faster it is to access proc inodes' microbenchmark, since noone is actually going to do that, but some results of a useful benchmark you care about. Maybe Dmitry gave those numbers and I missed them? Otherwise, stick with per inode like Linus wants... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ima: policy search speedup
On Tue, Dec 11, 2012 at 1:18 PM, Mimi Zohar wrote: > The appraisal policy is based on the object metadata, such as the uid, > so the result is static and can be cached. The measurement policy, on > the other hand, is normally based on the subject (eg. who is > reading/executing) the file. Knowledge of whether the file has been > measured is cached in the iint, but unlike the appraisal policy, not > whether it needs to be measured. Having the flag on a per inode basis, > doesn't really help. Can you try again? Even I can't parse this. Not sure what to tell you to try again, maybe give us a summary at a high level again and then why this patch is specifically necessary? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ima: policy search speedup
S_PRIVATE is totally unacceptable as it has a meaning across all LSMs, not just IMA. S_NOSEC means 'this is not setuid or setgid and we don't need to do those checks on modify' You are going to need to use a S_NOIMA. Of Dmitry's 90,000 fewer policy lookups using the per sb flag, how many of them are the same inode over and over again which would be circumvented using S_NOIMA per inode flag? On Tue, Dec 11, 2012 at 2:48 PM, Mimi Zohar wrote: > On Tue, 2012-12-11 at 11:10 -0800, Linus Torvalds wrote: > >> Anyway, the whole "you can do it at file granularity" isn't the bulk >> of my argument (the "we already have the field that makes sense" is). >> But my point is that per-inode is not only the logically more >> straightforward place to do it, it's also the much more flexible place >> to do it. Because it *allows* for things like that. > > Ok. To summarize, S_IMA indicates that there is a rule and that the iint > was allocated. To differentiate between 'haven't looked/don't know' and > 'definitely not', we need another bit. For this, you're suggesting > using IS_PRIVATE()? Hopefully, I misunderstood. > > thanks, > > Mimi > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ima: policy search speedup
On Tue, Dec 11, 2012 at 3:05 PM, Linus Torvalds wrote: > The "IS_PRIVATE()" thing is more a "if you know a-priori that you > aren't interested in pseudo-filesystems, you can already check that > bit, because it will be set for things like /proc and shmem mappings > and pipes etc". I know it isn't relevant to the final solution, but this is simply wrong. IS_PRIVATE() means 'this inode is filesystem internal.' It is not used by anything except rieser and the anon_inode. If it is used by psuedo filesystems in general, like /proc, shmem mappings, and pipes that is a huge bug and is absolutely wrong. You are correct IS_PRIVATE is sufficient to know you don't need to do any IMA stuff with that inode, but it is used in damn few places and better remain that way -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: recent rebases
On Wed, 2012-12-12 at 08:41 +1100, Stephen Rothwell wrote: > Hi Linus, > > I am not sure how much you care about this, but here is a list of the > trees in linux-next that have been rebased in the last day (i.e. since I > fetched the trees for next-20121211): > > fsnotify: rebased to v3.6(!) to resolve a conflict. > git://git.infradead.org/users/eparis/notify.git#for-next I planned to let it sit in -next until at least next thursday before I sent a pull request. Thought that fit within the rules... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ima: policy search speedup
Linus made it clear he likes per-inode. Do a test with per-inode S_NOIMA. See how that compares to your 100,000 vs 10,000 lookups. Then we can discuss with facts if per sb is worth it or not. We still don't actually know if 100,000 lookups vs 10,000 lookups really matters, but at least we'll have some facts... On Tue, Dec 11, 2012 at 5:57 PM, Kasatkin, Dmitry wrote: > On Tue, Dec 11, 2012 at 10:08 PM, Eric Paris wrote: >> S_PRIVATE is totally unacceptable as it has a meaning across all LSMs, >> not just IMA. >> >> S_NOSEC means 'this is not setuid or setgid and we don't need to do >> those checks on modify' >> >> You are going to need to use a S_NOIMA. >> >> Of Dmitry's 90,000 fewer policy lookups using the per sb flag, how >> many of them are the same inode over and over again which would be >> circumvented using S_NOIMA per inode flag? >> > > IIRC those were only newly instantiated inodes. > > For new inodes, S_NOIMA would not be set and used at that point. > IMA must do the policy search and then would set the S_NOIMA. > > For subsequent calls, S_NOIMA makes sense. > > If we would have the SB flag like MS_NOIMA, then we could replicate sb > flag to inode S_NOIMA flag, > in similar way as inode_has_no_xattr() does: > > static inline void inode_has_no_xattr(struct inode *inode) > { > if (!is_sxid(inode->i_mode) && (inode->i_sb->s_flags & MS_NOSEC)) > inode->i_flags |= S_NOSEC; > } > > Then later there is no need to dereference inode->i_sb... > > Can we reach conclusion about it? > Will we have SB flag? > if yes, then where, s_flags, or in some other field like s_feature_flags? > if not, then we can stop this discussion... > > - Dmitry > >> >> >> On Tue, Dec 11, 2012 at 2:48 PM, Mimi Zohar wrote: >>> On Tue, 2012-12-11 at 11:10 -0800, Linus Torvalds wrote: >>> >>>> Anyway, the whole "you can do it at file granularity" isn't the bulk >>>> of my argument (the "we already have the field that makes sense" is). >>>> But my point is that per-inode is not only the logically more >>>> straightforward place to do it, it's also the much more flexible place >>>> to do it. Because it *allows* for things like that. >>> >>> Ok. To summarize, S_IMA indicates that there is a rule and that the iint >>> was allocated. To differentiate between 'haven't looked/don't know' and >>> 'definitely not', we need another bit. For this, you're suggesting >>> using IS_PRIVATE()? Hopefully, I misunderstood. >>> >>> thanks, >>> >>> Mimi >>> >>> >>> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bisected regression: iterate_fd() selinux change affects flash plugin
OMG this +1 -1 stuff is nuts... iterate_fd passes fd+1 instead of the fd for the file? seriously? e. I see how this patch fixes it, but still, wouldn't some magic or negative value for no entries found be better than having to understand the crazy logics? /me pokes Al. On Sun, Nov 11, 2012 at 11:50 PM, Pavel Roskin wrote: > Quoting Pavel Roskin : > >> Hello, Al! >> >> I have noticed that Mozilla Firefox gets stuck for seconds or minutes >> on some sites, in particular on Facebook with Linux 3.7-rc1 and newer >> mainline kernels. Disabling flash plugin fixes the delays. >> >> This is a Fedora 17 system with SELinux enabled, on x86_64 >> architecture, with all updates, with LXDE desktop. It's not the Fedora >> 16 system I mentioned before, it has never had LXDE login problems due >> to replace_fd(). >> >> Bisecting lead me to the patch that introduced iterate_fd(): >> >> commit c3c073f808b22dfae15ef8412b6f7b998644139a >> Author: Al Viro >> Date: Tue Aug 21 22:32:06 2012 -0400 >> >> new helper: iterate_fd() >> >> iterates through the opened files in given descriptor table, >> calling a supplied function; we stop once non-zero is returned. >> Callback gets struct file *, descriptor number and const void * >> argument passed to iterator. It is called with files->file_lock >> held, so it is not allowed to block. >> >> tty_io, netprio_cgroup and selinux flush_unauthorized_files() >> converted to its use. >> >> Signed-off-by: Al Viro >> >> I have found that reverting the changes to security/selinux/hooks.c is >> sufficient to restore the correct behavior. >> >> -- >> Regards, >> Pavel Roskin > > > I've made a bugzilla entry for the bug and put a preliminary patch there. > https://bugzilla.kernel.org/show_bug.cgi?id=50401 > > > -- > Regards, > Pavel Roskin > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] smack: SMACK_MAGIC to include/uapi/linux/magic.h
It looks like smack_lsm.c includes linux/magic.h, but smackfs.c doesn't... Seems like they should have included linux/magic.h in the same header they removed the definition from On Tue, Nov 6, 2012 at 4:59 PM, Casey Schaufler wrote: > On 11/6/2012 12:17 AM, Jarkko Sakkinen wrote: >> SMACK_MAGIC moved to a proper place for easy user space access >> (i.e. libsmack). >> >> Signed-off-by: Jarkko Sakkinen >> --- >> include/uapi/linux/magic.h |1 + >> security/smack/smack.h |5 - >> 2 files changed, 1 insertion(+), 5 deletions(-) > > Will security/smack/smack_lsm.c and security/smack/smackfs.c > compile after this change? > >> >> diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h >> index e15192c..12735ad 100644 >> --- a/include/uapi/linux/magic.h >> +++ b/include/uapi/linux/magic.h >> @@ -11,6 +11,7 @@ >> #define DEBUGFS_MAGIC 0x64626720 >> #define SECURITYFS_MAGIC 0x73636673 >> #define SELINUX_MAGIC0xf97cff8c >> +#define SMACK_MAGIC 0x43415d53 /* "SMAC" */ >> #define RAMFS_MAGIC 0x858458f6 /* some random number */ >> #define TMPFS_MAGIC 0x01021994 >> #define HUGETLBFS_MAGIC 0x958458f6 /* some random number */ >> diff --git a/security/smack/smack.h b/security/smack/smack.h >> index 99b3612..8ad3095 100644 >> --- a/security/smack/smack.h >> +++ b/security/smack/smack.h >> @@ -149,11 +149,6 @@ struct smack_known { >> #define SMACK_CIPSO_SOCKET 1 >> >> /* >> - * smackfs magic number >> - */ >> -#define SMACK_MAGIC 0x43415d53 /* "SMAC" */ >> - >> -/* >> * CIPSO defaults. >> */ >> #define SMACK_CIPSO_DOI_DEFAULT 3 /* Historical */ > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-security-module" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] smack: SMACK_MAGIC to include/uapi/linux/magic.h
Then it only works by accident that magic.h is included by some random path in smackfs.c. You really should be including it in smack.h (or each .c file individually, up to casey) On Thu, Nov 8, 2012 at 5:43 AM, Jarkko Sakkinen wrote: > On Tue, Nov 6, 2012 at 11:59 PM, Casey Schaufler > wrote: >> >> On 11/6/2012 12:17 AM, Jarkko Sakkinen wrote: >> > SMACK_MAGIC moved to a proper place for easy user space access >> > (i.e. libsmack). >> > >> > Signed-off-by: Jarkko Sakkinen >> > --- >> > include/uapi/linux/magic.h |1 + >> > security/smack/smack.h |5 - >> > 2 files changed, 1 insertion(+), 5 deletions(-) >> >> Will security/smack/smack_lsm.c and security/smack/smackfs.c >> compile after this change? > > Sorry I haven't replied earlier. Anyway, I made a sanity check. > > I retried build from clean. Works. I also checked that vmlinux contains > SMACK symbols. It does. > >> >> > >> > diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h >> > index e15192c..12735ad 100644 >> > --- a/include/uapi/linux/magic.h >> > +++ b/include/uapi/linux/magic.h >> > @@ -11,6 +11,7 @@ >> > #define DEBUGFS_MAGIC 0x64626720 >> > #define SECURITYFS_MAGIC 0x73636673 >> > #define SELINUX_MAGIC0xf97cff8c >> > +#define SMACK_MAGIC 0x43415d53 /* "SMAC" */ >> > #define RAMFS_MAGIC 0x858458f6 /* some random number */ >> > #define TMPFS_MAGIC 0x01021994 >> > #define HUGETLBFS_MAGIC 0x958458f6 /* some random number */ >> > diff --git a/security/smack/smack.h b/security/smack/smack.h >> > index 99b3612..8ad3095 100644 >> > --- a/security/smack/smack.h >> > +++ b/security/smack/smack.h >> > @@ -149,11 +149,6 @@ struct smack_known { >> > #define SMACK_CIPSO_SOCKET 1 >> > >> > /* >> > - * smackfs magic number >> > - */ >> > -#define SMACK_MAGIC 0x43415d53 /* "SMAC" */ >> > - >> > -/* >> > * CIPSO defaults. >> > */ >> > #define SMACK_CIPSO_DOI_DEFAULT 3 /* Historical */ >> > > /Jarkko > -- > To unsubscribe from this list: send the line "unsubscribe > linux-security-module" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prctl: fix validation of an address
On Sat, 2012-12-29 at 15:00 +0400, Andrey Vagin wrote: > The address should be bigger than dac_mmap_min_addr, because > a process with CAP_RAWIO can map a vma bellow mmap_min_addr. NAK This doesn't make any sense. dac_mmap_min_addr should ONLY be used in security/min_addr.c and security/commoncap.c. Period. You should not be allowed to circumvent LSM protections. Maybe you are missing that mmap_min_addr = max(dac_mmap_min_addr, CONFIG_LSM_MMAP_MIN_ADDR) ? But this patch is absolutely unacceptable. Maybe you can help me understand what problem you had and what you were hoping for? -Eric > > Cc: Andrew Morton > Cc: Kees Cook > Cc: Cyrill Gorcunov > Cc: Serge Hallyn > Cc: "Eric W. Biederman" > Cc: Eric Paris > Cc: James Morris > Signed-off-by: Andrey Vagin > --- > kernel/sys.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sys.c b/kernel/sys.c > index 265b376..e0e1bbd 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -1868,7 +1868,7 @@ static int prctl_set_mm(int opt, unsigned long addr, > if (opt == PR_SET_MM_EXE_FILE) > return prctl_set_mm_exe_file(mm, (unsigned int)addr); > > - if (addr >= TASK_SIZE || addr < mmap_min_addr) > + if (addr >= TASK_SIZE || addr < dac_mmap_min_addr) > return -EINVAL; > > error = -EINVAL; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prctl: fix validation of an address
On Mon, 2012-12-31 at 14:14 +0400, Andrew Vagin wrote: > On Sun, Dec 30, 2012 at 05:03:07PM -0500, Eric Paris wrote: > > On Sat, 2012-12-29 at 15:00 +0400, Andrey Vagin wrote: > > > The address should be bigger than dac_mmap_min_addr, because > > > a process with CAP_RAWIO can map a vma bellow mmap_min_addr. > > > > NAK > > Currently prctl(PR_SET_MM_*, addr, ) returns EINVAL for valid addresses. > I think it's a bug. Are you agree? Can you help me understand how prctl(PR_SET_MM_*, relates to checkpoint/restore? My worry here is that somehow this interface could be used to bypass the security properties of mmap_min_addr.I have no idea how the interface is used, so I don't know if my fears are founded. When I hear 'restore' I think of a privileged application setting up some unprivileged application based on untrusted data. My fear is that some unpriv application, that doesn't have permission to map below mmap_min_addr, may be able to trick the privileged application, which would have this permission, into doing it on its behalf. Does that make sense? Is that a realistic scenario with how this interface is used? > CONFIG_LSM_MMAP_MIN_ADDR could not be got from user space. > > This application can use a real value of mmap_min_addr, but it is not > provided into userspace. Unrelated to this patch issue, but I guess either could be exposed if there is a need. > Currently a task can have user memory area bellow dac_mmap_min_addr, > but prctl returns -EINVAL for such addresses. > How can I understand the reason, if I know that the address is valid? Talking about dac_mmap_min_addr is wrong. The capabilities security module uses dac_mmap_min_addr but other LSMs can (and obviously do) use other things. mmap_min_addr is just the shorthand to make sure you clear all hurdles. Breaking those hurdles up outside of the security subsystem is wrong. The kernel makes the decision on what is valid via security_mmap_addr(). Assuming there are no security fears of an untrusted application tricking some priviledged application to set up these maps the answer is just calling security_mmap_addr() instead of doing if(addr < mmap_min_addr) return -EINVAL; I don't know if it is a good idea to allow this interface to ever go below mmap_min_addr, but I do know that using (or even thinking about) dac_mmap_min_addr is wrong and you should be looking at security_mmap_addr() if you look at anything... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prctl: fix validation of an address
On Mon, 2012-12-31 at 19:13 +0400, Cyrill Gorcunov wrote: > On Mon, Dec 31, 2012 at 09:27:14AM -0500, Eric Paris wrote: > > On Mon, 2012-12-31 at 14:14 +0400, Andrew Vagin wrote: > > > On Sun, Dec 30, 2012 at 05:03:07PM -0500, Eric Paris wrote: > > > > On Sat, 2012-12-29 at 15:00 +0400, Andrey Vagin wrote: > > > > > The address should be bigger than dac_mmap_min_addr, because > > > > > a process with CAP_RAWIO can map a vma bellow mmap_min_addr. > > > > > > > > NAK > > > > > > Currently prctl(PR_SET_MM_*, addr, ) returns EINVAL for valid addresses. > > > I think it's a bug. Are you agree? > > > > Can you help me understand how prctl(PR_SET_MM_*, relates to > > checkpoint/restore? My worry here is that somehow this interface could > > Here how we use it (from userspace code) > > ret |= sys_prctl_safe(PR_SET_MM, PR_SET_MM_START_CODE, > (long)args->mm.mm_start_code, 0); > ret |= sys_prctl_safe(PR_SET_MM, PR_SET_MM_END_CODE, > (long)args->mm.mm_end_code, 0); > ... > > the values of mm.mm_start_code and such are saved in image file and obtained > during checkpoint stage. Note the prctl_set_mm requires the caller to have > CAP_SYS_RESOURCE privilege granted. Is there anything which prevents an unpriv application from changing mm.mm_start_code and mm.mm_end_code in the image, thus taking advantage of the privileged restore code to bypass the mmap_min_addr restrictions? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] notification tree changes for 3.8
I believe you would get a build failure after this pull due to the addition of procfs information for *notify fds. The attached patch from sfr should be applied during the merge to change the spin_lock in that patch to the mutex in this patch. This pull mostly is about locking changes in the fsnotify system. By switching the group lock from a spin_lock() to a mutex() we can now hold the lock across things like iput(). This fixes a problem involving unmounting a fs and having inodes be busy, first pointed out by FAT, but reproducible with tmpfs. This also restores signal driven I/O for inotify, which has been broken since about 2.6.32. $ git request-pull v3.6 git://git.infradead.org/users/eparis/notify.git The following changes since commit a0d271cbfed1dd50278c6b06bead3d00ba0a88f9: Linux 3.6 (2012-09-30 16:47:46 -0700) are available in the git repository at: git://git.infradead.org/users/eparis/notify.git for-next for you to fetch changes up to 1ca39ab9d21ac93f94b9e3eb364ea9a5cf2aba06: inotify: automatically restart syscalls (2012-12-11 13:44:37 -0500) Eric Paris (2): fsnotify: make fasync generic for both inotify and fanotify inotify: automatically restart syscalls Lino Sanfilippo (12): inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group() fsnotify: introduce fsnotify_get_group() fsnotify: use reference counting for groups fsnotify: take groups mark_lock before mark lock fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed fsnotify: use a mutex instead of a spinlock to protect a groups mark list fsnotify: pass group to fsnotify_destroy_mark() fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() fsnotify: dont put marks on temporary list when clearing marks by group fsnotify: change locking order fanotify: dont merge permission events inotify: dont skip removal of watch descriptor if creation of ignored event failed fs/notify/dnotify/dnotify.c | 4 ++-- fs/notify/fanotify/fanotify.c| 6 ++ fs/notify/fanotify/fanotify_user.c | 37 + fs/notify/group.c| 47 +++ fs/notify/inode_mark.c | 14 +++--- fs/notify/inotify/inotify_fsnotify.c | 4 +++- fs/notify/inotify/inotify_user.c | 34 ++ fs/notify/mark.c | 91 ++- fs/notify/notification.c | 1 + fs/notify/vfsmount_mark.c| 14 +++--- include/linux/fsnotify_backend.h | 31 +-- kernel/audit_tree.c | 10 +- kernel/audit_watch.c | 4 ++-- 13 files changed, 178 insertions(+), 119 deletions(-) From: Stephen Rothwell Date: Wed, 19 Dec 2012 11:53:20 +1100 Subject: [PATCH] fsnotify: cope with change from spinlock to mutex Signed-off-by: Stephen Rothwell --- fs/notify/fdinfo.c |Â Â Â 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 514c4b8..238a593 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -27,13 +27,13 @@ static int show_fdinfo(struct seq_file *m, struct file *f, struct fsnotify_mark *mark; int ret = 0; - spin_lock(&group->mark_lock); + mutex_lock(&group->mark_mutex); list_for_each_entry(mark, &group->marks_list, g_list) { ret = show(m, mark); if (ret) break; } - spin_unlock(&group->mark_lock); + mutex_unlock(&group->mark_mutex); return ret; }
Re: [GIT PULL] notification tree changes for 3.8
On Thu, 2012-12-20 at 17:50 -0800, Linus Torvalds wrote: > On Thu, Dec 20, 2012 at 2:38 PM, Eric Paris wrote: > > I believe you would get a build failure after this pull due to the > > addition of procfs information for *notify fds. The attached patch from > > sfr should be applied during the merge to change the spin_lock in that > > patch to the mutex in this patch. > > I'm not going to bother pulling this. > > It's coming in late in the merge window, it has been rebased days ago, > and it changes fundamental infrastructure. > > This has been a huge merge window, and this is just too late and too > surprising. The commits are marked as being written 18 months ago, but > then the commit date is after the merge window started, and sent the > day before it closes. > > WTF? If they've waited 18 months, there cannot be this kind of > last-minute hurry now. Why did you wait until the last minute? The changes have been in next for about 18 months. I've just been a missing maintainer. I'm back with some time to fix that fact, but I understand if you want to reject it on that account. I'm available now if something goes wrong. Most of this work was originally done just as cleanup, but recently people started complaining about the FAT race/NULL ptr deref. https://bugzilla.redhat.com/show_bug.cgi?id=768534 Which got me to finally get off my lazy ass. When looking to send the pull request on the first day of the window I realized just how far behind my tree was and that there was a merge conflict for which sfr had been carrying a patch forever. With 3.7 less than a day old I figured 3.6 was a good place to go in order to solve the conflict. Since I rebased, it obviously needed more testing and some time in -next. Which was why I said to you, back on the 11th, that I was planning to send my pull request today. http://marc.info/?l=linux-next&m=135526314222685&w=2 It is a locking change, but if anything goes wrong, at least I've got time without other work commitments over the next 2 weeks to fix them. If you are wondering about testing, I've written the attached notification stressor. On a dual core box it will cause a kernel panic is about 1 second on your kernel. With the patch set it doesn't pop... Your call, but it does fix a real bug that people are reporting today. -Eric #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* huerristic on how hard to load a box */ static unsigned int num_cores; /* number of threads which just read from inotify_fd and ignore the results */ static unsigned int num_data_dumpers; /* each thread will add a watch to a given file as fast as it can */ static unsigned int num_adder_threads; /* each thread will remove all watches between low and high as fast as they can */ static unsigned int num_remover_threads; /* a multiplier for adders and removers so you have X adders per file */ static unsigned int watcher_multiplier; /* this removes watches from low_wd to low_wd + 3 just for extra removal races */ static unsigned int num_low_remover_threads; /* create and destroy the files watches are being added to and removed from */ static unsigned int num_file_creaters; /* how many inotify_fd's we have total (so another multiplier for adders and removers) */ static unsigned int num_inotify_instances; static char *working_dir = "/tmp/inotify_syscall_thrash"; /* if mounting a real filesystem, where is the source? (doesn't matter for tmpfs) */ static char *mnt_src; /* what is the fstype to mount and unmonut? */ static char *fstype = "tmpfs"; static pthread_attr_t attr; static pthread_mutex_t wait_mutex = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t wait_var = PTHREAD_COND_INITIALIZER; static int wait; #define WAIT_CHILD do {\ pthread_mutex_lock(&wait_mutex); \ if (wait == 0) \ pthread_cond_wait(&wait_var, &wait_mutex); \ wait = 0; \ pthread_mutex_unlock(&wait_mutex); \ } while (0); #define WAKE_PARENT do {\ pthread_mutex_lock(&wait_mutex); \ wait = 1; \ pthread_cond_signal(&wait_var); \ pthread_mutex_unlock(&wait_mutex); \ } while (0); struct adder_struct { int inotify_fd; int file_num; }; struct operator_struct { int inotify_fd; }; struct thread_data { int inotify_fd; pthread_t *adders; pthread_t *removers; pthread_t *lownum_removers; pthread_t *data_dumpers; }; pthread_t *file_creaters; pthread_t low_wd_reseter; pthread_t mounter; static int stopped = 0; static int high_wd = 0; static int low_wd = INT_MAX; static int handle_error(const char *arg) { perror(arg); exit(EXIT_FAILURE); } static void sigfunc(int sig_num) { if (sig_num == SIGINT) stopped = 1; else printf("Got an unknown signal!\n"); } /* constantly crea
Re: Bisected regression: iterate_fd() selinux change affects flash plugin
On Mon, Nov 12, 2012 at 11:57 AM, Pavel Roskin wrote: > Quoting Eric Paris : > >> OMG this +1 -1 stuff is nuts... Ping, Al. int iterate_fd(struct files_struct *files, unsigned n, [snip] while (!res && n < fdt->max_fds) { file = rcu_dereference_check_fdtable(files, fdt->fd[n++]); if (file) res = f(p, file, n); } spin_unlock(&files->file_lock); return res; So we increment n (the file descriptor number) in the dereference, then pass that (wrong) number to f(). Every single f() (including SELinux, the cause of this bug) returns fd+1 (so now we are up by 2). Then all of the users of iterate fd actually use fd-1 (which is wrong) Why not have iterate_fd return -ENOENT on no entries and stop all of the stupid games? We fix the real bug (the above function should do the n++ after the f() call, and the interface is sane to design against... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] audit: destroy filename correctly PING.
I picked it up for 3.10. Sorry, should have said something. Thanks! -Eric - Original Message - > On Mon, 1 Apr 2013 11:00:00 +0400, Dmitry Monakhov > wrote: > Ping. Patch (https://lkml.org/lkml/2013/4/1/65) was not a 1'st April's joke. > Add CC:linux-au...@redhat.com > > filename should be destroyed via final_putname() instead of __putname() > > Otherwise this result in following BUGON() in case of long names: > > kernel BUG at mm/slab.c:3006! > > Call Trace: > > kmem_cache_free+0x1c1/0x850 > > audit_putname+0x88/0x90 > > putname+0x73/0x80 > > sys_symlinkat+0x120/0x150 > > sys_symlink+0x16/0x20 > > system_call_fastpath+0x16/0x1b > > > > Signed-off-by: Dmitry Monakhov > > --- > > kernel/auditsc.c |6 +++--- > > 1 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > index a371f85..bfe7ca6 100644 > > --- a/kernel/auditsc.c > > +++ b/kernel/auditsc.c > > @@ -1010,7 +1010,7 @@ static inline void audit_free_names(struct > > audit_context *context) > > list_for_each_entry_safe(n, next, &context->names_list, list) { > > list_del(&n->list); > > if (n->name && n->name_put) > > - __putname(n->name); > > + final_putname(n->name); > > if (n->should_free) > > kfree(n); > > } > > @@ -2036,7 +2036,7 @@ void audit_putname(struct filename *name) > > BUG_ON(!context); > > if (!context->in_syscall) { > > #if AUDIT_DEBUG == 2 > > - printk(KERN_ERR "%s:%d(:%d): __putname(%p)\n", > > + printk(KERN_ERR "%s:%d(:%d): final_putname(%p)\n", > >__FILE__, __LINE__, context->serial, name); > > if (context->name_count) { > > struct audit_names *n; > > @@ -2047,7 +2047,7 @@ void audit_putname(struct filename *name) > >n->name, n->name->name ?: "(null)"); > > } > > #endif > > - __putname(name); > > + final_putname(name); > > } > > #if AUDIT_DEBUG > > else { > > -- > > 1.7.1 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Part1 PATCH 00/22] Add namespace support for audit
On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote: > On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote: > > This patchset is first part of namespace support for audit. > > in this patchset, the mainly resources of audit system have > > been isolated. the audit filter, rules havn't been isolated > > now. It will be implemented in Part2. We finished the isolation > > of user audit message in this patchset. > > > > I choose to assign audit to the user namespace. > > Right now,there are six kinds of namespaces, such as > > net, mount, ipc, pid, uts and user. the first five > > namespaces have special usage. the audit isn't suitable to > > belong to these five namespaces, And since the flag of system > > call clone is in short supply, we can't provide a new flag such > > as CLONE_NEWAUDIT to enable audit namespace separately. so the > > user namespace may be the best choice. > > I thought it was said on the last submission that to tie userns and > audit namespace would be a bad idea? I consider it a non-starter. unpriv users are allowed to launch their own user namespace. The whole point of audit is to have only a priv user be allowed to make changes. If you tied audit namespace to user namespace you grant an unpriv user the ability to modify audit. NAK. If there are not clone flags you will either need to only do this from unshare and not from clone, or get more flags to clone -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Part1 PATCH 00/22] Add namespace support for audit
On Thu, 2013-06-20 at 11:02 +0800, Gao feng wrote: > On 06/20/2013 04:51 AM, Eric Paris wrote: > > On Wed, 2013-06-19 at 16:49 -0400, Aristeu Rozanski wrote: > >> On Wed, Jun 19, 2013 at 09:53:32AM +0800, Gao feng wrote: > >>> This patchset is first part of namespace support for audit. > >>> in this patchset, the mainly resources of audit system have > >>> been isolated. the audit filter, rules havn't been isolated > >>> now. It will be implemented in Part2. We finished the isolation > >>> of user audit message in this patchset. > >>> > >>> I choose to assign audit to the user namespace. > >>> Right now,there are six kinds of namespaces, such as > >>> net, mount, ipc, pid, uts and user. the first five > >>> namespaces have special usage. the audit isn't suitable to > >>> belong to these five namespaces, And since the flag of system > >>> call clone is in short supply, we can't provide a new flag such > >>> as CLONE_NEWAUDIT to enable audit namespace separately. so the > >>> user namespace may be the best choice. > >> > >> I thought it was said on the last submission that to tie userns and > >> audit namespace would be a bad idea? > > > > I consider it a non-starter. unpriv users are allowed to launch their > > own user namespace. The whole point of audit is to have only a priv > > user be allowed to make changes. If you tied audit namespace to user > > namespace you grant an unpriv user the ability to modify audit. > > > > I understand your views. > > But ven the unpriv user are allowed to make changes, they can do no harm. > they can only make changes on the audit namespace they created.they can > only communicate with the audit namespace they created. Imagine I set up my machine to audit all user access to super secret information. With your patch set all an malicious user has to do is create a new user namespace. Now when he accesses the super secret information it will be logged inside the user namespace he created. So he can just dump those logs in the trash. I believe that each audit namespace should require priv (CAP_AUDIT_CONTROL) in the user namespace that created the current audit namespace. So lets say the machine boots and we are in the init_user and init_audit namespace. Creating a new audit namespace should require CAP_AUDIT_CONTROL in the init_user namespace. If instead we spawned a new user namespace userns1 and try to create a new audit namespace, we should STILL check for CAP_AUDIT_CONTROL in the init_user namespace. Assuming we've spawned auditns1 in userns1 and want to spawn auditns2 it should require CAP_AUDIT_CONTROL in userns1. So now you only have permission to change your audit config (create a new audit namespace) if you already had permission to change your current audit config. Now how to handle coding this... When the kernel receives an audit message on the netlink socket it can always check the current->[whatever] to figure out which audit namespace it came from. Then it can be processed accordingly... Sending messages to the userspace auditd is a little more tricky. We need to somehow map the audit namespace to a socket connected to auditd in userspace. I'd imagine we'd have to either have per auditns backlog queues and run one kauditd per audit namespace, or we'd have to tag the skb's with the intended namespace somehow and then find the right socket in kauditd. Either way it doesn't seem too onerous (although I admit, I don't know how to code the per namespace kauditd right offhand) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stupid VFS name lookup interface..
On Sat, May 25, 2013 at 10:19 PM, Linus Torvalds wrote: > On Sat, May 25, 2013 at 10:04 PM, James Morris wrote: >> On Sat, 25 May 2013, Linus Torvalds wrote: >> >>> But I haven't even looked at what non-selinux setups do to >>> performance. Last time I tried Ubuntu (they still use apparmor, no?), >>> "make modules_install ; make install" didn't work for the kernel, and >>> if the Ubuntu people don't want to support kernel engineers, I >>> certainly am not going to bother with them. Who uses smack? >> >> Tizen, perhaps a few others. > > Btw, it really would be good if security people started realizing that > performance matters. It's annoying to see the security lookups cause > 50% performance degradations on pathname lookup (and no, I'm not > exaggerating, that's literally what it was before we fixed it - and > no, by "we" I don't mean security people). I take a bit of exception to this. I do care. Stephen Smalley, the only other person who does SELinux kernel work, cares. I don't speak for other LSMs, but at least both of us who have done anything with SELinux in the last years care. I did the RCU work for selinux and you, sds, and I did a bunch of work to stop wasting so much stack space which was crapping on performance. And I'm here again :) > Right now (zooming into the kernel only - ignoring the fact that make > really spends a fair amount of time in user space) I get > > 9.79% make [k] __d_lookup_rcu > 5.48% make [k] link_path_walk > 2.94% make [k] avc_has_perm_noaudit > 2.47% make [k] selinux_inode_permission > 2.25% make [k] path_lookupat > 1.89% make [k] generic_fillattr > 1.50% make [k] lookup_fast > 1.27% make [k] copy_user_generic_string > 1.17% make [k] generic_permission > 1.15% make [k] dput > 1.12% make [k] inode_has_perm.constprop.58 > 1.11% make [k] __inode_permission > 1.08% make [k] kmem_cache_alloc > ... I tried something else, doing caching of the last successful security check inside the isec. It isn't race free, so it's not ready for prime time. But right now my >1% looks like: 7.97% make [k] __d_lookup_rcu 5.79% make [k] link_path_walk 3.67% make [k] selinux_inode_permission 2.02% make [k] lookup_fast 1.90% make [k] system_call 1.76% make [k] path_lookupat 1.68% make [k] inode_has_perm.isra.45.constprop.61 1.53% make [k] copy_user_enhanced_fast_string 1.39% make [k] generic_permission 1.35% make [k] kmem_cache_free 1.30% make [k] __audit_syscall_exit 1.13% make [k] kmem_cache_alloc 1.00% make [k] strncpy_from_user How do I tell what is taking time inside selinux_inode_permission? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 1/2] selinux: merge selinux_inode_permission and inode_has_perm
selinux_inode_permission had some heavy lifting done to make it more performance polite. But it still does largely the same thing as inode_has_perm. So move that work into inode_has_perm and call inode_has_perm from selinux_inode_permission. Signed-off-by: Eric Paris --- security/selinux/hooks.c | 92 ++-- 1 file changed, 42 insertions(+), 50 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 5c6f2cd..cfecb52 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -1494,17 +1494,41 @@ static int task_has_system(struct task_struct *tsk, SECCLASS_SYSTEM, perms, NULL); } +static noinline int audit_inode_permission(struct inode *inode, + struct common_audit_data *adp, + u32 perms, u32 audited, u32 denied, + unsigned flags) +{ + struct common_audit_data ad; + struct inode_security_struct *isec = inode->i_security; + int rc; + + if (!adp) { + ad.type = LSM_AUDIT_DATA_INODE; + ad.u.inode = inode; + adp = &ad; + } + + rc = slow_avc_audit(current_sid(), isec->sid, isec->sclass, perms, + audited, denied, adp, flags); + if (rc) + return rc; + return 0; +} + /* Check whether a task has a particular permission to an inode. The 'adp' parameter is optional and allows other audit data to be passed (e.g. the dentry). */ static int inode_has_perm(const struct cred *cred, struct inode *inode, - u32 perms, + u32 perms, u32 dontaudit, struct common_audit_data *adp, unsigned flags) { struct inode_security_struct *isec; - u32 sid; + struct av_decision avd; + u32 sid, denied, audited; + int rc, rc2; validate_creds(cred); @@ -1514,6 +1538,14 @@ static int inode_has_perm(const struct cred *cred, sid = cred_sid(cred); isec = inode->i_security; + rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd); + audited = avc_audit_required(perms, &avd, rc, dontaudit, &denied); + if (likely(!audited)) + return rc; + + rc2 = audit_inode_permission(inode, adp, perms, audited, denied, flags); + if (rc2) + return rc2; return avc_has_perm_flags(sid, isec->sid, isec->sclass, perms, adp, flags); } @@ -1529,7 +1561,7 @@ static inline int dentry_has_perm(const struct cred *cred, ad.type = LSM_AUDIT_DATA_DENTRY; ad.u.dentry = dentry; - return inode_has_perm(cred, inode, av, &ad, 0); + return inode_has_perm(cred, inode, av, 0, &ad, 0); } /* Same as inode_has_perm, but pass explicit audit data containing @@ -1544,7 +1576,7 @@ static inline int path_has_perm(const struct cred *cred, ad.type = LSM_AUDIT_DATA_PATH; ad.u.path = *path; - return inode_has_perm(cred, inode, av, &ad, 0); + return inode_has_perm(cred, inode, av, 0, &ad, 0); } /* Check whether a task can use an open file descriptor to @@ -1580,7 +1612,7 @@ static int file_has_perm(const struct cred *cred, /* av is zero if only checking access to the descriptor. */ rc = 0; if (av) - rc = inode_has_perm(cred, inode, av, &ad, 0); + rc = inode_has_perm(cred, inode, av, 0, &ad, 0); out: return rc; @@ -2635,64 +2667,24 @@ static int selinux_inode_follow_link(struct dentry *dentry, struct nameidata *na return dentry_has_perm(cred, dentry, FILE__READ); } -static noinline int audit_inode_permission(struct inode *inode, - u32 perms, u32 audited, u32 denied, - unsigned flags) -{ - struct common_audit_data ad; - struct inode_security_struct *isec = inode->i_security; - int rc; - - ad.type = LSM_AUDIT_DATA_INODE; - ad.u.inode = inode; - - rc = slow_avc_audit(current_sid(), isec->sid, isec->sclass, perms, - audited, denied, &ad, flags); - if (rc) - return rc; - return 0; -} - static int selinux_inode_permission(struct inode *inode, int mask) { const struct cred *cred = current_cred(); - u32 perms; - bool from_access; + u32 perms, dontaudit = 0; unsigned flags = mask & MAY_NOT_BLOCK; - struct inode_security_struct *isec; - u32 sid; - struct av_decision avd; - int rc, rc2; - u32 audited, denied; - from_access = mask & MAY_ACCESS; + if (mask & MAY_ACCESS) +
[RFC PATCH 2/2] SELinux: cache inode checks inside struct inode
This patch adds a cache of selinux security checks into struct inode. It is protected by the seq counter against updates by other nodes. This has a measurable impact on one benchmark Linus mentioned. The cpu time using make to check a huge project for changes. It is going to have a negative impact on cases where tasks with different labels are accessing the same object. In these cases each one will grab the i_lock to reset the in inode cache. The only place I imagine this would be common would be with shared libraries. But as those are typically openned and mmapped, they don't have continuous checks... Stock Kernel: 8.23% make [k] __d_lookup_rcu 6.27% make [k] link_path_walk 3.91% make [k] selinux_inode_permission < 3.37% make [k] avc_has_perm_noaudit < 2.26% make [k] lookup_fast 2.12% make [k] system_call 1.86% make [k] path_lookupat 1.82% make [k] inode_has_perm.isra.32.constprop.61 < 1.57% make [k] copy_user_enhanced_fast_string 1.48% make [k] generic_permission 1.34% make [k] __audit_syscall_exit 1.31% make [k] kmem_cache_free 1.24% make [k] kmem_cache_alloc 1.20% make [k] generic_fillattr 1.12% make [k] __inode_permission 1.06% make [k] dput 1.04% make [k] strncpy_from_user 1.04% make [k] _raw_spin_lock Total: 3.91 + 3.37 + 1.82 = 9.1% My Changes: 8.54% make [k] __d_lookup_rcu 6.52% make [k] link_path_walk 3.93% make [k] inode_has_perm < 2.31% make [k] lookup_fast 2.05% make [k] system_call 1.79% make [k] path_lookupat 1.72% make [k] generic_permission 1.50% make [k] __audit_syscall_exit 1.49% make [k] selinux_inode_permission < 1.47% make [k] copy_user_enhanced_fast_string 1.28% make [k] __inode_permission 1.23% make [k] kmem_cache_alloc 1.19% make [k] _raw_spin_lock 1.15% make [k] lg_local_lock 1.10% make [k] strncpy_from_user 1.10% make [k] dput 1.08% make [k] kmem_cache_free 1.08% make [k] generic_fillattr Total: 3.93 + 1.49 = 5.42 In inode_has_perm the big time takers are loading cred->sid and the raw_seqcount_begin(inode->i_security_seccount). I'm not certain how to make either of those much faster. In selinux_inode_permission() we spend time in getting current->cred and in calling inode_has_perm(). Signed-off-by: Eric Paris --- include/linux/fs.h | 5 +++ security/selinux/hooks.c| 62 ++--- security/selinux/include/security.h | 1 + security/selinux/ss/services.c | 5 +++ 4 files changed, 69 insertions(+), 4 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 43db02e..5268cf3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -535,6 +535,11 @@ struct inode { struct address_space*i_mapping; #ifdef CONFIG_SECURITY + seqcount_t i_security_seqcount; + u32 i_last_task_sid; + u32 i_last_granting; + u32 i_last_perms; + u32 i_audit_allow; void*i_security; #endif diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index cfecb52..00dd6d9 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -82,6 +82,7 @@ #include #include #include +#include #include #include "avc.h" @@ -207,6 +208,7 @@ static int inode_alloc_security(struct inode *inode) if (!isec) return -ENOMEM; + seqcount_init(&inode->i_security_seqcount); mutex_init(&isec->lock); INIT_LIST_HEAD(&isec->list); isec->inode = inode; @@ -1516,6 +1518,44 @@ static noinline int audit_inode_permission(struct inode *inode, return 0; } +static bool inode_has_perm_cached(u32 sid, struct inode *inode, u32 perms) +{ + unsigned seq; + u32 last_task_sid; + u32 last_perms; + u32 last_granting; + + seq = raw_seqcount_begin(&inode->i_security_seqcount); + last_task_sid = inode->i_last_task_sid; + last_perms = inode->i_last_perms; + last_granting = inode->i_last_granting; + + /* something changed, bail! */ + if (read_seqcount_retry(&inode->i_security_seqcount, seq)) + return false; + + return sid == last_task_sid && (perms & last_perms) == perms && + security_get_latest_granting() == last_granting; +} + +static void inode_set_perm_cache(struct inode *inode, u32 task_sid, u32 perms, +u32 granting, u32 audit_allow) +{ + spin_lock(&inode->i_lock); + write_seqcount_begin(&inode->i_security_seqcount); + if (inode->i_last_task_sid == task_sid && + inode->i_la
Re: [RFC PATCH 1/2] selinux: merge selinux_inode_permission and inode_has_perm
On Mon, 2013-06-03 at 14:59 -0400, Eric Paris wrote: > selinux_inode_permission had some heavy lifting done to make it more > performance polite. But it still does largely the same thing as > inode_has_perm. So move that work into inode_has_perm and call > inode_has_perm from selinux_inode_permission. > > Signed-off-by: Eric Paris > --- > security/selinux/hooks.c | 92 > ++-- > 1 file changed, 42 insertions(+), 50 deletions(-) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 5c6f2cd..cfecb52 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -1514,6 +1538,14 @@ static int inode_has_perm(const struct cred *cred, > sid = cred_sid(cred); > isec = inode->i_security; > > + rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd); > + audited = avc_audit_required(perms, &avd, rc, dontaudit, &denied); > + if (likely(!audited)) > + return rc; > + > + rc2 = audit_inode_permission(inode, adp, perms, audited, denied, flags); > + if (rc2) > + return rc2; > return avc_has_perm_flags(sid, isec->sid, isec->sclass, perms, adp, > flags); > } > Should just return rc, not avc_has_perm_flags(). I fixed that in the 2/2 patch and this should work just fine. Kills a little performance, but still works. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 2/2] SELinux: cache inode checks inside struct inode
On Tue, 2013-06-04 at 06:31 +0900, Linus Torvalds wrote: > > > On Mon, 3 Jun 2013, Eric Paris wrote: > > > > #ifdef CONFIG_SECURITY > > + seqcount_t i_security_seqcount; > > + u32 i_last_task_sid; > > + u32 i_last_granting; > > + u32 i_last_perms; > > + u32 i_audit_allow; > > void*i_security; > > #endif > > This is much too big. I was really hoping for "another word that the > security layer can use" or similar. Not sure how that can work :-( > Something this big would be acceptable if it would be a *generic* security > cache, and others could use it too, and would avoid ever actually calling > into any security layer at all (ie we could do the checks entirely at the > VFS layer). Then it would be fine. But for just the fact that SELinux is > too slow? No. There is nothing about it that can't be VFS-erized. The fields are: readlockless way to get the data which task was allowed which perms were allowed what generation of security policy allowed it what perms should be forced to call security hook anyway defining "perms" from a VFS PoV is hard. doing any of this with 'stacking' is hard. Then again, I'm only so so on the value of stacking. I've waffled a few times... I can do it entirely inside selinux, but we are still going to have the cache hit you were originally seeing as we dereference isec to get the info -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 00/48] Add namespace support for audit
On Tue, 2013-06-11 at 13:59 +0800, Gao feng wrote: > On 06/11/2013 05:24 AM, Serge E. Hallyn wrote: > > Quoting Gao feng (gaof...@cn.fujitsu.com): > >> On 06/07/2013 06:47 AM, Serge Hallyn wrote: > >>> Quoting Serge Hallyn (serge.hal...@ubuntu.com): > Quoting Gao feng (gaof...@cn.fujitsu.com): > > On 05/07/2013 10:20 AM, Gao feng wrote: > In my option, the audit rules(inode, tree_list, filter) , some of audit > controller related resources(enabled,pid,portid...) and skb queue, audit > netlink sockets,kauditd thread should be per-userns. The audit user message > which generated by the user in container should be per-userns too. > > Since netns is not implemented as a hierarchy, and the network related > resources are not global. so network related audit message should be > per-userns too. > > The security related audit message should be send to init user namespace > as we discussed before. Maybe tty related audit message should be send > to init user namespace too, I have no idea now. > > The next step, I will post a new patchset which only make the audit user > message and the basic audit resource per userns. I think this patchset > will easy to be reviewed and accepted, And will not influence the host. > This patchset contains the below patches: I think this would be easier for us do from a certification and doumentation PoV if we had an audit namespace, not tied to the user namespace. creating a new audit namespace should require CAP_AUDIT_CONTROL in the user namespace which created the current audit namespace. Does that make sense? I don't mind messages staying completely inside the current namespace, but that means we can't give unpriv users (even if they have priv in their user namespace) a new audit namespace... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: security_inode_init_security() inode field requirements
SELinux has no maximum :-( Realistically there are a couple of interfaces that limit things to 4k, but labels on files on disk could be even larger than that! 255 will fit most every label, but not necessarily all of them. I know ext4 on Fedora allocates inodes which left about 255 bytes for selinux.selinux, but will place the xattr in another block if it happens to be larger than 255. This is rare, but certainly possible We use the inode->i_mode. In debug/error path we use: inode->i_sb inode->i_no We could use the parent dir sb instead of the new inode->i_sb. We don't have to print the i_no when we hit a failure, but it is just about the only information that can help for debugging/figuring out which file had a failure.. -Eric On Fri, Mar 1, 2013 at 9:07 AM, Mimi Zohar wrote: > On Fri, 2013-03-01 at 13:11 +, Steven Whitehouse wrote: >> Hi, >> >> On Fri, 2013-03-01 at 07:27 -0500, Mimi Zohar wrote: >> > On Fri, 2013-03-01 at 10:12 +, Steven Whitehouse wrote: >> > > Hi, >> > > >> > > I'm wondering whether there is a list somewhere of fields which >> > > security_inode_init_security() requires are set in an inode when it is >> > > called? In particular, does it matter if the inode number itself is >> > > unset when security_inode_init_security() is called? >> > > >> > > The problem that I'm looking at is the use of multiple transactions >> > > during inode creation when some combination of ACLs/LSMs/IMA are turned >> > > on. What we have currently (in GFS2, there are other fs which follow >> > > broadly the same solution though) is this: >> > > >> > > 1. Create inode in core >> > > 2. Create inode on disk >> > > 3. Add xattrs one at a time for ACLs/LSMs/IMA >> > > 4. Link inode into directory >> > > >> > > Steps 2 through 4 require separate transactions at the moment. What I'd >> > > like to do is to be able to get the details of the xattrs ahead of time >> > > such that the allocation of the inode can be one and the same allocation >> > > as that for the xattr blocks. That allows merging of the transactions >> > > into one and a greatly simplified error path too. This would look >> > > something like: >> > > >> > > 1. Create in-core inode and init required fields >> > > 2. Get details of all xattrs for the inode >> > > 3. Alloc on disk inode and blocks for xattrs as needed >> > > 4. Link into directory >> > > >> > > In this case, steps 2 through 4 are within a single transaction. We >> > > don't actually need to have the content of the xattrs ahead of >> > > allocating the inode, just the length (or even just a max length, if it >> > > is not too large). However the easiest solution would just be to move >> > > the call to security_inode_init_security() to the point before we've >> > > allocated the inode (and thus we don't know its number yet) but after >> > > we've filled out all the remaining fields if that makes sense? >> > > >> > > So I just wanted to check whether that would break anything, >> > >> > Hi Steve, >> > >> > Included in security_inode_init_security() is the call to >> > evm_inode_init_security() to write the 'security.evm' extended >> > attribute. 'security.evm' is an HMAC of the security extended >> > attributes and other file metadata, including the inode. For an exact >> > list of other metadata included in the HMAC calculation refer to >> > hmac_add_misc(). (The UUID is being added to the HMAC calculation in >> > this open window.) >> > >> > thanks, >> > >> > Mimi >> > >> >> Ok... but it is using inode->i_ino in that case, as well as the >> generation number too. So that presumably can only be done after the >> inode has been allocated, since we need to know its location in order to >> know its inode number. Also the generation number is assigned at >> allocation time. One potential issue though... the inode->i_ino is an >> unsigned long, so that on 32 bit archs, that will be (on GFS2) a >> truncated version of the full 64 bit inode number. I'm not sure if that >> matters or not for EVM. > > The reason for adding additional information to the HMAC calculation is > to prevent a cut & paste attack, taking a valid 'security.evm' from one > file and using it for another file. > >> So since that appears to rule out doing the security init bits ahead of >> the allocation of the inode, is it possible to get a maximum value for >> the size of the xattr which EVM will add? Likewise for the other LSMs >> too? > > Although 'security.evm' can contain either an HMAC or a digital > signature, new inodes are created with an HMAC, which is hardcoded as > hmac(sha1). On a running system, the existing 'security.evm' digital > signatures are replaced with an HMAC. > > SELinux defines INITCONTEXTLEN as 255, which seems to be the maximum > xattr length, but I'm not sure. From the smack header: > > /* > * Smack labels were limited to 23 characters for a long time. > */ > #define SMK_LABELLEN24 > #define SMK_LONGLABEL 256 > > thanks, > > Mimi > > -- > To unsubscribe f
Re: IMA: How to manage user space signing policy with others
I think that is what he was suggesting. It reuses the integrity code but it loses the integrity flexibility. I don't think it is a good solution :-( On Mon, Mar 4, 2013 at 1:59 PM, Mimi Zohar wrote: > On Mon, 2013-03-04 at 10:29 -0500, Vivek Goyal wrote: > [...] > >> Hi Mimi, >> >> If we decide to merge flags, then practically we modified the >> ima_appraise_tcb policy. ima_appraise_tcb policy expects to cache the >> results and we will not do that. And this conflict just grows if we >> are forced to add more options in future. >> >> Also as you mentioned that in some cases flag merging is OR operation >> and in another cases it might be AND operation. And we will most likely >> end up hardcoding all this. I think slowly this is getting complicated >> and as people add more complex rules things can quickly get out of hand. >> >> I am wondering that why are we trying to make multiple policies work >> together. Can we try to keep it simple and say that at one point of >> time only one policy can be effective. It could either be a built in >> policy or user defined one. In fact that's how things are working right now. >> User defined policy replaces built-in policy. >> >> For the sake of backward compatibility "ima_tcb" and "ima_appraise_tcb" >> can co-exist together (like today). But ima_secureboot_policy will not >> be compatible with other policies. I understand that there might be a >> desire to use multiple policies together down the line, but I guess in >> that case policies need to specified using "policy" interface. And >> ima_secureboot will be odd man out here as it can not trust the root >> to specify policy. So practically ima_secureboot will be disabled. >> >> We just have to provide an IMA interface so that caller can query what's >> the effective policy currently. Say, IMA_POLICY_SECUREBOOT, >> IMA_POLICY_TCB, or IMA_POLICY_USER. Caller of the bprm_check() or >> bprm_post_load() can also check for current policy in force and give >> CAP_SIGNED only if desired policy is in effect. >> >> This reduces our options but trying to make multiple policies co-exist >> together is just making it complicated. We can take it up again when >> somebody has a strong use case of using secureboot policy along with >> other policies. In fact a user can still define a custom policy which >> is mix of multiple policies. Just that it is not compatible with >> "secureboot" policy because for that we can't trust "root" to define >> policy. > > Let me get this straight. You're suggesting that distros/users will > need to make a Kconfig build decision of enabling secureboot, including > the secureboot built-in policy, or be allowed to enable other integrity > policies. If RH enables secureboot, then no other integrity policy will > be permitted. Is that what you're saying, and if so, why would I agree > to this? > > thanks, > > Mimi > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-security-module" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [BZ905179] audit: omit check for uid and gid validity in audit rules and data
On Thu, 2013-05-09 at 09:29 -0400, Steve Grubb wrote: > On Tuesday, April 16, 2013 03:38:23 PM Richard Guy Briggs wrote: > > On Tue, Apr 09, 2013 at 02:39:32AM -0700, Eric W. Biederman wrote: > > > Andrew Morton writes: > > > > On Wed, 20 Mar 2013 15:18:17 -0400 Richard Guy Briggs > wrote: > > > >> audit rule additions containing "-F auid!=4294967295" were failing with > > > >> EINVAL.> > > > The only case where this appears to make the least little bit of sense > > > is if the goal of the test is to test to see if an audit logloginuid > > > has been set at all. In which case depending on a test against > > > 4294967295 is bogus because you are depending on an intimate internal > > > kernel implementation detail. > > > > > > How about something like my untested patch below that add an explicit > > > operation to test if loginuid has been set? > > > > Sorry for the delay in testing this, I had another urgent bug and a > > belligerent test box... > > > > I like this approach better than mine now that I understand it. I've > > tested the patch below without any changes. It works as expected with > > my previous test case. I don't know if a Signed-off-by: is appropriate > > for me in this case, but I'll throw in a: > > > > Tested-by: Richard Guy Briggs > > > > and recommend a: > > > > Reported-By: Steve Grubb > > If this is the approved patch, can it be put in stable? The audit system > hasn't worked as intended since January. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/auditsc.c?id=780a7654cee8d61819512385e778e4827db4bfbc Should be queued for 3.7 and later now. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the akpm tree with Linus' tree
On Mon, 2013-05-13 at 12:07 +1000, Stephen Rothwell wrote: > Hi Andrew, > > Today's linux-next merge of the akpm tree got a conflict in > kernel/auditsc.c between commit b24a30a73054 ("audit: fix event coverage > of AUDIT_ANOM_LINK") from Linus' tree and commit "audit: fix mq_open and > mq_unlink to add the MQ root as a hidden parent audit_names record" from > the akpm tree. Actually, I've already picked the patch up for 3.11. So Andrew, you can drop it. > I fixed it up (see below) and can carry the fix as necessary (no action > is required). > > BTW, commit b24a30a73054 from Linus' tree has Eric Paris as Author and > Committer, but is only Signed-off-by Kees Cook. It is part of a long > series that did not go anywhere near linus-next. I do have an audit > tree in linux-next > (git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit.git#for-next) > but that hasn't seen any recent activity. I thought I sent you a note asking for audit to get pulled into -next quite a while back. I'll resend... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: linux-next: manual merge of the akpm tree with Linus' tree
-Original Message- From: Kees Cook [keesc...@chromium.org] Received: Monday, 13 May 2013, 12:49am To: Eric Paris [epa...@redhat.com] CC: Stephen Rothwell [s...@canb.auug.org.au]; Andrew Morton [a...@linux-foundation.org]; Linus [torva...@linux-foundation.org]; Linux-Next [linux-n...@vger.kernel.org]; LKML [linux-kernel@vger.kernel.org]; Jeff Layton [jlay...@redhat.com]; Al Viro [v...@zeniv.linux.org.uk] Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree On Sun, May 12, 2013 at 7:11 PM, Eric Paris wrote: > On Mon, 2013-05-13 at 12:07 +1000, Stephen Rothwell wrote: >> Hi Andrew, >> >> Today's linux-next merge of the akpm tree got a conflict in >> kernel/auditsc.c between commit b24a30a73054 ("audit: fix event coverage >> of AUDIT_ANOM_LINK") from Linus' tree and commit "audit: fix mq_open and >> mq_unlink to add the MQ root as a hidden parent audit_names record" from >> the akpm tree. > > Actually, I've already picked the patch up for 3.11. So Andrew, you can > drop it. > >> I fixed it up (see below) and can carry the fix as necessary (no action >> is required). >> >> BTW, commit b24a30a73054 from Linus' tree has Eric Paris as Author and >> Committer, but is only Signed-off-by Kees Cook. It is part of a long >> series that did not go anywhere near linus-next. I do have an audit >> tree in linux-next >> (git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit.git#for-next) >> but that hasn't seen any recent activity. > > I thought I sent you a note asking for audit to get pulled into -next > quite a while back. I'll resend... > Hrm, how did the Author get mangled? -Kees -- Kees Cook Chrome OS Security
RE: linux-next: manual merge of the akpm tree with Linus' tree
-Original Message- From: Kees Cook [keesc...@chromium.org] Received: Monday, 13 May 2013, 12:49am To: Eric Paris [epa...@redhat.com] CC: Stephen Rothwell [s...@canb.auug.org.au]; Andrew Morton [a...@linux-foundation.org]; Linus [torva...@linux-foundation.org]; Linux-Next [linux-n...@vger.kernel.org]; LKML [linux-kernel@vger.kernel.org]; Jeff Layton [jlay...@redhat.com]; Al Viro [v...@zeniv.linux.org.uk] Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree On Sun, May 12, 2013 at 7:11 PM, Eric Paris wrote: > On Mon, 2013-05-13 at 12:07 +1000, Stephen Rothwell wrote: >> Hi Andrew, >> >> Today's linux-next merge of the akpm tree got a conflict in >> kernel/auditsc.c between commit b24a30a73054 ("audit: fix event coverage >> of AUDIT_ANOM_LINK") from Linus' tree and commit "audit: fix mq_open and >> mq_unlink to add the MQ root as a hidden parent audit_names record" from >> the akpm tree. > > Actually, I've already picked the patch up for 3.11. So Andrew, you can > drop it. > >> I fixed it up (see below) and can carry the fix as necessary (no action >> is required). >> >> BTW, commit b24a30a73054 from Linus' tree has Eric Paris as Author and >> Committer, but is only Signed-off-by Kees Cook. It is part of a long >> series that did not go anywhere near linus-next. I do have an audit >> tree in linux-next >> (git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit.git#for-next) >> but that hasn't seen any recent activity. > > I thought I sent you a note asking for audit to get pulled into -next > quite a while back. I'll resend... > Hrm, how did the Author get mangled? I remember it having a conflict when I tried to merge it (someone else had changed the same area of the header file). So I used patch -p1 and fixed up the reject by hand. I wonder if I screwed up and used git commit -a instead of git am --resolved? That is 2 things I should have caught on final review I missed. :-( Now to wait for everything else I screwed up...
[PATCH] fork: reorder permissions when violating number of processes limits
When a task is attempting to violate the RLIMIT_NPROC limit we have a check to see if the task is sufficiently priviledged. The check first looks at CAP_SYS_ADMIN, then CAP_SYS_RESOURCE, then if the task is uid=0. A result is that tasks which are allowed by the uid=0 check are first checked against the security subsystem. This results in the security subsystem auditting a denial for sys_admin and sys_resource and then the task passing the uid=0 check. This patch rearrainges the code to first check uid=0, since if we pass that we shouldn't hit the security system at all. We then check sys_resource, since it is the smallest capability which will solve the problem. Lastly we check the fallback everything cap_sysadmin. We don't want to give this capability many places since it is so powerful. This will eliminate many of the false positive/needless denial messages we get when a root task tries to violate the nproc limit. (note that kthreads count against root, so on a sufficiently large machine we can actually get past the default limits before any userspace tasks are launched.) Signed-off-by: Eric Paris --- kernel/fork.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 987b28a..09dbda3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1199,8 +1199,8 @@ static struct task_struct *copy_process(unsigned long clone_flags, retval = -EAGAIN; if (atomic_read(&p->real_cred->user->processes) >= task_rlimit(p, RLIMIT_NPROC)) { - if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) && - p->real_cred->user != INIT_USER) + if (p->real_cred->user != INIT_USER && + !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) goto bad_fork_free; } current->flags &= ~PF_NPROC_EXCEEDED; -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build warning after merge of the final tree (in Linus' tree)
On Wed, 2013-05-15 at 13:20 +1000, Stephen Rothwell wrote: > Hi , > > After merging the final tree, today's linux-next build (i386 defconfig) > produced this warning: > > kernel/auditfilter.c: In function 'audit_data_to_entry': > kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned only > in ISO C90 [enabled by default] > > Introduced by commit 780a7654cee8 ("audit: Make testing for a valid > loginuid explicit") from Linus' tree. Thank you, I'll fix it up. What am I likely missing that I don't see it on my builds? I'm using gcc 4.8. Is there a config option that enables additional warnings? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] powerpc: Optimise 64bit syscall auditing entry path
Patches 1 and 2 I applied for 3.10, but I'd really like to have someone who knows PPC ack 3 and 4. Especially if there is a hope that it goes through my tree... Link to original messages for your ease of review... http://marc.info/?l=linux-kernel&m=135768892320439&w=2 http://marc.info/?l=linux-kernel&m=135768895320472&w=2 -Eric - Original Message - > > Add an assembly fast path for the syscall audit entry path on > 64bit. Some distros enable auditing by default which forces us > through the syscall auditing path even if there are no rules. > > I wrote some test cases to validate the patch: > > http://ozlabs.org/~anton/junkcode/audit_tests.tar.gz > > And to test the performance I ran a simple null syscall > microbenchmark on a POWER7 box: > > http://ozlabs.org/~anton/junkcode/null_syscall.c > > Baseline: 949.2 cycles > Patched: 920.6 cycles > > An improvement of 3%. Most of the potential gains are masked by > the syscall audit exit path which will be fixed in a > subsequent patch. > > Signed-off-by: Anton Blanchard > --- > > Index: b/arch/powerpc/kernel/entry_64.S > === > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -34,6 +34,12 @@ > #include > #include > > +/* Avoid __ASSEMBLER__'ifying just for this. */ > +#include > +#define AUDIT_ARCH_PPC (EM_PPC) > +#define AUDIT_ARCH_PPC64 (EM_PPC64|__AUDIT_ARCH_64BIT) > +#define __AUDIT_ARCH_64BIT 0x8000 > + > /* > * System calls. > */ > @@ -244,6 +250,10 @@ syscall_error: > > /* Traced system call support */ > syscall_dotrace: > +#ifdef CONFIG_AUDITSYSCALL > + andi. r11,r10,(_TIF_SYSCALL_T_OR_A & ~_TIF_SYSCALL_AUDIT) > + beq audit_entry > +#endif > bl .save_nvgprs > addir3,r1,STACK_FRAME_OVERHEAD > bl .do_syscall_trace_enter > @@ -253,6 +263,7 @@ syscall_dotrace: >* for the call number to look up in the table (r0). >*/ > mr r0,r3 > +.Laudit_entry_return: > ld r3,GPR3(r1) > ld r4,GPR4(r1) > ld r5,GPR5(r1) > @@ -264,6 +275,34 @@ syscall_dotrace: > ld r10,TI_FLAGS(r10) > b .Lsyscall_dotrace_cont > > +#ifdef CONFIG_AUDITSYSCALL > +audit_entry: > + ld r4,GPR0(r1) > + ld r5,GPR3(r1) > + ld r6,GPR4(r1) > + ld r7,GPR5(r1) > + ld r8,GPR6(r1) > + > + andi. r11,r10,_TIF_32BIT > + beq 1f > + > + lis r3,AUDIT_ARCH_PPC@h > + ori r3,r3,AUDIT_ARCH_PPC@l > + clrldi r5,r5,32 > + clrldi r6,r6,32 > + clrldi r7,r7,32 > + clrldi r8,r8,32 > + bl .__audit_syscall_entry > + ld r0,GPR0(r1) > + b .Laudit_entry_return > + > +1: lis r3,AUDIT_ARCH_PPC64@h > + ori r3,r3,AUDIT_ARCH_PPC64@l > + bl .__audit_syscall_entry > + ld r0,GPR0(r1) > + b .Laudit_entry_return > +#endif > + > syscall_enosys: > li r3,-ENOSYS > b syscall_exit > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] audit: remove duplicated include from audit.c
net/netlink.h includes linux/netlink.h but linux/netlink.h does not include net/netlink.h since the audit code uses the nlmsg_* functions provided in net/netlink.h it seems we need this include. since the audit code uses netlink_unicast provided in linux/netlink.h it seems we need that include. I don't see duplication even if by chance one of them would have included the other... - Original Message - > From: Wei Yongjun > > Remove duplicated include. > > Signed-off-by: Wei Yongjun > --- > kernel/audit.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/kernel/audit.c b/kernel/audit.c > index 488f85f..9377913 100644 > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -58,7 +58,6 @@ > #ifdef CONFIG_SECURITY > #include > #endif > -#include > #include > #include > #include > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: audit: beautify code, for extern function, better to check its parameters by itself
- Original Message - > > __audit_socketcall is an extern function. > better to check its parameters by itself. > > also can return error code, when fail (find invalid parameters). > also use macro instead of real hard code number > also give related comments for it. > > Signed-off-by: Chen Gang > --- > include/linux/audit.h | 12 > kernel/auditsc.c |9 ++--- > net/socket.c |6 -- > 3 files changed, 18 insertions(+), 9 deletions(-) > > diff --git a/include/linux/audit.h b/include/linux/audit.h > @@ -354,7 +358,7 @@ static inline int audit_bprm(struct linux_binprm *bprm) > { > return 0; > } > -static inline void audit_socketcall(int nargs, unsigned long *args) > +static inline int audit_socketcall(int nargs, unsigned long *args) > { } > static inline void audit_fd_pair(int fd1, int fd2) > { } This now returns a value but you forgot to return a value. Thus this would not even build... I fixed it up myself. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs
We only allow one filter key per rule. So we should never be able to get into this situation. See audit_data_to_entry() -Eric - Original Message - > > in the 'fcount' looping, > if 'new->fields[*].type" has 2 or more AUDIT_FILTERKEYs > need judge new->filterkey whether has value, or memory leak. > > Signed-off-by: Chen Gang > --- > kernel/auditfilter.c |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c > index f9fc54b..936ac79 100644 > --- a/kernel/auditfilter.c > +++ b/kernel/auditfilter.c > @@ -859,6 +859,8 @@ struct audit_entry *audit_dupe_rule(struct audit_krule > *old) > &old->fields[i]); > break; > case AUDIT_FILTERKEY: > + if (new->filterkey) > + break; > fk = kstrdup(old->filterkey, GFP_KERNEL); > if (unlikely(!fk)) > err = -ENOMEM; > -- > 1.7.7.6 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs
- Original Message - > > > in another function: audit_data_to_entry: > > a. has the same issue for case AUDIT_WATCH. You are saying if there were 2 of them it will leak the old one? No. If you have 2 AUDIT_WATCH entries the first one will set entry->rule->watch and the second will bomb with EINVAL in audit_to_watch() -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs
- Original Message - > b. has an new issue for AUDIT_DIR: >after AUDIT_DIR succeed, it will set rule->tree. >next, the other case fail, then will call audit_free_rule. >but audit_free_rule will not free rule->tree. Definitely a couple of leaks here... I'm seeing leaks on size 8, 64, and 128. Al, what do you think? Should I be calling audit_put_tree() in the error case if entry->tree != NULL? The audit trees are some of the most complex code in the kernel I think. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs
- Original Message - > > also for function audit_list: > when call audit_make_reply fails (will return NULL). > we need free all its related variables instead of only kfree rull. > (such as call autit_free_rule) > > please help check, thanks. audit_free_rule() takes a struct audit_entry, not an audit_rule. struct audit_rule does not have additional things which need to be freed... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs
- Original Message - > > also for function audit_list_rules: > when call audit_make_reply fails (will return NULL). > we also need process data->buf, not only data itself. > > please help check, thanks. struct audit_rule_data { [...] charbuf[0]; /* string fields buffer */ }; The last element in the struct is 0 length. But the allocation in audit_krule_to_data() looks like: data = kmalloc(sizeof(*data) + krule->buflen, GFP_KERNEL); So now data->buf appears as an allocation of size krule->buflen. We do not need to free it separately. This is a pretty common C trick. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 00/48] Add namespace support for audit
What kernel are these patches against? On Tue, 2013-05-07 at 10:20 +0800, Gao feng wrote: > This patchset try to add namespace support for audit. > > I choose to assign audit to the user namespace. > Right now,there are six kinds of namespaces, such as > net, mount, ipc, pid, uts and user. the first five > namespaces have special usage. the audit isn't suitable to > belong to these five namespaces, so the user namespace > may be the best choice. > > Through I decide to make audit related resources per user > namespace, but audit uses netlink to communicate between kernel > space and user space, and the netlink is a private resource > of per net namespace. So we need the capability to allow the > netlink sockets to communicate with each other in the same user > namespace even they are in different net namespace. [PATCH 2/48] > does this job, it adds a new function "compare" for per netlink > table to compare two sockets. it means the netlink protocols can > has its own compare fuction, For other protocols, two netlink > sockets are different if they belong to the different net namespace. > For audit protocol, two sockets can be the same even they in different > net namespace,we use user namespace not net namespace to make the > decision. > > There is one point that some people may dislike,in [PATCH 1/48], > the kernel side audit netlink socket is created only when we create > the first netns for the userns, and this userns will hold the netns > until we destroy this userns. > > The other patches just make the audit related resources per > user namespace. > > This patchset is sent as an RFC,any comments are welcome. > > Gao feng (48): > Audit: make audit kernel side netlink sock per userns > netlink: Add compare function for netlink_table > Audit: implement audit self-defined compare function > Audit: make audit_skb_queue per user namespace > Audit: make audit_skb_hold_queue per user namespace > Audit: make kauditd_task per user namespace > Audit: make audit_pid per user namespace > Audit: make audit_nlk_portid per user namesapce > Audit: make audit_enabled per user namespace > Audit: change type of audit_ever_enabled to bool > Audit: make audit_ever_enabled per user namespace > Audit: make audit_initialized per user namespace > Audit: only allow init user namespace to change audit_rate_limit > Audit: only allow init user namespace to change audit_failure > Audit: allow to send netlink message to auditd in uninit user > namespace > Audit: user proper user namespace in audit_log_config_change > Audit: make kauditd_wait per user namespace > Audit: make audit_backlog_wait per user namespace > Audit: remove duplicate comments > Audit: introduce new audit logging interface for user namespace > Audit: pass proper user namespace to audit_log_common_recv_msg > Audit: Log audit config change in uninit user namespace > Audit: netfilter: Log xt table replace behavior in proper user > namespace > Audit: xt_AUDIT: Log audit message in proper user namespace > Audit: send reply message to the auditd in proper user namespace > Audit: make audit_inode_hash per user namespace > Audit: make tree_list per user namespace > Audit: make audit filter list per user namespace > Audit: make audit_krule belongs to user namespace > Audit: reply audit filter list request to proper user namespace > Audit: pass proper user namespace to audit_filter_syscall > Audit: pass proper user namespace to audit_filter_inode_name > Audit: Log filter related audit message to proper user namespace > Log audit tree related message in proper user namespace > Audit: Log task related audit message to proper user namespace > Audit: Log watch related audit message to proper user namespace > Audit: translate audit_log_start to audit_log_start_ns > Audit: tty: translate audit_log_start to audit_log_start_ns > Audit: netlabel: translate audit_log_start to audit_log_start_ns > Audit: ima: translate audit_log_start to audit_log_start_ns > Audit: lsm: translate audit_log_start to audit_log_start_ns > Audit: selinux: translate audit_log_start to audit_log_start_ns > Audit: xfrm: translate audit_log_start to audit_log_start_ns > Audit: rename audit_log_start_ns to audit_log_start > Audit: user audit_enabled_ns to replace audit_enabled > Audit: rename audit_enabled_ns to audit_enabled > Audit: make audit_log user namespace awared > Audit: allow root user of un-init user namespace to set audit > > drivers/tty/tty_audit.c | 9 +- > include/linux/audit.h | 44 ++-- > include/linux/netlink.h | 1 + > include/linux/user_namespace.h | 25 +++ > include/net/xfrm.h | 7 +- > kernel/audit.c | 393 > +--- > kernel/audit.h | 24 +-- > kernel/audit_tree.c | 49 ++--- > kernel/audit_watch.c| 23
Re: [PATCH] kernel: auditfilter: looping issue, memory leak if has 2 or more AUDIT_FILTERKEYs
- Original Message - > On 2013年04月11日 05:19, Eric Paris wrote: > > - Original Message - > > > >> > b. has an new issue for AUDIT_DIR: > >> >after AUDIT_DIR succeed, it will set rule->tree. > >> >next, the other case fail, then will call audit_free_rule. > >> >but audit_free_rule will not free rule->tree. > > Definitely a couple of leaks here... > > > > I'm seeing leaks on size 8, 64, and 128. > > > > Al, what do you think? Should I be calling audit_put_tree() in the error > > case if entry->tree != NULL? The audit trees are some of the most complex > > code in the kernel I think. > > > > > > can we add it in audit_free_rule ? > > maybe like this: > > @@ -75,6 +75,8 @@ static inline void audit_free_rule(struct audit_entry *e) > /* some rules don't have associated watches */ > if (erule->watch) > audit_put_watch(erule->watch); > + if (erule->tree) > + audit_put_tree(erule->tree); > if (erule->fields) > for (i = 0; i < erule->field_count; i++) { > struct audit_field *f = &erule->fields[i]; Where does the tree information get freed normally? That's the code you need to run down. You don't want to start getting double frees on the non-error case. I'll try to dig into it if Al doesn't. It's easy to show the leak on current kernels. while(1) auditctl -a exit,always -w /etc -F auid=-1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -next] audit: remove duplicated include from audit.c
Came in via the net-next tree from commit 941912133025926307c7a65b203fa38403b1063a I'll ping dmiller and friends. - Original Message - > Hi Eric, > > Sorry reply on top. > > From the source code from linex-next.git tree, line 55~64: > > #include > #include *** > #include > #ifdef CONFIG_SECURITY > #include > #endif > #include *** > #include > #include > #include > > net/netlink.h is included twice, and linux/netlink.h is not included by this > file. > > -Original Message- > From: Eric Paris [mailto:epa...@redhat.com] > Sent: 2013年4月11日 1:26 > To: Wei Yongjun > Cc: v...@zeniv.linux.org.uk; Yongjun Wei (RD-CN); > linux-kernel@vger.kernel.org > Subject: Re: [PATCH -next] audit: remove duplicated include from audit.c > > net/netlink.h includes linux/netlink.h but linux/netlink.h does not include > net/netlink.h > > since the audit code uses the nlmsg_* functions provided in net/netlink.h it > seems we need this include. > > since the audit code uses netlink_unicast provided in linux/netlink.h it > seems we need that include. > > I don't see duplication even if by chance one of them would have included the > other... > > - Original Message - > > From: Wei Yongjun > > > > Remove duplicated include. > > > > Signed-off-by: Wei Yongjun > > --- > > kernel/audit.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/kernel/audit.c b/kernel/audit.c index 488f85f..9377913 > > 100644 > > --- a/kernel/audit.c > > +++ b/kernel/audit.c > > @@ -58,7 +58,6 @@ > > #ifdef CONFIG_SECURITY > > #include > > #endif > > -#include > > #include > > #include > > #include > > > > > > > TREND MICRO EMAIL NOTICE > The information contained in this email and any attachments is confidential > and may be subject to copyright or other intellectual property protection. > If you are not the intended recipient, you are not authorized to use or > disclose this information, and we request that you notify us by reply mail or > telephone and delete the original message from your mail system. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] audit: replace obsolete NLMSG_* with type safe nlmsg_*
On Wed, Mar 27, 2013 at 12:49 PM, Hong Zhiguo wrote: > Signed-off-by: Hong Zhiguo > --- > kernel/audit.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/kernel/audit.c b/kernel/audit.c > index d596e53..4dbb047 100644 > --- a/kernel/audit.c > +++ b/kernel/audit.c > @@ -58,7 +58,7 @@ > #ifdef CONFIG_SECURITY > #include > #endif > -#include > +#include This is a duplicate include. We already included net/netlink.h a couple of lines above. We use netlink_unicast, which comes from linux/netlink.h. I recognize that it comes in from net/netlink.h, but I thought it good practice to pull it in ourselves. Should I send a separate patch to revert back to linux/netlink.h or a patch to just remove the include altogether? -Eric > #include > #include > #include > @@ -910,7 +910,7 @@ static void audit_receive_skb(struct sk_buff *skb) > { > struct nlmsghdr *nlh; > /* > -* len MUST be signed for NLMSG_NEXT to be able to dec it below 0 > +* len MUST be signed for nlmsg_next to be able to dec it below 0 > * if the nlmsg_len was not aligned > */ > int len; > @@ -919,13 +919,13 @@ static void audit_receive_skb(struct sk_buff *skb) > nlh = nlmsg_hdr(skb); > len = skb->len; > > - while (NLMSG_OK(nlh, len)) { > + while (nlmsg_ok(nlh, len)) { > err = audit_receive_msg(skb, nlh); > /* if err or if this message says it wants a response */ > if (err || (nlh->nlmsg_flags & NLM_F_ACK)) > netlink_ack(skb, nlh, err); > > - nlh = NLMSG_NEXT(nlh, len); > + nlh = nlmsg_next(nlh, len); > } > } > > @@ -1483,7 +1483,7 @@ void audit_log_end(struct audit_buffer *ab) > audit_log_lost("rate limit exceeded"); > } else { > struct nlmsghdr *nlh = nlmsg_hdr(ab->skb); > - nlh->nlmsg_len = ab->skb->len - NLMSG_SPACE(0); > + nlh->nlmsg_len = ab->skb->len - NLMSG_HDRLEN; > > if (audit_pid) { > skb_queue_tail(&audit_skb_queue, ab->skb); > -- > 1.7.10.4 > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-security-module" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build warning after merge of the final tree (in Linus' tree)
On Fri, 2013-05-17 at 10:47 +0530, Viresh Kumar wrote: > On Wed, May 15, 2013 at 7:02 PM, Eric Paris wrote: > > On Wed, 2013-05-15 at 13:20 +1000, Stephen Rothwell wrote: > >> Hi , > >> > >> After merging the final tree, today's linux-next build (i386 defconfig) > >> produced this warning: > >> > >> kernel/auditfilter.c: In function 'audit_data_to_entry': > >> kernel/auditfilter.c:426:3: warning: this decimal constant is unsigned > >> only in ISO C90 [enabled by default] > >> > >> Introduced by commit 780a7654cee8 ("audit: Make testing for a valid > >> loginuid explicit") from Linus' tree. > > > > Thank you, I'll fix it up. What am I likely missing that I don't see it > > on my builds? I'm using gcc 4.8. Is there a config option that enables > > additional warnings? > > Is this fixed? I couldn't find a relevant patch in linux-next/master for it. Something should be posted by the end of the day. Sorry for the delay. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in audit_copy_inode
I believe this was already found and fixed: https://lkml.org/lkml/2012/7/25/259 Which was pulled by Linus in: 3134f37e931d75931bdf6d4eacd82a3fd26eca7c -Eric On Wed, 2012-08-01 at 18:11 +0200, Miklos Szeredi wrote: > Hi Peter, > > Thanks for the report. > > Here's a patch. I haven't tested it but I'm pretty confident that it > fixes the bug. > > Thanks, > Miklos > > > Subject: vfs: fix audit_inode on negative dentry > From: Miklos Szeredi > > Peter Moody reported an oops in audit_copy_inode() and bisected it to commit > 7157486541 (vfs: do_last(): common slow lookup). > > The problem is that audit_inode() in do_last() is called with a negative > dentry. > > Previously the non-O_CREAT case didn't call audit_inode() here, but now both > O_CREAT and non-O_CREAT opens are handled by the same code. > > I really have no idea why this audit_inode() is needed here at all but am > afaid > to remove this for fear of breaking audit somehow. So just fix this case by > checking for a negative dentry. > > Reported-by: Peter Moody > Signed-off-by: Miklos Szeredi > CC: sta...@vger.kernel.org > --- > fs/namei.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > Index: linux-2.6/fs/namei.c > === > --- linux-2.6.orig/fs/namei.c 2012-08-01 17:47:28.0 +0200 > +++ linux-2.6/fs/namei.c 2012-08-01 17:49:26.0 +0200 > @@ -2607,10 +2607,12 @@ static int do_last(struct nameidata *nd, > goto finish_open_created; > } > > - /* > - * It already exists. > - */ > - audit_inode(pathname, path->dentry); > + if (path->dentry->d_inode) { > + /* > + * It already exists. > + */ > + audit_inode(pathname, path->dentry); > + } > > /* >* If atomic_open() acquired write access it is dropped now due to > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fix sel_netnode_insert suspicious rcu dereference.
I thought I HAD committed it months ago! Man I suck at this On Mon, Aug 6, 2012 at 3:10 PM, Paul Moore wrote: > On Mon, Aug 6, 2012 at 12:49 PM, Dave Jones wrote: >> On Tue, Jun 05, 2012 at 01:12:39AM -0400, Dave Jones wrote: >> > I reported this a year ago (https://lkml.org/lkml/2011/4/20/308). >> > It's still a problem apparently ... >> >> And another two months pass in silence. >> >> This is happening to other people too. >> https://bugzilla.redhat.com/show_bug.cgi?id=846037 >> >> Can someone please apply this patch, or at least point out what's wrong with >> it ? > > I thought Eric had applied it a few months ago, but I just checked and > I don't see it in Linus' tree for some reason? I know multiple > patches have been posted from different authors, all fixing the same > thing ... > > Acked-by: Paul Moore > > -- > paul moore > www.paul-moore.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] audit: Use a tracepoint for getname
cc'ing Jeff Layton who has recently done a lot of getname work and I want to make sure he sees this. On Wed, 19 Sep 2012 15:56:59 -0700 Arnaldo Carvalho de Melo wrote: > Al, Eric, > > Was this considered before? Acceptable? > > - Arnaldo > > --- > > Instead of an explicit hook only for audit, use a tracepoint, so that > other users that need to know about filenames can hook there just like > audit. > @@ -978,6 +986,9 @@ static int __init audit_init(void) > else > audit_sock->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT; > > + if (register_trace_getname(audit_getname, NULL)) > + audit_panic("cannot register getname tracepoint"); > + > skb_queue_head_init(&audit_skb_queue); > skb_queue_head_init(&audit_skb_hold_queue); > audit_initialized = AUDIT_INITIALIZED; I think we need to just use panic instead of audit_panic. This early at boot userspace would not have been able to tell the kernel that audit_panic == panic nor would the box die later if userspace ask for that functionality. Instead the box would run but audit would be broken, which customers who want audit_panic == panic would be most upset about. Otherwise, its good to me. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL pointer dereference in selinux_ip_postroute_compat
On Wed, Aug 8, 2012 at 3:38 PM, Eric Dumazet wrote: > On Wed, 2012-08-08 at 15:26 -0400, Paul Moore wrote: > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index ba39a52..027a331 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1524,6 +1524,10 @@ void ip_send_unicast_reply(struct net *net, struct > sk_buff *skb, __be32 daddr, > sk->sk_priority = skb->priority; > sk->sk_protocol = ip_hdr(skb)->protocol; > sk->sk_bound_dev_if = arg->bound_dev_if; > +#ifdef CONFIG_SECURITY > + if (!sk->sk_security && security_sk_alloc(sk, PF_INET, GFP_ATOMIC)) > + goto out; > +#endif > sock_net_set(sk, net); > __skb_queue_head_init(&sk->sk_write_queue); > sk->sk_sndbuf = sysctl_wmem_default; > @@ -1539,7 +1543,7 @@ void ip_send_unicast_reply(struct net *net, struct > sk_buff *skb, __be32 daddr, > skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb)); > ip_push_pending_frames(sk, &fl4); > } > - > +out: > put_cpu_var(unicast_sock); > > ip_rt_put(rt); Seems wrong. We shouldn't ever need ifdef CONFIG_SECURITY in core code. Ifndef CONF_SECURITY then security_sk_alloc() is a static inline return 0; I guess the question is "Where did the sk come from"? Why wasn't security_sk_alloc() called when it was allocated? Should it have been updated at some time and that wasn't done either? Seems wrong to be putting packets on the queue for a socket where the security data was never allocated and was never set to its proper state. there must be a bigger bug here... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL pointer dereference in selinux_ip_postroute_compat
On Wed, Aug 8, 2012 at 4:35 PM, Paul Moore wrote: > On Wednesday, August 08, 2012 10:09:38 PM Eric Dumazet wrote: > Actually, the issue is that the shared socket doesn't have an init/alloc > function to do the LSM allocation like we do with other sockets so Eric's > patch does it as part of ip_send_unicast_reply(). > > If we look at the relevant part of Eric's patch: > > +#ifdef CONFIG_SECURITY > + if (!sk->sk_security && security_sk_alloc(sk, PF_INET, GFP_ATOMIC)) > + goto out; > +#endif > > ... if we were to remove the CONFIG_SECURITY conditional we would end up > calling security_sk_alloc() each time through in the CONFIG_SECURITY=n case as > sk->sk_security would never be initialized to a non-NULL value. In the > CONFIG_SECURITY=y case it should only be called once as security_sk_alloc() > should set sk->sk_security to a LSM blob. Ifndef SECURITY this turns into (because security_sk_alloc is a static inline in that case) if (!sk->sk_security && 0) goto out; Which I'd hope the compiler would optimize. So that only leaves us caring about the case there CONFIG_SECURITY is true. In that case if we need code which does if !alloc'd then alloc it seems we broke the model of everything else in the code and added a branch needlessly. Could we add a __init function which does the security_sk_alloc() in the same file where we declared them? >> IMHO it seems wrong to even care about security for internal sockets. >> >> They are per cpu, shared for all users on the machine. > > The issue, from a security point of view, is that these sockets are sending > network traffic; even if it is just resets and timewait ACKs, it is still > network traffic and the LSMs need to be able to enforce security policy on > this traffic. After all, what would you say if your firewall let these same > packets pass without any filtering? > > The issue I'm struggling with at present is how should we handle this traffic > from a LSM perspective. The label based LSMs, e.g. SELinux and Smack, use the > LSM blob assigned to locally generated outbound traffic to identify the > traffic and apply the security policy, so not only do we have to resolve the > issue of ensuring the traffic is labeled correctly, we have to do it with a > shared socket (although the patch didn't change the shared nature of the > socket). > > For those who are interested, I think the reasonable labeling solution here is > to go with SECINITSID_KERNEL/kernel_t for SELinux and likely the ambient label > for Smack as in both the TCP reset and timewait ACK there shouldn't be any > actual user data present. I'm willing to accept that argument from an SELinux perspective. I'd also accept the argument that it is private and do something similar to what we do with IS_PRIVATE on inodes. Although sockets probably don't have a good field to use... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NULL pointer dereference in selinux_ip_postroute_compat
On Wed, Aug 8, 2012 at 5:03 PM, Paul Moore wrote: > On Wednesday, August 08, 2012 04:51:56 PM Eric Paris wrote: >> Could we add a __init function which does the security_sk_alloc() in >> the same file where we declared them? > > Is it safe to call security_sk_alloc() from inside another __init function? I > think in both the case of SELinux and Smack it shouldn't be a problem, but I'm > concerned about the more general case of calling a LSM hook potentially before > the LSM has been initialized. > > If that isn't an issue we could probably do something in ip_init(). The security_initcall() functions should happen way before __init functions. If an LSM busts, it's the LSM initializing itself too late not the code here being wrong... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock
On Thu, Aug 9, 2012 at 11:36 AM, Eric Dumazet wrote: > On Thu, 2012-08-09 at 11:07 -0400, Paul Moore wrote: > >> Is is possible to do the call to security_sk_alloc() in the ip_init() >> function >> or does the per-cpu nature of the socket make this a pain? >> > > Its a pain, if we want NUMA affinity. > > Here, each cpu should get memory from its closest node. I really really don't like it. I won't say NAK, but it is the first and only place in the kernel where I believe we allocate an object and don't allocate the security blob until some random later point in time. If it is such a performance issue to have the security blob in the same numa node, isn't adding a number of branches and putting this function call on every output at least as bad? Aren't we discouraged from GFP_ATOMIC? In __init we can use GFP_KERNEL. This still doesn't fix these sockets entirely. We now have the security blob allocated, but it was never set to something useful. Paul, are you looking into this? This is a bandaide, not a fix -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ipv4: tcp: security_sk_alloc() needed for unicast_sock
NAK. I personally think commit be9f4a44e7d41cee should be reverted until it is fixed. Let me explain what all I believe it broke and how. Old callchain of the creation of the 'equivalent' socket previous to the patch in question just for reference: inet_ctl_sock_create sock_create_kern __sock_create pf->create (inet_create) sk_alloc sk_prot_alloc security_sk_alloc() This WAS working properly. All of it. The equivalent struct sock was being created and allocated in inet_create(), which called to sk_alloc->sk_prot_alloc->security_sk_alloc(). We all agree that failing to call security_sk_alloc() is the first regression introduced. The second regression was the labeling issue. There was a call to security_socket_post_create (from __sock_create) which was properly setting the labels on both the socket and sock. This new patch broke that as well. We don't expose an equivalent security_sock_post_create() interface in the LSM currently, and until we do, this can't be fixed. It's why I say it should be reverted. I have a patch I'm testing right now which takes care of the first part the way I like (and yes, I'm doing the allocation on the correct number node). It basically looks like so: + for_each_possible_cpu(cpu) { + sock = &per_cpu(unicast_sock, cpu); + rc = security_sk_alloc(&sock->sk, PF_INET, GFP_KERNEL, cpu_to_node(cpu)); + if (rc) + return rc; + } I'm going to work right now on exposing the equivalent struct sock LSM interface so we can call that as well. But it's going to take me a bit. Attached is the patch just to (hopefully untested) shut up the panic. -Eric On Thu, Aug 9, 2012 at 10:50 AM, Eric Dumazet wrote: > From: Eric Dumazet > > commit be9f4a44e7d41cee (ipv4: tcp: remove per net tcp_sock) added a > selinux regression, reported and bisected by John Stultz > > selinux_ip_postroute_compat() expect to find a valid sk->sk_security > pointer, but this field is NULL for unicast_sock > > Fix this by adding a new 'kernel' parameter to security_sk_alloc(), > set to true if socket might already have a valid sk->sk_security > pointer. ip_send_unicast_reply() uses a percpu fake socket, so the first > call to security_sk_alloc() will populate sk->sk_security pointer, > subsequent ones will reuse existing context. > > Reported-by: John Stultz > Bisected-by: John Stultz > Signed-off-by: Eric Dumazet > Cc: Paul Moore > Cc: Eric Paris > Cc: "Serge E. Hallyn" > --- > include/linux/security.h |6 +++--- > net/core/sock.c|2 +- > net/ipv4/ip_output.c |4 +++- > security/security.c|4 ++-- > security/selinux/hooks.c |5 - > security/smack/smack_lsm.c | 10 -- > 6 files changed, 21 insertions(+), 10 deletions(-) > > diff --git a/include/linux/security.h b/include/linux/security.h > index 4e5a73c..4d8e454 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -1601,7 +1601,7 @@ struct security_operations { > int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb); > int (*socket_getpeersec_stream) (struct socket *sock, char __user > *optval, int __user *optlen, unsigned len); > int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff > *skb, u32 *secid); > - int (*sk_alloc_security) (struct sock *sk, int family, gfp_t > priority); > + int (*sk_alloc_security) (struct sock *sk, int family, gfp_t > priority, bool kernel); > void (*sk_free_security) (struct sock *sk); > void (*sk_clone_security) (const struct sock *sk, struct sock *newsk); > void (*sk_getsecid) (struct sock *sk, u32 *secid); > @@ -2539,7 +2539,7 @@ int security_sock_rcv_skb(struct sock *sk, struct > sk_buff *skb); > int security_socket_getpeersec_stream(struct socket *sock, char __user > *optval, > int __user *optlen, unsigned len); > int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff > *skb, u32 *secid); > -int security_sk_alloc(struct sock *sk, int family, gfp_t priority); > +int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool > kernel); > void security_sk_free(struct sock *sk); > void security_sk_clone(const struct sock *sk, struct sock *newsk); > void security_sk_classify_flow(struct sock *sk, struct flowi *fl); > @@ -2667,7 +2667,7 @@ static inline int > security_socket_getpeersec_dgram(struct socket *sock, struct s > return -ENOPROTOOPT; > } > > -static inline int security_sk_alloc(struct sock *sk, int family, gfp_t &g
Re: Friendlier EPERM - Request for input
On Fri, 2013-01-11 at 00:14 +0900, Tetsuo Handa wrote: > The reason I think is that people turn off LSMs because they are using LSMs > without understanding "what the current configuration is" and/or "how to > change > configuration". People do not spend (or cannot afford spending) resources for > understanding LSM's configuration. This is not the point I am arguing. This is not about LSMs, how hard they are to configure, or how to 'fix' them. It certainly isn't about how one LSM is better, easier, or superior to another. This is about getting more information in userspace when operations fail. I'll quote an off list e-mail I received: Friendlier/more complete error messages would eliminate an awful lot of digging around trying to figure *what* the problem is, preparatory to discerning *where* the problem is and *how* to fix it. There are so many things that might go into fixing problems in an LSM. That's what you are talking about, but it isn't relevant here. This is about knowing WHAT the problem is, maybe helping with where and how. And this isn't just about LSM. Heck, LSMs are just a small part of it. I want extended errno interface to talk about DAC, capabilities, ACLs, LSMs, everything! > > The audit log is complete crap from a usability PoV. If a web admin is > > TOMOYO's audit log is very useful. TOMOYO is a security tool but is also > useful > for education/training/debugging/development/profiling etc. The TOMOYO audit log is a very poor fit for this as well. I'm not trying to be rude, but there is no reasonable way for applications to use it, it is TOMOYO specific, and it doesn't cover non-LSM errors. I want applications like httpd to be able to put what went wrong in its log message. I want python to be able to get extended information and present that up the stack. Nothing we have today comes close. My proposal isn't perfect, it suffers from the same problems as errno (except even worse because it is harder to save and restore), but at least it will usually be helpful... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ima: policy search speedup
On Wed, Dec 12, 2012 at 8:56 AM, Kasatkin, Dmitry wrote: > I have done few tests. > Ratio between lookups is different, but I do not really remember what > exactly it was. > Probably I did measurement with directory integrity protection... > > First test is done using upstream IMA. > > IMA-appraisal > [ 59.554072] counter1 - before (inode->i_flags & S_NOIMA): 74586 > [ 59.554628] counter2 - after (inode->i_flags & S_NOIMA): 74558 > [ 59.555024] counter3 - after (inode->i_sb->s_feature_flags & SF_NOIMA): > 52895 > > You can see 3 counters after 55 seconds uptime. > Counters count entries to the policy search function. > First counter counts every entry. > Second counter counts if control passes after inode flag S_NOIMA. > Third counts if control passes after SB flag SF_NOIMA. > > As you can see, inode flag does not make a difference - just 28 > entries differences. That is quite shocking to me that on system start up we only ever reference the same /proc file 28 times. Wow. I just don't know what to say. Obviously the per inode flag is dead. Now there is probably some CPU benefit to a per-sb flag, but I'm not sure it is worth the code to do it if you can't really measure that as something that makes a difference in userspace. Maybe if you can show that after your integrity protection work is done we can revisit. But I'd think this patch is dead until you can show the impact on real userspace. Someone else might disagree, but that's my thought. Sorry. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] audit: clean up refcounting in audit-tree
On Mon, 2012-06-25 at 19:46 +0200, Miklos Szeredi wrote: > From: Miklos Szeredi > > Drop the initial reference by fsnotify_init_mark early instead of > audit_tree_freeing_mark() at destroy time. > > In the cases we destroy the mark before we drop the initial reference we need > to > get rid of the get_mark that balances the put_mark in > audit_tree_freeing_mark(). > > Signed-off-by: Miklos Szeredi I don't love this. The reference from fsnotify_init_mark() should live until fsnotify_mark_destroy(). If we need to drop a reference in audit_tree_freeing_mark() we should be taking that elsewhere > --- > kernel/audit_tree.c | 12 +--- > 1 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c > index 31fdc48..7b95706 100644 > --- a/kernel/audit_tree.c > +++ b/kernel/audit_tree.c > @@ -292,6 +292,7 @@ static void untag_chunk(struct node *p) > spin_unlock(&hash_lock); > spin_unlock(&entry->lock); > fsnotify_destroy_mark(entry); > + fsnotify_put_mark(&new->mark); /* drop initial reference */ > goto out; > > Fallback: > @@ -330,7 +331,6 @@ static int create_chunk(struct inode *inode, struct > audit_tree *tree) > spin_unlock(&hash_lock); > chunk->dead = 1; > spin_unlock(&entry->lock); > - fsnotify_get_mark(entry); > fsnotify_destroy_mark(entry); > fsnotify_put_mark(entry); > return 0; > @@ -346,6 +346,7 @@ static int create_chunk(struct inode *inode, struct > audit_tree *tree) > insert_hash(chunk); > spin_unlock(&hash_lock); > spin_unlock(&entry->lock); > + fsnotify_put_mark(entry); /* drop initial reference */ > return 0; > } > > @@ -411,7 +412,6 @@ static int tag_chunk(struct inode *inode, struct > audit_tree *tree) > spin_unlock(&chunk_entry->lock); > spin_unlock(&old_entry->lock); > > - fsnotify_get_mark(chunk_entry); > fsnotify_destroy_mark(chunk_entry); > > fsnotify_put_mark(chunk_entry); > @@ -444,6 +444,7 @@ static int tag_chunk(struct inode *inode, struct > audit_tree *tree) > spin_unlock(&chunk_entry->lock); > spin_unlock(&old_entry->lock); > fsnotify_destroy_mark(old_entry); > + fsnotify_put_mark(chunk_entry); /* drop initial reference */ > fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */ > return 0; > } > @@ -915,7 +916,12 @@ static void audit_tree_freeing_mark(struct fsnotify_mark > *entry, struct fsnotify > struct audit_chunk *chunk = container_of(entry, struct audit_chunk, > mark); > > evict_chunk(chunk); > - fsnotify_put_mark(entry); > + > + /* > + * We are guaranteed to have at least one reference to the mark from > + * either the inode or the caller of fsnotify_destroy_mark(). > + */ > + BUG_ON(atomic_read(&entry->refcnt) < 1); > } > > static bool audit_tree_send_event(struct fsnotify_group *group, struct inode > *inode, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] audit: fix refcounting in audit-tree
On Mon, 2012-06-25 at 19:46 +0200, Miklos Szeredi wrote: > From: Miklos Szeredi > > Refcounting of fsnotify_mark in audit tree is broken. E.g: > > refcount > create_chunk > alloc_chunk 1 > fsnotify_add_mark 2 > > untag_chunk > fsnotify_get_mark 3 > fsnotify_destroy_mark > audit_tree_freeing_mark 2 > fsnotify_put_mark 1 > fsnotify_put_mark 0 > via destroy_list > fsnotify_mark_destroy-1 > > This was reported by various people as triggering Oops when stopping auditd. > > We could just remove the put_mark from audit_tree_freeing_mark() but that > would > break freeing via inode destruction. So this patch simply omits a put_mark > after calling destroy_mark (or adds a get_mark before). > > Next patch will clean up the remaining mess. > > Signed-off-by: Miklos Szeredi > CC: sta...@vger.kernel.org Agreed this is needed. My changelog was: audit: fix ref count problems in audit trees Before the switch from in kernel inotify to fsnotify for audit trees the code regularly did: inotify_evict_watch(&chunk->watch); put_inotify_watch(&chunk->watch); I translated this in fsnotify_speak into: fsnotify_destroy_mark_by_entry(chunk_entry); fsnotify_put_mark(chunk_entry); The problem is that the inotify_evict_watch function actually took a reference on chunk->watch, which is what was being dropped by put_inotify_watch(). The fsnotify code does not take such a reference during fsnotify_destroy_mark_by_entry(). Thus we are dropping reference counts prematurely and eventually we hit a use after free! Whoops! Fix these call sites to not drop the extra reference. Reported-by: Valentin Avram Reported-by: Peter Moody Partial-patch-by: Marcelo Cerri Signed-off-by: Eric Paris Maybe you can use some of that changelog in your next post? (If you do one?) The only reason you would repost is because I don't understand why you took a ref in some places instead of just not dropping it everywhere... > --- > kernel/audit_tree.c |5 ++--- > 1 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c > index d52d247..31fdc48 100644 > --- a/kernel/audit_tree.c > +++ b/kernel/audit_tree.c > @@ -250,7 +250,6 @@ static void untag_chunk(struct node *p) > spin_unlock(&hash_lock); > spin_unlock(&entry->lock); > fsnotify_destroy_mark(entry); > - fsnotify_put_mark(entry); > goto out; > } > > @@ -293,7 +292,6 @@ static void untag_chunk(struct node *p) > spin_unlock(&hash_lock); > spin_unlock(&entry->lock); > fsnotify_destroy_mark(entry); > - fsnotify_put_mark(entry); > goto out; > > Fallback: > @@ -332,6 +330,7 @@ static int create_chunk(struct inode *inode, struct > audit_tree *tree) > spin_unlock(&hash_lock); > chunk->dead = 1; > spin_unlock(&entry->lock); > + fsnotify_get_mark(entry); > fsnotify_destroy_mark(entry); > fsnotify_put_mark(entry); > return 0; Like here? Why not just avoid the atomic op altogether? > @@ -412,6 +411,7 @@ static int tag_chunk(struct inode *inode, struct > audit_tree *tree) > spin_unlock(&chunk_entry->lock); > spin_unlock(&old_entry->lock); > > + fsnotify_get_mark(chunk_entry); > fsnotify_destroy_mark(chunk_entry); > > fsnotify_put_mark(chunk_entry); > @@ -445,7 +445,6 @@ static int tag_chunk(struct inode *inode, struct > audit_tree *tree) > spin_unlock(&old_entry->lock); > fsnotify_destroy_mark(old_entry); > fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */ > - fsnotify_put_mark(old_entry); /* and kill it */ > return 0; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] audit: don't free_chunk() after fsnotify_add_mark()
On Mon, 2012-06-25 at 19:46 +0200, Miklos Szeredi wrote: > From: Miklos Szeredi > > Don't do free_chunk() after fsnotify_add_mark(). That one does a delayed > unref > via the destroy list and this results in use-after-free. > > Signed-off-by: Miklos Szeredi > CC: sta...@vger.kernel.org Al, can you send these along? I am not going to see a computer again for a month. Hurray! Acked-by: Eric Paris > --- > kernel/audit_tree.c |6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c > index 5bf0790..d52d247 100644 > --- a/kernel/audit_tree.c > +++ b/kernel/audit_tree.c > @@ -259,7 +259,7 @@ static void untag_chunk(struct node *p) > > fsnotify_duplicate_mark(&new->mark, entry); > if (fsnotify_add_mark(&new->mark, new->mark.group, new->mark.i.inode, > NULL, 1)) { > - free_chunk(new); > + fsnotify_put_mark(&new->mark); > goto Fallback; > } > > @@ -322,7 +322,7 @@ static int create_chunk(struct inode *inode, struct > audit_tree *tree) > > entry = &chunk->mark; > if (fsnotify_add_mark(entry, audit_tree_group, inode, NULL, 0)) { > - free_chunk(chunk); > + fsnotify_put_mark(entry); > return -ENOSPC; > } > > @@ -396,7 +396,7 @@ static int tag_chunk(struct inode *inode, struct > audit_tree *tree) > fsnotify_duplicate_mark(chunk_entry, old_entry); > if (fsnotify_add_mark(chunk_entry, chunk_entry->group, > chunk_entry->i.inode, NULL, 1)) { > spin_unlock(&old_entry->lock); > - free_chunk(chunk); > + fsnotify_put_mark(chunk_entry); > fsnotify_put_mark(old_entry); > return -ENOSPC; > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: inotify: notify IN_DELETE_SELF when file is deleted or inode deleted ?
It is inode oriented. Makes it a pain to work with, but that's how it is. Sorry! On Fri, Jun 29, 2012 at 10:06 AM, Pierre PEIFFER wrote: > Hi, > > By playing with inotify (on user side) to know whether the file I'm using is > deleted by someone else, I have noted that I do not receive the > IN_DELETE_SELF event, exactly because I'm using it. By checking into kernel > area, I see that at unlink(), an IN_ATTRIB event is sent, but IN_DELETE_SELF > is, indeed, sent only when the inode is deleted. > > But such IN_ATTRIB event doesn't tell to the user what has changed among > permissions, timestamps, link count, etc... > So it doesn't much help. Of course, I have noted that I can monitor the > parent directory for IN_DELETE and then check which file has been deleted; > few more stuff to do but it works, no pb. > > But I'm still wondering after reading in the man.: >IN_DELETE_SELFWatched file/directory was itself deleted. > Is this really the expected behavior ? Shouldn't the kernel trig such event > at unlink() ? Or is inotify clearly inode oriented ? > > Thanks, > > Pierre > > PS: keep me in cc, I'm not subscribed to the list. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Friendlier EPERM - Request for input
Getting an EPERM/EACCES in userspace really kinda blows. As a user you don't have any idea why you got it. It could be SELinux, it could be rwx bits on the file, it could be a missing capability, it could be an ACL, it could be who knows what. We'd like to start figuring out the who knows what and hopefully find a way to expose that to userspace. My initial thought is a small buffer in the task struct in which the kernel can dump some data when it is going to send back an EPERM. I was thinking of exposing that data via a /proc/self/task/[tid]/file which userspace could poll after an EPERM. The hope would be to all userspace a better understanding of why it failed. Wouldn't it be nice if instead of httpd log just saying 'permission denied' it would be able to say 'permision denied because the file was 640"? I was thinking this buffer would look something like (very pseudo codie) struct friendly_response { enum { NONE, CAPABILITIES, SELINUX, FILE_PERMS } type; union { struct capabilities { int cap; }; struct selinux { __u32 source_label __u32 target_label __u32 operation; __u32 tclass; }; struct file_perms { int uid; int gid; int file_uid; int file_gid; int file_mode; }; }; }; The idea being that the buffer should only hold stateless data which can be overwritten or destroyed on the next EPERM. Reading the proc file could then translate that into something a little more useful, but hopefully still computer friendly enough that it can be translated and useful to a user/developer. So after a denial from file permissions the proc file would return a string like: 3 500 500 0 0 0640 ^ ^ ^ ^ ^ ^--- Mode | | | | |--- File Gid | | | |---File Uid | | |--- User Gid | |---User Uid |--- Type for failure (file perms) Or from SELinux: 2 unconfined_u:unconfined_r:unconfined_t:s0 system_u:system_r:system_t:s0 { read } file This would take patching hundreds of call sites in the kernel to fill the buffer. But just getting extended denial information in a couple of the hot spots would be a huge win. Put it in capable(), LSM hooks, the open() syscall and path walk code. It was pointed out that this isn't perfect, and I'm well aware of that. Lots of userspace libraries do things like save errno, make other kernel calls which could overwrite the buffer, then pass errno up the stack. I don't believe there is much that can done about that. Other than userspace trying to save the extended error information as well? One suggestion was to make it a ring buffer of some configurable size instead. I'm not sure how a program could find the 'right' message in the ring buffer, but it is a thought to help the problem of intervening kernel calls overwriting the last extended error info. I was pointed to the win32 api for getLastError() as something to look at. This API is mostly like errno/strerror in libc, except without the easy ability to save it for later. My new proposed API would have the same problems. No 'easy' way to save it for later. (also getLastError() doesn't pass any kind of extended information, so it seems to solve a different problem and I haven't seen any OS that is doing this, so point me to places I can learn) Another thought was that some people might want to turn off the ability to get this information. I don't see a reason a sysctl couldn't be used to disable extended errno information if the admin so chose. But maybe those great minds on the lists can help me think of ways to get Friendlier denials that I haven't thought of. Please. What are you thoughts, concerns, issues? -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Friendlier EPERM - Request for input
>On Wed, 2013-01-09 at 11:04 -0500, Eric Paris wrote: > >Getting an EPERM/EACCES in userspace really kinda blows. As a user you > >don't have any idea why you got it. Stephen Smalley wrote: > What if the denial was due to lacking sufficient permission to stat > the file (in which case you shouldn't leak the file's attributes to the > caller)? > > Also, the present inability of (unprivileged) userspace to distinguish > DAC from MAC denials is relied upon to avoid any covert channel when > the DAC check happens before the MAC check, as otherwise you leak > information about the DAC attributes. If you are going to indicate to > all of userspace whether it was DAC or MAC, you have to make sure that > the MAC check / LSM hook always comes first in every case. Which then > leads to lots of unnecessary MAC audit messages that would have been > denied by DAC anyway. I'd have to fall back to my original thought about who to expose this information to: > >Another thought was that some people might want to turn off the ability > >to get this information. I don't see a reason a sysctl couldn't be used > >to disable extended errno information if the admin so chose. On systems with a strict security policy worried about such things this would quite reasonably need to be disabled. But most of the reason people turn off LSMs is because it gets in the way and they get pissed getting an EPERM, checking rwx bits, having no idea WTF happened, and then being REALLY pissed at the LSM when they figure out that was the problem. I want to put it right there, out front, here is what went wrong for the common case. I mean, we sorta, kinda sometimes, poorly give this information. Isn't DAC usually EPERM and LSMs usually EACCES. From a DAC PoV exposing the DAC info is completely reasonable. You can't hide a stat(), even setting chmod 000. So this is an LSM issue. Not an issue for DAC/Capabilities/ACLs. We can add an LSM hook on the /proc file to make sure nothing can get the data even if the global sysctl allows such access. We could if it was worth it make filling in the extended dial information an LSM hook where you can check on a per denial basis. But I don't see the need. > Privileged userspace already has a way to identify when it was a MAC > denial, via the audit logs. But those are considered > security-sensitive, and so exposing that same information to > unprivileged userspace will be a problem. The audit log is complete crap from a usability PoV. If a web admin is trying to figure out why his web server isn't serving out web pages in /home he's going to look in the web server logs and see EPERM. That's it. Now he has to start blindly guessing which of the many possible reasons for EPERM is his problem. Wouldn't it be better if the information was actually available? Using the audit log in real time is a bloody nightmare. As the audit maintainer I know how to use ausearch, but then again, as the SELinux maintainer I know how to figure out which of the many possibilities from EPERM are the cause. But I want to make it easier for a normal admin to figure that out. I know many people are worried about information leaks, so I'll right up front say lets add the sysctl to disable the interface for those who are concerned about the metadata information leak. But for most of us I want that data right when it happens, where it happens, so It can be exposed, used, and acted upon by the admin trying to troubleshoot why the shit just hit the fan. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Friendlier EPERM - Request for input
On Wed, 2013-01-09 at 12:14 -0800, Casey Schaufler wrote: > On 1/9/2013 11:43 AM, Eric Paris wrote: > > I know many people are worried about information leaks, so I'll right up > > front say lets add the sysctl to disable the interface for those who are > > concerned about the metadata information leak. But for most of us I > > want that data right when it happens, where it happens, so It can be > > exposed, used, and acted upon by the admin trying to troubleshoot why > > the shit just hit the fan. > > How on earth is your webadmin supposed to match the failure > of a system call with the content of /proc/8675309/whydiditfail > at the time of the failure? It's not like he's going to go into > the apache source and add code to look at what's there. > > Unfortunately, what you probably need is an interface that gives > the program access to the audit records it has generated. The first thought was wanting to add it by default to perror/strerror in libc, but libc guys didn't think it reasonable since it couldn't be save/restored across calls, like errno. Instead I hope for a libc function, something like char *get_extended_error_info(void), which outputs a localized string based on the information in the /proc interface. Programs like httpd would need to be changed to use this function when logging denials. Programs like python would use this interface to populate additional information in the exception they pass up the stack. I agree that something access to the audit record is appropriate. It's sorta like what I'm proposing. But audit is not really useful here. It doesn't collect information about any denial other than LSM. And the number of DAC or capabilities denials on a normally operating box is, I expect, quite large. Thus we can't pump them all into audit just on the unlikely chance something cares. Setting a couple of ints is cheap. Audit is really nice for people will well defined security goals and information retention and analysis purposes, but it kinda completely sucks butt for a normal sysadmin or developer... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Friendlier EPERM - Request for input
On Wed, 2013-01-09 at 12:53 -0800, Casey Schaufler wrote: > Let me try again, I think I didn't quite get the idea across. > > I'm suggesting that the string returned by get_extended_error_info() > ought to be the audit record the system call would generate, regardless > of whether the audit system would emit it or not. > If the audit record doesn't have the information you need we should > fix the audit system to provide it. Any bit of the information in > the audit record might be relevant, and your admin or developer might > need to see it. > > I'm suggesting using the audit record because there are tools to > examine them and it's a pity to use a different format instead of > fixing the one that's already there. I get the point. My problem with using audit records is that they have to be stored on disk, forever. We have to store a record on disk for EVERY denial because of rwx bits, acls, capabilities, LSM, etc. We don't do that today and I'm scared of disk growth explosion. Then we could have a kernel interface, say get_last_audit_record(), which could query the audit system for that record number. A thought on disk size explosion might be something like generating these records in the kernel and just store them in the task struct until some later point in time. If userspace calls get_last_audit_record() we might be able to dump the record to auditd. If another record comes along we have to free the last one and replace it. Lot more of a perf hit than setting a couple of ints and taking the hit at the time when userspace actually wants to collect/use this information. But are we just building up a rube goldburg machine? I don't see a problem storing the last audit record if it exists, but I don't like making audit part of the normal workflow. I'd do it if others like that though -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Friendlier EPERM - Request for input
On Wed, 2013-01-09 at 21:59 +0100, Jakub Jelinek wrote: > On Wed, Jan 09, 2013 at 12:53:40PM -0800, Casey Schaufler wrote: > > I'm suggesting that the string returned by get_extended_error_info() > > ought to be the audit record the system call would generate, regardless > > of whether the audit system would emit it or not. > > What system call would that info be for and would it be reset on next > syscall that succeeded, or also failed? > > The thing is, various functions e.g. perform some syscall, save errno, do > some other syscall, and if they decide that the first syscall should be what > determines the whole function's errno, just restore errno from the saved > value and return. Similarly, various functions just set errno upon > detecting some error condition in userspace. > There is no 1:1 mapping between many libc library calls and syscalls. > So, when would it be safe to call this new get_extended_error_info function > and how to determine to which syscall it was relevant? I was thinking of it to be the last kernel error. So if the first and that second operation caused the kernel to want to make available extended errno information you would end up with the second. I see this is an informative piece of information, not normative. Not a replacement for errno. I'm hoping for a best effort way to provide extended errno information. It would be really neat for libc to have a way to save and restore the extended errno information, maybe even supply its own if it made the choice in userspace, but that sounds really hard for the first pass. I mean it would be great if we could rewrite every system call with a cookie so userspace could reliably match things back up, but I just don't see that as practical. Instead we do the best we can and help admins and developers most of the time, instead of none of the time. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] Audit: break a large single execve argument into smaller records
support single arguments that are large, not just large lists of execve args. This also means we never have to get a kernel buffer larger than MAX_EXECVE_AUDIT_LEN no matter how large the argument is. Before this patch we could need to allocate 32 consecutive pages to hold one argument which could pretty easily oom. a single argument larger than MAX_EXECVE_AUDIT_LEN is broken into multiple records and have a format like a10[0] a10[1] a10[2] etc. Signed-off-by: Eric Paris <[EMAIL PROTECTED]> --- example audit log (about 50k long) for the whole patch series can be found at http://people.redhat.com/~eparis/audit/audit.log the execve in question was something like: program_name [about 50 arguments] [one argument which is about 17k long] [about 1000 arguments] kernel/auditsc.c | 42 + 1 files changed, 42 insertions(+), 0 deletions(-) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 4176db6..ffc8d4b 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -853,6 +853,48 @@ static void audit_log_execve_info(struct audit_context *context, send_sig(SIGKILL, current, 0); } + if (unlikely(len > MAX_EXECVE_AUDIT_LEN)) { + /* deal with single arugments > MAX_EXECVE_AUDIT_LEN */ + int j; + const long tmplen = sizeof(char) * MAX_EXECVE_AUDIT_LEN; + + buf = kmalloc(tmplen + 1, GFP_KERNEL); + if (!buf) { + audit_panic("out of memory for argv string\n"); + return; + } + buf[tmplen] = '\0'; + for (j = 0; len > 0; j++) { + if (len > tmplen) { + ret = copy_from_user(buf, p, tmplen); + p += tmplen; + len -= tmplen; + } else { + ret = copy_from_user(buf, p, len); + /* p is at the next arg */ + p += len; + /* 27 is the max length of a%d[%d] */ + len_sent = len + 27; + len = 0; + } + if (ret) { + WARN_ON(1); + send_sig(SIGKILL, current, 0); + } + audit_log_end(*ab); + *ab = audit_log_start(context, GFP_KERNEL, + AUDIT_EXECVE); + if (!*ab) { + kfree(buf); + return; + } + audit_log_format(*ab, "a%d[%d]=", i, j); + audit_log_untrustedstring(*ab, buf); + audit_log_format(*ab, "\n"); + } + continue; + } + buf = kmalloc(len, GFP_KERNEL); if (!buf) { audit_panic("out of memory for argv string\n"); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] Audit: break up execve argument lists into multiple records
Break the auditing of a list of execve arguments into smaller records if there are a too many. The limit is currently around 7.5k of arguments as userspace has an 8k buffer limit and will drop messages which are longer. Signed-off-by: Eric Paris <[EMAIL PROTECTED]> --- Basically the same patch as last time, used a #define, cleaned up a memory leak on a malloc failure code path. Other than that it's the same. 7500 also is a good size because it means we never need more than a 2 page allocation. I'd say this is a good thing as even if we used a full netlink message of 32k its just making it harder on the kernel to have the memory it needs. I think userspace will want to get fixed eventually to handle a full 32k just in case, but keeping the kernel under 8k when we know we can seems like a good idea. kernel/auditsc.c | 39 +-- 1 files changed, 33 insertions(+), 6 deletions(-) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 04f3ffb..4176db6 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -78,6 +78,9 @@ extern struct list_head audit_filter_list[]; /* Indicates that audit should log the full pathname. */ #define AUDIT_NAME_FULL -1 +/* no execve audit message should be longer than this (userspace limits) */ +#define MAX_EXECVE_AUDIT_LEN 7500 + /* number of audit rules */ int audit_n_rules; @@ -819,11 +822,12 @@ static int audit_log_pid_context(struct audit_context *context, pid_t pid, return rc; } -static void audit_log_execve_info(struct audit_buffer *ab, +static void audit_log_execve_info(struct audit_context *context, + struct audit_buffer **ab, struct audit_aux_data_execve *axi) { int i; - long len, ret; + long len, ret, len_sent = 0; const char __user *p; char *buf; @@ -833,7 +837,11 @@ static void audit_log_execve_info(struct audit_buffer *ab, p = (const char __user *)axi->mm->arg_start; for (i = 0; i < axi->argc; i++, p += len) { + char tmp_buf[12]; + /* how many digits are in i? */ + int i_len = snprintf(tmp_buf, 12, "%d", i); len = strnlen_user(p, MAX_ARG_STRLEN); + /* * We just created this mm, if we can't find the strings * we just copied into it something is _very_ wrong. Similar @@ -862,9 +870,28 @@ static void audit_log_execve_info(struct audit_buffer *ab, send_sig(SIGKILL, current, 0); } - audit_log_format(ab, "a%d=", i); - audit_log_untrustedstring(ab, buf); - audit_log_format(ab, "\n"); + /* +* If there are a lot of args just break them into multiple +* messages. the last ab started will get closed by the +* caller. +* +* + 3 + i_len because we know at least a = and \n will be sent +* as well as the number of digits in i (i_len). +*/ + len_sent += (len + 3 + i_len); + if (len_sent > MAX_EXECVE_AUDIT_LEN) { + len_sent = len + 3 + i_len; + audit_log_end(*ab); + *ab = audit_log_start(context, GFP_KERNEL, AUDIT_EXECVE); + if (!*ab) { + kfree(buf); + return; + } + } + + audit_log_format(*ab, "a%d=", i); + audit_log_untrustedstring(*ab, buf); + audit_log_format(*ab, "\n"); kfree(buf); } @@ -1010,7 +1037,7 @@ static void audit_log_exit(struct audit_context *context, struct task_struct *ts case AUDIT_EXECVE: { struct audit_aux_data_execve *axi = (void *)aux; - audit_log_execve_info(ab, axi); + audit_log_execve_info(context, &ab, axi); break; } case AUDIT_SOCKETCALL: { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] Audit: remove the limit on execve arguments when audit is running
Remove the limitation on argv size. The audit system now logs arguments in smaller chunks (currently about 8k due to userspace audit system buffer sizes) so this is no longer a requirement. Signed-off-by: Eric Paris <[EMAIL PROTECTED]> Acked-by: Peter Zijlstra <[EMAIL PROTECTED]> --- This patch hasn't changed since the last series, just reposted as 3/3 and rediffed. kernel/auditsc.c | 10 -- kernel/sysctl.c | 11 --- 2 files changed, 0 insertions(+), 21 deletions(-) diff --git a/kernel/auditsc.c b/kernel/auditsc.c index ffc8d4b..5d39727 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1917,8 +1917,6 @@ int __audit_ipc_set_perm(unsigned long qbytes, uid_t uid, gid_t gid, mode_t mode return 0; } -int audit_argv_kb = 32; - int audit_bprm(struct linux_binprm *bprm) { struct audit_aux_data_execve *ax; @@ -1927,14 +1925,6 @@ int audit_bprm(struct linux_binprm *bprm) if (likely(!audit_enabled || !context || context->dummy)) return 0; - /* -* Even though the stack code doesn't limit the arg+env size any more, -* the audit code requires that _all_ arguments be logged in a single -* netlink skb. Hence cap it :-( -*/ - if (bprm->argv_len > (audit_argv_kb << 10)) - return -E2BIG; - ax = kmalloc(sizeof(*ax), GFP_KERNEL); if (!ax) return -ENOMEM; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 53a456e..88e5d06 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -77,7 +77,6 @@ extern int percpu_pagelist_fraction; extern int compat_log; extern int maps_protect; extern int sysctl_stat_interval; -extern int audit_argv_kb; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static int maxolduid = 65535; @@ -347,16 +346,6 @@ static ctl_table kern_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, -#ifdef CONFIG_AUDITSYSCALL - { - .ctl_name = CTL_UNNUMBERED, - .procname = "audit_argv_kb", - .data = &audit_argv_kb, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = &proc_dointvec, - }, -#endif { .ctl_name = KERN_CORE_PATTERN, .procname = "core_pattern", - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow Kconfig to set default mmap_min_addr protection
On Thu, 2007-12-20 at 00:29 +0100, Jan Engelhardt wrote: > On Dec 19 2007 16:59, Eric Paris wrote: > > > >+config SECURITY_DEFAULT_MMAP_MIN_ADDR > >+int "Low address space to protect from user allocation" > > Hm, should not this be 'hex'? I guess it could be, but the input for /proc/sys/vm/mmap_min_addr is base 10 as well so I figured consistency was a good thing. Do you have strong feelings? I guess so since you posted about it. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow Kconfig to set default mmap_min_addr protection
On Fri, 2007-12-21 at 23:59 +0100, Jan Engelhardt wrote: > On Dec 21 2007 14:35, Greg KH wrote: > >> >> >I guess it could be, but the input for /proc/sys/vm/mmap_min_addr is > >> >> >base 10 as well > >> >> > >> >> sysfs is autobase, i.e. echo "0xb000" >/sys/foo will Do The Right Thing. > >> > > >> >yes but if you cat /proc/sys/vm/mmap_min_addr, it returns in base 10. > >> > >> sysfs should probably be tuned to output it in a preferred base. > > > >Again, this is sysctl, not sysfs. two very different things... > > > Argh... :) Just shows that /proc is the wrong place for system variables. > > Well, module_params(integer) are autobase, and that's all I needed so > far :-D So in the end we are all happy with the original patch I sent? -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] VM/Security: add security hook to do_brk
Given a specifically crafted binary do_brk() can be used to get low pages available in userspace virtually memory and can thus be used to circumvent the mmap_min_addr low memory protection. Add security checks in do_brk(). Signed-off-by: Eric Paris <[EMAIL PROTECTED]> --- mm/mmap.c |4 1 file changed, 4 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index f4cfc6a..15678aa 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1941,6 +1941,10 @@ unsigned long do_brk(unsigned long addr, unsigned long len) if (is_hugepage_only_range(mm, addr, len)) return -EINVAL; + error = security_file_mmap(0, 0, 0, 0, addr, 1); + if (error) + return error; + flags = VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags; error = arch_mmap_check(addr, len, flags); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] security: allow capable check to permit mmap or low vm space
On Sat, 2007-11-17 at 09:12 +1100, James Morris wrote: > On Fri, 16 Nov 2007, Eric Paris wrote: > > > When this protection was originally concieved it intentionally was > > offing something even without an more 'full featured' LSM. That was the > > whole reason I had to drop the secondary stacking hook inside the > > selinux code. > > > > While I now understand the question, I think that this is the behavior > > most people would want. I'll revert the security enhancement for > > non-LSM systems if others agree with James, but I think adding another > > small bit of protection against kernel flaws for everyone who wants > > security is a win. (and remember, in kernel we still default this to > > off so noone is going to 'accidentally' see and security checks in the > > dummy hooks) > > If it's off by default and generally useful across LSMs, why not just put > it in the base kernel code? It was placed in CONFIG_SECURITY so that users can (If their given LSM supports it) selectively allow this stuff. Some LSMs are fine grained enough to allow applications like dosemu and X to use low pages without globally disabling. I can't think of any reasonable way to move all of this into base kernel code while still allowing overrides. I'd love to hear suggestions on how to move all of this out of the security code and into the base kernel while not neglecting those few users who need this functionality. -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/