Re: fsl_udc_core: BUG: scheduling while atomic
I would say it is a general problem by using CONFIG_PREEMPT_VOLUNTARY, not only Freescale... Am Donnerstag, den 12.05.2011, 11:30 -0400 schrieb Matthew L. Creech: On Thu, May 12, 2011 at 4:37 AM, sergej.stepa...@ids.de wrote: Hi Mattheew, such oops you can get also with spi. For such problem helps to compile your kernel with other preemption model: - preempt - standard - !!! but not voluntary preemption !!! Thanks Sergej, indeed I'm currently using CONFIG_PREEMPT_VOLUNTARY on this board. I'll change it to fix this problem for now. Do you happen to know whether the Freescale folks intend to fix this? If not, it seems like at least some sort of warning is in order. -- Matthew L. Creech ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [linuxppc-release] [PATCH 1/2] powerpc, e5500: add networking to defconfig
Subject: Re: [linuxppc-release] [PATCH 1/2] powerpc, e5500: add networking to defconfig On Thu, 12 May 2011 10:31:08 -0500 Scott Wood scottw...@freescale.com wrote: On Thu, 12 May 2011 01:11:03 -0500 Li Yang-R58472 r58...@freescale.com wrote: diff --git a/arch/powerpc/configs/e55xx_smp_defconfig b/arch/powerpc/configs/e55xx_smp_defconfig index 9fa1613..f4c5780 100644 --- a/arch/powerpc/configs/e55xx_smp_defconfig +++ b/arch/powerpc/configs/e55xx_smp_defconfig @@ -6,10 +6,10 @@ CONFIG_NR_CPUS=2 CONFIG_EXPERIMENTAL=y CONFIG_SYSVIPC=y CONFIG_BSD_PROCESS_ACCT=y +CONFIG_SPARSE_IRQ=y Hi Scott, I remember in previous testing that this option has a negative effect on performance. Do we really need it to be enabled? I didn't change this setting, it just moved due to running it through savedefconfig. What was the performance impact? It adds CPU cycles to the interrupt handling path. Will cause performance drop for benchmarks with large amount of interrupts such as IP forwarding. - Leo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATHC] Fix for Pegasos keyboard and mouse
[See http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-October/086424.html and followups. Part of the commit message is directly copied from that.] Commit 540c6c392f01887dcc96bef0a41e63e6c1334f01 tries to find i8042 IRQs in the device-tree but doesn't fall back to the old hardcoded 1 and 12 in all failure cases. Specifically, the case where the device-tree contains nothing matching pnpPNP,303 or pnpPNP,f03 doesn't seem to be handled well. It sort of falls through to the old code, but leaves the IRQs set to 0. Signed-off-by: Gabriel Paubert paub...@iram.es --- This fix has only been tested on Pegasos, but to my knowledge it only affects a Pegasos specific path (all other fimwares should be able to find the keyboard through the pnp identifiers. diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 21f30cb..6c7abbf 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -602,6 +602,10 @@ int check_legacy_ioport(unsigned long base_port) * name instead */ if (!np) np = of_find_node_by_name(NULL, 8042); + if (np) { + of_i8042_kbd_irq = 1; + of_i8042_aux_irq = 12; + } break; case FDC_BASE: /* FDC1 */ np = of_find_node_by_type(NULL, fdc); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
* Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:10 +0200, Ingo Molnar wrote: err = event_vfs_getname(result); I really think we should not do this. Events like we have them should be inactive, totally passive entities, only observe but not affect execution (other than the bare minimal time delay introduced by observance). Well, this patchset already demonstrates that we can use a single event callback for a rather useful purpose. Either it makes sense to do, in which case we should share facilities as much as possible, or it makes no sense, in which case we should not merge it at all. If you want another entity that is more active, please invent a new name for it and create a new subsystem for them, now you could have these active entities also have an (automatic) passive event side, but that's some detail. Why should we have two callbacks next to each other: event_vfs_getname(result); result = check_event_vfs_getname(result); if one could do it all? Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, 2011-05-13 at 14:26 +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:10 +0200, Ingo Molnar wrote: err = event_vfs_getname(result); I really think we should not do this. Events like we have them should be inactive, totally passive entities, only observe but not affect execution (other than the bare minimal time delay introduced by observance). Well, this patchset already demonstrates that we can use a single event callback for a rather useful purpose. Can and should are two distinct things. Either it makes sense to do, in which case we should share facilities as much as possible, or it makes no sense, in which case we should not merge it at all. And I'm arguing we should _not_. Observing is radically different from Affecting, at the very least the two things should have different permission schemes. We should not confuse these two matters. If you want another entity that is more active, please invent a new name for it and create a new subsystem for them, now you could have these active entities also have an (automatic) passive event side, but that's some detail. Why should we have two callbacks next to each other: event_vfs_getname(result); result = check_event_vfs_getname(result); if one could do it all? Did you actually read the bit where I said that check_event_* (although I still think that name sucks) could imply a matching event_*? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, 2011-05-13 at 14:39 +0200, Peter Zijlstra wrote: event_vfs_getname(result); result = check_event_vfs_getname(result); Another fundamental difference is how to treat the callback chains for these two. Observers won't have a return value and are assumed to never fail, therefore we can always call every entry on the callback list. Active things otoh do have a return value, and thus we need to have semantics that define what to do with that during callback iteration, when to continue and when to break. Thus for active elements its impossible to guarantee all entries will indeed be called. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
* Peter Zijlstra pet...@infradead.org wrote: Why should we have two callbacks next to each other: event_vfs_getname(result); result = check_event_vfs_getname(result); if one could do it all? Did you actually read the bit where I said that check_event_* (although I still think that name sucks) could imply a matching event_*? No, did not notice that - and yes that solves this particular problem. So given that by your own admission it makes sense to share the facilities at the low level, i also argue that it makes sense to share as high up as possible. Are you perhaps arguing for a -observe flag that would make 100% sure that the default behavior for events is observe-only? That would make sense indeed. Otherwise both cases really want to use all the same facilities for event discovery, setup, control and potential extraction of events. Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
* Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:39 +0200, Peter Zijlstra wrote: event_vfs_getname(result); result = check_event_vfs_getname(result); Another fundamental difference is how to treat the callback chains for these two. Observers won't have a return value and are assumed to never fail, therefore we can always call every entry on the callback list. Active things otoh do have a return value, and thus we need to have semantics that define what to do with that during callback iteration, when to continue and when to break. Thus for active elements its impossible to guarantee all entries will indeed be called. I think the sanest semantics is to run all active callbacks as well. For example if this is used for three stacked security policies - as if 3 LSM modules were stacked at once. We'd call all three, and we'd determine that at least one failed - and we'd return a failure. Even if the first one failed already we'd still want to trigger *all* the failures, because security policies like to know when they have triggered a failure (regardless of other active policies) and want to see that failure event (if they are logging such events). So to me this looks pretty similar to observer callbacks as well, it's the natural extension to an observer callback chain. Observer callbacks are simply constant functions (to the caller), those which never return failure and which never modify any of the parameters. It's as if you argued that there should be separate syscalls/facilities for handling readonly files versus handling read/write files. Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote: I think the sanest semantics is to run all active callbacks as well. For example if this is used for three stacked security policies - as if 3 LSM modules were stacked at once. We'd call all three, and we'd determine that at least one failed - and we'd return a failure. But that only works for boolean functions where you can return the multi-bit-or of the result. What if you need to return the specific error code. Also, there's bound to be other cases where people will want to employ this, look at all the various notifier chain muck we've got, it already deals with much of this -- simply because users need it. Then there's the whole indirection argument, if you don't need indirection, its often better to not use it, I myself much prefer code to look like: foo1(bar); foo2(bar); foo3(bar); Than: foo_notifier(bar); Simply because its much clearer who all are involved without me having to grep around to see who registers for foo_notifier and wth they do with it. It also makes it much harder to sneak in another user, whereas its nearly impossible to find new notifier users. Its also much faster, no extra memory accesses, no indirect function calls, no other muck. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
* Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote: I think the sanest semantics is to run all active callbacks as well. For example if this is used for three stacked security policies - as if 3 LSM modules were stacked at once. We'd call all three, and we'd determine that at least one failed - and we'd return a failure. But that only works for boolean functions where you can return the multi-bit-or of the result. What if you need to return the specific error code. Do you mean that one filter returns -EINVAL while the other -EACCES? Seems like a non-problem to me, we'd return the first nonzero value. Also, there's bound to be other cases where people will want to employ this, look at all the various notifier chain muck we've got, it already deals with much of this -- simply because users need it. Do you mean it would be easy to abuse it? What kind of abuse are you most worried about? Then there's the whole indirection argument, if you don't need indirection, its often better to not use it, I myself much prefer code to look like: foo1(bar); foo2(bar); foo3(bar); Than: foo_notifier(bar); Simply because its much clearer who all are involved without me having to grep around to see who registers for foo_notifier and wth they do with it. It also makes it much harder to sneak in another user, whereas its nearly impossible to find new notifier users. Its also much faster, no extra memory accesses, no indirect function calls, no other muck. But i suspect this question has been settled, given the fact that even pure observer events need and already process a chain of events? Am i missing something about your argument? Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, 2011-05-13 at 14:49 +0200, Ingo Molnar wrote: So given that by your own admission it makes sense to share the facilities at the low level, i also argue that it makes sense to share as high up as possible. I'm not saying any such thing, I'm saying that it might make sense to observe active objects and auto-create these observation points. That doesn't make them similar or make them share anything. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
Cut the microblaze list since its bouncy. On Fri, 2011-05-13 at 15:18 +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote: I think the sanest semantics is to run all active callbacks as well. For example if this is used for three stacked security policies - as if 3 LSM modules were stacked at once. We'd call all three, and we'd determine that at least one failed - and we'd return a failure. But that only works for boolean functions where you can return the multi-bit-or of the result. What if you need to return the specific error code. Do you mean that one filter returns -EINVAL while the other -EACCES? Seems like a non-problem to me, we'd return the first nonzero value. Assuming the first is -EINVAL, what then is the value in computing the -EACCESS? Sounds like a massive waste of time to me. Also, there's bound to be other cases where people will want to employ this, look at all the various notifier chain muck we've got, it already deals with much of this -- simply because users need it. Do you mean it would be easy to abuse it? What kind of abuse are you most worried about? I'm not worried about abuse, I'm saying that going by the existing notifier pattern always visiting all entries on the callback list is undesired. Then there's the whole indirection argument, if you don't need indirection, its often better to not use it, I myself much prefer code to look like: foo1(bar); foo2(bar); foo3(bar); Than: foo_notifier(bar); Simply because its much clearer who all are involved without me having to grep around to see who registers for foo_notifier and wth they do with it. It also makes it much harder to sneak in another user, whereas its nearly impossible to find new notifier users. Its also much faster, no extra memory accesses, no indirect function calls, no other muck. But i suspect this question has been settled, given the fact that even pure observer events need and already process a chain of events? Am i missing something about your argument? I'm saying that there's reasons to not use notifiers passive or active. Mostly the whole notifier/indirection muck comes up once you want modules to make use of the thing, because then you need dynamic management of the callback list. (Then again, I'm fairly glad we don't have explicit callbacks in kernel/cpu.c for all the cpu-hotplug callbacks :-) Anyway, I oppose for the existing events to gain an active role. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
* Peter Zijlstra pet...@infradead.org wrote: Cut the microblaze list since its bouncy. On Fri, 2011-05-13 at 15:18 +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote: I think the sanest semantics is to run all active callbacks as well. For example if this is used for three stacked security policies - as if 3 LSM modules were stacked at once. We'd call all three, and we'd determine that at least one failed - and we'd return a failure. But that only works for boolean functions where you can return the multi-bit-or of the result. What if you need to return the specific error code. Do you mean that one filter returns -EINVAL while the other -EACCES? Seems like a non-problem to me, we'd return the first nonzero value. Assuming the first is -EINVAL, what then is the value in computing the -EACCESS? Sounds like a massive waste of time to me. No, because the common case is no rejection - this is a security mechanism. So in the normal case we would execute all 3 anyway, just to determine that all return 0. Are you really worried about the abnormal case of one of them returning an error and us calculating all 3 return values? Also, there's bound to be other cases where people will want to employ this, look at all the various notifier chain muck we've got, it already deals with much of this -- simply because users need it. Do you mean it would be easy to abuse it? What kind of abuse are you most worried about? I'm not worried about abuse, I'm saying that going by the existing notifier pattern always visiting all entries on the callback list is undesired. That is because many notifier chains are used in an 'event consuming' manner - they are responding to things like hardware events and are called in an interrupt-handler alike fashion most of the time. Then there's the whole indirection argument, if you don't need indirection, its often better to not use it, I myself much prefer code to look like: foo1(bar); foo2(bar); foo3(bar); Than: foo_notifier(bar); Simply because its much clearer who all are involved without me having to grep around to see who registers for foo_notifier and wth they do with it. It also makes it much harder to sneak in another user, whereas its nearly impossible to find new notifier users. Its also much faster, no extra memory accesses, no indirect function calls, no other muck. But i suspect this question has been settled, given the fact that even pure observer events need and already process a chain of events? Am i missing something about your argument? I'm saying that there's reasons to not use notifiers passive or active. Mostly the whole notifier/indirection muck comes up once you want modules to make use of the thing, because then you need dynamic management of the callback list. But your argument assumes that we'd have a chain of functions to call, like regular notifiers. While the natural model here would be to have a list of registered event structs for that point, with different filters but basically the same callback mechanism (a call into the filter engine in essence). Also note that the common case would be no event registered - and we'd automatically optimize that case via the existing jump labels optimization. (Then again, I'm fairly glad we don't have explicit callbacks in kernel/cpu.c for all the cpu-hotplug callbacks :-) Anyway, I oppose for the existing events to gain an active role. Why if 'being active' is optional and useful? Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
* Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:49 +0200, Ingo Molnar wrote: So given that by your own admission it makes sense to share the facilities at the low level, i also argue that it makes sense to share as high up as possible. I'm not saying any such thing, I'm saying that it might make sense to observe active objects and auto-create these observation points. That doesn't make them similar or make them share anything. Well, they would share the lowest level call site: result = check_event_vfs_getname(result); You call it 'auto-generated call site', i call it a shared (single line) call site. The same thing as far as the lowest level goes. Now (the way i understood it) you'd want to stop the sharing right after that. I argue that it should go all the way up. Note: i fully agree that there should be events where filters can have no effect whatsoever. For example if this was written as: check_event_vfs_getname(result); Then it would have no effect. This is decided by the subsystem developers, obviously. So whether an event is 'active' or 'passive' can be enforced at the subsystem level as well. As far as the event facilities go, 'no effect observation' is a special-case of 'active observation' - just like read-only files are a special case of read-write files. Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
[dropping microblaze and roland] lOn Fri, 2011-05-13 at 14:10 +0200, Ingo Molnar wrote: * James Morris jmor...@namei.org wrote: It is a simple and sensible security feature, agreed? It allows most code to run well and link to countless libraries - but no access to other files is allowed. It's simple enough and sounds reasonable, but you can read all the discussion about AppArmour why many people don't really think it's the best. Still, I'll agree it's a lot better than nothing. But if i had a VFS event at the fs/namei.c::getname() level, i would have access to a central point where the VFS string becomes stable to the kernel and can be checked (and denied if necessary). A sidenote, and not surprisingly, the audit subsystem already has an event callback there: audit_getname(result); Unfortunately this audit callback cannot be used for my purposes, because the event is single-purpose for auditd and because it allows no feedback (no deny/accept discretion for the security policy). But if had this simple event there: err = event_vfs_getname(result); Wow it sounds so easy. Now lets keep extending your train of thought until we can actually provide the security provided by SELinux. What do we end up with? We end up with an event hook right next to every LSM hook. You know, the LSM hooks were placed where they are for a reason. Because those were the locations inside the kernel where you actually have information about the task doing an operation and the objects (files, sockets, directories, other tasks, etc) they are doing an operation on. Honestly all you are talking about it remaking the LSM with 2 sets of hooks instead if 1. Why? It seems much easier that if you want the language of the filter engine you would just make a new LSM that uses the filter engine for it's policy language rather than the language created by SELinux or SMACK or name your LSM implementation. - unprivileged: application-definable, allowing the embedding of security policy in *apps* as well, not just the system - flexible: can be added/removed runtime unprivileged, and cheaply so - transparent: does not impact executing code that meets the policy - nestable: it is inherited by child tasks and is fundamentally stackable, multiple policies will have the combined effect and they are transparent to each other. So if a child task within a sandbox adds *more* checks then those add to the already existing set of checks. We only narrow permissions, never extend them. - generic: allowing observation and (safe) control of security relevant parameters not just at the system call boundary but at other relevant places of kernel execution as well: which points/callbacks could also be used for other types of event extraction such as perf. It could even be shared with audit ... I'm not arguing that any of these things are bad things. What you describe is a new LSM that uses a discretionary access control model but with the granularity and flexibility that has traditionally only existed in the mandatory access control security modules previously implemented in the kernel. I won't argue that's a bad idea, there's no reason in my mind that a process shouldn't be allowed to control it's own access decisions in a more flexible way than rwx bits. Then again, I certainly don't see a reason that this syscall hardening patch should be held up while a whole new concept in computer security is contemplated... -Eric ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
[dropping microblaze and roland] On Fri, 2011-05-13 at 15:18 +0200, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Fri, 2011-05-13 at 14:54 +0200, Ingo Molnar wrote: I think the sanest semantics is to run all active callbacks as well. For example if this is used for three stacked security policies - as if 3 LSM modules were stacked at once. We'd call all three, and we'd determine that at least one failed - and we'd return a failure. But that only works for boolean functions where you can return the multi-bit-or of the result. What if you need to return the specific error code. Do you mean that one filter returns -EINVAL while the other -EACCES? Seems like a non-problem to me, we'd return the first nonzero value. Sounds so easy! Why haven't LSMs stacked already? Because what happens if one of these hooks did something stateful? Lets say on open, hook #1 returns EPERM. hook #2 allocates memory. The open is going to fail and hooks #2 is never going to get the close() which should have freed the allocation. If you can be completely stateless its easier, but there's a reason that stacking security modules is hard. Serge has tried in the past and both dhowells and casey schaufler are working on it right now. Stacking is never as easy as it sounds :) -Eric ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, 2011-05-13 at 11:10 -0400, Eric Paris wrote: Then again, I certainly don't see a reason that this syscall hardening patch should be held up while a whole new concept in computer security is contemplated... Which makes me wonder why this syscall hardening stuff is done outside of LSM? Why isn't is part of the LSM so that say SELinux can have a syscall bitmask per security context? Making it part of the LSM also avoids having to add this prctl(). ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, 2011-05-13 at 16:57 +0200, Ingo Molnar wrote: this is a security mechanism Who says? and why would you want to unify two separate concepts only to them limit it to security that just doesn't make sense. Either you provide a full on replacement for notifier chain like things or you don't, only extending trace events in this fashion for security is like way weird. Plus see the arguments Eric made about stacking stuff, not only security schemes will have those problems. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system callfiltering
... If you can be completely stateless its easier, but there's a reason that stacking security modules is hard. Serge has tried in the past and both dhowells and casey schaufler are working on it right now. Stacking is never as easy as it sounds :) For a bad example of trying to allow alternate security models look at NetBSD's kauth code :-) NetBSD also had issues where some 'system call trace' code was being used to (try to) apply security - unfortunately it worked by looking at the user-space buffers on system call entry - and a multithreaded program can easily arrange to update them after the initial check! For trace/event type activities this wouldn't really matter, for security policy it does. (I've not looked directly at these event points in linux) David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
* James Morris jmor...@namei.org wrote: On Thu, 12 May 2011, Ingo Molnar wrote: Funnily enough, back then you wrote this: I'm concerned that we're seeing yet another security scheme being designed on the fly, without a well-formed threat model, and without taking into account lessons learned from the seemingly endless parade of similar, failed schemes. so when and how did your opinion of this scheme turn from it being an endless parade of failed schemes to it being a well-defined and readily understandable feature? :-) When it was defined in a way which limited its purpose to reducing the attack surface of the sycall interface. Let me outline a simple example of a new filter expression based security feature that could be implemented outside the narrow system call boundary you find acceptable, and please tell what is bad about it. Say i'm a user-space sandbox developer who wants to enforce that sandboxed code should only be allowed to open files in /home/sandbox/, /lib/ and /usr/lib/. It is a simple and sensible security feature, agreed? It allows most code to run well and link to countless libraries - but no access to other files is allowed. I would also like my sandbox app to be able to install this policy without having to be root. I do not want the sandbox app to have permission to create labels on /lib and /usr/lib and what not. Firstly, using the filter code i deny the various link creation syscalls so that sandboxed code cannot escape for example by creating a symlink to outside the permitted VFS namespace. (Note: we opt-in to syscalls, that way new syscalls added by new kernels are denied by defalt. The current symlink creation syscalls are not opted in to.) But the next step, actually checking filenames, poses a big hurdle: i cannot implement the filename checking at the sys_open() syscall level in a secure way: because the pathname is passed to sys_open() by pointer, and if i check it at the generic sys_open() syscall level, another thread in the sandbox might modify the underlying filename *after* i've checked it. But if i had a VFS event at the fs/namei.c::getname() level, i would have access to a central point where the VFS string becomes stable to the kernel and can be checked (and denied if necessary). A sidenote, and not surprisingly, the audit subsystem already has an event callback there: audit_getname(result); Unfortunately this audit callback cannot be used for my purposes, because the event is single-purpose for auditd and because it allows no feedback (no deny/accept discretion for the security policy). But if had this simple event there: err = event_vfs_getname(result); I could implement this new filename based sandboxing policy, using a filter like this installed on the vfs::getname event and inherited by all sandboxed tasks (which cannot uninstall the filter, obviously): if (strstr(name, ..)) return -EACCESS; if (!strncmp(name, /home/sandbox/, 14) !strncmp(name, /lib/, 5) !strncmp(name, /usr/lib/, 9)) return -EACCESS; # # Note1: Obviously the filter engine would be extended to allow such simple string #match functions. ) # # Note2: .. is disallowed so that sandboxed code cannot escape the restrictions # using /... # This kind of flexible and dynamic sandboxing would allow a wide range of file ops within the sandbox, while still isolating it from files not included in the specified VFS namespace. ( Note that there are tons of other examples as well, for useful security features that are best done using events outside the syscall boundary. ) The security event filters code tied to seccomp and syscalls at the moment is useful, but limited in its future potential. So i argue that it should go slightly further and should become: - unprivileged: application-definable, allowing the embedding of security policy in *apps* as well, not just the system - flexible: can be added/removed runtime unprivileged, and cheaply so - transparent: does not impact executing code that meets the policy - nestable: it is inherited by child tasks and is fundamentally stackable, multiple policies will have the combined effect and they are transparent to each other. So if a child task within a sandbox adds *more* checks then those add to the already existing set of checks. We only narrow permissions, never extend them. - generic: allowing observation and (safe) control of security relevant parameters not just at the system call boundary but at other relevant places of kernel execution as well: which points/callbacks could also be used for other types of event extraction such
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, 2011-05-13 at 17:23 +0200, Peter Zijlstra wrote: On Fri, 2011-05-13 at 11:10 -0400, Eric Paris wrote: Then again, I certainly don't see a reason that this syscall hardening patch should be held up while a whole new concept in computer security is contemplated... Which makes me wonder why this syscall hardening stuff is done outside of LSM? Why isn't is part of the LSM so that say SELinux can have a syscall bitmask per security context? I could do that, but I like Will's approach better. From the PoV of meeting security goals of information flow, data confidentiality, integrity, least priv, etc limiting on the syscall boundary doesn't make a lot of sense. You just don't know enough there to enforce these things. These are the types of goals that SELinux and other LSMs have previously tried to enforce. From the PoV of making the kernel more resistant to attacks and making a process more resistant to misbehavior I think that the syscall boundary is appropriate. Although I could do it in SELinux it don't really want to do it there. In case people are interested or confused let me give my definition of two words I've used a bit in these conversations: discretionary and mandatory. Any time I talk about a 'discretionary' security decision it is a security decisions that a process imposed upon itself. Aka the choice to use seccomp is discretionary. The choice to mark our own file u-wx is discretionary. This isn't the best definition but it's one that works well in this discussion. Mandatory security is one enforce by a global policy. It's what selinux is all about. SELinux doesn't give hoot what a process wants to do, it enforces a global policy from the top down. You take over a process, well, too bad, you still have no choice but to follow the mandatory policy. The LSM does NOT enforce a mandatory access control model, it's just how it's been used in the past. Ingo appears to me (please correct me if I'm wrong) to really be a fan of exposing the flexibility of the LSM to a discretionary access control model. That doesn't seem like a bad idea. And maybe using the filter engine to define the language to do this isn't a bad idea either. But I think that's a 'down the road' project, not something to hold up a better seccomp. Making it part of the LSM also avoids having to add this prctl(). Well, it would mean exposing some new language construct to every LSM (instead of a single prctl construct) and it would mean anyone wanting to use the interface would have to rely on the LSM implementing those hooks the way they need it. Honestly chrome can already get all of the benefits of this patch (given a perfectly coded kernel) and a whole lot more using SELinux, but (surprise surprise) not everyone uses SELinux. I think it's a good idea to expose a simple interface which will be widely enough adopted that many userspace applications can rely on it for hardening. The existence of the LSM and the fact that there exists multiple security modules that may or may not be enabled really leads application developers to be unable to rely on LSM for security. If linux had a single security model which everyone could rely on we wouldn't really have as big of an issue but that's not possible. So I'm advocating for this series which will provide a single useful change which applications can rely upon across distros and platforms to enhance the properties and abilities of the linux kernel. -Eric ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
On Fri, May 13, 2011 at 10:55 AM, Eric Paris epa...@redhat.com wrote: On Fri, 2011-05-13 at 17:23 +0200, Peter Zijlstra wrote: On Fri, 2011-05-13 at 11:10 -0400, Eric Paris wrote: Then again, I certainly don't see a reason that this syscall hardening patch should be held up while a whole new concept in computer security is contemplated... Which makes me wonder why this syscall hardening stuff is done outside of LSM? Why isn't is part of the LSM so that say SELinux can have a syscall bitmask per security context? I could do that, but I like Will's approach better. From the PoV of meeting security goals of information flow, data confidentiality, integrity, least priv, etc limiting on the syscall boundary doesn't make a lot of sense. You just don't know enough there to enforce these things. These are the types of goals that SELinux and other LSMs have previously tried to enforce. From the PoV of making the kernel more resistant to attacks and making a process more resistant to misbehavior I think that the syscall boundary is appropriate. Although I could do it in SELinux it don't really want to do it there. There's also the problem that there are no hooks per-system call for LSMs, only logical hooks that sometimes mirror system call names and are called after user data has been parsed. If system call enter hooks, like seccomp's, were added for LSMs, it would allow the lsm bitmask approach, but it still wouldn't satisfy the issues you raise below (and I wholeheartedly agree with). In case people are interested or confused let me give my definition of two words I've used a bit in these conversations: discretionary and mandatory. Any time I talk about a 'discretionary' security decision it is a security decisions that a process imposed upon itself. Aka the choice to use seccomp is discretionary. The choice to mark our own file u-wx is discretionary. This isn't the best definition but it's one that works well in this discussion. Mandatory security is one enforce by a global policy. It's what selinux is all about. SELinux doesn't give hoot what a process wants to do, it enforces a global policy from the top down. You take over a process, well, too bad, you still have no choice but to follow the mandatory policy. The LSM does NOT enforce a mandatory access control model, it's just how it's been used in the past. Ingo appears to me (please correct me if I'm wrong) to really be a fan of exposing the flexibility of the LSM to a discretionary access control model. That doesn't seem like a bad idea. And maybe using the filter engine to define the language to do this isn't a bad idea either. But I think that's a 'down the road' project, not something to hold up a better seccomp. Making it part of the LSM also avoids having to add this prctl(). Well, it would mean exposing some new language construct to every LSM (instead of a single prctl construct) and it would mean anyone wanting to use the interface would have to rely on the LSM implementing those hooks the way they need it. Honestly chrome can already get all of the benefits of this patch (given a perfectly coded kernel) and a whole lot more using SELinux, but (surprise surprise) not everyone uses SELinux. I think it's a good idea to expose a simple interface which will be widely enough adopted that many userspace applications can rely on it for hardening. The existence of the LSM and the fact that there exists multiple security modules that may or may not be enabled really leads application developers to be unable to rely on LSM for security. If linux had a single security model which everyone could rely on we wouldn't really have as big of an issue but that's not possible. So I'm advocating for this series which will provide a single useful change which applications can rely upon across distros and platforms to enhance the properties and abilities of the linux kernel. -Eric ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev