Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Tue, Apr 25, 2017 at 7:14 PM, PaX Teamwrote: > On 25 Apr 2017 at 9:39, Kees Cook wrote: > >> On Tue, Apr 25, 2017 at 4:26 AM, PaX Team wrote: >> > INT_MAX threads would be needed when the leaking path is locked so >> > that it can only be exercised once and you'll need to get normal >> > (balanced) paths preempted just after the increment. if the leaking >> > path is lockless (can be exercised in parallel without bounds) then >> > 2 threads are enough where the one triggering the signed overflow >> > would have to be preempted while the other one does INT_MAX increments >> > and trigger the UAF. this is where the other mechanisms i talked about >> > in the past become relevant: preemption or interrupts can be disabled >> > or negative refcount values can be detected and acted upon (your blind >> > copy-pasting effort passed upon this latter opportunity by not >> > specializing the 'jo' into 'js' for the refcount case). >> >> Well, it's not "blind" -- I'm trying to bring the code as-is to >> upstream for discussion/examination with as little functional >> differences as possible so it's easier to compare apples to apples. > > you copied code from a version which is at least 2 major kernel revisions > behind (so much for those apples) Hmm, this was from your 4.9 port. Linus hasn't quite released 4.11 yet, so that's actually "at most 2 major kernel revisions behind". :) Regardless, I'd be happy to refresh the port. Will you share a URL to your latest rebase against upstream? > you chose the one version which had a > bug that you didn't spot nor fix properly, you didn't realize the opportunity > that a special refcount type represents, you claimed refcount underflows > aren't exploitable but copied code that would detect signed underflow, you > didn't understand the limits and edge cases i explained above... need i go As I said, I was trying to minimize changes to your implementation, which included the bug and the other issues. The point of this was to share it with others, and work collaboratively on it. I think this clearly succeeded with benefits to both upstream and PaX: Jann spotted the fix for the bug causing weird crashes I saw when doing initial testing, you pointed out the benefit of using js over jo, I've reorganized the RMWcc macros for more easily adding trailing instructions, Peter is thinking about ways around the protection, etc. > on? doesn't leave one with great confidence in your ability to understand > and maintain this code... Well, that's your opinion. I think the patch and its discussion helped several people, including myself, understand this code. Since many people will share its maintenance, I think this is the right way to handle upstreaming these kinds of things. I don't claim to be omniscient, just persistent. Getting this protection into upstream means every Linux user will benefit from what you created, which I think is awesome; this whole class of refcount flaws goes away. Thank you for writing it, sharing it, and discussing it! -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Tue, Apr 25, 2017 at 7:14 PM, PaX Team wrote: > On 25 Apr 2017 at 9:39, Kees Cook wrote: > >> On Tue, Apr 25, 2017 at 4:26 AM, PaX Team wrote: >> > INT_MAX threads would be needed when the leaking path is locked so >> > that it can only be exercised once and you'll need to get normal >> > (balanced) paths preempted just after the increment. if the leaking >> > path is lockless (can be exercised in parallel without bounds) then >> > 2 threads are enough where the one triggering the signed overflow >> > would have to be preempted while the other one does INT_MAX increments >> > and trigger the UAF. this is where the other mechanisms i talked about >> > in the past become relevant: preemption or interrupts can be disabled >> > or negative refcount values can be detected and acted upon (your blind >> > copy-pasting effort passed upon this latter opportunity by not >> > specializing the 'jo' into 'js' for the refcount case). >> >> Well, it's not "blind" -- I'm trying to bring the code as-is to >> upstream for discussion/examination with as little functional >> differences as possible so it's easier to compare apples to apples. > > you copied code from a version which is at least 2 major kernel revisions > behind (so much for those apples) Hmm, this was from your 4.9 port. Linus hasn't quite released 4.11 yet, so that's actually "at most 2 major kernel revisions behind". :) Regardless, I'd be happy to refresh the port. Will you share a URL to your latest rebase against upstream? > you chose the one version which had a > bug that you didn't spot nor fix properly, you didn't realize the opportunity > that a special refcount type represents, you claimed refcount underflows > aren't exploitable but copied code that would detect signed underflow, you > didn't understand the limits and edge cases i explained above... need i go As I said, I was trying to minimize changes to your implementation, which included the bug and the other issues. The point of this was to share it with others, and work collaboratively on it. I think this clearly succeeded with benefits to both upstream and PaX: Jann spotted the fix for the bug causing weird crashes I saw when doing initial testing, you pointed out the benefit of using js over jo, I've reorganized the RMWcc macros for more easily adding trailing instructions, Peter is thinking about ways around the protection, etc. > on? doesn't leave one with great confidence in your ability to understand > and maintain this code... Well, that's your opinion. I think the patch and its discussion helped several people, including myself, understand this code. Since many people will share its maintenance, I think this is the right way to handle upstreaming these kinds of things. I don't claim to be omniscient, just persistent. Getting this protection into upstream means every Linux user will benefit from what you created, which I think is awesome; this whole class of refcount flaws goes away. Thank you for writing it, sharing it, and discussing it! -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 25 Apr 2017 at 9:39, Kees Cook wrote: > On Tue, Apr 25, 2017 at 4:26 AM, PaX Teamwrote: > > INT_MAX threads would be needed when the leaking path is locked so > > that it can only be exercised once and you'll need to get normal > > (balanced) paths preempted just after the increment. if the leaking > > path is lockless (can be exercised in parallel without bounds) then > > 2 threads are enough where the one triggering the signed overflow > > would have to be preempted while the other one does INT_MAX increments > > and trigger the UAF. this is where the other mechanisms i talked about > > in the past become relevant: preemption or interrupts can be disabled > > or negative refcount values can be detected and acted upon (your blind > > copy-pasting effort passed upon this latter opportunity by not > > specializing the 'jo' into 'js' for the refcount case). > > Well, it's not "blind" -- I'm trying to bring the code as-is to > upstream for discussion/examination with as little functional > differences as possible so it's easier to compare apples to apples. you copied code from a version which is at least 2 major kernel revisions behind (so much for those apples), you chose the one version which had a bug that you didn't spot nor fix properly, you didn't realize the opportunity that a special refcount type represents, you claimed refcount underflows aren't exploitable but copied code that would detect signed underflow, you didn't understand the limits and edge cases i explained above... need i go on? doesn't leave one with great confidence in your ability to understand and maintain this code...
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 25 Apr 2017 at 9:39, Kees Cook wrote: > On Tue, Apr 25, 2017 at 4:26 AM, PaX Team wrote: > > INT_MAX threads would be needed when the leaking path is locked so > > that it can only be exercised once and you'll need to get normal > > (balanced) paths preempted just after the increment. if the leaking > > path is lockless (can be exercised in parallel without bounds) then > > 2 threads are enough where the one triggering the signed overflow > > would have to be preempted while the other one does INT_MAX increments > > and trigger the UAF. this is where the other mechanisms i talked about > > in the past become relevant: preemption or interrupts can be disabled > > or negative refcount values can be detected and acted upon (your blind > > copy-pasting effort passed upon this latter opportunity by not > > specializing the 'jo' into 'js' for the refcount case). > > Well, it's not "blind" -- I'm trying to bring the code as-is to > upstream for discussion/examination with as little functional > differences as possible so it's easier to compare apples to apples. you copied code from a version which is at least 2 major kernel revisions behind (so much for those apples), you chose the one version which had a bug that you didn't spot nor fix properly, you didn't realize the opportunity that a special refcount type represents, you claimed refcount underflows aren't exploitable but copied code that would detect signed underflow, you didn't understand the limits and edge cases i explained above... need i go on? doesn't leave one with great confidence in your ability to understand and maintain this code...
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Tue, Apr 25, 2017 at 4:26 AM, PaX Teamwrote: > INT_MAX threads would be needed when the leaking path is locked so > that it can only be exercised once and you'll need to get normal > (balanced) paths preempted just after the increment. if the leaking > path is lockless (can be exercised in parallel without bounds) then > 2 threads are enough where the one triggering the signed overflow > would have to be preempted while the other one does INT_MAX increments > and trigger the UAF. this is where the other mechanisms i talked about > in the past become relevant: preemption or interrupts can be disabled > or negative refcount values can be detected and acted upon (your blind > copy-pasting effort passed upon this latter opportunity by not > specializing the 'jo' into 'js' for the refcount case). Well, it's not "blind" -- I'm trying to bring the code as-is to upstream for discussion/examination with as little functional differences as possible so it's easier to compare apples to apples. (Which already resulted in more eyes looking at the code to find a bug -- thanks Jann!) But yes, jo -> js hugely increases the coverage. I'll make that change for v2. Thanks! -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team wrote: > INT_MAX threads would be needed when the leaking path is locked so > that it can only be exercised once and you'll need to get normal > (balanced) paths preempted just after the increment. if the leaking > path is lockless (can be exercised in parallel without bounds) then > 2 threads are enough where the one triggering the signed overflow > would have to be preempted while the other one does INT_MAX increments > and trigger the UAF. this is where the other mechanisms i talked about > in the past become relevant: preemption or interrupts can be disabled > or negative refcount values can be detected and acted upon (your blind > copy-pasting effort passed upon this latter opportunity by not > specializing the 'jo' into 'js' for the refcount case). Well, it's not "blind" -- I'm trying to bring the code as-is to upstream for discussion/examination with as little functional differences as possible so it's easier to compare apples to apples. (Which already resulted in more eyes looking at the code to find a bug -- thanks Jann!) But yes, jo -> js hugely increases the coverage. I'll make that change for v2. Thanks! -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Tue, Apr 25, 2017 at 4:26 AM, PaX Teamwrote: > On 25 Apr 2017 at 0:01, Peter Zijlstra wrote: >> How is the below not useful fodder for an exploit? It might be a less >> common bug, and perhaps a bit more fiddly to make work, but afaict its >> still a full use-after-free and therefore useful. >> >> --- >> >> Thread-AThread-B >> >> if(dec_and_test(>ref)) { // true, ref==0 >> >> inc(>ref) // ref: 0->1 >> >> kfree(obj); >> } > > ... and tell me why an attacker would let Thread-B do that increment > (that you're trying to detect) *before* the underlying memory gets > reused and thus the 0 changed to something else? hint: he'll do everything > in his power to prevent that, either by winning the race or if there's > no race (no refcount users outside his control), he'll win every time. > IOW, checking for 0 is pointless and you kinda proved it yourself now. Right, having a deterministic protection (checking for overflow) is best since it stops all exploits using that path. Hoping that an attacker is unlucky and hits a notification after they've already landed their corruption is not a very useful defense. It certainly has a non-zero value, but stopping overflow 100% is better. Especially when we can do it with no meaningful change in performance which lets us actually do the atomic_t -> refcount_t conversion everywhere. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team wrote: > On 25 Apr 2017 at 0:01, Peter Zijlstra wrote: >> How is the below not useful fodder for an exploit? It might be a less >> common bug, and perhaps a bit more fiddly to make work, but afaict its >> still a full use-after-free and therefore useful. >> >> --- >> >> Thread-AThread-B >> >> if(dec_and_test(>ref)) { // true, ref==0 >> >> inc(>ref) // ref: 0->1 >> >> kfree(obj); >> } > > ... and tell me why an attacker would let Thread-B do that increment > (that you're trying to detect) *before* the underlying memory gets > reused and thus the 0 changed to something else? hint: he'll do everything > in his power to prevent that, either by winning the race or if there's > no race (no refcount users outside his control), he'll win every time. > IOW, checking for 0 is pointless and you kinda proved it yourself now. Right, having a deterministic protection (checking for overflow) is best since it stops all exploits using that path. Hoping that an attacker is unlucky and hits a notification after they've already landed their corruption is not a very useful defense. It certainly has a non-zero value, but stopping overflow 100% is better. Especially when we can do it with no meaningful change in performance which lets us actually do the atomic_t -> refcount_t conversion everywhere. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 13:33, Kees Cook wrote: > On Mon, Apr 24, 2017 at 4:00 AM, PaX Teamwrote: > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > >> > This patch ports the x86-specific atomic overflow handling from PaX's > >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > >> > from PaX that eliminates the saturation race condition by resetting the > >> > atomic counter back to the INT_MAX saturation value on both overflow and > >> > underflow. To win a race, a system would have to have INT_MAX threads > >> > simultaneously overflow before the saturation handler runs. > > > > note that the above is wrong (and even contradicting itself and the code). > > True, this changelog could be more accurate (it resets to INT_MAX on > overflow and INT_MIN on underflow). I think I'm right in saying that a > system would need INT_MAX threads running a refcount_inc() (and a > refcount_dec_and_test() at exactly the right moment) before the reset > handler got scheduled, though, yes? there's no uniform answer to this as there're several conditions that can affect the effectiveness of the refcount protection. e.g., how many independent leaking paths can the attacker exercise (typically one), are these paths under some kind of locks (would already prevent unbounded leaks/increments should the overflow detecting thread be preempted), are negative refcounts allowed and checked for or only signed overflow, etc. INT_MAX threads would be needed when the leaking path is locked so that it can only be exercised once and you'll need to get normal (balanced) paths preempted just after the increment. if the leaking path is lockless (can be exercised in parallel without bounds) then 2 threads are enough where the one triggering the signed overflow would have to be preempted while the other one does INT_MAX increments and trigger the UAF. this is where the other mechanisms i talked about in the past become relevant: preemption or interrupts can be disabled or negative refcount values can be detected and acted upon (your blind copy-pasting effort passed upon this latter opportunity by not specializing the 'jo' into 'js' for the refcount case).
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 13:33, Kees Cook wrote: > On Mon, Apr 24, 2017 at 4:00 AM, PaX Team wrote: > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > >> > This patch ports the x86-specific atomic overflow handling from PaX's > >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > >> > from PaX that eliminates the saturation race condition by resetting the > >> > atomic counter back to the INT_MAX saturation value on both overflow and > >> > underflow. To win a race, a system would have to have INT_MAX threads > >> > simultaneously overflow before the saturation handler runs. > > > > note that the above is wrong (and even contradicting itself and the code). > > True, this changelog could be more accurate (it resets to INT_MAX on > overflow and INT_MIN on underflow). I think I'm right in saying that a > system would need INT_MAX threads running a refcount_inc() (and a > refcount_dec_and_test() at exactly the right moment) before the reset > handler got scheduled, though, yes? there's no uniform answer to this as there're several conditions that can affect the effectiveness of the refcount protection. e.g., how many independent leaking paths can the attacker exercise (typically one), are these paths under some kind of locks (would already prevent unbounded leaks/increments should the overflow detecting thread be preempted), are negative refcounts allowed and checked for or only signed overflow, etc. INT_MAX threads would be needed when the leaking path is locked so that it can only be exercised once and you'll need to get normal (balanced) paths preempted just after the increment. if the leaking path is lockless (can be exercised in parallel without bounds) then 2 threads are enough where the one triggering the signed overflow would have to be preempted while the other one does INT_MAX increments and trigger the UAF. this is where the other mechanisms i talked about in the past become relevant: preemption or interrupts can be disabled or negative refcount values can be detected and acted upon (your blind copy-pasting effort passed upon this latter opportunity by not specializing the 'jo' into 'js' for the refcount case).
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 25 Apr 2017 at 12:23, Peter Zijlstra wrote: > So what avoids this: simple, you noted it yourself in your previous mail: > Well, your setup (panic_on_warn et al) would have it panic the box. That > will effectively stop the exploit by virtue of stopping everything. with that in mind the actual code looks like this: > CPU0CPU1 > > > lock inc %[val]; # 0x7fff > jo 2f >1: ... > > lock dec %[val]; # 0x8000 > jo 2f > 1: ... > > > > >2: mov $0x7fff, %[val] panic() > jmp 1b > > 2: mov $0x8000, %[val] panic() > jmp 1b > ... and we never get this far.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 25 Apr 2017 at 12:23, Peter Zijlstra wrote: > So what avoids this: simple, you noted it yourself in your previous mail: > Well, your setup (panic_on_warn et al) would have it panic the box. That > will effectively stop the exploit by virtue of stopping everything. with that in mind the actual code looks like this: > CPU0CPU1 > > > lock inc %[val]; # 0x7fff > jo 2f >1: ... > > lock dec %[val]; # 0x8000 > jo 2f > 1: ... > > > > >2: mov $0x7fff, %[val] panic() > jmp 1b > > 2: mov $0x8000, %[val] panic() > jmp 1b > ... and we never get this far.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 25 Apr 2017 at 0:01, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > > I think we're way off in the weeds here. The "cannot inc from 0" check > > is about general sanity checks on refcounts. > > I disagree, although sanity check are good too. exactly, an attacker doesn't care how a premature free occurs due to reaching a 0 refcount, afterwards it's memory corruption time for both old and new references regardless. > > However, what the refcount hardening protection is trying to do is > > protect again the exploitable condition: overflow. > > Sure.. underflow is also exploitable, it's just much harder to defend against (there're no known practical solutions). > > Inc-from-0 isn't an exploitable condition since in theory > > the memory suddenly becomes correctly managed again. > > It does not. It just got free'ed. Nothing will stop the free from > happening (or already having happened). now hold this thought... > How is the below not useful fodder for an exploit? It might be a less > common bug, and perhaps a bit more fiddly to make work, but afaict its > still a full use-after-free and therefore useful. > > --- > > Thread-AThread-B > > if(dec_and_test(>ref)) { // true, ref==0 > > inc(>ref) // ref: 0->1 > > kfree(obj); > } ... and tell me why an attacker would let Thread-B do that increment (that you're trying to detect) *before* the underlying memory gets reused and thus the 0 changed to something else? hint: he'll do everything in his power to prevent that, either by winning the race or if there's no race (no refcount users outside his control), he'll win every time. IOW, checking for 0 is pointless and you kinda proved it yourself now.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 25 Apr 2017 at 0:01, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > > I think we're way off in the weeds here. The "cannot inc from 0" check > > is about general sanity checks on refcounts. > > I disagree, although sanity check are good too. exactly, an attacker doesn't care how a premature free occurs due to reaching a 0 refcount, afterwards it's memory corruption time for both old and new references regardless. > > However, what the refcount hardening protection is trying to do is > > protect again the exploitable condition: overflow. > > Sure.. underflow is also exploitable, it's just much harder to defend against (there're no known practical solutions). > > Inc-from-0 isn't an exploitable condition since in theory > > the memory suddenly becomes correctly managed again. > > It does not. It just got free'ed. Nothing will stop the free from > happening (or already having happened). now hold this thought... > How is the below not useful fodder for an exploit? It might be a less > common bug, and perhaps a bit more fiddly to make work, but afaict its > still a full use-after-free and therefore useful. > > --- > > Thread-AThread-B > > if(dec_and_test(>ref)) { // true, ref==0 > > inc(>ref) // ref: 0->1 > > kfree(obj); > } ... and tell me why an attacker would let Thread-B do that increment (that you're trying to detect) *before* the underlying memory gets reused and thus the 0 changed to something else? hint: he'll do everything in his power to prevent that, either by winning the race or if there's no race (no refcount users outside his control), he'll win every time. IOW, checking for 0 is pointless and you kinda proved it yourself now.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > +static __always_inline void refcount_inc(refcount_t *r) > +{ > + asm volatile(LOCK_PREFIX "incl %0\n\t" > + REFCOUNT_CHECK_OVERFLOW(4) > + : [counter] "+m" (r->refs.counter) > + : : "cc", "cx"); > +} > + > +static __always_inline void refcount_dec(refcount_t *r) > +{ > + asm volatile(LOCK_PREFIX "decl %0\n\t" > + REFCOUNT_CHECK_UNDERFLOW(4) > + : [counter] "+m" (r->refs.counter) > + : : "cc", "cx"); > +} > +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code) > +{ > + const char *str = NULL; > + > + BUG_ON(!(regs->flags & X86_EFLAGS_OF)); > + > +#define range_check(size, direction, type, value) \ > + if ((unsigned long)__##size##_##direction##_start <= regs->ip && \ > + regs->ip < (unsigned long)__##size##_##direction##_end) { \ > + *(type *)regs->cx = value; \ > + str = #size " " #direction; \ > + } > + > + range_check(refcount, overflow, int, INT_MAX) > + range_check(refcount, underflow, int, INT_MIN) > + > +#undef range_check > + > + BUG_ON(!str); > + do_error_trap(regs, error_code, (char *)str, X86_REFCOUNT_VECTOR, > + SIGILL); > +} > +#endif So what avoids this: CPU0CPU1 lock inc %[val]; # 0x7fff jo 2f 1: ... lock dec %[val]; # 0x8000 jo 2f 1: ... 2: mov $0x7fff, %[val] jmp 1b 2: mov $0x8000, %[val] jmp 1b // lock inc %val; #0x8000 lock inc %val; 0x lock inc %val; 0x
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > +static __always_inline void refcount_inc(refcount_t *r) > +{ > + asm volatile(LOCK_PREFIX "incl %0\n\t" > + REFCOUNT_CHECK_OVERFLOW(4) > + : [counter] "+m" (r->refs.counter) > + : : "cc", "cx"); > +} > + > +static __always_inline void refcount_dec(refcount_t *r) > +{ > + asm volatile(LOCK_PREFIX "decl %0\n\t" > + REFCOUNT_CHECK_UNDERFLOW(4) > + : [counter] "+m" (r->refs.counter) > + : : "cc", "cx"); > +} > +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code) > +{ > + const char *str = NULL; > + > + BUG_ON(!(regs->flags & X86_EFLAGS_OF)); > + > +#define range_check(size, direction, type, value) \ > + if ((unsigned long)__##size##_##direction##_start <= regs->ip && \ > + regs->ip < (unsigned long)__##size##_##direction##_end) { \ > + *(type *)regs->cx = value; \ > + str = #size " " #direction; \ > + } > + > + range_check(refcount, overflow, int, INT_MAX) > + range_check(refcount, underflow, int, INT_MIN) > + > +#undef range_check > + > + BUG_ON(!str); > + do_error_trap(regs, error_code, (char *)str, X86_REFCOUNT_VECTOR, > + SIGILL); > +} > +#endif So what avoids this: CPU0CPU1 lock inc %[val]; # 0x7fff jo 2f 1: ... lock dec %[val]; # 0x8000 jo 2f 1: ... 2: mov $0x7fff, %[val] jmp 1b 2: mov $0x8000, %[val] jmp 1b // lock inc %val; #0x8000 lock inc %val; 0x lock inc %val; 0x
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 03:37:32PM -0700, Kees Cook wrote: > On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstrawrote: > > It does not. It just got free'ed. Nothing will stop the free from > > happening (or already having happened). > > Well, yes, but that's kind of my point. Detecting inc-from-0 is "too > late" to offer a protection. It offers notification of a bug, rather > than stopping an exploit from happening. Well, your setup (panic_on_warn et al) would have it panic the box. That will effectively stop the exploit by virtue of stopping everything. And warn/bug/panic etc.. are I think a better option that silently letting it happen.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 03:37:32PM -0700, Kees Cook wrote: > On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra wrote: > > It does not. It just got free'ed. Nothing will stop the free from > > happening (or already having happened). > > Well, yes, but that's kind of my point. Detecting inc-from-0 is "too > late" to offer a protection. It offers notification of a bug, rather > than stopping an exploit from happening. Well, your setup (panic_on_warn et al) would have it panic the box. That will effectively stop the exploit by virtue of stopping everything. And warn/bug/panic etc.. are I think a better option that silently letting it happen.
Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, 2017-04-24 at 15:37 -0700, Kees Cook wrote: > On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra> wrote: > > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > > > I think we're way off in the weeds here. The "cannot inc from 0" > > > check > > > is about general sanity checks on refcounts. > > > > I disagree, although sanity check are good too. > > > > > It should never happen, and if it does, there's a bug. > > > > The very same is true of the overflow thing. > > > > > However, what the refcount hardening protection is trying to do > > > is > > > protect again the exploitable condition: overflow. > > > > Sure.. > > > > > Inc-from-0 isn't an exploitable condition since in theory > > > the memory suddenly becomes correctly managed again. > > > > It does not. It just got free'ed. Nothing will stop the free from > > happening (or already having happened). > > Well, yes, but that's kind of my point. Detecting inc-from-0 is "too > late" to offer a protection. It offers notification of a bug, rather > than stopping an exploit from happening. inc-from-0 could allow the attacker to gain access to an object which gets allocated to a new user afterwards. Certainly much less useful as an exploit, but still a potential privilege escalation.
Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, 2017-04-24 at 15:37 -0700, Kees Cook wrote: > On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra > wrote: > > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > > > I think we're way off in the weeds here. The "cannot inc from 0" > > > check > > > is about general sanity checks on refcounts. > > > > I disagree, although sanity check are good too. > > > > > It should never happen, and if it does, there's a bug. > > > > The very same is true of the overflow thing. > > > > > However, what the refcount hardening protection is trying to do > > > is > > > protect again the exploitable condition: overflow. > > > > Sure.. > > > > > Inc-from-0 isn't an exploitable condition since in theory > > > the memory suddenly becomes correctly managed again. > > > > It does not. It just got free'ed. Nothing will stop the free from > > happening (or already having happened). > > Well, yes, but that's kind of my point. Detecting inc-from-0 is "too > late" to offer a protection. It offers notification of a bug, rather > than stopping an exploit from happening. inc-from-0 could allow the attacker to gain access to an object which gets allocated to a new user afterwards. Certainly much less useful as an exploit, but still a potential privilege escalation.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstrawrote: > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: >> I think we're way off in the weeds here. The "cannot inc from 0" check >> is about general sanity checks on refcounts. > > I disagree, although sanity check are good too. > >> It should never happen, and if it does, there's a bug. > > The very same is true of the overflow thing. > >> However, what the refcount hardening protection is trying to do is >> protect again the exploitable condition: overflow. > > Sure.. > >> Inc-from-0 isn't an exploitable condition since in theory >> the memory suddenly becomes correctly managed again. > > It does not. It just got free'ed. Nothing will stop the free from > happening (or already having happened). Well, yes, but that's kind of my point. Detecting inc-from-0 is "too late" to offer a protection. It offers notification of a bug, rather than stopping an exploit from happening. >> We're just discussing different things. > > No, both are bugs, neither overflow not inc-from-zero _should_ happen. > The whole point is that code is buggy. If it weren't for that, we'd not > be doing this. > > How is the below not useful fodder for an exploit? It might be a less > common bug, and perhaps a bit more fiddly to make work, but afaict its > still a full use-after-free and therefore useful. > > --- > > Thread-AThread-B > > if(dec_and_test(>ref)) { // true, ref==0 > > inc(>ref) // ref: 0->1 > > kfree(obj); > } > > > > ~~~/ Thread-C /~~~ > > obj = kmalloc(); // is obj from Thread-A > > obj->ref = 1; > obj->func = > > > obj->func(); > > -- OR -- > > put(obj); > if (dec_and_test(>ref)) > kfree(obj); // which was > of Thread-C Right, both are bugs, but the exploitable condition is "refcount hit zero and got freed while there were still things using it". Having too many put()s that cause the refcount to reach zero early can't be detected, but having too many get()s that wrap around, ultimately to zero, can be (the overflow condition). With overflow protection, the refcount can never (realistically) hit zero since the refcount will get saturated at INT_MAX, so no exploit is possible -- no memory is ever freed (it just leaks the allocation). The inc-from-zero case performing a notification is certainly nice, but the exploit already happened. I want the overflow protection to work everywhere the kernel uses refcounts, which means dealing with performance concerns from mm, net, block, etc. Since this is a 1 instruction change (which branch prediction should make almost no cost at all), this will easily address any performance concerns from the other subsystems. I'd rather have the overflow protection everywhere than only in some areas, and I want to break the deadlock of this catch-22 of subsystems not being able to say what benchmarks are important to them but refusing to switch to refcount_t due to performance concerns. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: >> I think we're way off in the weeds here. The "cannot inc from 0" check >> is about general sanity checks on refcounts. > > I disagree, although sanity check are good too. > >> It should never happen, and if it does, there's a bug. > > The very same is true of the overflow thing. > >> However, what the refcount hardening protection is trying to do is >> protect again the exploitable condition: overflow. > > Sure.. > >> Inc-from-0 isn't an exploitable condition since in theory >> the memory suddenly becomes correctly managed again. > > It does not. It just got free'ed. Nothing will stop the free from > happening (or already having happened). Well, yes, but that's kind of my point. Detecting inc-from-0 is "too late" to offer a protection. It offers notification of a bug, rather than stopping an exploit from happening. >> We're just discussing different things. > > No, both are bugs, neither overflow not inc-from-zero _should_ happen. > The whole point is that code is buggy. If it weren't for that, we'd not > be doing this. > > How is the below not useful fodder for an exploit? It might be a less > common bug, and perhaps a bit more fiddly to make work, but afaict its > still a full use-after-free and therefore useful. > > --- > > Thread-AThread-B > > if(dec_and_test(>ref)) { // true, ref==0 > > inc(>ref) // ref: 0->1 > > kfree(obj); > } > > > > ~~~/ Thread-C /~~~ > > obj = kmalloc(); // is obj from Thread-A > > obj->ref = 1; > obj->func = > > > obj->func(); > > -- OR -- > > put(obj); > if (dec_and_test(>ref)) > kfree(obj); // which was > of Thread-C Right, both are bugs, but the exploitable condition is "refcount hit zero and got freed while there were still things using it". Having too many put()s that cause the refcount to reach zero early can't be detected, but having too many get()s that wrap around, ultimately to zero, can be (the overflow condition). With overflow protection, the refcount can never (realistically) hit zero since the refcount will get saturated at INT_MAX, so no exploit is possible -- no memory is ever freed (it just leaks the allocation). The inc-from-zero case performing a notification is certainly nice, but the exploit already happened. I want the overflow protection to work everywhere the kernel uses refcounts, which means dealing with performance concerns from mm, net, block, etc. Since this is a 1 instruction change (which branch prediction should make almost no cost at all), this will easily address any performance concerns from the other subsystems. I'd rather have the overflow protection everywhere than only in some areas, and I want to break the deadlock of this catch-22 of subsystems not being able to say what benchmarks are important to them but refusing to switch to refcount_t due to performance concerns. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > I think we're way off in the weeds here. The "cannot inc from 0" check > is about general sanity checks on refcounts. I disagree, although sanity check are good too. > It should never happen, and if it does, there's a bug. The very same is true of the overflow thing. > However, what the refcount hardening protection is trying to do is > protect again the exploitable condition: overflow. Sure.. > Inc-from-0 isn't an exploitable condition since in theory > the memory suddenly becomes correctly managed again. It does not. It just got free'ed. Nothing will stop the free from happening (or already having happened). > We're just discussing different things. No, both are bugs, neither overflow not inc-from-zero _should_ happen. The whole point is that code is buggy. If it weren't for that, we'd not be doing this. How is the below not useful fodder for an exploit? It might be a less common bug, and perhaps a bit more fiddly to make work, but afaict its still a full use-after-free and therefore useful. --- Thread-AThread-B if(dec_and_test(>ref)) { // true, ref==0 inc(>ref) // ref: 0->1 kfree(obj); } ~~~/ Thread-C /~~~ obj = kmalloc(); // is obj from Thread-A obj->ref = 1; obj->func = obj->func(); -- OR -- put(obj); if (dec_and_test(>ref)) kfree(obj); // which was of Thread-C
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote: > I think we're way off in the weeds here. The "cannot inc from 0" check > is about general sanity checks on refcounts. I disagree, although sanity check are good too. > It should never happen, and if it does, there's a bug. The very same is true of the overflow thing. > However, what the refcount hardening protection is trying to do is > protect again the exploitable condition: overflow. Sure.. > Inc-from-0 isn't an exploitable condition since in theory > the memory suddenly becomes correctly managed again. It does not. It just got free'ed. Nothing will stop the free from happening (or already having happened). > We're just discussing different things. No, both are bugs, neither overflow not inc-from-zero _should_ happen. The whole point is that code is buggy. If it weren't for that, we'd not be doing this. How is the below not useful fodder for an exploit? It might be a less common bug, and perhaps a bit more fiddly to make work, but afaict its still a full use-after-free and therefore useful. --- Thread-AThread-B if(dec_and_test(>ref)) { // true, ref==0 inc(>ref) // ref: 0->1 kfree(obj); } ~~~/ Thread-C /~~~ obj = kmalloc(); // is obj from Thread-A obj->ref = 1; obj->func = obj->func(); -- OR -- put(obj); if (dec_and_test(>ref)) kfree(obj); // which was of Thread-C
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 8:15 AM, PaX Teamwrote: > On 24 Apr 2017 at 15:33, Peter Zijlstra wrote: >> On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: >> > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: >> > that was exactly my point: all this applies to you as well. so let me ask >> > the 3rd time: what is your "argument for correctness" for a 0 refcount >> > value check? how does it prevent exploitation? >> >> I think I've explained that before; per reference count rules 0 means >> freed (or about to be freed when we talk RCU). > > you only said the same thing, what 0 means. you (still) didn't explain how > checking for it prevents exploitation. > >> The whole pattern: >> >> if (dec_and_test(>ref)) >> kfree(obj); >> >> expresses this etc.. Other reference counts also do this. No references >> means its getting freed. >> >> Can you agree with this? > > sure, so far so good. > >> If so; any attempt to increase the reference count while its (being) >> freed() is a use-after-free. > > why would ever be there such an attempt? a freed object with intact memory > content is as useful for an attacker as a live one, that is, not at all. I think we're way off in the weeds here. The "cannot inc from 0" check is about general sanity checks on refcounts. It should never happen, and if it does, there's a bug. However, what the refcount hardening protection is trying to do is protect again the exploitable condition: overflow. Inc-from-0 isn't an exploitable condition since in theory the memory suddenly becomes correctly managed again. We're just discussing different things. The point is to have very fast refcount_t that protects against overflow so the mm, net, and block subsystems aren't worried about making the atomic_t -> refcount_t changes there. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 8:15 AM, PaX Team wrote: > On 24 Apr 2017 at 15:33, Peter Zijlstra wrote: >> On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: >> > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: >> > that was exactly my point: all this applies to you as well. so let me ask >> > the 3rd time: what is your "argument for correctness" for a 0 refcount >> > value check? how does it prevent exploitation? >> >> I think I've explained that before; per reference count rules 0 means >> freed (or about to be freed when we talk RCU). > > you only said the same thing, what 0 means. you (still) didn't explain how > checking for it prevents exploitation. > >> The whole pattern: >> >> if (dec_and_test(>ref)) >> kfree(obj); >> >> expresses this etc.. Other reference counts also do this. No references >> means its getting freed. >> >> Can you agree with this? > > sure, so far so good. > >> If so; any attempt to increase the reference count while its (being) >> freed() is a use-after-free. > > why would ever be there such an attempt? a freed object with intact memory > content is as useful for an attacker as a live one, that is, not at all. I think we're way off in the weeds here. The "cannot inc from 0" check is about general sanity checks on refcounts. It should never happen, and if it does, there's a bug. However, what the refcount hardening protection is trying to do is protect again the exploitable condition: overflow. Inc-from-0 isn't an exploitable condition since in theory the memory suddenly becomes correctly managed again. We're just discussing different things. The point is to have very fast refcount_t that protects against overflow so the mm, net, and block subsystems aren't worried about making the atomic_t -> refcount_t changes there. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 4:00 AM, PaX Teamwrote: > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> > This patch ports the x86-specific atomic overflow handling from PaX's >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> > from PaX that eliminates the saturation race condition by resetting the >> > atomic counter back to the INT_MAX saturation value on both overflow and >> > underflow. To win a race, a system would have to have INT_MAX threads >> > simultaneously overflow before the saturation handler runs. > > note that the above is wrong (and even contradicting itself and the code). True, this changelog could be more accurate (it resets to INT_MAX on overflow and INT_MIN on underflow). I think I'm right in saying that a system would need INT_MAX threads running a refcount_inc() (and a refcount_dec_and_test() at exactly the right moment) before the reset handler got scheduled, though, yes? I'll attempt to clarify this. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 4:00 AM, PaX Team wrote: > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> > This patch ports the x86-specific atomic overflow handling from PaX's >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> > from PaX that eliminates the saturation race condition by resetting the >> > atomic counter back to the INT_MAX saturation value on both overflow and >> > underflow. To win a race, a system would have to have INT_MAX threads >> > simultaneously overflow before the saturation handler runs. > > note that the above is wrong (and even contradicting itself and the code). True, this changelog could be more accurate (it resets to INT_MAX on overflow and INT_MIN on underflow). I think I'm right in saying that a system would need INT_MAX threads running a refcount_inc() (and a refcount_dec_and_test() at exactly the right moment) before the reset handler got scheduled, though, yes? I'll attempt to clarify this. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 3:48 AM, Peter Zijlstrawrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c >> index e3f4cd8876b5..1bdafb29b802 100644 >> --- a/drivers/misc/lkdtm_bugs.c >> +++ b/drivers/misc/lkdtm_bugs.c >> @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void) >> schedule(); >> } >> >> +#ifdef CONFIG_FAST_REFCOUNT >> +#define REFCOUNT_MAX INT_MAX >> +#else >> +#define REFCOUNT_MAX UINT_MAX >> +#endif > > That doesn't seem like a sensible place for this. I'll drop the LKDTM changes from this particular patch. As for the define, I think it's only interesting to LKDTM since it's the only part interested in refcount_t internals. (i.e. nothing else would (or should) use this information.) -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 3:48 AM, Peter Zijlstra wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c >> index e3f4cd8876b5..1bdafb29b802 100644 >> --- a/drivers/misc/lkdtm_bugs.c >> +++ b/drivers/misc/lkdtm_bugs.c >> @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void) >> schedule(); >> } >> >> +#ifdef CONFIG_FAST_REFCOUNT >> +#define REFCOUNT_MAX INT_MAX >> +#else >> +#define REFCOUNT_MAX UINT_MAX >> +#endif > > That doesn't seem like a sensible place for this. I'll drop the LKDTM changes from this particular patch. As for the define, I think it's only interesting to LKDTM since it's the only part interested in refcount_t internals. (i.e. nothing else would (or should) use this information.) -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 3:45 AM, Peter Zijlstrawrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t >> *r) >> +{ >> + const int a = 1; >> + const int u = 0; >> + int c, old; >> + >> + c = atomic_read(&(r->refs)); >> + for (;;) { >> + if (unlikely(c == (u))) >> + break; >> + old = atomic_cmpxchg(&(r->refs), c, c + (a)); > > Please use atomic_try_cmpxchg(), that generates saner code. Ah-ha, thanks. I actually copied this directly out of the existing atomic_t function, so we should probably update it there too. -Kees > >> + if (likely(old == c)) >> + break; >> + c = old; >> + } >> + return c != u; >> +} -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 3:45 AM, Peter Zijlstra wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t >> *r) >> +{ >> + const int a = 1; >> + const int u = 0; >> + int c, old; >> + >> + c = atomic_read(&(r->refs)); >> + for (;;) { >> + if (unlikely(c == (u))) >> + break; >> + old = atomic_cmpxchg(&(r->refs), c, c + (a)); > > Please use atomic_try_cmpxchg(), that generates saner code. Ah-ha, thanks. I actually copied this directly out of the existing atomic_t function, so we should probably update it there too. -Kees > >> + if (likely(old == c)) >> + break; >> + c = old; >> + } >> + return c != u; >> +} -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 1:32 AM, Peter Zijlstrawrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> This patch ports the x86-specific atomic overflow handling from PaX's >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> from PaX that eliminates the saturation race condition by resetting the >> atomic counter back to the INT_MAX saturation value on both overflow and >> underflow. To win a race, a system would have to have INT_MAX threads >> simultaneously overflow before the saturation handler runs. > > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? I'll adjust the language. "Highly unlikely" is still better than "trivially doable with a single thread". :) > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). > > Because PaX does it, is not a correctness argument. And this really > wants one. Sure, I didn't mean to imply anything other than a demonstration of what PaX is doing (and that it's better than not having it). If we can improve it, that's great. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 1:32 AM, Peter Zijlstra wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> This patch ports the x86-specific atomic overflow handling from PaX's >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> from PaX that eliminates the saturation race condition by resetting the >> atomic counter back to the INT_MAX saturation value on both overflow and >> underflow. To win a race, a system would have to have INT_MAX threads >> simultaneously overflow before the saturation handler runs. > > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? I'll adjust the language. "Highly unlikely" is still better than "trivially doable with a single thread". :) > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). > > Because PaX does it, is not a correctness argument. And this really > wants one. Sure, I didn't mean to imply anything other than a demonstration of what PaX is doing (and that it's better than not having it). If we can improve it, that's great. -Kees -- Kees Cook Pixel Security
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 15:33, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: > > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > > > > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > > > O(nr_tasks + 3*nr_cpus). > > > > > > > > what does nr_cpus have to do with winning the race? > > > > > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > > > refcount, irrespective of the (preempted) task context. > > > > that's fine but are you also assuming that the code executed in each of > > those contexts leaks the same refcount? otherwise whatever they do to the > > refcount is no more relevant than a non-leaking preemptible path that runs > > to completion in a bounded amount of time (i.e., you get temporary bumps > > and thus need to win yet another set of races to get their effects at once). > > For worst case analysis we have to assume it does, unless we can proof > it doesn't. And that proof is very very hard, and would need to be > redone every time the kernel changes. for worst case analysis you need to show the existence of an amd64 system that can spawn 2G tasks. then you'll have to show the feasibility of making all of them get preempted (without a later reschedule) inside a 2 insn window. > > that was exactly my point: all this applies to you as well. so let me ask > > the 3rd time: what is your "argument for correctness" for a 0 refcount > > value check? how does it prevent exploitation? > > What 0 count check are you talking about, the one that triggers when we > want to increment 0 ? are there any other 0 checks in there? > I think I've explained that before; per reference count rules 0 means > freed (or about to be freed when we talk RCU). you only said the same thing, what 0 means. you (still) didn't explain how checking for it prevents exploitation. > The whole pattern: > > if (dec_and_test(>ref)) > kfree(obj); > > expresses this etc.. Other reference counts also do this. No references > means its getting freed. > > Can you agree with this? sure, so far so good. > If so; any attempt to increase the reference count while its (being) > freed() is a use-after-free. why would ever be there such an attempt? a freed object with intact memory content is as useful for an attacker as a live one, that is, not at all.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 15:33, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: > > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > > > > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > > > O(nr_tasks + 3*nr_cpus). > > > > > > > > what does nr_cpus have to do with winning the race? > > > > > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > > > refcount, irrespective of the (preempted) task context. > > > > that's fine but are you also assuming that the code executed in each of > > those contexts leaks the same refcount? otherwise whatever they do to the > > refcount is no more relevant than a non-leaking preemptible path that runs > > to completion in a bounded amount of time (i.e., you get temporary bumps > > and thus need to win yet another set of races to get their effects at once). > > For worst case analysis we have to assume it does, unless we can proof > it doesn't. And that proof is very very hard, and would need to be > redone every time the kernel changes. for worst case analysis you need to show the existence of an amd64 system that can spawn 2G tasks. then you'll have to show the feasibility of making all of them get preempted (without a later reschedule) inside a 2 insn window. > > that was exactly my point: all this applies to you as well. so let me ask > > the 3rd time: what is your "argument for correctness" for a 0 refcount > > value check? how does it prevent exploitation? > > What 0 count check are you talking about, the one that triggers when we > want to increment 0 ? are there any other 0 checks in there? > I think I've explained that before; per reference count rules 0 means > freed (or about to be freed when we talk RCU). you only said the same thing, what 0 means. you (still) didn't explain how checking for it prevents exploitation. > The whole pattern: > > if (dec_and_test(>ref)) > kfree(obj); > > expresses this etc.. Other reference counts also do this. No references > means its getting freed. > > Can you agree with this? sure, so far so good. > If so; any attempt to increase the reference count while its (being) > freed() is a use-after-free. why would ever be there such an attempt? a freed object with intact memory content is as useful for an attacker as a live one, that is, not at all.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > > O(nr_tasks + 3*nr_cpus). > > > > > > what does nr_cpus have to do with winning the race? > > > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > > refcount, irrespective of the (preempted) task context. > > that's fine but are you also assuming that the code executed in each of > those contexts leaks the same refcount? otherwise whatever they do to the > refcount is no more relevant than a non-leaking preemptible path that runs > to completion in a bounded amount of time (i.e., you get temporary bumps > and thus need to win yet another set of races to get their effects at once). For worst case analysis we have to assume it does, unless we can proof it doesn't. And that proof is very very hard, and would need to be redone every time the kernel changes. > that was exactly my point: all this applies to you as well. so let me ask > the 3rd time: what is your "argument for correctness" for a 0 refcount > value check? how does it prevent exploitation? What 0 count check are you talking about, the one that triggers when we want to increment 0 ? I think I've explained that before; per reference count rules 0 means freed (or about to be freed when we talk RCU). The whole pattern: if (dec_and_test(>ref)) kfree(obj); expresses this etc.. Other reference counts also do this. No references means its getting freed. Can you agree with this? If so; any attempt to increase the reference count while its (being) freed() is a use-after-free. Therefore we disallow 0 increment. Yes, this is an annoyance when you consider usage-counts, where 0 means something else. But then, we were talking about reference counts, not something else.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote: > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > > O(nr_tasks + 3*nr_cpus). > > > > > > what does nr_cpus have to do with winning the race? > > > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > > refcount, irrespective of the (preempted) task context. > > that's fine but are you also assuming that the code executed in each of > those contexts leaks the same refcount? otherwise whatever they do to the > refcount is no more relevant than a non-leaking preemptible path that runs > to completion in a bounded amount of time (i.e., you get temporary bumps > and thus need to win yet another set of races to get their effects at once). For worst case analysis we have to assume it does, unless we can proof it doesn't. And that proof is very very hard, and would need to be redone every time the kernel changes. > that was exactly my point: all this applies to you as well. so let me ask > the 3rd time: what is your "argument for correctness" for a 0 refcount > value check? how does it prevent exploitation? What 0 count check are you talking about, the one that triggers when we want to increment 0 ? I think I've explained that before; per reference count rules 0 means freed (or about to be freed when we talk RCU). The whole pattern: if (dec_and_test(>ref)) kfree(obj); expresses this etc.. Other reference counts also do this. No references means its getting freed. Can you agree with this? If so; any attempt to increase the reference count while its (being) freed() is a use-after-free. Therefore we disallow 0 increment. Yes, this is an annoyance when you consider usage-counts, where 0 means something else. But then, we were talking about reference counts, not something else.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > O(nr_tasks + 3*nr_cpus). > > > > what does nr_cpus have to do with winning the race? > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > refcount, irrespective of the (preempted) task context. that's fine but are you also assuming that the code executed in each of those contexts leaks the same refcount? otherwise whatever they do to the refcount is no more relevant than a non-leaking preemptible path that runs to completion in a bounded amount of time (i.e., you get temporary bumps and thus need to win yet another set of races to get their effects at once). > > > Because PaX does it, is not a correctness argument. And this really > > > wants one. > > > > heh, do you want to tell me about how checking for a 0 refcount prevents > > exploiting a bug? > > Not the point. All I said was that saying somebody else does it (anybody > else, doesn't matter it was you) isn't an argument for correctness. that was exactly my point: all this applies to you as well. so let me ask the 3rd time: what is your "argument for correctness" for a 0 refcount value check? how does it prevent exploitation?
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 13:15, Peter Zijlstra wrote: > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > > O(nr_tasks + 3*nr_cpus). > > > > what does nr_cpus have to do with winning the race? > > The CPUs could each run nested softirq/hardirq/nmi context poking at the > refcount, irrespective of the (preempted) task context. that's fine but are you also assuming that the code executed in each of those contexts leaks the same refcount? otherwise whatever they do to the refcount is no more relevant than a non-leaking preemptible path that runs to completion in a bounded amount of time (i.e., you get temporary bumps and thus need to win yet another set of races to get their effects at once). > > > Because PaX does it, is not a correctness argument. And this really > > > wants one. > > > > heh, do you want to tell me about how checking for a 0 refcount prevents > > exploiting a bug? > > Not the point. All I said was that saying somebody else does it (anybody > else, doesn't matter it was you) isn't an argument for correctness. that was exactly my point: all this applies to you as well. so let me ask the 3rd time: what is your "argument for correctness" for a 0 refcount value check? how does it prevent exploitation?
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > O(nr_tasks + 3*nr_cpus). > > what does nr_cpus have to do with winning the race? The CPUs could each run nested softirq/hardirq/nmi context poking at the refcount, irrespective of the (preempted) task context. > > Because PaX does it, is not a correctness argument. And this really > > wants one. > > heh, do you want to tell me about how checking for a 0 refcount prevents > exploiting a bug? Not the point. All I said was that saying somebody else does it (anybody else, doesn't matter it was you) isn't an argument for correctness.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote: > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > O(nr_tasks + 3*nr_cpus). > > what does nr_cpus have to do with winning the race? The CPUs could each run nested softirq/hardirq/nmi context poking at the refcount, irrespective of the (preempted) task context. > > Because PaX does it, is not a correctness argument. And this really > > wants one. > > heh, do you want to tell me about how checking for a 0 refcount prevents > exploiting a bug? Not the point. All I said was that saying somebody else does it (anybody else, doesn't matter it was you) isn't an argument for correctness.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > > This patch ports the x86-specific atomic overflow handling from PaX's > > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > > from PaX that eliminates the saturation race condition by resetting the > > atomic counter back to the INT_MAX saturation value on both overflow and > > underflow. To win a race, a system would have to have INT_MAX threads > > simultaneously overflow before the saturation handler runs. note that the above is wrong (and even contradicting itself and the code). > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? here's my analysis from a while ago: http://www.openwall.com/lists/kernel-hardening/2017/01/05/19 > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). what does nr_cpus have to do with winning the race? > Because PaX does it, is not a correctness argument. And this really > wants one. heh, do you want to tell me about how checking for a 0 refcount prevents exploiting a bug?
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On 24 Apr 2017 at 10:32, Peter Zijlstra wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > > This patch ports the x86-specific atomic overflow handling from PaX's > > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > > from PaX that eliminates the saturation race condition by resetting the > > atomic counter back to the INT_MAX saturation value on both overflow and > > underflow. To win a race, a system would have to have INT_MAX threads > > simultaneously overflow before the saturation handler runs. note that the above is wrong (and even contradicting itself and the code). > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? here's my analysis from a while ago: http://www.openwall.com/lists/kernel-hardening/2017/01/05/19 > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). what does nr_cpus have to do with winning the race? > Because PaX does it, is not a correctness argument. And this really > wants one. heh, do you want to tell me about how checking for a 0 refcount prevents exploiting a bug?
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c > index e3f4cd8876b5..1bdafb29b802 100644 > --- a/drivers/misc/lkdtm_bugs.c > +++ b/drivers/misc/lkdtm_bugs.c > @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void) > schedule(); > } > > +#ifdef CONFIG_FAST_REFCOUNT > +#define REFCOUNT_MAX INT_MAX > +#else > +#define REFCOUNT_MAX UINT_MAX > +#endif That doesn't seem like a sensible place for this.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c > index e3f4cd8876b5..1bdafb29b802 100644 > --- a/drivers/misc/lkdtm_bugs.c > +++ b/drivers/misc/lkdtm_bugs.c > @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void) > schedule(); > } > > +#ifdef CONFIG_FAST_REFCOUNT > +#define REFCOUNT_MAX INT_MAX > +#else > +#define REFCOUNT_MAX UINT_MAX > +#endif That doesn't seem like a sensible place for this.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r) > +{ > + const int a = 1; > + const int u = 0; > + int c, old; > + > + c = atomic_read(&(r->refs)); > + for (;;) { > + if (unlikely(c == (u))) > + break; > + old = atomic_cmpxchg(&(r->refs), c, c + (a)); Please use atomic_try_cmpxchg(), that generates saner code. > + if (likely(old == c)) > + break; > + c = old; > + } > + return c != u; > +}
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r) > +{ > + const int a = 1; > + const int u = 0; > + int c, old; > + > + c = atomic_read(&(r->refs)); > + for (;;) { > + if (unlikely(c == (u))) > + break; > + old = atomic_cmpxchg(&(r->refs), c, c + (a)); Please use atomic_try_cmpxchg(), that generates saner code. > + if (likely(old == c)) > + break; > + c = old; > + } > + return c != u; > +}
Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 10:53:30AM +0200, Jann Horn wrote: > On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstrawrote: > > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > >> This patch ports the x86-specific atomic overflow handling from PaX's > >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > >> from PaX that eliminates the saturation race condition by resetting the > >> atomic counter back to the INT_MAX saturation value on both overflow and > >> underflow. To win a race, a system would have to have INT_MAX threads > >> simultaneously overflow before the saturation handler runs. > > > > And is this impossible? Highly unlikely I'll grant you, but absolutely > > impossible? > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > O(nr_tasks + 3*nr_cpus). > > > > Because PaX does it, is not a correctness argument. And this really > > wants one. > > From include/linux/threads.h: > > /* > * A maximum of 4 million PIDs should be enough for a while. > * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.] > */ > #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ > (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) > > AFAICS that means you can only have up to 2^22 running tasks at > a time, since every running task requires a PID in the init pid namespace. That's but an artificial limit and bound to be increased sooner rather than later, given the trend in building ever larger machines. If we go with the futex limit on the full tid space, we end up with 2^30 (the comment here about 29 is wrong, see FUTEX_TID_MASK). We then still have to add a multiple of nr_cpus. Granted, that will still be slightly short of 3^31, but not to any amount I'm really comfortable with, we're _really_ close. Point remains though; Changelog (and arguably the code itself) should very much include such bits.
Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 10:53:30AM +0200, Jann Horn wrote: > On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra wrote: > > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > >> This patch ports the x86-specific atomic overflow handling from PaX's > >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > >> from PaX that eliminates the saturation race condition by resetting the > >> atomic counter back to the INT_MAX saturation value on both overflow and > >> underflow. To win a race, a system would have to have INT_MAX threads > >> simultaneously overflow before the saturation handler runs. > > > > And is this impossible? Highly unlikely I'll grant you, but absolutely > > impossible? > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > > O(nr_tasks + 3*nr_cpus). > > > > Because PaX does it, is not a correctness argument. And this really > > wants one. > > From include/linux/threads.h: > > /* > * A maximum of 4 million PIDs should be enough for a while. > * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.] > */ > #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ > (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) > > AFAICS that means you can only have up to 2^22 running tasks at > a time, since every running task requires a PID in the init pid namespace. That's but an artificial limit and bound to be increased sooner rather than later, given the trend in building ever larger machines. If we go with the futex limit on the full tid space, we end up with 2^30 (the comment here about 29 is wrong, see FUTEX_TID_MASK). We then still have to add a multiple of nr_cpus. Granted, that will still be slightly short of 3^31, but not to any amount I'm really comfortable with, we're _really_ close. Point remains though; Changelog (and arguably the code itself) should very much include such bits.
Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstrawrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> This patch ports the x86-specific atomic overflow handling from PaX's >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> from PaX that eliminates the saturation race condition by resetting the >> atomic counter back to the INT_MAX saturation value on both overflow and >> underflow. To win a race, a system would have to have INT_MAX threads >> simultaneously overflow before the saturation handler runs. > > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). > > Because PaX does it, is not a correctness argument. And this really > wants one. >From include/linux/threads.h: /* * A maximum of 4 million PIDs should be enough for a while. * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.] */ #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) AFAICS that means you can only have up to 2^22 running tasks at a time, since every running task requires a PID in the init pid namespace.
Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra wrote: > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: >> This patch ports the x86-specific atomic overflow handling from PaX's >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version >> from PaX that eliminates the saturation race condition by resetting the >> atomic counter back to the INT_MAX saturation value on both overflow and >> underflow. To win a race, a system would have to have INT_MAX threads >> simultaneously overflow before the saturation handler runs. > > And is this impossible? Highly unlikely I'll grant you, but absolutely > impossible? > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is > O(nr_tasks + 3*nr_cpus). > > Because PaX does it, is not a correctness argument. And this really > wants one. >From include/linux/threads.h: /* * A maximum of 4 million PIDs should be enough for a while. * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.] */ #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \ (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT)) AFAICS that means you can only have up to 2^22 running tasks at a time, since every running task requires a PID in the init pid namespace.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > This patch ports the x86-specific atomic overflow handling from PaX's > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > from PaX that eliminates the saturation race condition by resetting the > atomic counter back to the INT_MAX saturation value on both overflow and > underflow. To win a race, a system would have to have INT_MAX threads > simultaneously overflow before the saturation handler runs. And is this impossible? Highly unlikely I'll grant you, but absolutely impossible? Also, you forgot nr_cpus in your bound. Afaict the worst case here is O(nr_tasks + 3*nr_cpus). Because PaX does it, is not a correctness argument. And this really wants one.
Re: [PATCH] x86/refcount: Implement fast refcount_t handling
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote: > This patch ports the x86-specific atomic overflow handling from PaX's > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version > from PaX that eliminates the saturation race condition by resetting the > atomic counter back to the INT_MAX saturation value on both overflow and > underflow. To win a race, a system would have to have INT_MAX threads > simultaneously overflow before the saturation handler runs. And is this impossible? Highly unlikely I'll grant you, but absolutely impossible? Also, you forgot nr_cpus in your bound. Afaict the worst case here is O(nr_tasks + 3*nr_cpus). Because PaX does it, is not a correctness argument. And this really wants one.