Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Kees Cook
On Tue, Apr 25, 2017 at 7:14 PM, PaX Team  wrote:
> On 25 Apr 2017 at 9:39, Kees Cook wrote:
>
>> On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
>> > INT_MAX threads would be needed when the leaking path is locked so
>> > that it can only be exercised once and you'll need to get normal
>> > (balanced) paths preempted just after the increment. if the leaking
>> > path is lockless (can be exercised in parallel without bounds) then
>> > 2 threads are enough where the one triggering the signed overflow
>> > would have to be preempted while the other one does INT_MAX increments
>> > and trigger the UAF. this is where the other mechanisms i talked about
>> > in the past become relevant: preemption or interrupts can be disabled
>> > or negative refcount values can be detected and acted upon (your blind
>> > copy-pasting effort passed upon this latter opportunity by not
>> > specializing the 'jo' into 'js' for the refcount case).
>>
>> Well, it's not "blind" -- I'm trying to bring the code as-is to
>> upstream for discussion/examination with as little functional
>> differences as possible so it's easier to compare apples to apples.
>
> you copied code from a version which is at least 2 major kernel revisions
> behind (so much for those apples)

Hmm, this was from your 4.9 port. Linus hasn't quite released 4.11
yet, so that's actually "at most 2 major kernel revisions behind". :)
Regardless, I'd be happy to refresh the port. Will you share a URL to
your latest rebase against upstream?

> you chose the one version which had a
> bug that you didn't spot nor fix properly, you didn't realize the opportunity
> that a special refcount type represents, you claimed refcount underflows
> aren't exploitable but copied code that would detect signed underflow, you
> didn't understand the limits and edge cases i explained above... need i go

As I said, I was trying to minimize changes to your implementation,
which included the bug and the other issues. The point of this was to
share it with others, and work collaboratively on it. I think this
clearly succeeded with benefits to both upstream and PaX: Jann spotted
the fix for the bug causing weird crashes I saw when doing initial
testing, you pointed out the benefit of using js over jo, I've
reorganized the RMWcc macros for more easily adding trailing
instructions, Peter is thinking about ways around the protection, etc.

> on? doesn't leave one with great confidence in your ability to understand
> and maintain this code...

Well, that's your opinion. I think the patch and its discussion helped
several people, including myself, understand this code. Since many
people will share its maintenance, I think this is the right way to
handle upstreaming these kinds of things. I don't claim to be
omniscient, just persistent. Getting this protection into upstream
means every Linux user will benefit from what you created, which I
think is awesome; this whole class of refcount flaws goes away. Thank
you for writing it, sharing it, and discussing it!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Kees Cook
On Tue, Apr 25, 2017 at 7:14 PM, PaX Team  wrote:
> On 25 Apr 2017 at 9:39, Kees Cook wrote:
>
>> On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
>> > INT_MAX threads would be needed when the leaking path is locked so
>> > that it can only be exercised once and you'll need to get normal
>> > (balanced) paths preempted just after the increment. if the leaking
>> > path is lockless (can be exercised in parallel without bounds) then
>> > 2 threads are enough where the one triggering the signed overflow
>> > would have to be preempted while the other one does INT_MAX increments
>> > and trigger the UAF. this is where the other mechanisms i talked about
>> > in the past become relevant: preemption or interrupts can be disabled
>> > or negative refcount values can be detected and acted upon (your blind
>> > copy-pasting effort passed upon this latter opportunity by not
>> > specializing the 'jo' into 'js' for the refcount case).
>>
>> Well, it's not "blind" -- I'm trying to bring the code as-is to
>> upstream for discussion/examination with as little functional
>> differences as possible so it's easier to compare apples to apples.
>
> you copied code from a version which is at least 2 major kernel revisions
> behind (so much for those apples)

Hmm, this was from your 4.9 port. Linus hasn't quite released 4.11
yet, so that's actually "at most 2 major kernel revisions behind". :)
Regardless, I'd be happy to refresh the port. Will you share a URL to
your latest rebase against upstream?

> you chose the one version which had a
> bug that you didn't spot nor fix properly, you didn't realize the opportunity
> that a special refcount type represents, you claimed refcount underflows
> aren't exploitable but copied code that would detect signed underflow, you
> didn't understand the limits and edge cases i explained above... need i go

As I said, I was trying to minimize changes to your implementation,
which included the bug and the other issues. The point of this was to
share it with others, and work collaboratively on it. I think this
clearly succeeded with benefits to both upstream and PaX: Jann spotted
the fix for the bug causing weird crashes I saw when doing initial
testing, you pointed out the benefit of using js over jo, I've
reorganized the RMWcc macros for more easily adding trailing
instructions, Peter is thinking about ways around the protection, etc.

> on? doesn't leave one with great confidence in your ability to understand
> and maintain this code...

Well, that's your opinion. I think the patch and its discussion helped
several people, including myself, understand this code. Since many
people will share its maintenance, I think this is the right way to
handle upstreaming these kinds of things. I don't claim to be
omniscient, just persistent. Getting this protection into upstream
means every Linux user will benefit from what you created, which I
think is awesome; this whole class of refcount flaws goes away. Thank
you for writing it, sharing it, and discussing it!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 25 Apr 2017 at 9:39, Kees Cook wrote:

> On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
> > INT_MAX threads would be needed when the leaking path is locked so
> > that it can only be exercised once and you'll need to get normal
> > (balanced) paths preempted just after the increment. if the leaking
> > path is lockless (can be exercised in parallel without bounds) then
> > 2 threads are enough where the one triggering the signed overflow
> > would have to be preempted while the other one does INT_MAX increments
> > and trigger the UAF. this is where the other mechanisms i talked about
> > in the past become relevant: preemption or interrupts can be disabled
> > or negative refcount values can be detected and acted upon (your blind
> > copy-pasting effort passed upon this latter opportunity by not
> > specializing the 'jo' into 'js' for the refcount case).
> 
> Well, it's not "blind" -- I'm trying to bring the code as-is to
> upstream for discussion/examination with as little functional
> differences as possible so it's easier to compare apples to apples.

you copied code from a version which is at least 2 major kernel revisions
behind (so much for those apples), you chose the one version which had a
bug that you didn't spot nor fix properly, you didn't realize the opportunity
that a special refcount type represents, you claimed refcount underflows
aren't exploitable but copied code that would detect signed underflow, you
didn't understand the limits and edge cases i explained above... need i go
on? doesn't leave one with great confidence in your ability to understand
and maintain this code...



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 25 Apr 2017 at 9:39, Kees Cook wrote:

> On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
> > INT_MAX threads would be needed when the leaking path is locked so
> > that it can only be exercised once and you'll need to get normal
> > (balanced) paths preempted just after the increment. if the leaking
> > path is lockless (can be exercised in parallel without bounds) then
> > 2 threads are enough where the one triggering the signed overflow
> > would have to be preempted while the other one does INT_MAX increments
> > and trigger the UAF. this is where the other mechanisms i talked about
> > in the past become relevant: preemption or interrupts can be disabled
> > or negative refcount values can be detected and acted upon (your blind
> > copy-pasting effort passed upon this latter opportunity by not
> > specializing the 'jo' into 'js' for the refcount case).
> 
> Well, it's not "blind" -- I'm trying to bring the code as-is to
> upstream for discussion/examination with as little functional
> differences as possible so it's easier to compare apples to apples.

you copied code from a version which is at least 2 major kernel revisions
behind (so much for those apples), you chose the one version which had a
bug that you didn't spot nor fix properly, you didn't realize the opportunity
that a special refcount type represents, you claimed refcount underflows
aren't exploitable but copied code that would detect signed underflow, you
didn't understand the limits and edge cases i explained above... need i go
on? doesn't leave one with great confidence in your ability to understand
and maintain this code...



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Kees Cook
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
> INT_MAX threads would be needed when the leaking path is locked so
> that it can only be exercised once and you'll need to get normal
> (balanced) paths preempted just after the increment. if the leaking
> path is lockless (can be exercised in parallel without bounds) then
> 2 threads are enough where the one triggering the signed overflow
> would have to be preempted while the other one does INT_MAX increments
> and trigger the UAF. this is where the other mechanisms i talked about
> in the past become relevant: preemption or interrupts can be disabled
> or negative refcount values can be detected and acted upon (your blind
> copy-pasting effort passed upon this latter opportunity by not
> specializing the 'jo' into 'js' for the refcount case).

Well, it's not "blind" -- I'm trying to bring the code as-is to
upstream for discussion/examination with as little functional
differences as possible so it's easier to compare apples to apples.
(Which already resulted in more eyes looking at the code to find a bug
-- thanks Jann!) But yes, jo -> js hugely increases the coverage. I'll
make that change for v2.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Kees Cook
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
> INT_MAX threads would be needed when the leaking path is locked so
> that it can only be exercised once and you'll need to get normal
> (balanced) paths preempted just after the increment. if the leaking
> path is lockless (can be exercised in parallel without bounds) then
> 2 threads are enough where the one triggering the signed overflow
> would have to be preempted while the other one does INT_MAX increments
> and trigger the UAF. this is where the other mechanisms i talked about
> in the past become relevant: preemption or interrupts can be disabled
> or negative refcount values can be detected and acted upon (your blind
> copy-pasting effort passed upon this latter opportunity by not
> specializing the 'jo' into 'js' for the refcount case).

Well, it's not "blind" -- I'm trying to bring the code as-is to
upstream for discussion/examination with as little functional
differences as possible so it's easier to compare apples to apples.
(Which already resulted in more eyes looking at the code to find a bug
-- thanks Jann!) But yes, jo -> js hugely increases the coverage. I'll
make that change for v2.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Kees Cook
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
> On 25 Apr 2017 at 0:01, Peter Zijlstra wrote:
>> How is the below not useful fodder for an exploit? It might be a less
>> common bug, and perhaps a bit more fiddly to make work, but afaict its
>> still a full use-after-free and therefore useful.
>>
>> ---
>>
>> Thread-AThread-B
>>
>> if(dec_and_test(>ref)) { // true, ref==0
>>
>> inc(>ref) // ref: 0->1
>>
>> kfree(obj);
>> }
>
> ... and tell me why an attacker would let Thread-B do that increment
> (that you're trying to detect) *before* the underlying memory gets
> reused and thus the 0 changed to something else? hint: he'll do everything
> in his power to prevent that, either by winning the race or if there's
> no race (no refcount users outside his control), he'll win every time.
> IOW, checking for 0 is pointless and you kinda proved it yourself now.

Right, having a deterministic protection (checking for overflow) is
best since it stops all exploits using that path. Hoping that an
attacker is unlucky and hits a notification after they've already
landed their corruption is not a very useful defense. It certainly has
a non-zero value, but stopping overflow 100% is better. Especially
when we can do it with no meaningful change in performance which lets
us actually do the atomic_t -> refcount_t conversion everywhere.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Kees Cook
On Tue, Apr 25, 2017 at 4:26 AM, PaX Team  wrote:
> On 25 Apr 2017 at 0:01, Peter Zijlstra wrote:
>> How is the below not useful fodder for an exploit? It might be a less
>> common bug, and perhaps a bit more fiddly to make work, but afaict its
>> still a full use-after-free and therefore useful.
>>
>> ---
>>
>> Thread-AThread-B
>>
>> if(dec_and_test(>ref)) { // true, ref==0
>>
>> inc(>ref) // ref: 0->1
>>
>> kfree(obj);
>> }
>
> ... and tell me why an attacker would let Thread-B do that increment
> (that you're trying to detect) *before* the underlying memory gets
> reused and thus the 0 changed to something else? hint: he'll do everything
> in his power to prevent that, either by winning the race or if there's
> no race (no refcount users outside his control), he'll win every time.
> IOW, checking for 0 is pointless and you kinda proved it yourself now.

Right, having a deterministic protection (checking for overflow) is
best since it stops all exploits using that path. Hoping that an
attacker is unlucky and hits a notification after they've already
landed their corruption is not a very useful defense. It certainly has
a non-zero value, but stopping overflow 100% is better. Especially
when we can do it with no meaningful change in performance which lets
us actually do the atomic_t -> refcount_t conversion everywhere.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 24 Apr 2017 at 13:33, Kees Cook wrote:

> On Mon, Apr 24, 2017 at 4:00 AM, PaX Team  wrote:
> > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> >
> >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> >> > This patch ports the x86-specific atomic overflow handling from PaX's
> >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> >> > from PaX that eliminates the saturation race condition by resetting the
> >> > atomic counter back to the INT_MAX saturation value on both overflow and
> >> > underflow. To win a race, a system would have to have INT_MAX threads
> >> > simultaneously overflow before the saturation handler runs.
> >
> > note that the above is wrong (and even contradicting itself and the code).
> 
> True, this changelog could be more accurate (it resets to INT_MAX on
> overflow and INT_MIN on underflow). I think I'm right in saying that a
> system would need INT_MAX threads running a refcount_inc() (and a
> refcount_dec_and_test() at exactly the right moment) before the reset
> handler got scheduled, though, yes?

there's no uniform answer to this as there're several conditions
that can affect the effectiveness of the refcount protection.

e.g., how many independent leaking paths can the attacker exercise
(typically one), are these paths under some kind of locks (would
already prevent unbounded leaks/increments should the overflow
detecting thread be preempted), are negative refcounts allowed and
checked for or only signed overflow, etc.

INT_MAX threads would be needed when the leaking path is locked so
that it can only be exercised once and you'll need to get normal
(balanced) paths preempted just after the increment. if the leaking
path is lockless (can be exercised in parallel without bounds) then
2 threads are enough where the one triggering the signed overflow
would have to be preempted while the other one does INT_MAX increments
and trigger the UAF. this is where the other mechanisms i talked about
in the past become relevant: preemption or interrupts can be disabled
or negative refcount values can be detected and acted upon (your blind
copy-pasting effort passed upon this latter opportunity by not
specializing the 'jo' into 'js' for the refcount case).



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 24 Apr 2017 at 13:33, Kees Cook wrote:

> On Mon, Apr 24, 2017 at 4:00 AM, PaX Team  wrote:
> > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> >
> >> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> >> > This patch ports the x86-specific atomic overflow handling from PaX's
> >> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> >> > from PaX that eliminates the saturation race condition by resetting the
> >> > atomic counter back to the INT_MAX saturation value on both overflow and
> >> > underflow. To win a race, a system would have to have INT_MAX threads
> >> > simultaneously overflow before the saturation handler runs.
> >
> > note that the above is wrong (and even contradicting itself and the code).
> 
> True, this changelog could be more accurate (it resets to INT_MAX on
> overflow and INT_MIN on underflow). I think I'm right in saying that a
> system would need INT_MAX threads running a refcount_inc() (and a
> refcount_dec_and_test() at exactly the right moment) before the reset
> handler got scheduled, though, yes?

there's no uniform answer to this as there're several conditions
that can affect the effectiveness of the refcount protection.

e.g., how many independent leaking paths can the attacker exercise
(typically one), are these paths under some kind of locks (would
already prevent unbounded leaks/increments should the overflow
detecting thread be preempted), are negative refcounts allowed and
checked for or only signed overflow, etc.

INT_MAX threads would be needed when the leaking path is locked so
that it can only be exercised once and you'll need to get normal
(balanced) paths preempted just after the increment. if the leaking
path is lockless (can be exercised in parallel without bounds) then
2 threads are enough where the one triggering the signed overflow
would have to be preempted while the other one does INT_MAX increments
and trigger the UAF. this is where the other mechanisms i talked about
in the past become relevant: preemption or interrupts can be disabled
or negative refcount values can be detected and acted upon (your blind
copy-pasting effort passed upon this latter opportunity by not
specializing the 'jo' into 'js' for the refcount case).



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 25 Apr 2017 at 12:23, Peter Zijlstra wrote:

> So what avoids this:

simple, you noted it yourself in your previous mail:

> Well, your setup (panic_on_warn et al) would have it panic the box. That
> will effectively stop the exploit by virtue of stopping everything.

with that in mind the actual code looks like this:

>   CPU0CPU1
>
>
>   lock inc %[val]; # 0x7fff
>   jo  2f
>1: ...
>
>   lock dec %[val]; # 0x8000
>   jo  2f
>   1:  ...
>
>
>
>
>2: mov $0x7fff, %[val]

panic()

>   jmp 1b
>
>   2:  mov $0x8000, %[val]

panic()

>   jmp 1b
>

... and we never get this far.



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 25 Apr 2017 at 12:23, Peter Zijlstra wrote:

> So what avoids this:

simple, you noted it yourself in your previous mail:

> Well, your setup (panic_on_warn et al) would have it panic the box. That
> will effectively stop the exploit by virtue of stopping everything.

with that in mind the actual code looks like this:

>   CPU0CPU1
>
>
>   lock inc %[val]; # 0x7fff
>   jo  2f
>1: ...
>
>   lock dec %[val]; # 0x8000
>   jo  2f
>   1:  ...
>
>
>
>
>2: mov $0x7fff, %[val]

panic()

>   jmp 1b
>
>   2:  mov $0x8000, %[val]

panic()

>   jmp 1b
>

... and we never get this far.



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 25 Apr 2017 at 0:01, Peter Zijlstra wrote:

> On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
> > I think we're way off in the weeds here. The "cannot inc from 0" check
> > is about general sanity checks on refcounts.
> 
> I disagree, although sanity check are good too.

exactly, an attacker doesn't care how a premature free occurs due
to reaching a 0 refcount, afterwards it's memory corruption time for
both old and new references regardless.

> > However, what the refcount hardening protection is trying to do is
> > protect again the exploitable condition: overflow.
> 
> Sure..

underflow is also exploitable, it's just much harder to defend against
(there're no known practical solutions).

> > Inc-from-0 isn't an exploitable condition since in theory
> > the memory suddenly becomes correctly managed again.
> 
> It does not. It just got free'ed. Nothing will stop the free from
> happening (or already having happened).

now hold this thought...

> How is the below not useful fodder for an exploit? It might be a less
> common bug, and perhaps a bit more fiddly to make work, but afaict its
> still a full use-after-free and therefore useful.
> 
> ---
> 
> Thread-AThread-B
> 
> if(dec_and_test(>ref)) { // true, ref==0
> 
> inc(>ref) // ref: 0->1
> 
> kfree(obj);
> }

... and tell me why an attacker would let Thread-B do that increment
(that you're trying to detect) *before* the underlying memory gets
reused and thus the 0 changed to something else? hint: he'll do everything
in his power to prevent that, either by winning the race or if there's
no race (no refcount users outside his control), he'll win every time.
IOW, checking for 0 is pointless and you kinda proved it yourself now.



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread PaX Team
On 25 Apr 2017 at 0:01, Peter Zijlstra wrote:

> On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
> > I think we're way off in the weeds here. The "cannot inc from 0" check
> > is about general sanity checks on refcounts.
> 
> I disagree, although sanity check are good too.

exactly, an attacker doesn't care how a premature free occurs due
to reaching a 0 refcount, afterwards it's memory corruption time for
both old and new references regardless.

> > However, what the refcount hardening protection is trying to do is
> > protect again the exploitable condition: overflow.
> 
> Sure..

underflow is also exploitable, it's just much harder to defend against
(there're no known practical solutions).

> > Inc-from-0 isn't an exploitable condition since in theory
> > the memory suddenly becomes correctly managed again.
> 
> It does not. It just got free'ed. Nothing will stop the free from
> happening (or already having happened).

now hold this thought...

> How is the below not useful fodder for an exploit? It might be a less
> common bug, and perhaps a bit more fiddly to make work, but afaict its
> still a full use-after-free and therefore useful.
> 
> ---
> 
> Thread-AThread-B
> 
> if(dec_and_test(>ref)) { // true, ref==0
> 
> inc(>ref) // ref: 0->1
> 
> kfree(obj);
> }

... and tell me why an attacker would let Thread-B do that increment
(that you're trying to detect) *before* the underlying memory gets
reused and thus the 0 changed to something else? hint: he'll do everything
in his power to prevent that, either by winning the race or if there's
no race (no refcount users outside his control), he'll win every time.
IOW, checking for 0 is pointless and you kinda proved it yourself now.



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> +static __always_inline void refcount_inc(refcount_t *r)
> +{
> + asm volatile(LOCK_PREFIX "incl %0\n\t"
> + REFCOUNT_CHECK_OVERFLOW(4)
> + : [counter] "+m" (r->refs.counter)
> + : : "cc", "cx");
> +}
> +
> +static __always_inline void refcount_dec(refcount_t *r)
> +{
> + asm volatile(LOCK_PREFIX "decl %0\n\t"
> + REFCOUNT_CHECK_UNDERFLOW(4)
> + : [counter] "+m" (r->refs.counter)
> + : : "cc", "cx");
> +}

> +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code)
> +{
> + const char *str = NULL;
> +
> + BUG_ON(!(regs->flags & X86_EFLAGS_OF));
> +
> +#define range_check(size, direction, type, value) \
> + if ((unsigned long)__##size##_##direction##_start <= regs->ip && \
> + regs->ip < (unsigned long)__##size##_##direction##_end) { \
> + *(type *)regs->cx = value; \
> + str = #size " " #direction; \
> + }
> +
> + range_check(refcount,   overflow,  int, INT_MAX)
> + range_check(refcount,   underflow, int, INT_MIN)
> +
> +#undef range_check
> +
> + BUG_ON(!str);
> + do_error_trap(regs, error_code, (char *)str, X86_REFCOUNT_VECTOR,
> +   SIGILL);
> +}
> +#endif


So what avoids this:

CPU0CPU1


lock inc %[val]; # 0x7fff
jo  2f
1:  ...

lock dec %[val]; # 0x8000
jo  2f
1:  ...




2:  mov $0x7fff, %[val]
jmp 1b

2:  mov $0x8000, %[val]
jmp 1b




//


lock inc %val; #0x8000


lock inc %val; 0x
lock inc %val; 0x





Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> +static __always_inline void refcount_inc(refcount_t *r)
> +{
> + asm volatile(LOCK_PREFIX "incl %0\n\t"
> + REFCOUNT_CHECK_OVERFLOW(4)
> + : [counter] "+m" (r->refs.counter)
> + : : "cc", "cx");
> +}
> +
> +static __always_inline void refcount_dec(refcount_t *r)
> +{
> + asm volatile(LOCK_PREFIX "decl %0\n\t"
> + REFCOUNT_CHECK_UNDERFLOW(4)
> + : [counter] "+m" (r->refs.counter)
> + : : "cc", "cx");
> +}

> +dotraplinkage void do_refcount_error(struct pt_regs *regs, long error_code)
> +{
> + const char *str = NULL;
> +
> + BUG_ON(!(regs->flags & X86_EFLAGS_OF));
> +
> +#define range_check(size, direction, type, value) \
> + if ((unsigned long)__##size##_##direction##_start <= regs->ip && \
> + regs->ip < (unsigned long)__##size##_##direction##_end) { \
> + *(type *)regs->cx = value; \
> + str = #size " " #direction; \
> + }
> +
> + range_check(refcount,   overflow,  int, INT_MAX)
> + range_check(refcount,   underflow, int, INT_MIN)
> +
> +#undef range_check
> +
> + BUG_ON(!str);
> + do_error_trap(regs, error_code, (char *)str, X86_REFCOUNT_VECTOR,
> +   SIGILL);
> +}
> +#endif


So what avoids this:

CPU0CPU1


lock inc %[val]; # 0x7fff
jo  2f
1:  ...

lock dec %[val]; # 0x8000
jo  2f
1:  ...




2:  mov $0x7fff, %[val]
jmp 1b

2:  mov $0x8000, %[val]
jmp 1b




//


lock inc %val; #0x8000


lock inc %val; 0x
lock inc %val; 0x





Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 03:37:32PM -0700, Kees Cook wrote:
> On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra  wrote:

> > It does not. It just got free'ed. Nothing will stop the free from
> > happening (or already having happened).
> 
> Well, yes, but that's kind of my point. Detecting inc-from-0 is "too
> late" to offer a protection. It offers notification of a bug, rather
> than stopping an exploit from happening.

Well, your setup (panic_on_warn et al) would have it panic the box. That
will effectively stop the exploit by virtue of stopping everything.

And warn/bug/panic etc.. are I think a better option that silently
letting it happen.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-25 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 03:37:32PM -0700, Kees Cook wrote:
> On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra  wrote:

> > It does not. It just got free'ed. Nothing will stop the free from
> > happening (or already having happened).
> 
> Well, yes, but that's kind of my point. Detecting inc-from-0 is "too
> late" to offer a protection. It offers notification of a bug, rather
> than stopping an exploit from happening.

Well, your setup (panic_on_warn et al) would have it panic the box. That
will effectively stop the exploit by virtue of stopping everything.

And warn/bug/panic etc.. are I think a better option that silently
letting it happen.


Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Rik van Riel
On Mon, 2017-04-24 at 15:37 -0700, Kees Cook wrote:
> On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra  > wrote:
> > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
> > > I think we're way off in the weeds here. The "cannot inc from 0"
> > > check
> > > is about general sanity checks on refcounts.
> > 
> > I disagree, although sanity check are good too.
> > 
> > > It should never happen, and if it does, there's a bug.
> > 
> > The very same is true of the overflow thing.
> > 
> > > However, what the refcount hardening protection is trying to do
> > > is
> > > protect again the exploitable condition: overflow.
> > 
> > Sure..
> > 
> > > Inc-from-0 isn't an exploitable condition since in theory
> > > the memory suddenly becomes correctly managed again.
> > 
> > It does not. It just got free'ed. Nothing will stop the free from
> > happening (or already having happened).
> 
> Well, yes, but that's kind of my point. Detecting inc-from-0 is "too
> late" to offer a protection. It offers notification of a bug, rather
> than stopping an exploit from happening.

inc-from-0 could allow the attacker to gain access to
an object which gets allocated to a new user afterwards.

Certainly much less useful as an exploit, but still a
potential privilege escalation.


Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Rik van Riel
On Mon, 2017-04-24 at 15:37 -0700, Kees Cook wrote:
> On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra  > wrote:
> > On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
> > > I think we're way off in the weeds here. The "cannot inc from 0"
> > > check
> > > is about general sanity checks on refcounts.
> > 
> > I disagree, although sanity check are good too.
> > 
> > > It should never happen, and if it does, there's a bug.
> > 
> > The very same is true of the overflow thing.
> > 
> > > However, what the refcount hardening protection is trying to do
> > > is
> > > protect again the exploitable condition: overflow.
> > 
> > Sure..
> > 
> > > Inc-from-0 isn't an exploitable condition since in theory
> > > the memory suddenly becomes correctly managed again.
> > 
> > It does not. It just got free'ed. Nothing will stop the free from
> > happening (or already having happened).
> 
> Well, yes, but that's kind of my point. Detecting inc-from-0 is "too
> late" to offer a protection. It offers notification of a bug, rather
> than stopping an exploit from happening.

inc-from-0 could allow the attacker to gain access to
an object which gets allocated to a new user afterwards.

Certainly much less useful as an exploit, but still a
potential privilege escalation.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra  wrote:
> On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
>> I think we're way off in the weeds here. The "cannot inc from 0" check
>> is about general sanity checks on refcounts.
>
> I disagree, although sanity check are good too.
>
>> It should never happen, and if it does, there's a bug.
>
> The very same is true of the overflow thing.
>
>> However, what the refcount hardening protection is trying to do is
>> protect again the exploitable condition: overflow.
>
> Sure..
>
>> Inc-from-0 isn't an exploitable condition since in theory
>> the memory suddenly becomes correctly managed again.
>
> It does not. It just got free'ed. Nothing will stop the free from
> happening (or already having happened).

Well, yes, but that's kind of my point. Detecting inc-from-0 is "too
late" to offer a protection. It offers notification of a bug, rather
than stopping an exploit from happening.

>> We're just discussing different things.
>
> No, both are bugs, neither overflow not inc-from-zero _should_ happen.
> The whole point is that code is buggy. If it weren't for that, we'd not
> be doing this.
>
> How is the below not useful fodder for an exploit? It might be a less
> common bug, and perhaps a bit more fiddly to make work, but afaict its
> still a full use-after-free and therefore useful.
>
> ---
>
> Thread-AThread-B
>
> if(dec_and_test(>ref)) { // true, ref==0
>
> inc(>ref) // ref: 0->1
>
> kfree(obj);
> }
>
>
>
> ~~~/ Thread-C /~~~
>
> obj = kmalloc(); // is obj from Thread-A
>
> obj->ref = 1;
> obj->func = 
>
>
> obj->func();
>
> -- OR --
>
> put(obj);
>   if (dec_and_test(>ref))
> kfree(obj); // which was 
> of Thread-C

Right, both are bugs, but the exploitable condition is "refcount hit
zero and got freed while there were still things using it". Having too
many put()s that cause the refcount to reach zero early can't be
detected, but having too many get()s that wrap around, ultimately to
zero, can be (the overflow condition). With overflow protection, the
refcount can never (realistically) hit zero since the refcount will
get saturated at INT_MAX, so no exploit is possible -- no memory is
ever freed (it just leaks the allocation). The inc-from-zero case
performing a notification is certainly nice, but the exploit already
happened.

I want the overflow protection to work everywhere the kernel uses
refcounts, which means dealing with performance concerns from mm, net,
block, etc. Since this is a 1 instruction change (which branch
prediction should make almost no cost at all), this will easily
address any performance concerns from the other subsystems.

I'd rather have the overflow protection everywhere than only in some
areas, and I want to break the deadlock of this catch-22 of subsystems
not being able to say what benchmarks are important to them but
refusing to switch to refcount_t due to performance concerns.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 3:01 PM, Peter Zijlstra  wrote:
> On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
>> I think we're way off in the weeds here. The "cannot inc from 0" check
>> is about general sanity checks on refcounts.
>
> I disagree, although sanity check are good too.
>
>> It should never happen, and if it does, there's a bug.
>
> The very same is true of the overflow thing.
>
>> However, what the refcount hardening protection is trying to do is
>> protect again the exploitable condition: overflow.
>
> Sure..
>
>> Inc-from-0 isn't an exploitable condition since in theory
>> the memory suddenly becomes correctly managed again.
>
> It does not. It just got free'ed. Nothing will stop the free from
> happening (or already having happened).

Well, yes, but that's kind of my point. Detecting inc-from-0 is "too
late" to offer a protection. It offers notification of a bug, rather
than stopping an exploit from happening.

>> We're just discussing different things.
>
> No, both are bugs, neither overflow not inc-from-zero _should_ happen.
> The whole point is that code is buggy. If it weren't for that, we'd not
> be doing this.
>
> How is the below not useful fodder for an exploit? It might be a less
> common bug, and perhaps a bit more fiddly to make work, but afaict its
> still a full use-after-free and therefore useful.
>
> ---
>
> Thread-AThread-B
>
> if(dec_and_test(>ref)) { // true, ref==0
>
> inc(>ref) // ref: 0->1
>
> kfree(obj);
> }
>
>
>
> ~~~/ Thread-C /~~~
>
> obj = kmalloc(); // is obj from Thread-A
>
> obj->ref = 1;
> obj->func = 
>
>
> obj->func();
>
> -- OR --
>
> put(obj);
>   if (dec_and_test(>ref))
> kfree(obj); // which was 
> of Thread-C

Right, both are bugs, but the exploitable condition is "refcount hit
zero and got freed while there were still things using it". Having too
many put()s that cause the refcount to reach zero early can't be
detected, but having too many get()s that wrap around, ultimately to
zero, can be (the overflow condition). With overflow protection, the
refcount can never (realistically) hit zero since the refcount will
get saturated at INT_MAX, so no exploit is possible -- no memory is
ever freed (it just leaks the allocation). The inc-from-zero case
performing a notification is certainly nice, but the exploit already
happened.

I want the overflow protection to work everywhere the kernel uses
refcounts, which means dealing with performance concerns from mm, net,
block, etc. Since this is a 1 instruction change (which branch
prediction should make almost no cost at all), this will easily
address any performance concerns from the other subsystems.

I'd rather have the overflow protection everywhere than only in some
areas, and I want to break the deadlock of this catch-22 of subsystems
not being able to say what benchmarks are important to them but
refusing to switch to refcount_t due to performance concerns.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
> I think we're way off in the weeds here. The "cannot inc from 0" check
> is about general sanity checks on refcounts.

I disagree, although sanity check are good too.

> It should never happen, and if it does, there's a bug.

The very same is true of the overflow thing.

> However, what the refcount hardening protection is trying to do is
> protect again the exploitable condition: overflow.

Sure..

> Inc-from-0 isn't an exploitable condition since in theory
> the memory suddenly becomes correctly managed again.

It does not. It just got free'ed. Nothing will stop the free from
happening (or already having happened).

> We're just discussing different things.

No, both are bugs, neither overflow not inc-from-zero _should_ happen.
The whole point is that code is buggy. If it weren't for that, we'd not
be doing this.

How is the below not useful fodder for an exploit? It might be a less
common bug, and perhaps a bit more fiddly to make work, but afaict its
still a full use-after-free and therefore useful.

---

Thread-AThread-B

if(dec_and_test(>ref)) { // true, ref==0

inc(>ref) // ref: 0->1

kfree(obj);
}



~~~/ Thread-C /~~~

obj = kmalloc(); // is obj from Thread-A

obj->ref = 1;
obj->func = 


obj->func();

-- OR --

put(obj);
  if (dec_and_test(>ref))
kfree(obj); // which was of 
Thread-C


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 01:40:56PM -0700, Kees Cook wrote:
> I think we're way off in the weeds here. The "cannot inc from 0" check
> is about general sanity checks on refcounts.

I disagree, although sanity check are good too.

> It should never happen, and if it does, there's a bug.

The very same is true of the overflow thing.

> However, what the refcount hardening protection is trying to do is
> protect again the exploitable condition: overflow.

Sure..

> Inc-from-0 isn't an exploitable condition since in theory
> the memory suddenly becomes correctly managed again.

It does not. It just got free'ed. Nothing will stop the free from
happening (or already having happened).

> We're just discussing different things.

No, both are bugs, neither overflow not inc-from-zero _should_ happen.
The whole point is that code is buggy. If it weren't for that, we'd not
be doing this.

How is the below not useful fodder for an exploit? It might be a less
common bug, and perhaps a bit more fiddly to make work, but afaict its
still a full use-after-free and therefore useful.

---

Thread-AThread-B

if(dec_and_test(>ref)) { // true, ref==0

inc(>ref) // ref: 0->1

kfree(obj);
}



~~~/ Thread-C /~~~

obj = kmalloc(); // is obj from Thread-A

obj->ref = 1;
obj->func = 


obj->func();

-- OR --

put(obj);
  if (dec_and_test(>ref))
kfree(obj); // which was of 
Thread-C


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 8:15 AM, PaX Team  wrote:
> On 24 Apr 2017 at 15:33, Peter Zijlstra wrote:
>> On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote:
>> > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:
>> > that was exactly my point: all this applies to you as well. so let me ask
>> > the 3rd time: what is your "argument for correctness" for a 0 refcount
>> > value check? how does it prevent exploitation?
>>
>> I think I've explained that before; per reference count rules 0 means
>> freed (or about to be freed when we talk RCU).
>
> you only said the same thing, what 0 means. you (still) didn't explain how
> checking for it prevents exploitation.
>
>> The whole pattern:
>>
>>  if (dec_and_test(>ref))
>>   kfree(obj);
>>
>> expresses this etc.. Other reference counts also do this. No references
>> means its getting freed.
>>
>> Can you agree with this?
>
> sure, so far so good.
>
>> If so; any attempt to increase the reference count while its (being)
>> freed() is a use-after-free.
>
> why would ever be there such an attempt? a freed object with intact memory
> content is as useful for an attacker as a live one, that is, not at all.

I think we're way off in the weeds here. The "cannot inc from 0" check
is about general sanity checks on refcounts. It should never happen,
and if it does, there's a bug. However, what the refcount hardening
protection is trying to do is protect again the exploitable condition:
overflow. Inc-from-0 isn't an exploitable condition since in theory
the memory suddenly becomes correctly managed again. We're just
discussing different things.

The point is to have very fast refcount_t that protects against
overflow so the mm, net, and block subsystems aren't worried about
making the atomic_t -> refcount_t changes there.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 8:15 AM, PaX Team  wrote:
> On 24 Apr 2017 at 15:33, Peter Zijlstra wrote:
>> On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote:
>> > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:
>> > that was exactly my point: all this applies to you as well. so let me ask
>> > the 3rd time: what is your "argument for correctness" for a 0 refcount
>> > value check? how does it prevent exploitation?
>>
>> I think I've explained that before; per reference count rules 0 means
>> freed (or about to be freed when we talk RCU).
>
> you only said the same thing, what 0 means. you (still) didn't explain how
> checking for it prevents exploitation.
>
>> The whole pattern:
>>
>>  if (dec_and_test(>ref))
>>   kfree(obj);
>>
>> expresses this etc.. Other reference counts also do this. No references
>> means its getting freed.
>>
>> Can you agree with this?
>
> sure, so far so good.
>
>> If so; any attempt to increase the reference count while its (being)
>> freed() is a use-after-free.
>
> why would ever be there such an attempt? a freed object with intact memory
> content is as useful for an attacker as a live one, that is, not at all.

I think we're way off in the weeds here. The "cannot inc from 0" check
is about general sanity checks on refcounts. It should never happen,
and if it does, there's a bug. However, what the refcount hardening
protection is trying to do is protect again the exploitable condition:
overflow. Inc-from-0 isn't an exploitable condition since in theory
the memory suddenly becomes correctly managed again. We're just
discussing different things.

The point is to have very fast refcount_t that protects against
overflow so the mm, net, and block subsystems aren't worried about
making the atomic_t -> refcount_t changes there.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 4:00 AM, PaX Team  wrote:
> On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
>
>> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> > This patch ports the x86-specific atomic overflow handling from PaX's
>> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
>> > from PaX that eliminates the saturation race condition by resetting the
>> > atomic counter back to the INT_MAX saturation value on both overflow and
>> > underflow. To win a race, a system would have to have INT_MAX threads
>> > simultaneously overflow before the saturation handler runs.
>
> note that the above is wrong (and even contradicting itself and the code).

True, this changelog could be more accurate (it resets to INT_MAX on
overflow and INT_MIN on underflow). I think I'm right in saying that a
system would need INT_MAX threads running a refcount_inc() (and a
refcount_dec_and_test() at exactly the right moment) before the reset
handler got scheduled, though, yes?

I'll attempt to clarify this.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 4:00 AM, PaX Team  wrote:
> On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
>
>> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> > This patch ports the x86-specific atomic overflow handling from PaX's
>> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
>> > from PaX that eliminates the saturation race condition by resetting the
>> > atomic counter back to the INT_MAX saturation value on both overflow and
>> > underflow. To win a race, a system would have to have INT_MAX threads
>> > simultaneously overflow before the saturation handler runs.
>
> note that the above is wrong (and even contradicting itself and the code).

True, this changelog could be more accurate (it resets to INT_MAX on
overflow and INT_MIN on underflow). I think I'm right in saying that a
system would need INT_MAX threads running a refcount_inc() (and a
refcount_dec_and_test() at exactly the right moment) before the reset
handler got scheduled, though, yes?

I'll attempt to clarify this.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 3:48 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c
>> index e3f4cd8876b5..1bdafb29b802 100644
>> --- a/drivers/misc/lkdtm_bugs.c
>> +++ b/drivers/misc/lkdtm_bugs.c
>> @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void)
>>   schedule();
>>  }
>>
>> +#ifdef CONFIG_FAST_REFCOUNT
>> +#define REFCOUNT_MAX INT_MAX
>> +#else
>> +#define REFCOUNT_MAX UINT_MAX
>> +#endif
>
> That doesn't seem like a sensible place for this.

I'll drop the LKDTM changes from this particular patch. As for the
define, I think it's only interesting to LKDTM since it's the only
part interested in refcount_t internals. (i.e. nothing else would (or
should) use this information.)

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 3:48 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c
>> index e3f4cd8876b5..1bdafb29b802 100644
>> --- a/drivers/misc/lkdtm_bugs.c
>> +++ b/drivers/misc/lkdtm_bugs.c
>> @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void)
>>   schedule();
>>  }
>>
>> +#ifdef CONFIG_FAST_REFCOUNT
>> +#define REFCOUNT_MAX INT_MAX
>> +#else
>> +#define REFCOUNT_MAX UINT_MAX
>> +#endif
>
> That doesn't seem like a sensible place for this.

I'll drop the LKDTM changes from this particular patch. As for the
define, I think it's only interesting to LKDTM since it's the only
part interested in refcount_t internals. (i.e. nothing else would (or
should) use this information.)

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 3:45 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t 
>> *r)
>> +{
>> + const int a = 1;
>> + const int u = 0;
>> + int c, old;
>> +
>> + c = atomic_read(&(r->refs));
>> + for (;;) {
>> + if (unlikely(c == (u)))
>> + break;
>> + old = atomic_cmpxchg(&(r->refs), c, c + (a));
>
> Please use atomic_try_cmpxchg(), that generates saner code.

Ah-ha, thanks. I actually copied this directly out of the existing
atomic_t function, so we should probably update it there too.

-Kees

>
>> + if (likely(old == c))
>> + break;
>> + c = old;
>> + }
>> + return c != u;
>> +}



-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 3:45 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t 
>> *r)
>> +{
>> + const int a = 1;
>> + const int u = 0;
>> + int c, old;
>> +
>> + c = atomic_read(&(r->refs));
>> + for (;;) {
>> + if (unlikely(c == (u)))
>> + break;
>> + old = atomic_cmpxchg(&(r->refs), c, c + (a));
>
> Please use atomic_try_cmpxchg(), that generates saner code.

Ah-ha, thanks. I actually copied this directly out of the existing
atomic_t function, so we should probably update it there too.

-Kees

>
>> + if (likely(old == c))
>> + break;
>> + c = old;
>> + }
>> + return c != u;
>> +}



-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 1:32 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> This patch ports the x86-specific atomic overflow handling from PaX's
>> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
>> from PaX that eliminates the saturation race condition by resetting the
>> atomic counter back to the INT_MAX saturation value on both overflow and
>> underflow. To win a race, a system would have to have INT_MAX threads
>> simultaneously overflow before the saturation handler runs.
>
> And is this impossible? Highly unlikely I'll grant you, but absolutely
> impossible?

I'll adjust the language. "Highly unlikely" is still better than
"trivially doable with a single thread". :)

> Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> O(nr_tasks + 3*nr_cpus).
>
> Because PaX does it, is not a correctness argument. And this really
> wants one.

Sure, I didn't mean to imply anything other than a demonstration of
what PaX is doing (and that it's better than not having it). If we can
improve it, that's great.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Kees Cook
On Mon, Apr 24, 2017 at 1:32 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> This patch ports the x86-specific atomic overflow handling from PaX's
>> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
>> from PaX that eliminates the saturation race condition by resetting the
>> atomic counter back to the INT_MAX saturation value on both overflow and
>> underflow. To win a race, a system would have to have INT_MAX threads
>> simultaneously overflow before the saturation handler runs.
>
> And is this impossible? Highly unlikely I'll grant you, but absolutely
> impossible?

I'll adjust the language. "Highly unlikely" is still better than
"trivially doable with a single thread". :)

> Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> O(nr_tasks + 3*nr_cpus).
>
> Because PaX does it, is not a correctness argument. And this really
> wants one.

Sure, I didn't mean to imply anything other than a demonstration of
what PaX is doing (and that it's better than not having it). If we can
improve it, that's great.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread PaX Team
On 24 Apr 2017 at 15:33, Peter Zijlstra wrote:

> On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote:
> > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:
> > 
> > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> > > 
> > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > > > > O(nr_tasks + 3*nr_cpus).
> > > > 
> > > > what does nr_cpus have to do with winning the race?
> > > 
> > > The CPUs could each run nested softirq/hardirq/nmi context poking at the
> > > refcount, irrespective of the (preempted) task context.
> > 
> > that's fine but are you also assuming that the code executed in each of
> > those contexts leaks the same refcount? otherwise whatever they do to the
> > refcount is no more relevant than a non-leaking preemptible path that runs
> > to completion in a bounded amount of time (i.e., you get temporary bumps
> > and thus need to win yet another set of races to get their effects at once).
> 
> For worst case analysis we have to assume it does, unless we can proof
> it doesn't. And that proof is very very hard, and would need to be
> redone every time the kernel changes.

for worst case analysis you need to show the existence of an amd64 system that
can spawn 2G tasks. then you'll have to show the feasibility of making all of
them get preempted (without a later reschedule) inside a 2 insn window.

> > that was exactly my point: all this applies to you as well. so let me ask
> > the 3rd time: what is your "argument for correctness" for a 0 refcount
> > value check? how does it prevent exploitation?
> 
> What 0 count check are you talking about, the one that triggers when we
> want to increment 0 ?

are there any other 0 checks in there?

> I think I've explained that before; per reference count rules 0 means
> freed (or about to be freed when we talk RCU).

you only said the same thing, what 0 means. you (still) didn't explain how
checking for it prevents exploitation.

> The whole pattern:
> 
>  if (dec_and_test(>ref))
>   kfree(obj);
> 
> expresses this etc.. Other reference counts also do this. No references
> means its getting freed.
> 
> Can you agree with this?

sure, so far so good.

> If so; any attempt to increase the reference count while its (being)
> freed() is a use-after-free.

why would ever be there such an attempt? a freed object with intact memory
content is as useful for an attacker as a live one, that is, not at all.




Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread PaX Team
On 24 Apr 2017 at 15:33, Peter Zijlstra wrote:

> On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote:
> > On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:
> > 
> > > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> > > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> > > 
> > > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > > > > O(nr_tasks + 3*nr_cpus).
> > > > 
> > > > what does nr_cpus have to do with winning the race?
> > > 
> > > The CPUs could each run nested softirq/hardirq/nmi context poking at the
> > > refcount, irrespective of the (preempted) task context.
> > 
> > that's fine but are you also assuming that the code executed in each of
> > those contexts leaks the same refcount? otherwise whatever they do to the
> > refcount is no more relevant than a non-leaking preemptible path that runs
> > to completion in a bounded amount of time (i.e., you get temporary bumps
> > and thus need to win yet another set of races to get their effects at once).
> 
> For worst case analysis we have to assume it does, unless we can proof
> it doesn't. And that proof is very very hard, and would need to be
> redone every time the kernel changes.

for worst case analysis you need to show the existence of an amd64 system that
can spawn 2G tasks. then you'll have to show the feasibility of making all of
them get preempted (without a later reschedule) inside a 2 insn window.

> > that was exactly my point: all this applies to you as well. so let me ask
> > the 3rd time: what is your "argument for correctness" for a 0 refcount
> > value check? how does it prevent exploitation?
> 
> What 0 count check are you talking about, the one that triggers when we
> want to increment 0 ?

are there any other 0 checks in there?

> I think I've explained that before; per reference count rules 0 means
> freed (or about to be freed when we talk RCU).

you only said the same thing, what 0 means. you (still) didn't explain how
checking for it prevents exploitation.

> The whole pattern:
> 
>  if (dec_and_test(>ref))
>   kfree(obj);
> 
> expresses this etc.. Other reference counts also do this. No references
> means its getting freed.
> 
> Can you agree with this?

sure, so far so good.

> If so; any attempt to increase the reference count while its (being)
> freed() is a use-after-free.

why would ever be there such an attempt? a freed object with intact memory
content is as useful for an attacker as a live one, that is, not at all.




Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote:
> On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:
> 
> > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> > 
> > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > > > O(nr_tasks + 3*nr_cpus).
> > > 
> > > what does nr_cpus have to do with winning the race?
> > 
> > The CPUs could each run nested softirq/hardirq/nmi context poking at the
> > refcount, irrespective of the (preempted) task context.
> 
> that's fine but are you also assuming that the code executed in each of
> those contexts leaks the same refcount? otherwise whatever they do to the
> refcount is no more relevant than a non-leaking preemptible path that runs
> to completion in a bounded amount of time (i.e., you get temporary bumps
> and thus need to win yet another set of races to get their effects at once).

For worst case analysis we have to assume it does, unless we can proof
it doesn't. And that proof is very very hard, and would need to be
redone every time the kernel changes.

> that was exactly my point: all this applies to you as well. so let me ask
> the 3rd time: what is your "argument for correctness" for a 0 refcount
> value check? how does it prevent exploitation?

What 0 count check are you talking about, the one that triggers when we
want to increment 0 ?

I think I've explained that before; per reference count rules 0 means
freed (or about to be freed when we talk RCU).

The whole pattern:

if (dec_and_test(>ref))
kfree(obj);

expresses this etc.. Other reference counts also do this. No references
means its getting freed.

Can you agree with this?


If so; any attempt to increase the reference count while its (being)
freed() is a use-after-free.

Therefore we disallow 0 increment.


Yes, this is an annoyance when you consider usage-counts, where 0 means
something else. But then, we were talking about reference counts, not
something else.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 03:08:20PM +0200, PaX Team wrote:
> On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:
> 
> > On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> > > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> > 
> > > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > > > O(nr_tasks + 3*nr_cpus).
> > > 
> > > what does nr_cpus have to do with winning the race?
> > 
> > The CPUs could each run nested softirq/hardirq/nmi context poking at the
> > refcount, irrespective of the (preempted) task context.
> 
> that's fine but are you also assuming that the code executed in each of
> those contexts leaks the same refcount? otherwise whatever they do to the
> refcount is no more relevant than a non-leaking preemptible path that runs
> to completion in a bounded amount of time (i.e., you get temporary bumps
> and thus need to win yet another set of races to get their effects at once).

For worst case analysis we have to assume it does, unless we can proof
it doesn't. And that proof is very very hard, and would need to be
redone every time the kernel changes.

> that was exactly my point: all this applies to you as well. so let me ask
> the 3rd time: what is your "argument for correctness" for a 0 refcount
> value check? how does it prevent exploitation?

What 0 count check are you talking about, the one that triggers when we
want to increment 0 ?

I think I've explained that before; per reference count rules 0 means
freed (or about to be freed when we talk RCU).

The whole pattern:

if (dec_and_test(>ref))
kfree(obj);

expresses this etc.. Other reference counts also do this. No references
means its getting freed.

Can you agree with this?


If so; any attempt to increase the reference count while its (being)
freed() is a use-after-free.

Therefore we disallow 0 increment.


Yes, this is an annoyance when you consider usage-counts, where 0 means
something else. But then, we were talking about reference counts, not
something else.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread PaX Team
On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:

> On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> 
> > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > > O(nr_tasks + 3*nr_cpus).
> > 
> > what does nr_cpus have to do with winning the race?
> 
> The CPUs could each run nested softirq/hardirq/nmi context poking at the
> refcount, irrespective of the (preempted) task context.

that's fine but are you also assuming that the code executed in each of
those contexts leaks the same refcount? otherwise whatever they do to the
refcount is no more relevant than a non-leaking preemptible path that runs
to completion in a bounded amount of time (i.e., you get temporary bumps
and thus need to win yet another set of races to get their effects at once).

> > > Because PaX does it, is not a correctness argument. And this really
> > > wants one.
> > 
> > heh, do you want to tell me about how checking for a 0 refcount prevents
> > exploiting a bug?
> 
> Not the point. All I said was that saying somebody else does it (anybody
> else, doesn't matter it was you) isn't an argument for correctness.

that was exactly my point: all this applies to you as well. so let me ask
the 3rd time: what is your "argument for correctness" for a 0 refcount
value check? how does it prevent exploitation?



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread PaX Team
On 24 Apr 2017 at 13:15, Peter Zijlstra wrote:

> On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> > On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:
> 
> > > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > > O(nr_tasks + 3*nr_cpus).
> > 
> > what does nr_cpus have to do with winning the race?
> 
> The CPUs could each run nested softirq/hardirq/nmi context poking at the
> refcount, irrespective of the (preempted) task context.

that's fine but are you also assuming that the code executed in each of
those contexts leaks the same refcount? otherwise whatever they do to the
refcount is no more relevant than a non-leaking preemptible path that runs
to completion in a bounded amount of time (i.e., you get temporary bumps
and thus need to win yet another set of races to get their effects at once).

> > > Because PaX does it, is not a correctness argument. And this really
> > > wants one.
> > 
> > heh, do you want to tell me about how checking for a 0 refcount prevents
> > exploiting a bug?
> 
> Not the point. All I said was that saying somebody else does it (anybody
> else, doesn't matter it was you) isn't an argument for correctness.

that was exactly my point: all this applies to you as well. so let me ask
the 3rd time: what is your "argument for correctness" for a 0 refcount
value check? how does it prevent exploitation?



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:

> > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > O(nr_tasks + 3*nr_cpus).
> 
> what does nr_cpus have to do with winning the race?

The CPUs could each run nested softirq/hardirq/nmi context poking at the
refcount, irrespective of the (preempted) task context.

> > Because PaX does it, is not a correctness argument. And this really
> > wants one.
> 
> heh, do you want to tell me about how checking for a 0 refcount prevents
> exploiting a bug?

Not the point. All I said was that saying somebody else does it (anybody
else, doesn't matter it was you) isn't an argument for correctness.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 01:00:18PM +0200, PaX Team wrote:
> On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:

> > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > O(nr_tasks + 3*nr_cpus).
> 
> what does nr_cpus have to do with winning the race?

The CPUs could each run nested softirq/hardirq/nmi context poking at the
refcount, irrespective of the (preempted) task context.

> > Because PaX does it, is not a correctness argument. And this really
> > wants one.
> 
> heh, do you want to tell me about how checking for a 0 refcount prevents
> exploiting a bug?

Not the point. All I said was that saying somebody else does it (anybody
else, doesn't matter it was you) isn't an argument for correctness.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread PaX Team
On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:

> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> > This patch ports the x86-specific atomic overflow handling from PaX's
> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> > from PaX that eliminates the saturation race condition by resetting the
> > atomic counter back to the INT_MAX saturation value on both overflow and
> > underflow. To win a race, a system would have to have INT_MAX threads
> > simultaneously overflow before the saturation handler runs.

note that the above is wrong (and even contradicting itself and the code).

> And is this impossible? Highly unlikely I'll grant you, but absolutely
> impossible?

here's my analysis from a while ago:
http://www.openwall.com/lists/kernel-hardening/2017/01/05/19

> Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> O(nr_tasks + 3*nr_cpus).

what does nr_cpus have to do with winning the race?

> Because PaX does it, is not a correctness argument. And this really
> wants one.

heh, do you want to tell me about how checking for a 0 refcount prevents
exploiting a bug?



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread PaX Team
On 24 Apr 2017 at 10:32, Peter Zijlstra wrote:

> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> > This patch ports the x86-specific atomic overflow handling from PaX's
> > PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> > from PaX that eliminates the saturation race condition by resetting the
> > atomic counter back to the INT_MAX saturation value on both overflow and
> > underflow. To win a race, a system would have to have INT_MAX threads
> > simultaneously overflow before the saturation handler runs.

note that the above is wrong (and even contradicting itself and the code).

> And is this impossible? Highly unlikely I'll grant you, but absolutely
> impossible?

here's my analysis from a while ago:
http://www.openwall.com/lists/kernel-hardening/2017/01/05/19

> Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> O(nr_tasks + 3*nr_cpus).

what does nr_cpus have to do with winning the race?

> Because PaX does it, is not a correctness argument. And this really
> wants one.

heh, do you want to tell me about how checking for a 0 refcount prevents
exploiting a bug?



Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c
> index e3f4cd8876b5..1bdafb29b802 100644
> --- a/drivers/misc/lkdtm_bugs.c
> +++ b/drivers/misc/lkdtm_bugs.c
> @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void)
>   schedule();
>  }
>  
> +#ifdef CONFIG_FAST_REFCOUNT
> +#define REFCOUNT_MAX INT_MAX
> +#else
> +#define REFCOUNT_MAX UINT_MAX
> +#endif

That doesn't seem like a sensible place for this.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> diff --git a/drivers/misc/lkdtm_bugs.c b/drivers/misc/lkdtm_bugs.c
> index e3f4cd8876b5..1bdafb29b802 100644
> --- a/drivers/misc/lkdtm_bugs.c
> +++ b/drivers/misc/lkdtm_bugs.c
> @@ -135,9 +135,15 @@ void lkdtm_HUNG_TASK(void)
>   schedule();
>  }
>  
> +#ifdef CONFIG_FAST_REFCOUNT
> +#define REFCOUNT_MAX INT_MAX
> +#else
> +#define REFCOUNT_MAX UINT_MAX
> +#endif

That doesn't seem like a sensible place for this.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r)
> +{
> + const int a = 1;
> + const int u = 0;
> + int c, old;
> +
> + c = atomic_read(&(r->refs));
> + for (;;) {
> + if (unlikely(c == (u)))
> + break;
> + old = atomic_cmpxchg(&(r->refs), c, c + (a));

Please use atomic_try_cmpxchg(), that generates saner code.

> + if (likely(old == c))
> + break;
> + c = old;
> + }
> + return c != u;
> +}


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r)
> +{
> + const int a = 1;
> + const int u = 0;
> + int c, old;
> +
> + c = atomic_read(&(r->refs));
> + for (;;) {
> + if (unlikely(c == (u)))
> + break;
> + old = atomic_cmpxchg(&(r->refs), c, c + (a));

Please use atomic_try_cmpxchg(), that generates saner code.

> + if (likely(old == c))
> + break;
> + c = old;
> + }
> + return c != u;
> +}


Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 10:53:30AM +0200, Jann Horn wrote:
> On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra  wrote:
> > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> >> This patch ports the x86-specific atomic overflow handling from PaX's
> >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> >> from PaX that eliminates the saturation race condition by resetting the
> >> atomic counter back to the INT_MAX saturation value on both overflow and
> >> underflow. To win a race, a system would have to have INT_MAX threads
> >> simultaneously overflow before the saturation handler runs.
> >
> > And is this impossible? Highly unlikely I'll grant you, but absolutely
> > impossible?
> >
> > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > O(nr_tasks + 3*nr_cpus).
> >
> > Because PaX does it, is not a correctness argument. And this really
> > wants one.
> 
> From include/linux/threads.h:
> 
> /*
>  * A maximum of 4 million PIDs should be enough for a while.
>  * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
>  */
> #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
> (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
> 
> AFAICS that means you can only have up to 2^22 running tasks at
> a time, since every running task requires a PID in the init pid namespace.

That's but an artificial limit and bound to be increased sooner rather
than later, given the trend in building ever larger machines.

If we go with the futex limit on the full tid space, we end up with 2^30
(the comment here about 29 is wrong, see FUTEX_TID_MASK).

We then still have to add a multiple of nr_cpus. Granted, that will
still be slightly short of 3^31, but not to any amount I'm really
comfortable with, we're _really_ close.

Point remains though; Changelog (and arguably the code itself) should
very much include such bits.


Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Mon, Apr 24, 2017 at 10:53:30AM +0200, Jann Horn wrote:
> On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra  wrote:
> > On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> >> This patch ports the x86-specific atomic overflow handling from PaX's
> >> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> >> from PaX that eliminates the saturation race condition by resetting the
> >> atomic counter back to the INT_MAX saturation value on both overflow and
> >> underflow. To win a race, a system would have to have INT_MAX threads
> >> simultaneously overflow before the saturation handler runs.
> >
> > And is this impossible? Highly unlikely I'll grant you, but absolutely
> > impossible?
> >
> > Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> > O(nr_tasks + 3*nr_cpus).
> >
> > Because PaX does it, is not a correctness argument. And this really
> > wants one.
> 
> From include/linux/threads.h:
> 
> /*
>  * A maximum of 4 million PIDs should be enough for a while.
>  * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
>  */
> #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
> (sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
> 
> AFAICS that means you can only have up to 2^22 running tasks at
> a time, since every running task requires a PID in the init pid namespace.

That's but an artificial limit and bound to be increased sooner rather
than later, given the trend in building ever larger machines.

If we go with the futex limit on the full tid space, we end up with 2^30
(the comment here about 29 is wrong, see FUTEX_TID_MASK).

We then still have to add a multiple of nr_cpus. Granted, that will
still be slightly short of 3^31, but not to any amount I'm really
comfortable with, we're _really_ close.

Point remains though; Changelog (and arguably the code itself) should
very much include such bits.


Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Jann Horn
On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> This patch ports the x86-specific atomic overflow handling from PaX's
>> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
>> from PaX that eliminates the saturation race condition by resetting the
>> atomic counter back to the INT_MAX saturation value on both overflow and
>> underflow. To win a race, a system would have to have INT_MAX threads
>> simultaneously overflow before the saturation handler runs.
>
> And is this impossible? Highly unlikely I'll grant you, but absolutely
> impossible?
>
> Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> O(nr_tasks + 3*nr_cpus).
>
> Because PaX does it, is not a correctness argument. And this really
> wants one.

>From include/linux/threads.h:

/*
 * A maximum of 4 million PIDs should be enough for a while.
 * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
 */
#define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
(sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))

AFAICS that means you can only have up to 2^22 running tasks at
a time, since every running task requires a PID in the init pid namespace.


Re: [kernel-hardening] Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Jann Horn
On Mon, Apr 24, 2017 at 10:32 AM, Peter Zijlstra  wrote:
> On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
>> This patch ports the x86-specific atomic overflow handling from PaX's
>> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
>> from PaX that eliminates the saturation race condition by resetting the
>> atomic counter back to the INT_MAX saturation value on both overflow and
>> underflow. To win a race, a system would have to have INT_MAX threads
>> simultaneously overflow before the saturation handler runs.
>
> And is this impossible? Highly unlikely I'll grant you, but absolutely
> impossible?
>
> Also, you forgot nr_cpus in your bound. Afaict the worst case here is
> O(nr_tasks + 3*nr_cpus).
>
> Because PaX does it, is not a correctness argument. And this really
> wants one.

>From include/linux/threads.h:

/*
 * A maximum of 4 million PIDs should be enough for a while.
 * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
 */
#define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
(sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))

AFAICS that means you can only have up to 2^22 running tasks at
a time, since every running task requires a PID in the init pid namespace.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> This patch ports the x86-specific atomic overflow handling from PaX's
> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> from PaX that eliminates the saturation race condition by resetting the
> atomic counter back to the INT_MAX saturation value on both overflow and
> underflow. To win a race, a system would have to have INT_MAX threads
> simultaneously overflow before the saturation handler runs.

And is this impossible? Highly unlikely I'll grant you, but absolutely
impossible?

Also, you forgot nr_cpus in your bound. Afaict the worst case here is
O(nr_tasks + 3*nr_cpus).

Because PaX does it, is not a correctness argument. And this really
wants one.


Re: [PATCH] x86/refcount: Implement fast refcount_t handling

2017-04-24 Thread Peter Zijlstra
On Fri, Apr 21, 2017 at 03:09:39PM -0700, Kees Cook wrote:
> This patch ports the x86-specific atomic overflow handling from PaX's
> PAX_REFCOUNT to the upstream refcount_t API. This is an updated version
> from PaX that eliminates the saturation race condition by resetting the
> atomic counter back to the INT_MAX saturation value on both overflow and
> underflow. To win a race, a system would have to have INT_MAX threads
> simultaneously overflow before the saturation handler runs.

And is this impossible? Highly unlikely I'll grant you, but absolutely
impossible?

Also, you forgot nr_cpus in your bound. Afaict the worst case here is
O(nr_tasks + 3*nr_cpus).

Because PaX does it, is not a correctness argument. And this really
wants one.