Re: [PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free

2020-10-05 Thread Daniel Micay
It will reuse the memory for other things when the whole slab is freed
though. Not really realistic to change that without it being backed by
virtual memory along with higher-level management of regions to avoid
intense fragmentation and metadata waste. It would depend a lot on
having much finer-grained slab caches, otherwise it's not going to be
much of an alternative to a quarantine feature. Even then, a
quarantine feature is still useful, but is less suitable for a
mainstream feature due to performance cost. Even a small quarantine
has a fairly high performance cost.


Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-29 Thread Daniel Micay
Yeah, a no-op pkey_alloc flag tied to this patch to provide a way to
detect if pkey state is preserved on fork, since kernels without the
patch would report EINVAL. Something like
PKEY_ASSERT_FORK_INHERIT_STATE would make sense. Otherwise, it's
going to be quite painful to adopt this in userspace software without kernel
version checks. Most software can't accept causing a crash / abort
after forking in environments not applying all the stable kernel
patches, or in fresh OS installs that aren't fully updated + rebooted yet.


Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-29 Thread Daniel Micay
Yeah, a no-op pkey_alloc flag tied to this patch to provide a way to
detect if pkey state is preserved on fork, since kernels without the
patch would report EINVAL. Something like
PKEY_ASSERT_FORK_INHERIT_STATE would make sense. Otherwise, it's
going to be quite painful to adopt this in userspace software without kernel
version checks. Most software can't accept causing a crash / abort
after forking in environments not applying all the stable kernel
patches, or in fresh OS installs that aren't fully updated + rebooted yet.


Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-26 Thread Daniel Micay
On Fri, 26 Oct 2018 at 18:12, Andy Lutomirski  wrote:
>
>
>
> > On Oct 26, 2018, at 2:39 PM, Daniel Micay  wrote:
> >
> > I ended up working around this with a pthread_atfork handler disabling
> > my usage of the feature in the child process for the time being. I
> > don't have an easy way to detect if the bug is present within a
> > library so
>
> Can you not just make sure that the fix is backported to all relevant kernels?

There are too many different distribution kernels and I won't be in
control of where the software is used.

> I suppose we could add a new flag for pkey_get() or something.

That would work, since I can apply the workaround (disabling the
feature in child processes) if I get EINVAL. The flag wouldn't need to
do anything, just existing and being tied to this patch so I have a
way to detect that I can safely use MPK after fork.

> > I'm going to need a kernel version check with a table of
> > kernel releases fixing the problem for each stable branch.
>
> That won’t work right on district kernels. Please don’t go there.

If it's backported to an earlier version, it will just mean missing a
chance to use it. I'm not going to assume that it behaves a certain
way based on having an old kernel, but rather I just won't use it in a
forked child on an older kernel version if I don't have a way to
detect the problem. It's for a few different uses in library code so
I can't have a runtime test trying to detect it with clone(...). I'd
definitely prefer a proper way to detect that I can use it after fork. I really
don't want to have a hack like that which is why I'm bringing it up.


Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-26 Thread Daniel Micay
On Fri, 26 Oct 2018 at 18:12, Andy Lutomirski  wrote:
>
>
>
> > On Oct 26, 2018, at 2:39 PM, Daniel Micay  wrote:
> >
> > I ended up working around this with a pthread_atfork handler disabling
> > my usage of the feature in the child process for the time being. I
> > don't have an easy way to detect if the bug is present within a
> > library so
>
> Can you not just make sure that the fix is backported to all relevant kernels?

There are too many different distribution kernels and I won't be in
control of where the software is used.

> I suppose we could add a new flag for pkey_get() or something.

That would work, since I can apply the workaround (disabling the
feature in child processes) if I get EINVAL. The flag wouldn't need to
do anything, just existing and being tied to this patch so I have a
way to detect that I can safely use MPK after fork.

> > I'm going to need a kernel version check with a table of
> > kernel releases fixing the problem for each stable branch.
>
> That won’t work right on district kernels. Please don’t go there.

If it's backported to an earlier version, it will just mean missing a
chance to use it. I'm not going to assume that it behaves a certain
way based on having an old kernel, but rather I just won't use it in a
forked child on an older kernel version if I don't have a way to
detect the problem. It's for a few different uses in library code so
I can't have a runtime test trying to detect it with clone(...). I'd
definitely prefer a proper way to detect that I can use it after fork. I really
don't want to have a hack like that which is why I'm bringing it up.


Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-26 Thread Daniel Micay
I ended up working around this with a pthread_atfork handler disabling
my usage of the feature in the child process for the time being. I
don't have an easy way to detect if the bug is present within a
library so I'm going to need a kernel version check with a table of
kernel releases fixing the problem for each stable branch.

It would be helpful if there was a new cpuinfo flag to check if the
MPK state is preserved on fork in addition to the existing ospke flag.
The problem will fade away over time but in my experience there are a
lot of people using distributions with kernels not incorporating all
of the stable fixes. I expect other people will run into the problem
once hardware with MPK is more widely available and other people try
to use it for various things like moving GC or assorted security
features. Someone will end up running software adopting it on an older
kernel with the problem.

The clobbering issue I found with MAP_FIXED_NOREPLACE isn't quite
as annoying because it was easy to make a runtime test usable in a library
to see if the feature works properly.


Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-10-26 Thread Daniel Micay
I ended up working around this with a pthread_atfork handler disabling
my usage of the feature in the child process for the time being. I
don't have an easy way to detect if the bug is present within a
library so I'm going to need a kernel version check with a table of
kernel releases fixing the problem for each stable branch.

It would be helpful if there was a new cpuinfo flag to check if the
MPK state is preserved on fork in addition to the existing ospke flag.
The problem will fade away over time but in my experience there are a
lot of people using distributions with kernels not incorporating all
of the stable fixes. I expect other people will run into the problem
once hardware with MPK is more widely available and other people try
to use it for various things like moving GC or assorted security
features. Someone will end up running software adopting it on an older
kernel with the problem.

The clobbering issue I found with MAP_FIXED_NOREPLACE isn't quite
as annoying because it was easy to make a runtime test usable in a library
to see if the feature works properly.


Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4

2018-04-20 Thread Daniel Micay
Well, that's not related, it's just this:

#ifdef __GNUC__
#if (__GNUC__ == 3 && __GNUC_MINOR__ < 3)
#error Your compiler is too buggy; it is known to miscompile kernels.
#errorKnown good compilers: 3.3, 4.x
#endif
#if GCC_VERSION >= 40800 && GCC_VERSION < 40803
#error Your compiler is too buggy; it is known to miscompile kernels
#error and result in filesystem corruption and oopses.
#endif
#endif


Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4

2018-04-20 Thread Daniel Micay
Well, that's not related, it's just this:

#ifdef __GNUC__
#if (__GNUC__ == 3 && __GNUC_MINOR__ < 3)
#error Your compiler is too buggy; it is known to miscompile kernels.
#errorKnown good compilers: 3.3, 4.x
#endif
#if GCC_VERSION >= 40800 && GCC_VERSION < 40803
#error Your compiler is too buggy; it is known to miscompile kernels
#error and result in filesystem corruption and oopses.
#endif
#endif


Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4

2018-04-20 Thread Daniel Micay
On 20 April 2018 at 15:15, Pavel Machek  wrote:
> Hi!
>
>> >> Hi! Sorry I lost this email in my inbox. It seems this is specific to
>> >> a particular subset of arm architectures? (My local builds of arm all
>> >> succeed, for example. Can you send your failing config?) I'll take a
>> >> closer look on Monday if Daniel doesn't beat me to it.
>> >
>> > Daniel, Kees: any news?
>> >
>> > I'm aware you did not specify which Monday :-).
>>
>> Hi! Sorry, I got distracted. So the .config you sent me builds fine
>> with my cross compiler. I suspect this is something specific to ELDK's
>> compiler. I can try some other compiler versions. What version of gcc
>> is failing?
>
> I have:
>
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version
> arm-linux-gnueabi-gcc (GCC) 4.7.2
> Copyright (C) 2012 Free Software Foundation, Inc.
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version
> GNU ld (GNU Binutils) 2.23.1.20121113
> Copyright 2012 Free Software Foundation, Inc.
>
> Let me try with eldk-5.6:
>
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version
> arm-linux-gnueabi-gcc (GCC) 4.8.2
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version
> GNU ld (GNU Binutils) 2.24
>
> make  lib/string.o
>
> Seems to succeed, so I guess full build will succeed, too...

It doesn't imply that a full build would work.


Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4

2018-04-20 Thread Daniel Micay
On 20 April 2018 at 15:15, Pavel Machek  wrote:
> Hi!
>
>> >> Hi! Sorry I lost this email in my inbox. It seems this is specific to
>> >> a particular subset of arm architectures? (My local builds of arm all
>> >> succeed, for example. Can you send your failing config?) I'll take a
>> >> closer look on Monday if Daniel doesn't beat me to it.
>> >
>> > Daniel, Kees: any news?
>> >
>> > I'm aware you did not specify which Monday :-).
>>
>> Hi! Sorry, I got distracted. So the .config you sent me builds fine
>> with my cross compiler. I suspect this is something specific to ELDK's
>> compiler. I can try some other compiler versions. What version of gcc
>> is failing?
>
> I have:
>
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version
> arm-linux-gnueabi-gcc (GCC) 4.7.2
> Copyright (C) 2012 Free Software Foundation, Inc.
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version
> GNU ld (GNU Binutils) 2.23.1.20121113
> Copyright 2012 Free Software Foundation, Inc.
>
> Let me try with eldk-5.6:
>
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version
> arm-linux-gnueabi-gcc (GCC) 4.8.2
> pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version
> GNU ld (GNU Binutils) 2.24
>
> make  lib/string.o
>
> Seems to succeed, so I guess full build will succeed, too...

It doesn't imply that a full build would work.


Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage

2018-03-13 Thread Daniel Micay
> I don't think the difference between C and C++ const pointer
> conversions

I mean I don't think the difference between them was intended.


Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage

2018-03-13 Thread Daniel Micay
> I don't think the difference between C and C++ const pointer
> conversions

I mean I don't think the difference between them was intended.


Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage

2018-03-13 Thread Daniel Micay
No, it's undefined behavior to write to a const variable. The `static`
and `const` on the variable both change the code generation in the
real world as permitted / encouraged by the standard. It's placed in
read-only memory. Trying to write to it will break. It's not
"implemented defined" to write to it, it's "undefined behavior" i.e.
it's considered incorrect. There a clear distinction between those in
the standard.

You're confusing having a real `const` for a variable with having it
applied to a pointer. It's well-defined to cast away const from a
pointer and write to what it points at if it's not actually const. If
it is const, that's broken.

There's nothing implementation defined about either case.

The C standard could have considered `static const` variables to work
as constant expressions just like the C++ standard. They borrowed it
from there but made it less useful than const in what became the C++
standard. They also used stricter rules for the permitted implicit
conversions of const pointers which made those much less usable, i.e.
converting `int **` to `const int *const *` wasn't permitted like C++.
I don't think the difference between C and C++ const pointer
conversions, it's a quirk of them being standardized on different
timelines and ending up with different versions of the same thing. On
the other hand, they might have left out having ever so slightly more
useful constant expressions on purpose since people can use #define.


Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage

2018-03-13 Thread Daniel Micay
No, it's undefined behavior to write to a const variable. The `static`
and `const` on the variable both change the code generation in the
real world as permitted / encouraged by the standard. It's placed in
read-only memory. Trying to write to it will break. It's not
"implemented defined" to write to it, it's "undefined behavior" i.e.
it's considered incorrect. There a clear distinction between those in
the standard.

You're confusing having a real `const` for a variable with having it
applied to a pointer. It's well-defined to cast away const from a
pointer and write to what it points at if it's not actually const. If
it is const, that's broken.

There's nothing implementation defined about either case.

The C standard could have considered `static const` variables to work
as constant expressions just like the C++ standard. They borrowed it
from there but made it less useful than const in what became the C++
standard. They also used stricter rules for the permitted implicit
conversions of const pointers which made those much less usable, i.e.
converting `int **` to `const int *const *` wasn't permitted like C++.
I don't think the difference between C and C++ const pointer
conversions, it's a quirk of them being standardized on different
timelines and ending up with different versions of the same thing. On
the other hand, they might have left out having ever so slightly more
useful constant expressions on purpose since people can use #define.


Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage

2018-03-10 Thread Daniel Micay
> Just wondering. Is this actually a VLA. FFT_NUM_SAMPLES was static const so
> not really going to show a lot of variation. This array will always have the
> same size on the stack.

The issue is that unlike in C++, a `static const` can't be used in a
constant expression in C. It's unclear why C is defined that way but
it's how it is and there isn't currently a GCC extension making more
things into constant expressions like C++.


Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage

2018-03-10 Thread Daniel Micay
> Just wondering. Is this actually a VLA. FFT_NUM_SAMPLES was static const so
> not really going to show a lot of variation. This array will always have the
> same size on the stack.

The issue is that unlike in C++, a `static const` can't be used in a
constant expression in C. It's unclear why C is defined that way but
it's how it is and there isn't currently a GCC extension making more
things into constant expressions like C++.


Re: VLA removal (was Re: [RFC 2/2] lustre: use VLA_SAFE)

2018-03-07 Thread Daniel Micay
On 7 March 2018 at 13:09, Linus Torvalds  wrote:
> On Wed, Mar 7, 2018 at 9:37 AM, Kees Cook  wrote:
>>
>> Building with -Wvla, I see 209 unique locations reported in 60 directories:
>> http://paste.ubuntu.com/p/srQxwPQS9s/
>
> Ok, that's not so bad. Maybe Greg could even add it to one of those
> things he encourages new people to do?
>
> Because at least *some* of them are pretty trivial. For example,
> looking at the core code, I was surprised to see something in
> lib/btree.c

Some are probably just the issue of technically having a VLA that's
not really a VLA:

static const int size = 5;

void foo(void) {
  int x[size];
}

% gcc -c -Wvla foo.c
foo.c: In function ‘foo’:
foo.c:4:3: warning: ISO C90 forbids variable length array ‘x’ [-Wvla]
   int x[size];
   ^~~

I don't really understand why the C standard didn't make `static
const` declarations usable as constant expressions like C++. They made
the pointer conversions more painful too.

It would be nice to get rid of those cases to use -Werror=vla though.


Re: VLA removal (was Re: [RFC 2/2] lustre: use VLA_SAFE)

2018-03-07 Thread Daniel Micay
On 7 March 2018 at 13:09, Linus Torvalds  wrote:
> On Wed, Mar 7, 2018 at 9:37 AM, Kees Cook  wrote:
>>
>> Building with -Wvla, I see 209 unique locations reported in 60 directories:
>> http://paste.ubuntu.com/p/srQxwPQS9s/
>
> Ok, that's not so bad. Maybe Greg could even add it to one of those
> things he encourages new people to do?
>
> Because at least *some* of them are pretty trivial. For example,
> looking at the core code, I was surprised to see something in
> lib/btree.c

Some are probably just the issue of technically having a VLA that's
not really a VLA:

static const int size = 5;

void foo(void) {
  int x[size];
}

% gcc -c -Wvla foo.c
foo.c: In function ‘foo’:
foo.c:4:3: warning: ISO C90 forbids variable length array ‘x’ [-Wvla]
   int x[size];
   ^~~

I don't really understand why the C standard didn't make `static
const` declarations usable as constant expressions like C++. They made
the pointer conversions more painful too.

It would be nice to get rid of those cases to use -Werror=vla though.


Re: [RFC PATCH] Randomization of address chosen by mmap.

2018-03-05 Thread Daniel Micay
On 5 March 2018 at 08:09, Ilya Smith  wrote:
>
>> On 4 Mar 2018, at 23:56, Matthew Wilcox  wrote:
>> Thinking about this more ...
>>
>> - When you call munmap, if you pass in the same (addr, length) that were
>>   used for mmap, then it should unmap the guard pages as well (that
>>   wasn't part of the patch, so it would have to be added)
>> - If 'addr' is higher than the mapped address, and length at least
>>   reaches the end of the mapping, then I would expect the guard pages to
>>   "move down" and be after the end of the newly-shortened mapping.
>> - If 'addr' is higher than the mapped address, and the length doesn't
>>   reach the end of the old mapping, we split the old mapping into two.
>>   I would expect the guard pages to apply to both mappings, insofar as
>>   they'll fit.  For an example, suppose we have a five-page mapping with
>>   two guard pages (MGG), and then we unmap the fourth page.  Now we
>>   have a three-page mapping with one guard page followed immediately
>>   by a one-page mapping with two guard pages (MMMGMGG).
>
> I’m analysing that approach and see much more problems:
> - each time you call mmap like this, you still  increase count of vmas as my
> patch did
> - now feature vma_merge shouldn’t work at all, until MAP_FIXED is set or
> PROT_GUARD(0)
> - the entropy you provide is like 16 bit, that is really not so hard to brute
> - in your patch you don’t use vm_guard at address searching, I see many roots
> of bugs here
> - if you unmap/remap one page inside region, field vma_guard will show head
> or tail pages for vma, not both; kernel don’t know how to handle it
> - user mode now choose entropy with PROT_GUARD macro, where did he gets it?
> User mode shouldn’t be responsible for entropy at all

I didn't suggest this as the way of implementing fine-grained
randomization but rather a small starting point for hardening address
space layout further. I don't think it should be tied to a mmap flag
but rather something like a personality flag or a global sysctl. It
doesn't need to be random at all to be valuable, and it's just a first
step. It doesn't mean there can't be switches between random pivots
like OpenBSD mmap, etc. I'm not so sure that randomly switching around
is going to result in isolating things very well though.

The VMA count issue is at least something very predictable with a
performance cost only for kernel operations.

> I can’t understand what direction this conversation is going to. I was talking
> about weak implementation in Linux kernel but got many comments about ASLR
> should be implemented in user mode what is really weird to me.

That's not what I said. I was saying that splitting things into
regions based on the type of allocation works really well and allows
for high entropy bases, but that the kernel can't really do that right
now. It could split up code that starts as PROT_EXEC into a region but
that's generally not how libraries are mapped in so it won't know
until mprotect which is obviously too late. Unless it had some kind of
type key passed from userspace, it can't really do that.

> I think it is possible  to add GUARD pages into my implementations, but 
> initially
> problem was about entropy of address choosing. I would like to resolve it 
> step by
> step.

Starting with fairly aggressive fragmentation of the address space is
going to be a really hard sell. The costs of a very spread out address
space in terms of TLB misses, etc. are unclear. Starting with enforced
gaps (1 page) and randomization for those wouldn't rule out having
finer-grained randomization, like randomly switching between different
regions. This needs to be cheap enough that people want to enable it,
and the goals need to be clearly spelled out. The goal needs to be
clearer than "more randomization == good" and then accepting a high
performance cost for that.

I'm not dictating how things should be done, I don't have any say
about that. I'm just trying to discuss it.


Re: [RFC PATCH] Randomization of address chosen by mmap.

2018-03-05 Thread Daniel Micay
On 5 March 2018 at 08:09, Ilya Smith  wrote:
>
>> On 4 Mar 2018, at 23:56, Matthew Wilcox  wrote:
>> Thinking about this more ...
>>
>> - When you call munmap, if you pass in the same (addr, length) that were
>>   used for mmap, then it should unmap the guard pages as well (that
>>   wasn't part of the patch, so it would have to be added)
>> - If 'addr' is higher than the mapped address, and length at least
>>   reaches the end of the mapping, then I would expect the guard pages to
>>   "move down" and be after the end of the newly-shortened mapping.
>> - If 'addr' is higher than the mapped address, and the length doesn't
>>   reach the end of the old mapping, we split the old mapping into two.
>>   I would expect the guard pages to apply to both mappings, insofar as
>>   they'll fit.  For an example, suppose we have a five-page mapping with
>>   two guard pages (MGG), and then we unmap the fourth page.  Now we
>>   have a three-page mapping with one guard page followed immediately
>>   by a one-page mapping with two guard pages (MMMGMGG).
>
> I’m analysing that approach and see much more problems:
> - each time you call mmap like this, you still  increase count of vmas as my
> patch did
> - now feature vma_merge shouldn’t work at all, until MAP_FIXED is set or
> PROT_GUARD(0)
> - the entropy you provide is like 16 bit, that is really not so hard to brute
> - in your patch you don’t use vm_guard at address searching, I see many roots
> of bugs here
> - if you unmap/remap one page inside region, field vma_guard will show head
> or tail pages for vma, not both; kernel don’t know how to handle it
> - user mode now choose entropy with PROT_GUARD macro, where did he gets it?
> User mode shouldn’t be responsible for entropy at all

I didn't suggest this as the way of implementing fine-grained
randomization but rather a small starting point for hardening address
space layout further. I don't think it should be tied to a mmap flag
but rather something like a personality flag or a global sysctl. It
doesn't need to be random at all to be valuable, and it's just a first
step. It doesn't mean there can't be switches between random pivots
like OpenBSD mmap, etc. I'm not so sure that randomly switching around
is going to result in isolating things very well though.

The VMA count issue is at least something very predictable with a
performance cost only for kernel operations.

> I can’t understand what direction this conversation is going to. I was talking
> about weak implementation in Linux kernel but got many comments about ASLR
> should be implemented in user mode what is really weird to me.

That's not what I said. I was saying that splitting things into
regions based on the type of allocation works really well and allows
for high entropy bases, but that the kernel can't really do that right
now. It could split up code that starts as PROT_EXEC into a region but
that's generally not how libraries are mapped in so it won't know
until mprotect which is obviously too late. Unless it had some kind of
type key passed from userspace, it can't really do that.

> I think it is possible  to add GUARD pages into my implementations, but 
> initially
> problem was about entropy of address choosing. I would like to resolve it 
> step by
> step.

Starting with fairly aggressive fragmentation of the address space is
going to be a really hard sell. The costs of a very spread out address
space in terms of TLB misses, etc. are unclear. Starting with enforced
gaps (1 page) and randomization for those wouldn't rule out having
finer-grained randomization, like randomly switching between different
regions. This needs to be cheap enough that people want to enable it,
and the goals need to be clearly spelled out. The goal needs to be
clearer than "more randomization == good" and then accepting a high
performance cost for that.

I'm not dictating how things should be done, I don't have any say
about that. I'm just trying to discuss it.


Re: [RFC PATCH] Randomization of address chosen by mmap.

2018-03-03 Thread Daniel Micay
On 3 March 2018 at 08:58, Ilya Smith <blackz...@gmail.com> wrote:
> Hello Daniel, thanks for sharing you experience!
>
>> On 1 Mar 2018, at 00:02, Daniel Micay <danielmi...@gmail.com> wrote:
>>
>> I don't think it makes sense for the kernel to attempt mitigations to
>> hide libraries. The best way to do that is in userspace, by having the
>> linker reserve a large PROT_NONE region for mapping libraries (both at
>> initialization and for dlopen) including a random gap to act as a
>> separate ASLR base.
> Why this is the best and what is the limit of this large region?
> Let’s think out of the box.
> What you say here means you made a separate memory region for libraries 
> without
> changing kernel. But the basic idea - you have a separate region for libraries
> only. Probably you also want to have separate regions for any thread stack, 
> for
> mmaped files, shared memory, etc. This allows to protect memory regions of
> different types from each other. It is impossible to implement this without
> keeping the whole memory map. This map should be secure from any leak attack 
> to
> prevent ASLR bypass. The only one way to do it is to implement it in the 
> kernel
> and provide different syscalls like uselib or allocstack, etc. This one is
> really hard in current kernel implementation.

There's the option of reserving PROT_NONE regions and managing memory
within them using a similar best-fit allocation scheme to get separate
random bases. The kernel could offer something like that but it's
already possible to do it for libc mmap usage within libc as we did
for libraries.

The kernel's help is needed to cover non-libc users of mmap, i.e. not
the linker, malloc, etc. It's not possible for libc to assume that
everything goes through the libc mmap/mremap/munmap wrappers and it
would be a mess so I'm not saying the kernel doesn't have a part to
play. I'm only saying it makes sense to look at the whole picture and
if something can be done better in libc or the linker, to do it there
instead. There isn't an API for dividing stuff up into regions, so it
has to be done in userspace right now and I think it works a lot
better when it's an option.

>
> My approach was to hide memory regions from attacker and from each other.
>
>> If an attacker has library addresses, it's hard to
>> see much point in hiding the other libraries from them.
>
> In some cases attacker has only one leak for whole attack. And we should do 
> the best
> to make even this leak useless.
>
>> It does make
>> sense to keep them from knowing the location of any executable code if
>> they leak non-library addresses. An isolated library region + gap is a
>> feature we implemented in CopperheadOS and it works well, although we
>> haven't ported it to Android 7.x or 8.x.
> This one interesting to know and I would like to try to attack it, but it's 
> out of the
> scope of current conversation.

I don't think it's out-of-scope. There are different approaches to
this kind of finer-grained randomization and they can be done
together.

>> I don't think the kernel can
>> bring much / anything to the table for it. It's inherently the
>> responsibility of libc to randomize the lower bits for secondary
>> stacks too.
>
> I think any bit of secondary stack should be randomized to provide attacker as
> less information as we can.

The issue is that the kernel is only providing a mapping so it can add
a random gap or randomize it in other ways but it's ultimately up to
libc and other userspace code to do randomization without those
mappings.

A malloc implementation is similarly going to request fairly large
mappings from the kernel to manage a bunch of stuff within them
itself. The kernel can't protect against stuff like heap spray attacks
very well all by itself. It definitely has a part to play in that but
is a small piece of it (unless the malloc impl actually manages
virtual memory regions itself, which is already done by
performance-oriented allocators for very different reasons).

>> Fine-grained randomized mmap isn't going to be used if it causes
>> unpredictable levels of fragmentation or has a high / unpredictable
>> performance cost.
>
> Lets pretend any chosen address is pure random and always satisfies request. 
> At
> some time we failed to mmap new chunk with size N. What does this means? This
> means that all chunks with size of N are occupied and we even can’t find place
> between them. Now lets count already allocated memory. Let’s pretend on all of
> these occupied chunks lies one page minimum. So count of these pages is
> TASK_SIZE / N. Total bytes already allocated is PASGE_SIZE * TASK_SIZE / N. 
> Now
> we can calculate. TASK_SIZE is 2^48 bytes. PAGE_SIZE 4096. If N is 1MB,
> allo

Re: [RFC PATCH] Randomization of address chosen by mmap.

2018-03-03 Thread Daniel Micay
On 3 March 2018 at 08:58, Ilya Smith  wrote:
> Hello Daniel, thanks for sharing you experience!
>
>> On 1 Mar 2018, at 00:02, Daniel Micay  wrote:
>>
>> I don't think it makes sense for the kernel to attempt mitigations to
>> hide libraries. The best way to do that is in userspace, by having the
>> linker reserve a large PROT_NONE region for mapping libraries (both at
>> initialization and for dlopen) including a random gap to act as a
>> separate ASLR base.
> Why this is the best and what is the limit of this large region?
> Let’s think out of the box.
> What you say here means you made a separate memory region for libraries 
> without
> changing kernel. But the basic idea - you have a separate region for libraries
> only. Probably you also want to have separate regions for any thread stack, 
> for
> mmaped files, shared memory, etc. This allows to protect memory regions of
> different types from each other. It is impossible to implement this without
> keeping the whole memory map. This map should be secure from any leak attack 
> to
> prevent ASLR bypass. The only one way to do it is to implement it in the 
> kernel
> and provide different syscalls like uselib or allocstack, etc. This one is
> really hard in current kernel implementation.

There's the option of reserving PROT_NONE regions and managing memory
within them using a similar best-fit allocation scheme to get separate
random bases. The kernel could offer something like that but it's
already possible to do it for libc mmap usage within libc as we did
for libraries.

The kernel's help is needed to cover non-libc users of mmap, i.e. not
the linker, malloc, etc. It's not possible for libc to assume that
everything goes through the libc mmap/mremap/munmap wrappers and it
would be a mess so I'm not saying the kernel doesn't have a part to
play. I'm only saying it makes sense to look at the whole picture and
if something can be done better in libc or the linker, to do it there
instead. There isn't an API for dividing stuff up into regions, so it
has to be done in userspace right now and I think it works a lot
better when it's an option.

>
> My approach was to hide memory regions from attacker and from each other.
>
>> If an attacker has library addresses, it's hard to
>> see much point in hiding the other libraries from them.
>
> In some cases attacker has only one leak for whole attack. And we should do 
> the best
> to make even this leak useless.
>
>> It does make
>> sense to keep them from knowing the location of any executable code if
>> they leak non-library addresses. An isolated library region + gap is a
>> feature we implemented in CopperheadOS and it works well, although we
>> haven't ported it to Android 7.x or 8.x.
> This one interesting to know and I would like to try to attack it, but it's 
> out of the
> scope of current conversation.

I don't think it's out-of-scope. There are different approaches to
this kind of finer-grained randomization and they can be done
together.

>> I don't think the kernel can
>> bring much / anything to the table for it. It's inherently the
>> responsibility of libc to randomize the lower bits for secondary
>> stacks too.
>
> I think any bit of secondary stack should be randomized to provide attacker as
> less information as we can.

The issue is that the kernel is only providing a mapping so it can add
a random gap or randomize it in other ways but it's ultimately up to
libc and other userspace code to do randomization without those
mappings.

A malloc implementation is similarly going to request fairly large
mappings from the kernel to manage a bunch of stuff within them
itself. The kernel can't protect against stuff like heap spray attacks
very well all by itself. It definitely has a part to play in that but
is a small piece of it (unless the malloc impl actually manages
virtual memory regions itself, which is already done by
performance-oriented allocators for very different reasons).

>> Fine-grained randomized mmap isn't going to be used if it causes
>> unpredictable levels of fragmentation or has a high / unpredictable
>> performance cost.
>
> Lets pretend any chosen address is pure random and always satisfies request. 
> At
> some time we failed to mmap new chunk with size N. What does this means? This
> means that all chunks with size of N are occupied and we even can’t find place
> between them. Now lets count already allocated memory. Let’s pretend on all of
> these occupied chunks lies one page minimum. So count of these pages is
> TASK_SIZE / N. Total bytes already allocated is PASGE_SIZE * TASK_SIZE / N. 
> Now
> we can calculate. TASK_SIZE is 2^48 bytes. PAGE_SIZE 4096. If N is 1MB,
> allocated memory minimum 1125899906842624,

Re: [RFC PATCH] Randomization of address chosen by mmap.

2018-02-28 Thread Daniel Micay
The option to add at least one guard page would be useful whether or
not it's tied to randomization. It's not feasible to do that in
userspace for mmap as a whole, only specific users of mmap like malloc
and it adds significant overhead vs. a kernel implementation. It could
optionally let you choose a minimum and maximum guard region size with
it picking random sizes if they're not equal. It's important for it to
be an enforced gap rather than something that can be filled in by
another allocation. It will obviously help a lot more when it's being
used with a hardened allocator designed to take advantage of this
rather than glibc malloc or jemalloc.

I don't think it makes sense for the kernel to attempt mitigations to
hide libraries. The best way to do that is in userspace, by having the
linker reserve a large PROT_NONE region for mapping libraries (both at
initialization and for dlopen) including a random gap to act as a
separate ASLR base. If an attacker has library addresses, it's hard to
see much point in hiding the other libraries from them. It does make
sense to keep them from knowing the location of any executable code if
they leak non-library addresses. An isolated library region + gap is a
feature we implemented in CopperheadOS and it works well, although we
haven't ported it to Android 7.x or 8.x. I don't think the kernel can
bring much / anything to the table for it. It's inherently the
responsibility of libc to randomize the lower bits for secondary
stacks too.

Fine-grained randomized mmap isn't going to be used if it causes
unpredictable levels of fragmentation or has a high / unpredictable
performance cost. I don't think it makes sense to approach it
aggressively in a way that people can't use. The OpenBSD randomized
mmap is a fairly conservative implementation to avoid causing
excessive fragmentation. I think they do a bit more than adding random
gaps by switching between different 'pivots' but that isn't very high
benefit. The main benefit is having random bits of unmapped space all
over the heap when combined with their hardened allocator which
heavily uses small mmap mappings and has a fair bit of malloc-level
randomization (it's a bitmap / hash table based slab allocator using
4k regions with a page span cache and we use a port of it to Android
with added hardening features but we're missing the fine-grained mmap
rand it's meant to have underneath what it does itself).

The default vm.max_map_count = 65530 is also a major problem for doing
fine-grained mmap randomization of any kind and there's the 32-bit
reference count overflow issue on high memory machines with
max_map_count * pid_max which isn't resolved yet.


Re: [RFC PATCH] Randomization of address chosen by mmap.

2018-02-28 Thread Daniel Micay
The option to add at least one guard page would be useful whether or
not it's tied to randomization. It's not feasible to do that in
userspace for mmap as a whole, only specific users of mmap like malloc
and it adds significant overhead vs. a kernel implementation. It could
optionally let you choose a minimum and maximum guard region size with
it picking random sizes if they're not equal. It's important for it to
be an enforced gap rather than something that can be filled in by
another allocation. It will obviously help a lot more when it's being
used with a hardened allocator designed to take advantage of this
rather than glibc malloc or jemalloc.

I don't think it makes sense for the kernel to attempt mitigations to
hide libraries. The best way to do that is in userspace, by having the
linker reserve a large PROT_NONE region for mapping libraries (both at
initialization and for dlopen) including a random gap to act as a
separate ASLR base. If an attacker has library addresses, it's hard to
see much point in hiding the other libraries from them. It does make
sense to keep them from knowing the location of any executable code if
they leak non-library addresses. An isolated library region + gap is a
feature we implemented in CopperheadOS and it works well, although we
haven't ported it to Android 7.x or 8.x. I don't think the kernel can
bring much / anything to the table for it. It's inherently the
responsibility of libc to randomize the lower bits for secondary
stacks too.

Fine-grained randomized mmap isn't going to be used if it causes
unpredictable levels of fragmentation or has a high / unpredictable
performance cost. I don't think it makes sense to approach it
aggressively in a way that people can't use. The OpenBSD randomized
mmap is a fairly conservative implementation to avoid causing
excessive fragmentation. I think they do a bit more than adding random
gaps by switching between different 'pivots' but that isn't very high
benefit. The main benefit is having random bits of unmapped space all
over the heap when combined with their hardened allocator which
heavily uses small mmap mappings and has a fair bit of malloc-level
randomization (it's a bitmap / hash table based slab allocator using
4k regions with a page span cache and we use a port of it to Android
with added hardening features but we're missing the fine-grained mmap
rand it's meant to have underneath what it does itself).

The default vm.max_map_count = 65530 is also a major problem for doing
fine-grained mmap randomization of any kind and there's the 32-bit
reference count overflow issue on high memory machines with
max_map_count * pid_max which isn't resolved yet.


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
I think there are likely legitimate programs mapping something a bunch of times.

Falling back to a global object -> count mapping (an rbtree / radix
trie or whatever) with a lock once it hits saturation wouldn't risk
breaking something. It would permanently leave the inline count
saturated and just use the address of the inline counter as the key
for the map to find the 64-bit counter. Once it gets to 0 in the map,
it can delete it from the map and do the standard freeing process,
avoiding leaks. It would really just make it a 64-bit reference count
heavily size optimized for the common case. It would work elsewhere
too, not just this case.


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
I think there are likely legitimate programs mapping something a bunch of times.

Falling back to a global object -> count mapping (an rbtree / radix
trie or whatever) with a lock once it hits saturation wouldn't risk
breaking something. It would permanently leave the inline count
saturated and just use the address of the inline counter as the key
for the map to find the 64-bit counter. Once it gets to 0 in the map,
it can delete it from the map and do the standard freeing process,
avoiding leaks. It would really just make it a 64-bit reference count
heavily size optimized for the common case. It would work elsewhere
too, not just this case.


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
I guess it could saturate and then switch to tracking the count via an
object pointer -> count mapping with a global lock? Whatever the
solution is should probably be a generic one since it's a recurring
issue.


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
I guess it could saturate and then switch to tracking the count via an
object pointer -> count mapping with a global lock? Whatever the
solution is should probably be a generic one since it's a recurring
issue.


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
I don't think the kernel can get away with the current approach.
Object sizes and counts on 64-bit should be 64-bit unless there's a
verifiable reason they can get away with 32-bit. Having it use leak
memory isn't okay, just much less bad than vulnerabilities exploitable
beyond just denial of service.

Every 32-bit reference count should probably have a short comment
explaining why it can't overflow on 64-bit... if that can't be written
or it's too complicated to demonstrate, it probably needs to be
64-bit. It's one of many pervasive forms of integer overflows in the
kernel... :(


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
I don't think the kernel can get away with the current approach.
Object sizes and counts on 64-bit should be 64-bit unless there's a
verifiable reason they can get away with 32-bit. Having it use leak
memory isn't okay, just much less bad than vulnerabilities exploitable
beyond just denial of service.

Every 32-bit reference count should probably have a short comment
explaining why it can't overflow on 64-bit... if that can't be written
or it's too complicated to demonstrate, it probably needs to be
64-bit. It's one of many pervasive forms of integer overflows in the
kernel... :(


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
>> That seems pretty bad.  So here's a patch which adds documentation to the
>> two sysctls that a sysadmin could use to shoot themselves in the foot,
>> and adds a warning if they change either of them to a dangerous value.
>
> I have negative feelings about this patch, mostly because AFAICS:
>
>  - It documents an issue instead of fixing it.
>  - It likely only addresses a small part of the actual problem.

The standard map_max_count / pid_max are very low and there are many
situations where either or both need to be raised.

VM fragmentation in long-lived processes is a major issue. There are
allocators like jemalloc designed to minimize VM fragmentation by
never unmapping memory but they're relying on not having anything else
using mmap regularly so they can have all their ranges merged
together, unless they decide to do something like making a 1TB
PROT_NONE mapping up front to slowly consume. If you Google this
sysctl name, you'll find lots of people running into the limit. If
you're using a debugging / hardened allocator designed to use a lot of
guard pages, the standard map_max_count is close to unusable...

I think the same thing applies to pid_max. There are too many
reasonable reasons to increase it. Process-per-request is quite
reasonable if you care about robustness / security and want to sandbox
each request handler. Look at Chrome / Chromium: it's currently
process-per-site-instance, but they're moving to having more processes
with site isolation to isolate iframes into their own processes to
work towards enforcing the boundaries between sites at a process
level. It's way worse for fine-grained server-side sandboxing. Using a
lot of processes like this does counter VM fragmentation especially if
long-lived processes doing a lot of work are mostly avoided... but if
your allocators like using guard pages you're still going to hit the
limit.

I do think the default value in the documentation should be fixed but
if there's a clear problem with raising these it really needs to be
fixed. Google either of the sysctl names and look at all the people
running into issues and needing to raise them. It's only going to
become more common to raise these with people trying to use lots of
fine-grained sandboxing. Process-per-request is back in style.


Re: [RFC] Warn the user when they could overflow mapcount

2018-02-08 Thread Daniel Micay
>> That seems pretty bad.  So here's a patch which adds documentation to the
>> two sysctls that a sysadmin could use to shoot themselves in the foot,
>> and adds a warning if they change either of them to a dangerous value.
>
> I have negative feelings about this patch, mostly because AFAICS:
>
>  - It documents an issue instead of fixing it.
>  - It likely only addresses a small part of the actual problem.

The standard map_max_count / pid_max are very low and there are many
situations where either or both need to be raised.

VM fragmentation in long-lived processes is a major issue. There are
allocators like jemalloc designed to minimize VM fragmentation by
never unmapping memory but they're relying on not having anything else
using mmap regularly so they can have all their ranges merged
together, unless they decide to do something like making a 1TB
PROT_NONE mapping up front to slowly consume. If you Google this
sysctl name, you'll find lots of people running into the limit. If
you're using a debugging / hardened allocator designed to use a lot of
guard pages, the standard map_max_count is close to unusable...

I think the same thing applies to pid_max. There are too many
reasonable reasons to increase it. Process-per-request is quite
reasonable if you care about robustness / security and want to sandbox
each request handler. Look at Chrome / Chromium: it's currently
process-per-site-instance, but they're moving to having more processes
with site isolation to isolate iframes into their own processes to
work towards enforcing the boundaries between sites at a process
level. It's way worse for fine-grained server-side sandboxing. Using a
lot of processes like this does counter VM fragmentation especially if
long-lived processes doing a lot of work are mostly avoided... but if
your allocators like using guard pages you're still going to hit the
limit.

I do think the default value in the documentation should be fixed but
if there's a clear problem with raising these it really needs to be
fixed. Google either of the sysctl names and look at all the people
running into issues and needing to raise them. It's only going to
become more common to raise these with people trying to use lots of
fine-grained sandboxing. Process-per-request is back in style.


Re: [kernel-hardening] Re: [PATCHv2 5/7] printk: allow kmsg to be encrypted using public key encryption

2018-01-16 Thread Daniel Micay
> Do you have any backing from makers of such devices? I'd like to hear
> from Google's Android team or whoever that would turn this feature on.

(I'm not a Google employee, but I work on Android security and
contribute some of that to AOSP.)

Android restricts access to dmesg with SELinux, so it's not really an
issue there. They previously used dmesg_restrict but switched to
SELinux for fine-grained control without using the capability. In
general, the access control features / toggles proposed on
kernel-hardening don't do much for Android because there's nothing
unconfined and there's no system administrator reconfiguring the
system and disabling SELinux to make it work. The toggles like
dmesg_restrict tend to be too coarse without a good way to make
exceptions.

Unprivileged processes including apps can't access dmesg, debugfs,
sysfs (other than a tiny set of exceptions), procfs outside of
/proc/net and /proc/PID (but it uses hidepid=2) and a few whitelisted
files, etc. That's mostly done with SELinux along with using it for
ioctl command whitelisting to have per-device per-domain whitelists of
commands instead of using their global seccomp-bpf policy for it. The
hidepid=2 usage is an example of a feature where SELinux doesn't quite
fit the task as an alternative since apps are all in the untrusted_app
/ isolated_app domains but each have a unique uid/gid. It works
because hidepid has a gid option to set fine-grained exceptions,
rather than only being able to bypass it with a capability.

They do have some out-of-tree access control features and finding ways
to upstream those might be useful:

There's the added perf_event_paranoid=3 level which is set by default.
It's toggled off when a developer using profiling tools via Android
Debug Bridge (USB) after enabling ADB support in the OS and
whitelisting their key. That was submitted upstream again for Android
but it didn't end up landing. An LSM hook for perf events might help
but Android uses a fully static SELinux policy and would need a way to
toggle it off / on.

They also have group-based control over sockets used to implement the
INTERNET permission but technically they could double the number of
app domains and use SELinux for that if they absolutely had to avoid
an out-of-tree patch. They might not design it the same way today
since it predates using SELinux.


Re: [kernel-hardening] Re: [PATCHv2 5/7] printk: allow kmsg to be encrypted using public key encryption

2018-01-16 Thread Daniel Micay
> Do you have any backing from makers of such devices? I'd like to hear
> from Google's Android team or whoever that would turn this feature on.

(I'm not a Google employee, but I work on Android security and
contribute some of that to AOSP.)

Android restricts access to dmesg with SELinux, so it's not really an
issue there. They previously used dmesg_restrict but switched to
SELinux for fine-grained control without using the capability. In
general, the access control features / toggles proposed on
kernel-hardening don't do much for Android because there's nothing
unconfined and there's no system administrator reconfiguring the
system and disabling SELinux to make it work. The toggles like
dmesg_restrict tend to be too coarse without a good way to make
exceptions.

Unprivileged processes including apps can't access dmesg, debugfs,
sysfs (other than a tiny set of exceptions), procfs outside of
/proc/net and /proc/PID (but it uses hidepid=2) and a few whitelisted
files, etc. That's mostly done with SELinux along with using it for
ioctl command whitelisting to have per-device per-domain whitelists of
commands instead of using their global seccomp-bpf policy for it. The
hidepid=2 usage is an example of a feature where SELinux doesn't quite
fit the task as an alternative since apps are all in the untrusted_app
/ isolated_app domains but each have a unique uid/gid. It works
because hidepid has a gid option to set fine-grained exceptions,
rather than only being able to bypass it with a capability.

They do have some out-of-tree access control features and finding ways
to upstream those might be useful:

There's the added perf_event_paranoid=3 level which is set by default.
It's toggled off when a developer using profiling tools via Android
Debug Bridge (USB) after enabling ADB support in the OS and
whitelisting their key. That was submitted upstream again for Android
but it didn't end up landing. An LSM hook for perf events might help
but Android uses a fully static SELinux policy and would need a way to
toggle it off / on.

They also have group-based control over sockets used to implement the
INTERNET permission but technically they could double the number of
app domains and use SELinux for that if they absolutely had to avoid
an out-of-tree patch. They might not design it the same way today
since it predates using SELinux.


Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules

2017-11-30 Thread Daniel Micay
It was suggested that the feature would only be adopted in niches like
Android and I pointed out that it's not really relevant to Android.

It's a waste of time to try convincing me that it's useful elsewhere. I
never said or implied that it wasn't.

On Thu, 2017-11-30 at 09:50 +0100, Djalal Harouni wrote:
> On Thu, Nov 30, 2017 at 7:51 AM, Daniel Micay <danielmi...@gmail.com>
> wrote:
> [...]
> > Lots of potential module attack surface also gets eliminated by
> > default
> > via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl
> > commands, etc. The global seccomp whitelist might be relevant in
> > some
> > cases too.
> 
> In embedded systems we can't maintain a SELinux policy, distro man
> power hardly manage. We have abstracted seccomp etc, but the kernel
> inherited the difficult multiplex things, plus all other paths that
> trigger this.

It's cheaper to use an existing system like Android Things where device
makers only need to make their apps and perhaps some userspace hardware
drivers for cases not covered by mainline kernel drivers. I don't think
it makes sense for every device vendor to manage an OS and I seriously
doubt that's how the ecosystem is going to end up as it matures.

> > Android devices like to build everything into the kernel too, so
> > even if
> > they weren't using a module this feature wouldn't usually help them.
> > It
> > would need to work like this existing sysctl:
> > 
> > net.ipv4.tcp_available_congestion_control = cubic reno lp
> > 
> > i.e. whitelists for functionality offered by the modules, not just
> > whether they can be loaded.
> 
> Yes, but it is hard to maintain a whitelist policy, the code is hardly
> maintained... if you include everything you should have an LSM policy
> or something like that, and compiling kernels is expert thing.

I'm not talking about whitelist vs. blacklist, compiling kernels or
anything like that.

> Otherwise IMHO the kernel should provide default secure behaviour on
> how to load or add new functionality to the running one. From a user
> perspective, a switch "yes/no" that a privileged entity will
> *understand* and assume is what should be there, and the switch or
> flag as discussed here is local to processes, the sysctl will be
> removed. IMO it should come from userspace point of view, cause as an
> example the sysctl:
> 
> net.ipv4.tcp_available_congestion_control = cubic reno lp
> 
> Is kernel thing, too technical, userspace developers, admins or
> privileged entity will not understand what cubic or reno mean.

Congestion control algorithms are being used as an example in terms of
the mechanism being used to control which are available to unprivileged
users. The obscurity of congestion control algorithms is irrelevant.

>  Doing
> the same per functionality directly like this seems to much of a
> burden compared to the use case. The kernel maybe can do this to
> advance the art of the networking stack and for advanced cases, but in
> IMHO a sane default behaviour + an abstracted process/sandbox flag
> "yes/no" for most others, userspace developers and humans is what
> should be provided and we need the kernel to help here.

There are cases where unprivileged module auto-loading is relied upon
like network protocols. Having configuration for which protocols can be
used by unprivileged users is superior to limiting only which ones can
be auto-loaded. That's why I bought up the existing congestion control
knob. It works well in terms of having a whitelist of the sane, widely
used cases with exposing anything obscure requiring configuration. They
happen to be implemented as modules too.

Killing off unprivileged module loading other than a few cases like that
makes sense, and then those can provide similar control with similarly
sane defaults.


Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules

2017-11-30 Thread Daniel Micay
It was suggested that the feature would only be adopted in niches like
Android and I pointed out that it's not really relevant to Android.

It's a waste of time to try convincing me that it's useful elsewhere. I
never said or implied that it wasn't.

On Thu, 2017-11-30 at 09:50 +0100, Djalal Harouni wrote:
> On Thu, Nov 30, 2017 at 7:51 AM, Daniel Micay 
> wrote:
> [...]
> > Lots of potential module attack surface also gets eliminated by
> > default
> > via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl
> > commands, etc. The global seccomp whitelist might be relevant in
> > some
> > cases too.
> 
> In embedded systems we can't maintain a SELinux policy, distro man
> power hardly manage. We have abstracted seccomp etc, but the kernel
> inherited the difficult multiplex things, plus all other paths that
> trigger this.

It's cheaper to use an existing system like Android Things where device
makers only need to make their apps and perhaps some userspace hardware
drivers for cases not covered by mainline kernel drivers. I don't think
it makes sense for every device vendor to manage an OS and I seriously
doubt that's how the ecosystem is going to end up as it matures.

> > Android devices like to build everything into the kernel too, so
> > even if
> > they weren't using a module this feature wouldn't usually help them.
> > It
> > would need to work like this existing sysctl:
> > 
> > net.ipv4.tcp_available_congestion_control = cubic reno lp
> > 
> > i.e. whitelists for functionality offered by the modules, not just
> > whether they can be loaded.
> 
> Yes, but it is hard to maintain a whitelist policy, the code is hardly
> maintained... if you include everything you should have an LSM policy
> or something like that, and compiling kernels is expert thing.

I'm not talking about whitelist vs. blacklist, compiling kernels or
anything like that.

> Otherwise IMHO the kernel should provide default secure behaviour on
> how to load or add new functionality to the running one. From a user
> perspective, a switch "yes/no" that a privileged entity will
> *understand* and assume is what should be there, and the switch or
> flag as discussed here is local to processes, the sysctl will be
> removed. IMO it should come from userspace point of view, cause as an
> example the sysctl:
> 
> net.ipv4.tcp_available_congestion_control = cubic reno lp
> 
> Is kernel thing, too technical, userspace developers, admins or
> privileged entity will not understand what cubic or reno mean.

Congestion control algorithms are being used as an example in terms of
the mechanism being used to control which are available to unprivileged
users. The obscurity of congestion control algorithms is irrelevant.

>  Doing
> the same per functionality directly like this seems to much of a
> burden compared to the use case. The kernel maybe can do this to
> advance the art of the networking stack and for advanced cases, but in
> IMHO a sane default behaviour + an abstracted process/sandbox flag
> "yes/no" for most others, userspace developers and humans is what
> should be provided and we need the kernel to help here.

There are cases where unprivileged module auto-loading is relied upon
like network protocols. Having configuration for which protocols can be
used by unprivileged users is superior to limiting only which ones can
be auto-loaded. That's why I bought up the existing congestion control
knob. It works well in terms of having a whitelist of the sane, widely
used cases with exposing anything obscure requiring configuration. They
happen to be implemented as modules too.

Killing off unprivileged module loading other than a few cases like that
makes sense, and then those can provide similar control with similarly
sane defaults.


Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules

2017-11-29 Thread Daniel Micay
> And once you disable it by default, and it becomes purely opt-in, that
> means that nothing will change for most cases. Some embedded people
> that do their own thing (ie Android) might change, but normal
> distributions probably won't.
> 
> Yes, Android may be 99% of the users, and yes, the embedded world in
> general needs to be secure, but I'd still like this to be something
> that helps _everybody_.

Android devices won't get much benefit since they ship a tiny set of
modules chosen for the device. The kernels already get very stripped
down to the bare minimum vs. enabling every feature and driver available
and shipping it all by default on a traditional distribution.

Lots of potential module attack surface also gets eliminated by default
via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl
commands, etc. The global seccomp whitelist might be relevant in some
cases too.

Android devices like to build everything into the kernel too, so even if
they weren't using a module this feature wouldn't usually help them. It
would need to work like this existing sysctl:

net.ipv4.tcp_available_congestion_control = cubic reno lp

i.e. whitelists for functionality offered by the modules, not just
whether they can be loaded.


Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules

2017-11-29 Thread Daniel Micay
> And once you disable it by default, and it becomes purely opt-in, that
> means that nothing will change for most cases. Some embedded people
> that do their own thing (ie Android) might change, but normal
> distributions probably won't.
> 
> Yes, Android may be 99% of the users, and yes, the embedded world in
> general needs to be secure, but I'd still like this to be something
> that helps _everybody_.

Android devices won't get much benefit since they ship a tiny set of
modules chosen for the device. The kernels already get very stripped
down to the bare minimum vs. enabling every feature and driver available
and shipping it all by default on a traditional distribution.

Lots of potential module attack surface also gets eliminated by default
via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl
commands, etc. The global seccomp whitelist might be relevant in some
cases too.

Android devices like to build everything into the kernel too, so even if
they weren't using a module this feature wouldn't usually help them. It
would need to work like this existing sysctl:

net.ipv4.tcp_available_congestion_control = cubic reno lp

i.e. whitelists for functionality offered by the modules, not just
whether they can be loaded.


Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Daniel Micay
On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote:
> Quoting Daniel Micay (danielmi...@gmail.com):
> > Substantial added attack surface will never go away as a problem.
> > There
> > aren't a finite number of vulnerabilities to be found.
> 
> There's varying levels of usefulness and quality.  There is code which
> I
> want to be able to use in a container, and code which I can't ever see
> a
> reason for using there.  The latter, especially if it's also in a
> staging driver, would be nice to have a toggle to disable.
> 
> You're not advocating dropping the added attack surface, only adding a
> way of dealing with an 0day after the fact.  Privilege raising 0days
> can
> exist anywhere, not just in code which only root in a user namespace
> can
> exercise.  So from that point of view, ksplice seems a more complete
> solution.  Why not just actually fix the bad code block when we know
> about it?

That's not what I'm advocating. I only care about it for proactive
attack surface reduction downstream. I have no interest in using it to
block access to known vulnerabilities.

> Finally, it has been well argued that you can gain many new caps from
> having only a few others.  Given that, how could you ever be sure
> that,
> if an 0day is found which allows root in a user ns to abuse
> CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> would suffice?

I didn't suggest using it that way...

>  It seems to me that the existing control in
> /proc/sys/kernel/unprivileged_userns_clone might be the better duct
> tape
> in that case.

There's no such thing as unprivileged_userns_clone in mainline.

The advantage of this over unprivileged_userns_clone in Debian and maybe
some other distributions is not giving up unprivileged app containers /
sandboxes implemented via user namespaces.  For example, Chromium's user
namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium
will be dropping their setuid sandbox, forcing usage of user namespaces
to avoid losing the sandbox which will greatly increase local kernel
attack surface on the host by exposing netfilter management, etc. to
unprivileged users.

The proposed approach isn't necessarily the best way to implement this
kind of mitigation but I think it's filling a real need.


Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Daniel Micay
On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote:
> Quoting Daniel Micay (danielmi...@gmail.com):
> > Substantial added attack surface will never go away as a problem.
> > There
> > aren't a finite number of vulnerabilities to be found.
> 
> There's varying levels of usefulness and quality.  There is code which
> I
> want to be able to use in a container, and code which I can't ever see
> a
> reason for using there.  The latter, especially if it's also in a
> staging driver, would be nice to have a toggle to disable.
> 
> You're not advocating dropping the added attack surface, only adding a
> way of dealing with an 0day after the fact.  Privilege raising 0days
> can
> exist anywhere, not just in code which only root in a user namespace
> can
> exercise.  So from that point of view, ksplice seems a more complete
> solution.  Why not just actually fix the bad code block when we know
> about it?

That's not what I'm advocating. I only care about it for proactive
attack surface reduction downstream. I have no interest in using it to
block access to known vulnerabilities.

> Finally, it has been well argued that you can gain many new caps from
> having only a few others.  Given that, how could you ever be sure
> that,
> if an 0day is found which allows root in a user ns to abuse
> CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> would suffice?

I didn't suggest using it that way...

>  It seems to me that the existing control in
> /proc/sys/kernel/unprivileged_userns_clone might be the better duct
> tape
> in that case.

There's no such thing as unprivileged_userns_clone in mainline.

The advantage of this over unprivileged_userns_clone in Debian and maybe
some other distributions is not giving up unprivileged app containers /
sandboxes implemented via user namespaces.  For example, Chromium's user
namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium
will be dropping their setuid sandbox, forcing usage of user namespaces
to avoid losing the sandbox which will greatly increase local kernel
attack surface on the host by exposing netfilter management, etc. to
unprivileged users.

The proposed approach isn't necessarily the best way to implement this
kind of mitigation but I think it's filling a real need.


Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Daniel Micay
Substantial added attack surface will never go away as a problem. There
aren't a finite number of vulnerabilities to be found.


Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-06 Thread Daniel Micay
Substantial added attack surface will never go away as a problem. There
aren't a finite number of vulnerabilities to be found.


Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary

2017-09-19 Thread Daniel Micay
> Brad trolls us all lightly with this trivia question:
> 
> https://twitter.com/grsecurity/status/905246423591084033

I'll respond to your proposed scenario rather than guessing at what is
being suggested there and if it's actually the same thing as what you've
brought up.

They've stated many times that stack canaries are near useless and
enabling PaX UDEREF on i386 actually disabled stack canaries and
presumably still does after the last public patches:

+   select HAVE_CC_STACKPROTECTOR   if X86_64 || !PAX_MEMORY_UDEREF

Making the kernel more vulnerable to remote stack overflow exploits like
this month's overhyped Bluetooth bug to improve security against local
exploits seems like much more of a compromise. Not that i386 has any
real relevance anymore, but I think that's some interesting context...

It's not like upstream has a monopoly on needing to make hard choices
for security features, I don't see this as one of those hard choices
like whether sacrificing availability for SIZE_OVERFLOW makes sense.

> "For #trivia can you describe one scenario where this change actually
> helps exploitation using similar C string funcs?"
> 
> I suppose the expected answer is:
> 
> The change helps exploitation when the overwriting string ends just
> before the canary.  Its NUL overwriting the NUL byte in the canary
> would
> go undetected.  Before this change, there would be a 255/256 chance of
> detection.
>
> I hope this was considered.  The change might still be a good
> tradeoff,
> or it might not, depending on which scenarios are more likely (leak of
> canary value or the string required in an exploit ending at that exact
> byte location), and we probably lack such statistics.

It was considered that guaranteeing some forms of string overflows are
contained within the frame can have a negative impact but I don't think
it's significant.

There would need to be something worth overwriting between the source of
the overflow and the canary. Since SSP reorders local variables within
the stack frame based on risk of overflow, the targets would be locals
that Clang/GCC considers to have a higher risk of overflow. An array can
only have other arrays between it and the canary and there's ordering by
size (small vs. large at least) within the array section. If there was a
zero byte in any of that other data between the source of the overflow
and the canary, this wouldn't make a difference anyway.

Since the canary is only checked on return, it's also only relevant if
control flow still makes it to the function epilogue and it still
matters after the attacker has accomplished their goal.

The chance of there being a zero as a leading byte is already 1/256, but
there could also be a NUL elsewhere in the canary compromising a subset
of the entropy for this kind of scenario too so it's a tiny bit higher
than a 1/256 chance already. Not much resistance to brute force if it
really does ever end up mattering, unless canaries with NUL in leading
positions were actually *rejected*...

There's a more substantial compromise between zero and non-zero poison
on free but for production use. This was part of a set of changes where
CopperheadOS switched from non-zero fill -> zero fill on free for the
kernel and userspace and added a leading zero byte on 64-bit for heap
and stack canaries in the production builds while keeping around the old
way of doing things for some forms of debug builds. Non-zero fill on
free ended up causing too much harm by turning benign bugs into bugs
that might have been exploitable even beyond DoS in some cases tied to
non-NUL-terminated strings where non-zero poisoning got rid of zeroes
that were otherwise all over the heap containing them or when it broke
code wrongly depending on uninitialized data being zero in practice. I
think there's more of an argument to be had about poison-on-free as a
mitigation. This leading zero byte? Not so much, beyond perhaps wanting
the option to turn it off for bug finding... but there are better tools
for that now.

> I am not proposing to revert the change.  I had actually contemplated
> speaking up when this was discussed, but did not for lack of a better
> suggestion.  We could put/require a NUL in the middle of the canary,
> but
> with the full canary being only 64-bit at most that would also make
> some
> attacks easier.
> 
> So this is JFYI.  No action needed on it, I think.

It might make sense to put it in the middle, but a scenario where it
matters seems unlikely and as you mention it would make it weaker it
some other cases. I think it's already the best choice right now.

I'd choose XOR canaries over a leading zero byte if it could only be one
or the other, but I'm not really convinced that there's any significant
loss compared to the normal canaries.


Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary

2017-09-19 Thread Daniel Micay
> Brad trolls us all lightly with this trivia question:
> 
> https://twitter.com/grsecurity/status/905246423591084033

I'll respond to your proposed scenario rather than guessing at what is
being suggested there and if it's actually the same thing as what you've
brought up.

They've stated many times that stack canaries are near useless and
enabling PaX UDEREF on i386 actually disabled stack canaries and
presumably still does after the last public patches:

+   select HAVE_CC_STACKPROTECTOR   if X86_64 || !PAX_MEMORY_UDEREF

Making the kernel more vulnerable to remote stack overflow exploits like
this month's overhyped Bluetooth bug to improve security against local
exploits seems like much more of a compromise. Not that i386 has any
real relevance anymore, but I think that's some interesting context...

It's not like upstream has a monopoly on needing to make hard choices
for security features, I don't see this as one of those hard choices
like whether sacrificing availability for SIZE_OVERFLOW makes sense.

> "For #trivia can you describe one scenario where this change actually
> helps exploitation using similar C string funcs?"
> 
> I suppose the expected answer is:
> 
> The change helps exploitation when the overwriting string ends just
> before the canary.  Its NUL overwriting the NUL byte in the canary
> would
> go undetected.  Before this change, there would be a 255/256 chance of
> detection.
>
> I hope this was considered.  The change might still be a good
> tradeoff,
> or it might not, depending on which scenarios are more likely (leak of
> canary value or the string required in an exploit ending at that exact
> byte location), and we probably lack such statistics.

It was considered that guaranteeing some forms of string overflows are
contained within the frame can have a negative impact but I don't think
it's significant.

There would need to be something worth overwriting between the source of
the overflow and the canary. Since SSP reorders local variables within
the stack frame based on risk of overflow, the targets would be locals
that Clang/GCC considers to have a higher risk of overflow. An array can
only have other arrays between it and the canary and there's ordering by
size (small vs. large at least) within the array section. If there was a
zero byte in any of that other data between the source of the overflow
and the canary, this wouldn't make a difference anyway.

Since the canary is only checked on return, it's also only relevant if
control flow still makes it to the function epilogue and it still
matters after the attacker has accomplished their goal.

The chance of there being a zero as a leading byte is already 1/256, but
there could also be a NUL elsewhere in the canary compromising a subset
of the entropy for this kind of scenario too so it's a tiny bit higher
than a 1/256 chance already. Not much resistance to brute force if it
really does ever end up mattering, unless canaries with NUL in leading
positions were actually *rejected*...

There's a more substantial compromise between zero and non-zero poison
on free but for production use. This was part of a set of changes where
CopperheadOS switched from non-zero fill -> zero fill on free for the
kernel and userspace and added a leading zero byte on 64-bit for heap
and stack canaries in the production builds while keeping around the old
way of doing things for some forms of debug builds. Non-zero fill on
free ended up causing too much harm by turning benign bugs into bugs
that might have been exploitable even beyond DoS in some cases tied to
non-NUL-terminated strings where non-zero poisoning got rid of zeroes
that were otherwise all over the heap containing them or when it broke
code wrongly depending on uninitialized data being zero in practice. I
think there's more of an argument to be had about poison-on-free as a
mitigation. This leading zero byte? Not so much, beyond perhaps wanting
the option to turn it off for bug finding... but there are better tools
for that now.

> I am not proposing to revert the change.  I had actually contemplated
> speaking up when this was discussed, but did not for lack of a better
> suggestion.  We could put/require a NUL in the middle of the canary,
> but
> with the full canary being only 64-bit at most that would also make
> some
> attacks easier.
> 
> So this is JFYI.  No action needed on it, I think.

It might make sense to put it in the middle, but a scenario where it
matters seems unlikely and as you mention it would make it weaker it
some other cases. I think it's already the best choice right now.

I'd choose XOR canaries over a leading zero byte if it could only be one
or the other, but I'm not really convinced that there's any significant
loss compared to the normal canaries.


Re: nios2 crash due to 'init/main.c: extract early boot entropy from the passed cmdline'

2017-09-11 Thread Daniel Micay
On Mon, 2017-09-11 at 10:35 -0700, Guenter Roeck wrote:
> On Mon, Sep 11, 2017 at 09:36:00AM -0700, Kees Cook wrote:
> > On Sat, Sep 9, 2017 at 8:58 PM, Guenter Roeck 
> > wrote:
> > > Hi,
> > > 
> > > I noticed that nios2 images crash in mainline. Bisect points to
> > > commit
> > > 33d72f3822d7 ("init/main.c: extract early boot entropy from the
> > > passed
> > > cmdline").  Bisect log is attached.
> > > 
> > > As far as I can see, the problem is seen because
> > > add_device_randomness()
> > > calls random_get_entropy(). However, the underlying timer function
> > > used by the nios2 architecture (nios2_timer_read) is not yet
> > > initialized,
> > > causing a NULL pointer access and crash. A sample crash log is at
> > > http://kerneltests.org/builders/qemu-nios2-master/builds/1
> > > 75/steps/qemubuildcommand/logs/stdio
> > 
> > Oh, yikes. Do you have a full call trace? (Does this come through
> > get_cycles() or via the It seems like we could either initialize the
> > timer earlier or allow it to fall back when not initialized...
> > 
> 
> nios2 doesn't give me a traceback. I followed it by adding debug
> messages.
> The code path is through get_cycles().
> 
> On nios2:
> 
> static u64 nios2_timer_read(struct clocksource *cs)
> {
>   struct nios2_clocksource *nios2_cs = to_nios2_clksource(cs);
>   unsigned long flags;
>   u32 count;
> 
>   local_irq_save(flags);
>   count = read_timersnapshot(_cs->timer);   // <- not
> initialized
>   local_irq_restore(flags);
> 
>   /* Counter is counting down */
>   return ~count;
> }
> 
> cycles_t get_cycles(void)
> {
> return nios2_timer_read(_cs.cs);
> }
> EXPORT_SYMBOL(get_cycles);
> 
> Guenter

Maybe it should WARN and return 0 for now if that's NULL?


Re: nios2 crash due to 'init/main.c: extract early boot entropy from the passed cmdline'

2017-09-11 Thread Daniel Micay
On Mon, 2017-09-11 at 10:35 -0700, Guenter Roeck wrote:
> On Mon, Sep 11, 2017 at 09:36:00AM -0700, Kees Cook wrote:
> > On Sat, Sep 9, 2017 at 8:58 PM, Guenter Roeck 
> > wrote:
> > > Hi,
> > > 
> > > I noticed that nios2 images crash in mainline. Bisect points to
> > > commit
> > > 33d72f3822d7 ("init/main.c: extract early boot entropy from the
> > > passed
> > > cmdline").  Bisect log is attached.
> > > 
> > > As far as I can see, the problem is seen because
> > > add_device_randomness()
> > > calls random_get_entropy(). However, the underlying timer function
> > > used by the nios2 architecture (nios2_timer_read) is not yet
> > > initialized,
> > > causing a NULL pointer access and crash. A sample crash log is at
> > > http://kerneltests.org/builders/qemu-nios2-master/builds/1
> > > 75/steps/qemubuildcommand/logs/stdio
> > 
> > Oh, yikes. Do you have a full call trace? (Does this come through
> > get_cycles() or via the It seems like we could either initialize the
> > timer earlier or allow it to fall back when not initialized...
> > 
> 
> nios2 doesn't give me a traceback. I followed it by adding debug
> messages.
> The code path is through get_cycles().
> 
> On nios2:
> 
> static u64 nios2_timer_read(struct clocksource *cs)
> {
>   struct nios2_clocksource *nios2_cs = to_nios2_clksource(cs);
>   unsigned long flags;
>   u32 count;
> 
>   local_irq_save(flags);
>   count = read_timersnapshot(_cs->timer);   // <- not
> initialized
>   local_irq_restore(flags);
> 
>   /* Counter is counting down */
>   return ~count;
> }
> 
> cycles_t get_cycles(void)
> {
> return nios2_timer_read(_cs.cs);
> }
> EXPORT_SYMBOL(get_cycles);
> 
> Guenter

Maybe it should WARN and return 0 for now if that's NULL?


Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline

2017-08-17 Thread Daniel Micay
> I did say 'external attacker' but it could be made clearer.

Er, s/say/mean to imply/

I do think it will have some local value after Android 8 which should
start shipping in a few days though.

I'll look into having the kernel stash some entropy in pstore soon since
that seems like it could be a great improvement. I'm not sure how often
/ where it should hook into for regularly refreshing it though. Doing it
only on powering down isn't ideal.


Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline

2017-08-17 Thread Daniel Micay
> I did say 'external attacker' but it could be made clearer.

Er, s/say/mean to imply/

I do think it will have some local value after Android 8 which should
start shipping in a few days though.

I'll look into having the kernel stash some entropy in pstore soon since
that seems like it could be a great improvement. I'm not sure how often
/ where it should hook into for regularly refreshing it though. Doing it
only on powering down isn't ideal.


Re: [kernel-hardening] [PATCHv2 2/2] extract early boot entropy from the passed cmdline

2017-08-16 Thread Daniel Micay
On Wed, 2017-08-16 at 21:58 -0700, Kees Cook wrote:
> On Wed, Aug 16, 2017 at 9:56 PM, Nick Kralevich <n...@google.com>
> wrote:
> > On Wed, Aug 16, 2017 at 3:46 PM, Laura Abbott <labb...@redhat.com>
> > wrote:
> > > From: Daniel Micay <danielmi...@gmail.com>
> > > 
> > > Existing Android bootloaders usually pass data useful as early
> > > entropy
> > > on the kernel command-line. It may also be the case on other
> > > embedded
> > > systems. Sample command-line from a Google Pixel running
> > > CopperheadOS:
> > > 
> > 
> > Why is it better to put this into the kernel, rather than just rely
> > on
> > the existing userspace functionality which does exactly the same
> > thing? This is what Android already does today:
> > https://android-review.googlesource.com/198113
> 
> That's too late for setting up the kernel stack canary, among other
> things. The kernel will also be generating some early secrets for slab
> cache canaries, etc. That all needs to happen well before init is
> started.
> 
> -Kees
> 

It's also unfortunately the kernel's global stack canary for the entire
boot since unlike x86 there aren't per-task canaries. GCC / Clang access
it via a segment register on x86 vs. a global on other architectures.

In theory it could be task-local elsewhere but doing it efficiently
would imply reserving a register to store the random value. I think that
may actually end up helping performance more than it hurts by not
needing to read the global stack canary value from cache repeatedly. If
stack canaries were augmented into something more (XOR in the retaddr
and offer the option of more coverage than STRONG) it would be more
important.


Re: [kernel-hardening] [PATCHv2 2/2] extract early boot entropy from the passed cmdline

2017-08-16 Thread Daniel Micay
On Wed, 2017-08-16 at 21:58 -0700, Kees Cook wrote:
> On Wed, Aug 16, 2017 at 9:56 PM, Nick Kralevich 
> wrote:
> > On Wed, Aug 16, 2017 at 3:46 PM, Laura Abbott 
> > wrote:
> > > From: Daniel Micay 
> > > 
> > > Existing Android bootloaders usually pass data useful as early
> > > entropy
> > > on the kernel command-line. It may also be the case on other
> > > embedded
> > > systems. Sample command-line from a Google Pixel running
> > > CopperheadOS:
> > > 
> > 
> > Why is it better to put this into the kernel, rather than just rely
> > on
> > the existing userspace functionality which does exactly the same
> > thing? This is what Android already does today:
> > https://android-review.googlesource.com/198113
> 
> That's too late for setting up the kernel stack canary, among other
> things. The kernel will also be generating some early secrets for slab
> cache canaries, etc. That all needs to happen well before init is
> started.
> 
> -Kees
> 

It's also unfortunately the kernel's global stack canary for the entire
boot since unlike x86 there aren't per-task canaries. GCC / Clang access
it via a segment register on x86 vs. a global on other architectures.

In theory it could be task-local elsewhere but doing it efficiently
would imply reserving a register to store the random value. I think that
may actually end up helping performance more than it hurts by not
needing to read the global stack canary value from cache repeatedly. If
stack canaries were augmented into something more (XOR in the retaddr
and offer the option of more coverage than STRONG) it would be more
important.


Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline

2017-08-16 Thread Daniel Micay
On Wed, 2017-08-16 at 23:31 -0400, Theodore Ts'o wrote:
> On Wed, Aug 16, 2017 at 04:14:58PM -0700, Laura Abbott wrote:
> > From: Daniel Micay <danielmi...@gmail.com>
> > 
> > Existing Android bootloaders usually pass data useful as early
> > entropy
> > on the kernel command-line. It may also be the case on other
> > embedded
> > systems.
> 
> May I suggest a slight adjustment to the beginning commit description?
> 
>Feed the boot command-line as to the /dev/random entropy pool
> 
>Existing Android bootloaders usually pass data which may not be
>known by an external attacker on the kernel command-line.  It may
>also be the case on other embedded systems.  Sample command-line
>from a Google Pixel running CopperheadOS
> 
> The idea here is to if anything, err on the side of under-promising
> the amount of security we can guarantee that this technique will
> provide.  For example, how hard is it really for an attacker who has
> an APK installed locally to get the device serial number?  Or the OS
> version?  And how much variability is there in the bootloader stages
> in milliseconds?

The serial number is currently accessible to local apps up until Android
7.x so it doesn't have value if the adversary has local access. Access
to it without the READ_PHONE_STATE permission is being removed for apps
targeting Android 8.0 and will presumably be restructed for all apps at
some point in the future:

https://android-developers.googleblog.com/2017/04/changes-to-device-identifiers-in.html

Some bootloader stages vary a bit in time each boot. There's not much
variance or measurement precision so there's only a small amount of
entropy from this. The ones that consistently vary in timing do so
independently from each other so that helps a bit. Also worth noting
that before Android 8.0+, local apps can access the boot times since
it's written to a system property. After Android 8.0+, all that stuff is
inaccessible to them (no permission to get them) since there's a
whitelisting model for system property access.

> I think we should definitely do this.  So this is more of a request to
> be very careful what we promise in the commit description, not an
> objection to the change itself.

I did say 'external attacker' but it could be made clearer. It's
primarily aimed at getting a tiny bit of extra entropy for the kernel
stack canary and other probabilistic exploit mitigations set up in early
boot. On non-x86 archs, i.e. 99.9% of Android devices, the kernel stack
canary remains the same after it's set up in that early boot code.

Android devices almost all have a hardware RNG and Android init blocks
until a fair bit of data is read from it along with restoring entropy
that's regularly saved while running, but unfortunately that's not
available at this point in the boot process.

The kernel could save / restore entropy using pstore (which at least
Nexus / Pixel devices have - not sure about others). I don't know how
early that could feasibly be done. Ideally it would do that combined
with early usage of the hwrng.


Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline

2017-08-16 Thread Daniel Micay
On Wed, 2017-08-16 at 23:31 -0400, Theodore Ts'o wrote:
> On Wed, Aug 16, 2017 at 04:14:58PM -0700, Laura Abbott wrote:
> > From: Daniel Micay 
> > 
> > Existing Android bootloaders usually pass data useful as early
> > entropy
> > on the kernel command-line. It may also be the case on other
> > embedded
> > systems.
> 
> May I suggest a slight adjustment to the beginning commit description?
> 
>Feed the boot command-line as to the /dev/random entropy pool
> 
>Existing Android bootloaders usually pass data which may not be
>known by an external attacker on the kernel command-line.  It may
>also be the case on other embedded systems.  Sample command-line
>from a Google Pixel running CopperheadOS
> 
> The idea here is to if anything, err on the side of under-promising
> the amount of security we can guarantee that this technique will
> provide.  For example, how hard is it really for an attacker who has
> an APK installed locally to get the device serial number?  Or the OS
> version?  And how much variability is there in the bootloader stages
> in milliseconds?

The serial number is currently accessible to local apps up until Android
7.x so it doesn't have value if the adversary has local access. Access
to it without the READ_PHONE_STATE permission is being removed for apps
targeting Android 8.0 and will presumably be restructed for all apps at
some point in the future:

https://android-developers.googleblog.com/2017/04/changes-to-device-identifiers-in.html

Some bootloader stages vary a bit in time each boot. There's not much
variance or measurement precision so there's only a small amount of
entropy from this. The ones that consistently vary in timing do so
independently from each other so that helps a bit. Also worth noting
that before Android 8.0+, local apps can access the boot times since
it's written to a system property. After Android 8.0+, all that stuff is
inaccessible to them (no permission to get them) since there's a
whitelisting model for system property access.

> I think we should definitely do this.  So this is more of a request to
> be very careful what we promise in the commit description, not an
> objection to the change itself.

I did say 'external attacker' but it could be made clearer. It's
primarily aimed at getting a tiny bit of extra entropy for the kernel
stack canary and other probabilistic exploit mitigations set up in early
boot. On non-x86 archs, i.e. 99.9% of Android devices, the kernel stack
canary remains the same after it's set up in that early boot code.

Android devices almost all have a hardware RNG and Android init blocks
until a fair bit of data is read from it along with restoring entropy
that's regularly saved while running, but unfortunately that's not
available at this point in the boot process.

The kernel could save / restore entropy using pstore (which at least
Nexus / Pixel devices have - not sure about others). I don't know how
early that could feasibly be done. Ideally it would do that combined
with early usage of the hwrng.


Re: drivers/tty/serial/8250/8250_fintek.c:364: warning: 'probe_data' is used uninitialized in this function

2017-08-09 Thread Daniel Micay
On Wed, 2017-08-09 at 17:32 +0200, Arnd Bergmann wrote:
> On Wed, Aug 9, 2017 at 5:07 PM, kbuild test robot
>  wrote:
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin
> > ux.git master
> > head:   bfa738cf3dfae2111626650f86135f93c5ff0a22
> > commit: 6974f0c4555e285ab217cee58b6e874f776ff409
> > include/linux/string.h: add the option of fortified string.h
> > functions
> > date:   4 weeks ago
> > config: x86_64-randconfig-v0-08092220 (attached as .config)
> > compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> > reproduce:
> > git checkout 6974f0c4555e285ab217cee58b6e874f776ff409
> > # save the attached .config to linux build tree
> > make ARCH=x86_64
> > 
> > All warnings (new ones prefixed by >>):
> > 
> >In file included from include/linux/bitmap.h:8,
> > from include/linux/cpumask.h:11,
> > from arch/x86/include/asm/cpumask.h:4,
> > from arch/x86/include/asm/msr.h:10,
> > from arch/x86/include/asm/processor.h:20,
> > from arch/x86/include/asm/cpufeature.h:4,
> > from arch/x86/include/asm/thread_info.h:52,
> > from include/linux/thread_info.h:37,
> > from arch/x86/include/asm/preempt.h:6,
> > from include/linux/preempt.h:80,
> > from include/linux/spinlock.h:50,
> > from include/linux/seqlock.h:35,
> > from include/linux/time.h:5,
> > from include/linux/stat.h:18,
> > from include/linux/module.h:10,
> > from drivers/tty/serial/8250/8250_fintek.c:11:
> >include/linux/string.h: In function 'strcpy':
> >include/linux/string.h:209: warning: '__f' is static but
> > declared in inline function 'strcpy' which is not static
> >include/linux/string.h:211: warning: '__f' is static but
> > declared in inline function 'strcpy' which is not static
> 
> 
> This clearly comes from __trace_if() when CONFIG_PROFILE_ALL_BRANCHES
> is enabled. I did not see the warning with gcc-7.1.1, and I guess this
> only
> happens on older compilers like the gcc-4.4 that was used here.
> 
> What is the reason for __FORTIFY_INLINE to be "extern __always_inline"
> rather than "static __always_inline"? If they cannot just be 'static',
> maybe
> this can be changed to depend on the compiler version.
> 
>Arnd

It's important to get the semantics of using extern. It means if you do
something like , it resolves to the address of the direct symbol
instead of generating a useless thunk for that object file. It might
also be required for Clang compatibility, I don't remember.

It could have a compiler version dependency or maybe one specifically
tied to old compiler && CONFIG_PROFILE_ALL_BRANCHES / other options that
conflict with it like that.


Re: drivers/tty/serial/8250/8250_fintek.c:364: warning: 'probe_data' is used uninitialized in this function

2017-08-09 Thread Daniel Micay
On Wed, 2017-08-09 at 17:32 +0200, Arnd Bergmann wrote:
> On Wed, Aug 9, 2017 at 5:07 PM, kbuild test robot
>  wrote:
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin
> > ux.git master
> > head:   bfa738cf3dfae2111626650f86135f93c5ff0a22
> > commit: 6974f0c4555e285ab217cee58b6e874f776ff409
> > include/linux/string.h: add the option of fortified string.h
> > functions
> > date:   4 weeks ago
> > config: x86_64-randconfig-v0-08092220 (attached as .config)
> > compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> > reproduce:
> > git checkout 6974f0c4555e285ab217cee58b6e874f776ff409
> > # save the attached .config to linux build tree
> > make ARCH=x86_64
> > 
> > All warnings (new ones prefixed by >>):
> > 
> >In file included from include/linux/bitmap.h:8,
> > from include/linux/cpumask.h:11,
> > from arch/x86/include/asm/cpumask.h:4,
> > from arch/x86/include/asm/msr.h:10,
> > from arch/x86/include/asm/processor.h:20,
> > from arch/x86/include/asm/cpufeature.h:4,
> > from arch/x86/include/asm/thread_info.h:52,
> > from include/linux/thread_info.h:37,
> > from arch/x86/include/asm/preempt.h:6,
> > from include/linux/preempt.h:80,
> > from include/linux/spinlock.h:50,
> > from include/linux/seqlock.h:35,
> > from include/linux/time.h:5,
> > from include/linux/stat.h:18,
> > from include/linux/module.h:10,
> > from drivers/tty/serial/8250/8250_fintek.c:11:
> >include/linux/string.h: In function 'strcpy':
> >include/linux/string.h:209: warning: '__f' is static but
> > declared in inline function 'strcpy' which is not static
> >include/linux/string.h:211: warning: '__f' is static but
> > declared in inline function 'strcpy' which is not static
> 
> 
> This clearly comes from __trace_if() when CONFIG_PROFILE_ALL_BRANCHES
> is enabled. I did not see the warning with gcc-7.1.1, and I guess this
> only
> happens on older compilers like the gcc-4.4 that was used here.
> 
> What is the reason for __FORTIFY_INLINE to be "extern __always_inline"
> rather than "static __always_inline"? If they cannot just be 'static',
> maybe
> this can be changed to depend on the compiler version.
> 
>Arnd

It's important to get the semantics of using extern. It means if you do
something like , it resolves to the address of the direct symbol
instead of generating a useless thunk for that object file. It might
also be required for Clang compatibility, I don't remember.

It could have a compiler version dependency or maybe one specifically
tied to old compiler && CONFIG_PROFILE_ALL_BRANCHES / other options that
conflict with it like that.


Re: [PATCH] infiniband: avoid overflow warning

2017-07-31 Thread Daniel Micay
On Mon, 2017-07-31 at 14:18 -0700, Kees Cook wrote:
> On Mon, Jul 31, 2017 at 2:10 PM, Arnd Bergmann  wrote:
> > On Mon, Jul 31, 2017 at 10:58 PM, Kees Cook 
> > wrote:
> > > On Mon, Jul 31, 2017 at 12:30 AM, Arnd Bergmann 
> > > wrote:
> > > > On Mon, Jul 31, 2017 at 9:08 AM, Moni Shoua 
> > > > wrote:
> > > > > > break;
> > > > > > default:
> > > > > > return -EINVAL;
> > > > > 
> > > > > what happens if you replace 16 with sizeof(struct in6_addr)?
> > > > 
> > > > Same thing: the problem is that gcc already knows the size of
> > > > the structure we
> > > > pass in here, and it is in fact shorter.
> > > 
> > > So gcc is ignoring both the cast (to 16 byte struct in6_addr) and
> > > the
> > > caller's actual 128 byte struct sockaddr_storage, and looking only
> > > at
> > > struct sockaddr? That seems really weird.
> > 
> > Using a sockaddr_storage on the stack would address the warning, but
> > the question was about just changing the hardcoded 16 to a sizeof()
> > operation, and that has no effect.
> 
> Right, I didn't mean that; I was curious why the fortify macro
> resulted in an error at all. The callers are casting from struct
> sockaddr_storage (large enough) to struct sockaddr (not large enough),
> and then the inline is casting back to sockaddr_in6 (large enough). I
> would have expected fortify to check either sockaddr_storage or
> sockaddr_in6, but not sockaddr.
> 
> -Kees
> 

I don't think that's quite what's happening. It will report unknown if
it can't find allocation sites or other guarantees of size. There are no
alloc_size markers yet so it *mostly* does stack / global checks. It
won't infer sizes based on pointer types. It's not a heuristic.

Hoowever, fortify / -fsanitize=object-size can indirectly uncover other
forms of undefined behavior though. Code may rely on doing something
undefined that GCC actively assumes can't happen for optimization. It
can have false positives due to dead code with the compile-time errors
but if the code is actually called with the size that it rejects as
invalid, then it's unlikely there isn't a bug.


Re: [PATCH] infiniband: avoid overflow warning

2017-07-31 Thread Daniel Micay
On Mon, 2017-07-31 at 14:18 -0700, Kees Cook wrote:
> On Mon, Jul 31, 2017 at 2:10 PM, Arnd Bergmann  wrote:
> > On Mon, Jul 31, 2017 at 10:58 PM, Kees Cook 
> > wrote:
> > > On Mon, Jul 31, 2017 at 12:30 AM, Arnd Bergmann 
> > > wrote:
> > > > On Mon, Jul 31, 2017 at 9:08 AM, Moni Shoua 
> > > > wrote:
> > > > > > break;
> > > > > > default:
> > > > > > return -EINVAL;
> > > > > 
> > > > > what happens if you replace 16 with sizeof(struct in6_addr)?
> > > > 
> > > > Same thing: the problem is that gcc already knows the size of
> > > > the structure we
> > > > pass in here, and it is in fact shorter.
> > > 
> > > So gcc is ignoring both the cast (to 16 byte struct in6_addr) and
> > > the
> > > caller's actual 128 byte struct sockaddr_storage, and looking only
> > > at
> > > struct sockaddr? That seems really weird.
> > 
> > Using a sockaddr_storage on the stack would address the warning, but
> > the question was about just changing the hardcoded 16 to a sizeof()
> > operation, and that has no effect.
> 
> Right, I didn't mean that; I was curious why the fortify macro
> resulted in an error at all. The callers are casting from struct
> sockaddr_storage (large enough) to struct sockaddr (not large enough),
> and then the inline is casting back to sockaddr_in6 (large enough). I
> would have expected fortify to check either sockaddr_storage or
> sockaddr_in6, but not sockaddr.
> 
> -Kees
> 

I don't think that's quite what's happening. It will report unknown if
it can't find allocation sites or other guarantees of size. There are no
alloc_size markers yet so it *mostly* does stack / global checks. It
won't infer sizes based on pointer types. It's not a heuristic.

Hoowever, fortify / -fsanitize=object-size can indirectly uncover other
forms of undefined behavior though. Code may rely on doing something
undefined that GCC actively assumes can't happen for optimization. It
can have false positives due to dead code with the compile-time errors
but if the code is actually called with the size that it rejects as
invalid, then it's unlikely there isn't a bug.


Re: [PATCH] infiniband: avoid overflow warning

2017-07-31 Thread Daniel Micay
On Mon, 2017-07-31 at 21:19 +0200, Arnd Bergmann wrote:
> On Mon, Jul 31, 2017 at 6:17 PM, Bart Van Assche  om> wrote:
> > On Mon, 2017-07-31 at 18:04 +0200, Arnd Bergmann wrote:
> > > On Mon, Jul 31, 2017 at 5:32 PM, Bart Van Assche  > > dc.com> wrote:
> > > > So inetaddr_event() assigns AF_INET so .sin_family and gcc warns
> > > > about code
> > > > that is only executed if .sin_family == AF_INET6? Since this
> > > > warning is the
> > > > result of incorrect interprocedural analysis by gcc, shouldn't
> > > > this be
> > > > reported as a bug to the gcc authors?
> > > 
> > > I think the interprocedural analysis here is just a little worse
> > > than it could
> > > be, but it's not actually correct.  It's not gcc that prints the
> > > warning (if
> > > it did, then I'd agree it would be a gcc bug) but the warning is
> > > triggered
> > > intentionally by the fortified version of memcpy in
> > > include/linux/string.h.
> > > 
> > > The problem as I understand it is that gcc cannot guarantee that
> > > it
> > > tracks the value of addr->sa_family at  least as far as the size
> > > of the
> > > stack object, and it has no strict reason to do so, so the inlined
> > > rdma_ip2gid() will still contain both cases.
> > 
> > Hello Arnd,
> > 
> > Had you already considered to uninline the rdma_ip2gid() function?
> 
> Not really, that would prevent the normal optimization from happening,
> so that would be worse than uninlining addr_event() as I tried.
> 
> It would of course get rid of the warning, so if you think that's a
> better
> solution, I won't complain.
> 
>Arnd

You could also use __memcpy but using a struct assignment seems cleaner.

The compile-time fortify errors aren't perfect since they rely on GCC
being able to optimize them out and there can be dead code that has
intentional overflows, etc. Their purpose is just to move many runtime
errors (which don't have these false positives) to compile-time since
it's a lot less painful to deal with a few false positives like this
than errors slipping through to runtime (although it's less important
since it's going to be using WARN for the time being).

If the compile-time errors would removed, all of the real overflows
would still be detected at runtime. One option would be having something
like #define __NO_FORTIFY but *only* disabling the compile-time part of
the checks to work around false positives. *shrug*


Re: [PATCH] infiniband: avoid overflow warning

2017-07-31 Thread Daniel Micay
On Mon, 2017-07-31 at 21:19 +0200, Arnd Bergmann wrote:
> On Mon, Jul 31, 2017 at 6:17 PM, Bart Van Assche  om> wrote:
> > On Mon, 2017-07-31 at 18:04 +0200, Arnd Bergmann wrote:
> > > On Mon, Jul 31, 2017 at 5:32 PM, Bart Van Assche  > > dc.com> wrote:
> > > > So inetaddr_event() assigns AF_INET so .sin_family and gcc warns
> > > > about code
> > > > that is only executed if .sin_family == AF_INET6? Since this
> > > > warning is the
> > > > result of incorrect interprocedural analysis by gcc, shouldn't
> > > > this be
> > > > reported as a bug to the gcc authors?
> > > 
> > > I think the interprocedural analysis here is just a little worse
> > > than it could
> > > be, but it's not actually correct.  It's not gcc that prints the
> > > warning (if
> > > it did, then I'd agree it would be a gcc bug) but the warning is
> > > triggered
> > > intentionally by the fortified version of memcpy in
> > > include/linux/string.h.
> > > 
> > > The problem as I understand it is that gcc cannot guarantee that
> > > it
> > > tracks the value of addr->sa_family at  least as far as the size
> > > of the
> > > stack object, and it has no strict reason to do so, so the inlined
> > > rdma_ip2gid() will still contain both cases.
> > 
> > Hello Arnd,
> > 
> > Had you already considered to uninline the rdma_ip2gid() function?
> 
> Not really, that would prevent the normal optimization from happening,
> so that would be worse than uninlining addr_event() as I tried.
> 
> It would of course get rid of the warning, so if you think that's a
> better
> solution, I won't complain.
> 
>Arnd

You could also use __memcpy but using a struct assignment seems cleaner.

The compile-time fortify errors aren't perfect since they rely on GCC
being able to optimize them out and there can be dead code that has
intentional overflows, etc. Their purpose is just to move many runtime
errors (which don't have these false positives) to compile-time since
it's a lot less painful to deal with a few false positives like this
than errors slipping through to runtime (although it's less important
since it's going to be using WARN for the time being).

If the compile-time errors would removed, all of the real overflows
would still be detected at runtime. One option would be having something
like #define __NO_FORTIFY but *only* disabling the compile-time part of
the checks to work around false positives. *shrug*


Re: [PATCH] fortify: Use WARN instead of BUG for now

2017-07-27 Thread Daniel Micay
I think the 'else' added in the proposed patch makes it too complicated
for GCC to optimize out the __attribute__((error)) checks before they're
considered to be errors. It's not needed so it's probably best to just
avoid doing something like that. The runtime checks can't get false
positives from overly complex code but the compile-time ones depend on
GCC being able to reliably optimize them out.

This might be easier for GCC:

if (__builtin_constant_p(size) && condition_a) {
compiletimeerror();
}

if (__builtin_constant_p(size) && condition_b) {
compiletimeerror();
}

than the current:

if (__builtin_constant_p(size)) {
if (condition_a) {
compiletimeerror();
}

if (condition_b) {
compiletimeerror();
}
}

but it hasn't had a false positive like that with the current code.

Removing __noreturn is making the inline code more complex from GCC's
perspective too, but hopefully it's neither reducing coverage (i.e. not
making it less able to resolve __builtin_object_size - intuitively it
shouldn't impact it much but you never know) or making GCC unable to
deal with the compile-time checks.


Re: [PATCH] fortify: Use WARN instead of BUG for now

2017-07-27 Thread Daniel Micay
I think the 'else' added in the proposed patch makes it too complicated
for GCC to optimize out the __attribute__((error)) checks before they're
considered to be errors. It's not needed so it's probably best to just
avoid doing something like that. The runtime checks can't get false
positives from overly complex code but the compile-time ones depend on
GCC being able to reliably optimize them out.

This might be easier for GCC:

if (__builtin_constant_p(size) && condition_a) {
compiletimeerror();
}

if (__builtin_constant_p(size) && condition_b) {
compiletimeerror();
}

than the current:

if (__builtin_constant_p(size)) {
if (condition_a) {
compiletimeerror();
}

if (condition_b) {
compiletimeerror();
}
}

but it hasn't had a false positive like that with the current code.

Removing __noreturn is making the inline code more complex from GCC's
perspective too, but hopefully it's neither reducing coverage (i.e. not
making it less able to resolve __builtin_object_size - intuitively it
shouldn't impact it much but you never know) or making GCC unable to
deal with the compile-time checks.


Re: [PATCH] fortify: Use WARN instead of BUG for now

2017-07-26 Thread Daniel Micay
> Maybe we could do two phases? One to s/BUG/WARN/ and the second to
> improve the message?

s/fortify_panic/fortify_overflow/ + use WARN + remove __noreturn makes
sense as one commit. Still think the *option* of __noreturn + BUG should
be kept there even just for measuring the size overhead. !COMPILE_TIME
&& EXPERT if it needs to be for now. If you're fully removing __noreturn
then the entry in tools/objtool/check.c for __noreturn functions also
won't make sense (either way it needs to use the new name).

I think improving error messages should be done a bit differently though
and it'll be easier to not tie these things together.


Re: [PATCH] fortify: Use WARN instead of BUG for now

2017-07-26 Thread Daniel Micay
> Maybe we could do two phases? One to s/BUG/WARN/ and the second to
> improve the message?

s/fortify_panic/fortify_overflow/ + use WARN + remove __noreturn makes
sense as one commit. Still think the *option* of __noreturn + BUG should
be kept there even just for measuring the size overhead. !COMPILE_TIME
&& EXPERT if it needs to be for now. If you're fully removing __noreturn
then the entry in tools/objtool/check.c for __noreturn functions also
won't make sense (either way it needs to use the new name).

I think improving error messages should be done a bit differently though
and it'll be easier to not tie these things together.


Re: [PATCH] fortify: Use WARN instead of BUG for now

2017-07-26 Thread Daniel Micay
It should just be renamed from fortify_panic -> fortify_error, including
in arch/x86/boot/compressed/misc.c and arch/x86/boot/compressed/misc.c.
It can use WARN instead of BUG by with a 'default n', !COMPILE_TEST
option to use BUG again. Otherwise it needs to be patched downstream
when that's wanted.

I don't think splitting it is the right approach to improving the
runtime error handling. That only makes sense for the compile-time
errors due to the limitations of __attribute__((error)). Can we think
about that before changing it? Just make it use WARN for now.

The best debugging experience would be passing along the sizes and
having the fortify_error function convert that into nice error messages.
For memcpy(p, q, n), n can be larger than both the detected sizes of p
and q, not just either one. The error should just be saying the function
name and printing the copy size and maximum sizes of p and q. That's
going to increase the code size too but I think splitting it will be
worse and it goes in the wrong direction in terms of complexity. It's
going to make future extensions / optimization harder if it's split.


Re: [PATCH] fortify: Use WARN instead of BUG for now

2017-07-26 Thread Daniel Micay
It should just be renamed from fortify_panic -> fortify_error, including
in arch/x86/boot/compressed/misc.c and arch/x86/boot/compressed/misc.c.
It can use WARN instead of BUG by with a 'default n', !COMPILE_TEST
option to use BUG again. Otherwise it needs to be patched downstream
when that's wanted.

I don't think splitting it is the right approach to improving the
runtime error handling. That only makes sense for the compile-time
errors due to the limitations of __attribute__((error)). Can we think
about that before changing it? Just make it use WARN for now.

The best debugging experience would be passing along the sizes and
having the fortify_error function convert that into nice error messages.
For memcpy(p, q, n), n can be larger than both the detected sizes of p
and q, not just either one. The error should just be saying the function
name and printing the copy size and maximum sizes of p and q. That's
going to increase the code size too but I think splitting it will be
worse and it goes in the wrong direction in terms of complexity. It's
going to make future extensions / optimization harder if it's split.


Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

2017-07-25 Thread Daniel Micay
It was known that there are going to be bugs to work through, many of
them relatively benign like the leaks of data near string constants
(probably other string constants) in rodata. It makes sense to have it
default to WARN with BUG / noreturn as a non-default configuration
option for it, I guess with !COMPILE_TEST like UBSAN_SANITIZE_ALL. I
don't think there's any sane way to bound the length of either reads /
writes. It needs to either WARN + continue on into doing the overflow or
use BUG. Trying to correct it might make things worse and would make
this more complicated / bug-prone. It already has enough subtle edge
cases to deal with.

I think 'benign' is a better term than 'false positive' because there
hasn't been a non-bug found yet. They're mostly not security vulns but
they're undefined behavior.


Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

2017-07-25 Thread Daniel Micay
It was known that there are going to be bugs to work through, many of
them relatively benign like the leaks of data near string constants
(probably other string constants) in rodata. It makes sense to have it
default to WARN with BUG / noreturn as a non-default configuration
option for it, I guess with !COMPILE_TEST like UBSAN_SANITIZE_ALL. I
don't think there's any sane way to bound the length of either reads /
writes. It needs to either WARN + continue on into doing the overflow or
use BUG. Trying to correct it might make things worse and would make
this more complicated / bug-prone. It already has enough subtle edge
cases to deal with.

I think 'benign' is a better term than 'false positive' because there
hasn't been a non-bug found yet. They're mostly not security vulns but
they're undefined behavior.


Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

2017-07-19 Thread Daniel Micay
> So the fortify_string code has decided that only a single-byte (or
> empty) memcpy is ok.
> 
> And that, in turn, seems to be because we're copying from
> optprobe_template_entry, which is declared as
> 
> extern __visible kprobe_opcode_t optprobe_template_entry;
> 
> so the fortify code decides it's a single character.
> 
> Does just changing all those things to be declared as arrays fix
> things?

Yeah, that fixes it because GCC will consider the size of 'char foo[]'
unknown (i.e. (size_t)-1 from __builtin_object_size).

GCC doesn't know this essentially constant value at compile-time so it
wasn't a compile-time error:

#define TMPL_END_IDX \
((long)_template_end - (long)_template_entry)

-fsanitize=object-size works the same way for pointer dereferences so
replacing might fix some issues for CONFIG_UBSAN_SANITIZE_ALL. I guess
that's way too noisy at the moment thus the !COMPILE_TEST.


Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

2017-07-19 Thread Daniel Micay
> So the fortify_string code has decided that only a single-byte (or
> empty) memcpy is ok.
> 
> And that, in turn, seems to be because we're copying from
> optprobe_template_entry, which is declared as
> 
> extern __visible kprobe_opcode_t optprobe_template_entry;
> 
> so the fortify code decides it's a single character.
> 
> Does just changing all those things to be declared as arrays fix
> things?

Yeah, that fixes it because GCC will consider the size of 'char foo[]'
unknown (i.e. (size_t)-1 from __builtin_object_size).

GCC doesn't know this essentially constant value at compile-time so it
wasn't a compile-time error:

#define TMPL_END_IDX \
((long)_template_end - (long)_template_entry)

-fsanitize=object-size works the same way for pointer dereferences so
replacing might fix some issues for CONFIG_UBSAN_SANITIZE_ALL. I guess
that's way too noisy at the moment thus the !COMPILE_TEST.


Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

2017-07-19 Thread Daniel Micay
> [8.134886]  arch_prepare_optimized_kprobe+0xd5/0x171
> [8.134886]  arch_prepare_optimized_kprobe+0xd5/0x171

Probably this:

/* Copy arch-dep-instance from template */
memcpy(buf, _template_entry, TMPL_END_IDX);

Not a real bug, just technically undefined because these are u32:

typedef u32 kprobe_opcode_t;

extern __visible kprobe_opcode_t optprobe_template_entry;
extern __visible kprobe_opcode_t optprobe_template_val;
extern __visible kprobe_opcode_t optprobe_template_call;
extern __visible kprobe_opcode_t optprobe_template_end;
extern __visible kprobe_opcode_t optprobe_template_sub_sp;
extern __visible kprobe_opcode_t optprobe_template_add_sp;
extern __visible kprobe_opcode_t optprobe_template_restore_begin;
extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn;
extern __visible kprobe_opcode_t optprobe_template_restore_end;

Could be switched to unknown size arrays like optprobe_template_entry[]
but it might be best to just mark the kprobe code with #define
__NO_FORTIFY.


Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c

2017-07-19 Thread Daniel Micay
> [8.134886]  arch_prepare_optimized_kprobe+0xd5/0x171
> [8.134886]  arch_prepare_optimized_kprobe+0xd5/0x171

Probably this:

/* Copy arch-dep-instance from template */
memcpy(buf, _template_entry, TMPL_END_IDX);

Not a real bug, just technically undefined because these are u32:

typedef u32 kprobe_opcode_t;

extern __visible kprobe_opcode_t optprobe_template_entry;
extern __visible kprobe_opcode_t optprobe_template_val;
extern __visible kprobe_opcode_t optprobe_template_call;
extern __visible kprobe_opcode_t optprobe_template_end;
extern __visible kprobe_opcode_t optprobe_template_sub_sp;
extern __visible kprobe_opcode_t optprobe_template_add_sp;
extern __visible kprobe_opcode_t optprobe_template_restore_begin;
extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn;
extern __visible kprobe_opcode_t optprobe_template_restore_end;

Could be switched to unknown size arrays like optprobe_template_entry[]
but it might be best to just mark the kprobe code with #define
__NO_FORTIFY.


Re: [PATCH] replace incorrect strscpy use in FORTIFY_SOURCE

2017-07-14 Thread Daniel Micay
On Fri, 2017-07-14 at 16:51 -0700, Kees Cook wrote:
> On Fri, Jul 14, 2017 at 2:28 PM, Daniel Micay <danielmi...@gmail.com>
> wrote:
> > Using strscpy was wrong because FORTIFY_SOURCE is passing the
> > maximum
> > possible size of the outermost object, but strscpy defines the count
> > parameter as the exact buffer size, so this could copy past the end
> > of
> > the source. This would still be wrong with the planned usage of
> > __builtin_object_size(p, 1) for intra-object overflow checks since
> > it's
> > the maximum possible size of the specified object with no guarantee
> > of
> > it being that large.
> > 
> > Reuse of the fortified functions like this currently makes the
> > runtime
> > error reporting less precise but that can be improved later on.
> > 
> > Signed-off-by: Daniel Micay <danielmi...@gmail.com>
> 
> Thanks for fixing this! Linus, do you want to take this directly or
> have it go via -mm where fortify landed originally?
> 
> Acked-by: Kees Cook <keesc...@chromium.org>
> 
> As far as testing goes, was the NFS tree not in -next, or was a test
> not running against -next? I'm curious why it took until the NFS tree
> landed in Linus's tree for this to get noticed. Fortify was in -next
> for a while...
> 
> -Kees

Not sure but one issue is that v1 wasn't broken and that's what I most
heavily tested myself. I then switched to testing with intra-object size
checks on top of it, and there would have needed to be a case like this
to break with those stricter checks:

char a[2];
char b[3];
char *p = cond ? a : b;
strcpy(a, c);

__builtin_object_size(p, 0) / __builtin_object_size(p, 1) will both
return 3 there, which is wrong with that incorrect strscpy usage.

I wouldn't have found it via NFS since I've been testing with the string
(but not memory) functions switched to the stricter type 1.

I started on some test cases for FORTIFY_SOURCE but I was testing to
make sure it catches all the bugs it's supposed to catch, it's probably
a good idea to write some more generic string edge case tests too.

One of the somewhat subtle cases (which is handled properly already):

struct foo {
char a[2];
char b;
}

struct foo x;
x.a[0] = '1';
x.a[1] = '2';
x.b = '\0';

strlen(x.a); // BUG with stricter intra-object overflow checks on
strnlen(x.a, 3); // BUG with stricter intra-object overflow checks on
strnlen(x.a, 2); // no overflow

Anyway, it'd be really good if other people looked closely at these. I
wasn't sure what to do with test cases that I've made though. It seems
ones that are *supposed* to BUG should go in lkdtm, and should the other
tests just be together with those? Some should also pass w/o the intra
object overflow checks, but BUG with them enabled.


Re: [PATCH] replace incorrect strscpy use in FORTIFY_SOURCE

2017-07-14 Thread Daniel Micay
On Fri, 2017-07-14 at 16:51 -0700, Kees Cook wrote:
> On Fri, Jul 14, 2017 at 2:28 PM, Daniel Micay 
> wrote:
> > Using strscpy was wrong because FORTIFY_SOURCE is passing the
> > maximum
> > possible size of the outermost object, but strscpy defines the count
> > parameter as the exact buffer size, so this could copy past the end
> > of
> > the source. This would still be wrong with the planned usage of
> > __builtin_object_size(p, 1) for intra-object overflow checks since
> > it's
> > the maximum possible size of the specified object with no guarantee
> > of
> > it being that large.
> > 
> > Reuse of the fortified functions like this currently makes the
> > runtime
> > error reporting less precise but that can be improved later on.
> > 
> > Signed-off-by: Daniel Micay 
> 
> Thanks for fixing this! Linus, do you want to take this directly or
> have it go via -mm where fortify landed originally?
> 
> Acked-by: Kees Cook 
> 
> As far as testing goes, was the NFS tree not in -next, or was a test
> not running against -next? I'm curious why it took until the NFS tree
> landed in Linus's tree for this to get noticed. Fortify was in -next
> for a while...
> 
> -Kees

Not sure but one issue is that v1 wasn't broken and that's what I most
heavily tested myself. I then switched to testing with intra-object size
checks on top of it, and there would have needed to be a case like this
to break with those stricter checks:

char a[2];
char b[3];
char *p = cond ? a : b;
strcpy(a, c);

__builtin_object_size(p, 0) / __builtin_object_size(p, 1) will both
return 3 there, which is wrong with that incorrect strscpy usage.

I wouldn't have found it via NFS since I've been testing with the string
(but not memory) functions switched to the stricter type 1.

I started on some test cases for FORTIFY_SOURCE but I was testing to
make sure it catches all the bugs it's supposed to catch, it's probably
a good idea to write some more generic string edge case tests too.

One of the somewhat subtle cases (which is handled properly already):

struct foo {
char a[2];
char b;
}

struct foo x;
x.a[0] = '1';
x.a[1] = '2';
x.b = '\0';

strlen(x.a); // BUG with stricter intra-object overflow checks on
strnlen(x.a, 3); // BUG with stricter intra-object overflow checks on
strnlen(x.a, 2); // no overflow

Anyway, it'd be really good if other people looked closely at these. I
wasn't sure what to do with test cases that I've made though. It seems
ones that are *supposed* to BUG should go in lkdtm, and should the other
tests just be together with those? Some should also pass w/o the intra
object overflow checks, but BUG with them enabled.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
> I find "hardening" code that adds bugs to be particularly bad and
> ugly, the same way that I absolutely *hate* debugging code that turns
> out to make debugging impossible (we had that with the "better" stack
> tracing code that caused kernel panics to kill the machine entirely
> rather than show the backtrace, and I'm still bitter about it a decade
> after the fact).

Agree, it's very important for this code to be correct and the string
functions have some subtleties so it needs scrutiny. I messed up strcpy
between v1 and v2 trying to add a proper read overflow check. My fault
for not looking more closely at strscpy before adopting it based on my
misinterpretation of the API.

This is primarily a bug finding feature right now and it has gotten a
few fixed that actually matter (most were unimportant memcpy read past
end of string constant but not all). I don't think it has another bug
like this strscpy misuse itself, but there will need to be some more
fixes for minor read overflows, etc. elsewhere in the tree before it'll
actually make sense as a hardening feature because it can turn a benign
read overflow into a DoS via BUG(). I think it will be fine for 4.13,
but I definitely wouldn't propose 'default y' for a while, even if there
was no performance cost (and there is).

Fix for this issue is here in case anyone just looks only at this thread
(realized I should have passed send-email a reply id):

http://marc.info/?l=linux-fsdevel=150006772418003=2


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
> I find "hardening" code that adds bugs to be particularly bad and
> ugly, the same way that I absolutely *hate* debugging code that turns
> out to make debugging impossible (we had that with the "better" stack
> tracing code that caused kernel panics to kill the machine entirely
> rather than show the backtrace, and I'm still bitter about it a decade
> after the fact).

Agree, it's very important for this code to be correct and the string
functions have some subtleties so it needs scrutiny. I messed up strcpy
between v1 and v2 trying to add a proper read overflow check. My fault
for not looking more closely at strscpy before adopting it based on my
misinterpretation of the API.

This is primarily a bug finding feature right now and it has gotten a
few fixed that actually matter (most were unimportant memcpy read past
end of string constant but not all). I don't think it has another bug
like this strscpy misuse itself, but there will need to be some more
fixes for minor read overflows, etc. elsewhere in the tree before it'll
actually make sense as a hardening feature because it can turn a benign
read overflow into a DoS via BUG(). I think it will be fine for 4.13,
but I definitely wouldn't propose 'default y' for a while, even if there
was no performance cost (and there is).

Fix for this issue is here in case anyone just looks only at this thread
(realized I should have passed send-email a reply id):

http://marc.info/?l=linux-fsdevel=150006772418003=2


[PATCH] replace incorrect strscpy use in FORTIFY_SOURCE

2017-07-14 Thread Daniel Micay
Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum
possible size of the outermost object, but strscpy defines the count
parameter as the exact buffer size, so this could copy past the end of
the source. This would still be wrong with the planned usage of
__builtin_object_size(p, 1) for intra-object overflow checks since it's
the maximum possible size of the specified object with no guarantee of
it being that large.

Reuse of the fortified functions like this currently makes the runtime
error reporting less precise but that can be improved later on.

Signed-off-by: Daniel Micay <danielmi...@gmail.com>
---
 include/linux/string.h | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 96f5a5fd0377..049866760e8b 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -202,17 +202,6 @@ void __read_overflow2(void) __compiletime_error("detected 
read beyond size of ob
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
-__FORTIFY_INLINE char *strcpy(char *p, const char *q)
-{
-   size_t p_size = __builtin_object_size(p, 0);
-   size_t q_size = __builtin_object_size(q, 0);
-   if (p_size == (size_t)-1 && q_size == (size_t)-1)
-   return __builtin_strcpy(p, q);
-   if (strscpy(p, q, p_size < q_size ? p_size : q_size) < 0)
-   fortify_panic(__func__);
-   return p;
-}
-
 __FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size)
 {
size_t p_size = __builtin_object_size(p, 0);
@@ -391,6 +380,18 @@ __FORTIFY_INLINE void *kmemdup(const void *p, size_t size, 
gfp_t gfp)
fortify_panic(__func__);
return __real_kmemdup(p, size, gfp);
 }
+
+/* defined after fortified strlen and memcpy to reuse them */
+__FORTIFY_INLINE char *strcpy(char *p, const char *q)
+{
+   size_t p_size = __builtin_object_size(p, 0);
+   size_t q_size = __builtin_object_size(q, 0);
+   if (p_size == (size_t)-1 && q_size == (size_t)-1)
+   return __builtin_strcpy(p, q);
+   memcpy(p, q, strlen(q) + 1);
+   return p;
+}
+
 #endif
 
 #endif /* _LINUX_STRING_H_ */
-- 
2.13.3



[PATCH] replace incorrect strscpy use in FORTIFY_SOURCE

2017-07-14 Thread Daniel Micay
Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum
possible size of the outermost object, but strscpy defines the count
parameter as the exact buffer size, so this could copy past the end of
the source. This would still be wrong with the planned usage of
__builtin_object_size(p, 1) for intra-object overflow checks since it's
the maximum possible size of the specified object with no guarantee of
it being that large.

Reuse of the fortified functions like this currently makes the runtime
error reporting less precise but that can be improved later on.

Signed-off-by: Daniel Micay 
---
 include/linux/string.h | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 96f5a5fd0377..049866760e8b 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -202,17 +202,6 @@ void __read_overflow2(void) __compiletime_error("detected 
read beyond size of ob
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
-__FORTIFY_INLINE char *strcpy(char *p, const char *q)
-{
-   size_t p_size = __builtin_object_size(p, 0);
-   size_t q_size = __builtin_object_size(q, 0);
-   if (p_size == (size_t)-1 && q_size == (size_t)-1)
-   return __builtin_strcpy(p, q);
-   if (strscpy(p, q, p_size < q_size ? p_size : q_size) < 0)
-   fortify_panic(__func__);
-   return p;
-}
-
 __FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size)
 {
size_t p_size = __builtin_object_size(p, 0);
@@ -391,6 +380,18 @@ __FORTIFY_INLINE void *kmemdup(const void *p, size_t size, 
gfp_t gfp)
fortify_panic(__func__);
return __real_kmemdup(p, size, gfp);
 }
+
+/* defined after fortified strlen and memcpy to reuse them */
+__FORTIFY_INLINE char *strcpy(char *p, const char *q)
+{
+   size_t p_size = __builtin_object_size(p, 0);
+   size_t q_size = __builtin_object_size(q, 0);
+   if (p_size == (size_t)-1 && q_size == (size_t)-1)
+   return __builtin_strcpy(p, q);
+   memcpy(p, q, strlen(q) + 1);
+   return p;
+}
+
 #endif
 
 #endif /* _LINUX_STRING_H_ */
-- 
2.13.3



Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
> The reason q_size isn't used is because it doesn't yet prevent read
> overflow. The commit message mentions that among the current
> limitations
> along with __builtin_object_size(ptr, 1).

Er rather, in strlcat, the q_size is unused after the fast path is
because strnlen obtains the constant again itself.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
> The reason q_size isn't used is because it doesn't yet prevent read
> overflow. The commit message mentions that among the current
> limitations
> along with __builtin_object_size(ptr, 1).

Er rather, in strlcat, the q_size is unused after the fast path is
because strnlen obtains the constant again itself.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
On Fri, 2017-07-14 at 13:50 -0700, Linus Torvalds wrote:
> On Fri, Jul 14, 2017 at 1:38 PM, Daniel Micay <danielmi...@gmail.com>
> wrote:
> > 
> > If strscpy treats the count parameter as a *guarantee* of the dest
> > size
> > rather than a limit,
> 
> No, it's a *limit*.
> 
> And by a *limit*, I mean that we know that we can access both source
> and destination within that limit.

FORTIFY_SOURCE needs to be able to pass a limit without implying that
there's a minimum. That's the distinction I was trying to make. It's
wrong to use anything where it's interpreted as a minimum here. Using
__builtin_object_size(ptr, 0) vs. __builtin_object_size(ptr, 1) doesn't
avoid the problem. __builtin_object_size(ptr, 1) returns a maximum among
the possible buffer sizes just like 0. It's just stricter, i.e. catches
intra-object overflow, which isn't desirable for the first take since it
will cause compatibility issues. There's code using memcpy,
copy_to_user, etc. to read / write multiple fields with a pointer to the
first one passed as the source / destination.

> > My initial patch used strlcpy there, because I wasn't aware of
> > strscpy
> > before it was suggested:
> 
> Since I'm looking at this, I note that the "strlcpy()" code is
> complete garbage too, and has that same
> 
>  p_size == (size_t)-1 && q_size == (size_t)-1
> 
> check which is wrong.  Of course, in strlcpy, q_size is never actually
> *used*, so the whole check seems bogus.

That check is only an optimization. __builtin_object_size always returns
a constant, and it's (size_t)-1 when no limit could be found.

The reason q_size isn't used is because it doesn't yet prevent read
overflow. The commit message mentions that among the current limitations
along with __builtin_object_size(ptr, 1).

> But no, strlcpy() is complete garbage, and should never be used. It is
> truly a shit interface, and anybody who uses it is by definition
> buggy.
> 
> Why? Because the return value of "strlcpy()" is defined to be ignoring
> the limit, so you FUNDAMENTALLY must not use that thing on untrusted
> source strings.
> 
> But since the whole *point* of people using it is for untrusted
> sources, it by definition is garbage.
> 
> Ergo: don't use strlcpy(). It's unbelievable crap. It's wrong. There's
> a reason we defined "strscpy()" as the way to do safe copies
> (strncpy(), of course, is broken for both lack of NUL termination
> _and_ for excessive NUL termination when a NUL did exist).

Sure, it doesn't prevent read overflow (but it's not worse than strcpy,
which is the purpose) which is why I said this:

"The fortified string functions should place a limit on reads from the
source. For strcat/strcpy, these could just be a variant of strlcat /
strlcpy using the size limit as a bound on both the source and
destination, with the return value only reporting whether truncation
occurred rather than providing the source length. It would be an easier
API for developers to use too and not that it really matters but it
would be more efficient for the case where truncation is intended."

That's why strscpy was suggested and I switched to that + updated the
commit message to only mention strcat, but it's wrong to use it here
because __builtin_object_size(p, 0) / __builtin_object_size(p, 1) are
only a guaranteed maximum length with no minimum guarantee.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
On Fri, 2017-07-14 at 13:50 -0700, Linus Torvalds wrote:
> On Fri, Jul 14, 2017 at 1:38 PM, Daniel Micay 
> wrote:
> > 
> > If strscpy treats the count parameter as a *guarantee* of the dest
> > size
> > rather than a limit,
> 
> No, it's a *limit*.
> 
> And by a *limit*, I mean that we know that we can access both source
> and destination within that limit.

FORTIFY_SOURCE needs to be able to pass a limit without implying that
there's a minimum. That's the distinction I was trying to make. It's
wrong to use anything where it's interpreted as a minimum here. Using
__builtin_object_size(ptr, 0) vs. __builtin_object_size(ptr, 1) doesn't
avoid the problem. __builtin_object_size(ptr, 1) returns a maximum among
the possible buffer sizes just like 0. It's just stricter, i.e. catches
intra-object overflow, which isn't desirable for the first take since it
will cause compatibility issues. There's code using memcpy,
copy_to_user, etc. to read / write multiple fields with a pointer to the
first one passed as the source / destination.

> > My initial patch used strlcpy there, because I wasn't aware of
> > strscpy
> > before it was suggested:
> 
> Since I'm looking at this, I note that the "strlcpy()" code is
> complete garbage too, and has that same
> 
>  p_size == (size_t)-1 && q_size == (size_t)-1
> 
> check which is wrong.  Of course, in strlcpy, q_size is never actually
> *used*, so the whole check seems bogus.

That check is only an optimization. __builtin_object_size always returns
a constant, and it's (size_t)-1 when no limit could be found.

The reason q_size isn't used is because it doesn't yet prevent read
overflow. The commit message mentions that among the current limitations
along with __builtin_object_size(ptr, 1).

> But no, strlcpy() is complete garbage, and should never be used. It is
> truly a shit interface, and anybody who uses it is by definition
> buggy.
> 
> Why? Because the return value of "strlcpy()" is defined to be ignoring
> the limit, so you FUNDAMENTALLY must not use that thing on untrusted
> source strings.
> 
> But since the whole *point* of people using it is for untrusted
> sources, it by definition is garbage.
> 
> Ergo: don't use strlcpy(). It's unbelievable crap. It's wrong. There's
> a reason we defined "strscpy()" as the way to do safe copies
> (strncpy(), of course, is broken for both lack of NUL termination
> _and_ for excessive NUL termination when a NUL did exist).

Sure, it doesn't prevent read overflow (but it's not worse than strcpy,
which is the purpose) which is why I said this:

"The fortified string functions should place a limit on reads from the
source. For strcat/strcpy, these could just be a variant of strlcat /
strlcpy using the size limit as a bound on both the source and
destination, with the return value only reporting whether truncation
occurred rather than providing the source length. It would be an easier
API for developers to use too and not that it really matters but it
would be more efficient for the case where truncation is intended."

That's why strscpy was suggested and I switched to that + updated the
commit message to only mention strcat, but it's wrong to use it here
because __builtin_object_size(p, 0) / __builtin_object_size(p, 1) are
only a guaranteed maximum length with no minimum guarantee.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
> My initial patch used strlcpy there, because I wasn't aware of strscpy
> before it was suggested:
> 
> http://www.openwall.com/lists/kernel-hardening/2017/05/04/11
> 
> I was wrong to move it to strscpy. It could be switched back to
> strlcpy
> again unless the kernel considers the count parameter to be a
> guarantee
> that could be leveraged in the future. Using the fortified strlen +
> memcpy would provide the improvement that strscpy was meant to provide
> there over strlcpy.

Similarly, the FORTIFY_SOURCE strcat uses strlcat with the assumption
that the count parameter is a limit, not a guarantee for a optimization.
There's only a C implementation and it currently doesn't, but if it's
meant to be a guarantee then the strcat needs to be changed too.

I'll make a fix moving away both from the existing functions.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
> My initial patch used strlcpy there, because I wasn't aware of strscpy
> before it was suggested:
> 
> http://www.openwall.com/lists/kernel-hardening/2017/05/04/11
> 
> I was wrong to move it to strscpy. It could be switched back to
> strlcpy
> again unless the kernel considers the count parameter to be a
> guarantee
> that could be leveraged in the future. Using the fortified strlen +
> memcpy would provide the improvement that strscpy was meant to provide
> there over strlcpy.

Similarly, the FORTIFY_SOURCE strcat uses strlcat with the assumption
that the count parameter is a limit, not a guarantee for a optimization.
There's only a C implementation and it currently doesn't, but if it's
meant to be a guarantee then the strcat needs to be changed too.

I'll make a fix moving away both from the existing functions.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
On Fri, 2017-07-14 at 12:58 -0700, Linus Torvalds wrote:
> On Fri, Jul 14, 2017 at 12:43 PM, Andrey Ryabinin
>  wrote:
> > 
> > > yet when I look at the generated code for __ip_map_lookup, I see
> > > 
> > >movl$32, %edx   #,
> > >movq%r13, %rsi  # class,
> > >leaq48(%rax), %rdi  #, tmp126
> > >callstrscpy #
> > > 
> > > what's the bug here? Look at that third argume8nt - %rdx. It is
> > > initialized to 32.
> > 
> > It's not a compiler bug, it's a bug in our strcpy().
> > Whoever wrote this strcpy() into strscpy() code apparently didn't
> > read carefully
> > enough gcc manual about __builtin_object_size().
> > 
> > Summary from https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking
> > .html :
> > 
> > __builtin_object_size(ptr, type) returns a constant number
> > of bytes from 'ptr' to the end of the object 'ptr'
> > pointer points to. "type" is an integer constant from 0 to
> > 3. If the least significant bit is clear, objects
> > are whole variables, if it is set, a closest surrounding
> > subobject is considered the object a pointer points to.
> > The second bit determines if maximum or minimum of remaining
> > bytes is computed.
> > 
> > We have type = 0 in strcpy(), so the least significant bit is clear.
> > So the 'ptr' is considered as a pointer to the whole
> > variable i.e. pointer to struct ip_map ip;
> > And the number of bytes from 'ip.m_class' to the end of the ip
> > object is exactly 32.
> > 
> > I suppose that changing the type to 1 should fix this bug.
> 
> Oh, that absolutely needs to be done.
> 
> Because that "strcpy() -> strscpy()" conversion really depends on that
> size being the right size (well, in this case minimal safe size) for
> the actual accesses, exactly because "strscpy()" is perfectly willing
> to write *past* the end of the destination string within that given
> size limit (ie it reads and writes in the same 8-byte chunks).
> 
> So if you have a small target string that is contained in a big
> object, then the "hardened" strcpy() code can actually end up
> overwriting things past the end of the strring, even if the string
> itself were to have fit in the buffer.
> 
> I note that every single use in string.h is buggy, and it worries me
> that __compiletime_object_size() does this too. The only user of that
> seems to be check_copy_size(), and now I'm a bit worried what that bug
> may have hidden.
> 
> I find "hardening" code that adds bugs to be particularly bad and
> ugly, the same way that I absolutely *hate* debugging code that turns
> out to make debugging impossible (we had that with the "better" stack
> tracing code that caused kernel panics to kill the machine entirely
> rather than show the backtrace, and I'm still bitter about it a decade
> after the fact).
> 
> There is something actively *evil* about it. Daniel, Kees, please jump
> on this.
> 
> Andrey, thanks for noticing this thing,
> 
>   Linus

The issue is the usage of strscpy then, not the __builtin_object_size
type parameter. The type is set 0 rather than 1 to be more lenient by
not detecting intra-object overflow, which is going to come later.

If strscpy treats the count parameter as a *guarantee* of the dest size
rather than a limit, it's wrong to use it there, whether or not the type
parameter for __builtin_object_size is 0 or 1 since it can still return
a larger size. It's a limit with no guaranteed minimum.

My initial patch used strlcpy there, because I wasn't aware of strscpy
before it was suggested:

http://www.openwall.com/lists/kernel-hardening/2017/05/04/11

I was wrong to move it to strscpy. It could be switched back to strlcpy
again unless the kernel considers the count parameter to be a guarantee
that could be leveraged in the future. Using the fortified strlen +
memcpy would provide the improvement that strscpy was meant to provide
there over strlcpy.


Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

2017-07-14 Thread Daniel Micay
On Fri, 2017-07-14 at 12:58 -0700, Linus Torvalds wrote:
> On Fri, Jul 14, 2017 at 12:43 PM, Andrey Ryabinin
>  wrote:
> > 
> > > yet when I look at the generated code for __ip_map_lookup, I see
> > > 
> > >movl$32, %edx   #,
> > >movq%r13, %rsi  # class,
> > >leaq48(%rax), %rdi  #, tmp126
> > >callstrscpy #
> > > 
> > > what's the bug here? Look at that third argume8nt - %rdx. It is
> > > initialized to 32.
> > 
> > It's not a compiler bug, it's a bug in our strcpy().
> > Whoever wrote this strcpy() into strscpy() code apparently didn't
> > read carefully
> > enough gcc manual about __builtin_object_size().
> > 
> > Summary from https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking
> > .html :
> > 
> > __builtin_object_size(ptr, type) returns a constant number
> > of bytes from 'ptr' to the end of the object 'ptr'
> > pointer points to. "type" is an integer constant from 0 to
> > 3. If the least significant bit is clear, objects
> > are whole variables, if it is set, a closest surrounding
> > subobject is considered the object a pointer points to.
> > The second bit determines if maximum or minimum of remaining
> > bytes is computed.
> > 
> > We have type = 0 in strcpy(), so the least significant bit is clear.
> > So the 'ptr' is considered as a pointer to the whole
> > variable i.e. pointer to struct ip_map ip;
> > And the number of bytes from 'ip.m_class' to the end of the ip
> > object is exactly 32.
> > 
> > I suppose that changing the type to 1 should fix this bug.
> 
> Oh, that absolutely needs to be done.
> 
> Because that "strcpy() -> strscpy()" conversion really depends on that
> size being the right size (well, in this case minimal safe size) for
> the actual accesses, exactly because "strscpy()" is perfectly willing
> to write *past* the end of the destination string within that given
> size limit (ie it reads and writes in the same 8-byte chunks).
> 
> So if you have a small target string that is contained in a big
> object, then the "hardened" strcpy() code can actually end up
> overwriting things past the end of the strring, even if the string
> itself were to have fit in the buffer.
> 
> I note that every single use in string.h is buggy, and it worries me
> that __compiletime_object_size() does this too. The only user of that
> seems to be check_copy_size(), and now I'm a bit worried what that bug
> may have hidden.
> 
> I find "hardening" code that adds bugs to be particularly bad and
> ugly, the same way that I absolutely *hate* debugging code that turns
> out to make debugging impossible (we had that with the "better" stack
> tracing code that caused kernel panics to kill the machine entirely
> rather than show the backtrace, and I'm still bitter about it a decade
> after the fact).
> 
> There is something actively *evil* about it. Daniel, Kees, please jump
> on this.
> 
> Andrey, thanks for noticing this thing,
> 
>   Linus

The issue is the usage of strscpy then, not the __builtin_object_size
type parameter. The type is set 0 rather than 1 to be more lenient by
not detecting intra-object overflow, which is going to come later.

If strscpy treats the count parameter as a *guarantee* of the dest size
rather than a limit, it's wrong to use it there, whether or not the type
parameter for __builtin_object_size is 0 or 1 since it can still return
a larger size. It's a limit with no guaranteed minimum.

My initial patch used strlcpy there, because I wasn't aware of strscpy
before it was suggested:

http://www.openwall.com/lists/kernel-hardening/2017/05/04/11

I was wrong to move it to strscpy. It could be switched back to strlcpy
again unless the kernel considers the count parameter to be a guarantee
that could be leveraged in the future. Using the fortified strlen +
memcpy would provide the improvement that strscpy was meant to provide
there over strlcpy.


Re: [kernel-hardening] Re: [PATCH v3] mm: Add SLUB free list pointer obfuscation

2017-07-06 Thread Daniel Micay
On Thu, 2017-07-06 at 10:55 -0500, Christoph Lameter wrote:
> On Thu, 6 Jul 2017, Kees Cook wrote:
> 
> > On Thu, Jul 6, 2017 at 6:43 AM, Christoph Lameter 
> > wrote:
> > > On Wed, 5 Jul 2017, Kees Cook wrote:
> > > 
> > > > @@ -3536,6 +3565,9 @@ static int kmem_cache_open(struct
> > > > kmem_cache *s, unsigned long flags)
> > > >  {
> > > >   s->flags = kmem_cache_flags(s->size, flags, s->name, s-
> > > > >ctor);
> > > >   s->reserved = 0;
> > > > +#ifdef CONFIG_SLAB_FREELIST_HARDENED
> > > > + s->random = get_random_long();
> > > > +#endif
> > > > 
> > > >   if (need_reserve_slab_rcu && (s->flags &
> > > > SLAB_TYPESAFE_BY_RCU))
> > > >   s->reserved = sizeof(struct rcu_head);
> > > > 
> > > 
> > > So if an attacker knows the internal structure of data then he can
> > > simply
> > > dereference page->kmem_cache->random to decode the freepointer.
> > 
> > That requires a series of arbitrary reads. This is protecting
> > against
> > attacks that use an adjacent slab object write overflow to write the
> > freelist pointer. This internal structure is very reliable, and has
> > been the basis of freelist attacks against the kernel for a decade.
> 
> These reads are not arbitrary. You can usually calculate the page
> struct
> address easily from the address and then do a couple of loads to get
> there.

You're describing an arbitrary read vulnerability: an attacker able to
read the value of an address of their choosing. Requiring a powerful
additional primitive rather than only a small fixed size overflow or a
weak use-after-free vulnerability to use a common exploit vector is
useful.

A deterministic mitigation would be better, but I don't think an extra
slab allocator for hardened kernels would be welcomed. Since there isn't
a separate allocator for that niche, SLAB or SLUB are used. The ideal
would be bitmaps in `struct page` but that implies another allocator,
using single pages for the smallest size classes and potentially needing
to bloat `struct page` even with that.

There's definitely a limit to the hardening that can be done for SLUB,
but unless forking it into a different allocator is welcome that's what
will be suggested. Similarly, the slab freelist randomization feature is
a much weaker mitigation than it could be without these constraints
placed on it. This is much lower complexity than that and higher value
though...

> Ok so you get rid of the old attacks because we did not have that
> hardening in effect when they designed their approaches?
> 
> > It is a probabilistic defense, but then so is the stack protector.
> > This is a similar defense; while not perfect it makes the class of
> > attack much more difficult to mount.
> 
> Na I am not convinced of the "much more difficult". Maybe they will
> just
> have to upgrade their approaches to fetch the proper values to decode.

To fetch the values they would need an arbitrary read vulnerability or
the ability to dump them via uninitialized slab allocations as an extra
requirement.

An attacker can similarly bypass the stack canary by reading them from
stack frames via a stack buffer read overflow or uninitialized variable
usage leaking stack data. On non-x86, at least with SMP, the stack
canary is just a global variable that remains the same after
initialization too. That doesn't make it useless, although the kernel
doesn't have many linear overflows on the stack which is the real issue
with it as a mitigation. Despite that, most people are using kernels
with stack canaries, and that has a significant performance cost unlike
these kinds of changes.


Re: [kernel-hardening] Re: [PATCH v3] mm: Add SLUB free list pointer obfuscation

2017-07-06 Thread Daniel Micay
On Thu, 2017-07-06 at 10:55 -0500, Christoph Lameter wrote:
> On Thu, 6 Jul 2017, Kees Cook wrote:
> 
> > On Thu, Jul 6, 2017 at 6:43 AM, Christoph Lameter 
> > wrote:
> > > On Wed, 5 Jul 2017, Kees Cook wrote:
> > > 
> > > > @@ -3536,6 +3565,9 @@ static int kmem_cache_open(struct
> > > > kmem_cache *s, unsigned long flags)
> > > >  {
> > > >   s->flags = kmem_cache_flags(s->size, flags, s->name, s-
> > > > >ctor);
> > > >   s->reserved = 0;
> > > > +#ifdef CONFIG_SLAB_FREELIST_HARDENED
> > > > + s->random = get_random_long();
> > > > +#endif
> > > > 
> > > >   if (need_reserve_slab_rcu && (s->flags &
> > > > SLAB_TYPESAFE_BY_RCU))
> > > >   s->reserved = sizeof(struct rcu_head);
> > > > 
> > > 
> > > So if an attacker knows the internal structure of data then he can
> > > simply
> > > dereference page->kmem_cache->random to decode the freepointer.
> > 
> > That requires a series of arbitrary reads. This is protecting
> > against
> > attacks that use an adjacent slab object write overflow to write the
> > freelist pointer. This internal structure is very reliable, and has
> > been the basis of freelist attacks against the kernel for a decade.
> 
> These reads are not arbitrary. You can usually calculate the page
> struct
> address easily from the address and then do a couple of loads to get
> there.

You're describing an arbitrary read vulnerability: an attacker able to
read the value of an address of their choosing. Requiring a powerful
additional primitive rather than only a small fixed size overflow or a
weak use-after-free vulnerability to use a common exploit vector is
useful.

A deterministic mitigation would be better, but I don't think an extra
slab allocator for hardened kernels would be welcomed. Since there isn't
a separate allocator for that niche, SLAB or SLUB are used. The ideal
would be bitmaps in `struct page` but that implies another allocator,
using single pages for the smallest size classes and potentially needing
to bloat `struct page` even with that.

There's definitely a limit to the hardening that can be done for SLUB,
but unless forking it into a different allocator is welcome that's what
will be suggested. Similarly, the slab freelist randomization feature is
a much weaker mitigation than it could be without these constraints
placed on it. This is much lower complexity than that and higher value
though...

> Ok so you get rid of the old attacks because we did not have that
> hardening in effect when they designed their approaches?
> 
> > It is a probabilistic defense, but then so is the stack protector.
> > This is a similar defense; while not perfect it makes the class of
> > attack much more difficult to mount.
> 
> Na I am not convinced of the "much more difficult". Maybe they will
> just
> have to upgrade their approaches to fetch the proper values to decode.

To fetch the values they would need an arbitrary read vulnerability or
the ability to dump them via uninitialized slab allocations as an extra
requirement.

An attacker can similarly bypass the stack canary by reading them from
stack frames via a stack buffer read overflow or uninitialized variable
usage leaking stack data. On non-x86, at least with SMP, the stack
canary is just a global variable that remains the same after
initialization too. That doesn't make it useless, although the kernel
doesn't have many linear overflows on the stack which is the real issue
with it as a mitigation. Despite that, most people are using kernels
with stack canaries, and that has a significant performance cost unlike
these kinds of changes.


Re: [PATCH v2] kref: Avoid null pointer dereference after WARN

2017-06-27 Thread Daniel Micay
On Tue, 2017-06-27 at 12:34 -0700, Kees Cook wrote:
> On Tue, Jun 27, 2017 at 12:26 PM, Jason A. Donenfeld 
> wrote:
> > On Tue, Jun 27, 2017 at 9:22 PM, Andi Kleen 
> > wrote:
> > > Who would actually set mman_min_addr incorrectly?
> > 
> > Historically there have been quite a few bypasses of mmap_min_addr,
> > actually. This is well-trodden ground.
> 
> Targeting things in /proc/sys via confused privileged helpers is
> extremely common. See Chrome OS pwn2own exploits (targetting modprobe
> sysctl), and plenty of others. Modern attack methodology is rarely a
> single-bug attack, but rather a chain of bugs, which may include
> producing or exploiting weak userspace configurations to soften the
> kernel.
> 
> Regardless, it's a fair point that checking this unconditionally is
> wasteful. Strangely this doesn't help:
> 
> -   BUG_ON(release == NULL);
> +   if (!__builtin_constant_p(release))
> +   BUG_ON(release == NULL);
> 
> When nearly all callers pass a function directly:
> 
> ...
> drivers/block/rbd.c:kref_put(>kref, rbd_spec_free);
> drivers/char/hw_random/core.c:  kref_put(>ref,
> cleanup_rng);
> drivers/char/ipmi/ipmi_msghandler.c:
> kref_put(>intf->refcount, intf_free);
> ...
> 
> Hmmm
> 
> -Kees

It doesn't mean the address is constant if there's a fixed function
being passed to it. It's not known at compile-time and if the code can
be relocated it's not known at link-time.

I don't personally care about checks like this but I split it out with
some others just because it was there already.

Clang has a nullability attribute which is similar to nonnull but it
doesn't cause UB when violated, so if GCC picked that up it could be
added all over the place as an annotation on parameters to trigger
warnings. There's a sanitizer for it, so it can be made to trap with
-fsanitize=nullability -fsanitize-trap=nullability.


Re: [PATCH v2] kref: Avoid null pointer dereference after WARN

2017-06-27 Thread Daniel Micay
On Tue, 2017-06-27 at 12:34 -0700, Kees Cook wrote:
> On Tue, Jun 27, 2017 at 12:26 PM, Jason A. Donenfeld 
> wrote:
> > On Tue, Jun 27, 2017 at 9:22 PM, Andi Kleen 
> > wrote:
> > > Who would actually set mman_min_addr incorrectly?
> > 
> > Historically there have been quite a few bypasses of mmap_min_addr,
> > actually. This is well-trodden ground.
> 
> Targeting things in /proc/sys via confused privileged helpers is
> extremely common. See Chrome OS pwn2own exploits (targetting modprobe
> sysctl), and plenty of others. Modern attack methodology is rarely a
> single-bug attack, but rather a chain of bugs, which may include
> producing or exploiting weak userspace configurations to soften the
> kernel.
> 
> Regardless, it's a fair point that checking this unconditionally is
> wasteful. Strangely this doesn't help:
> 
> -   BUG_ON(release == NULL);
> +   if (!__builtin_constant_p(release))
> +   BUG_ON(release == NULL);
> 
> When nearly all callers pass a function directly:
> 
> ...
> drivers/block/rbd.c:kref_put(>kref, rbd_spec_free);
> drivers/char/hw_random/core.c:  kref_put(>ref,
> cleanup_rng);
> drivers/char/ipmi/ipmi_msghandler.c:
> kref_put(>intf->refcount, intf_free);
> ...
> 
> Hmmm
> 
> -Kees

It doesn't mean the address is constant if there's a fixed function
being passed to it. It's not known at compile-time and if the code can
be relocated it's not known at link-time.

I don't personally care about checks like this but I split it out with
some others just because it was there already.

Clang has a nullability attribute which is similar to nonnull but it
doesn't cause UB when violated, so if GCC picked that up it could be
added all over the place as an annotation on parameters to trigger
warnings. There's a sanitizer for it, so it can be made to trap with
-fsanitize=nullability -fsanitize-trap=nullability.


Re: [PATCH v2] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE

2017-06-27 Thread Daniel Micay
On Tue, 2017-06-27 at 16:49 +0200, Michal Hocko wrote:
> On Wed 21-06-17 10:32:01, Kees Cook wrote:
> > The ELF_ET_DYN_BASE position was originally intended to keep loaders
> > away from ET_EXEC binaries. (For example, running "/lib/ld-
> > linux.so.2
> > /bin/cat" might cause the subsequent load of /bin/cat into where the
> > loader had been loaded.) With the advent of PIE (ET_DYN binaries
> > with
> > an INTERP Program Header), ELF_ET_DYN_BASE continued to be used
> > since
> > the kernel was only looking at ET_DYN. However, since
> > ELF_ET_DYN_BASE
> > is traditionally set at the top 1/3rd of the TASK_SIZE, a
> > substantial
> > portion of the address space is unused.
> > 
> > For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs
> > are loaded below the mmap region. This means they can be made to
> > collide
> > (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
> > pathological
> > stack regions. Lowering ELF_ET_DYN_BASE solves both by moving
> > programs
> > above the mmap region in all cases, and will now additionally avoid
> > programs falling back to the mmap region by enforcing MAP_FIXED for
> > program loads (i.e. if it would have collided with the stack, now it
> > will fail to load instead of falling back to the mmap region).
> 
> I do not understand this part. MAP_FIXED will simply unmap whatever
> was under the requested range, how it could help failing anything? So
> what would happen if something was mapped in that region, or is this
> impossible? Moreover MAP_FIXED close to stack will inhibit the stack
> gap
> protection.

I don't think there's a reason to use MAP_FIXED. PaX likely ignores the
address hint with RANDMMAP in that code, which would explain it there.


Re: [PATCH v2] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE

2017-06-27 Thread Daniel Micay
On Tue, 2017-06-27 at 16:49 +0200, Michal Hocko wrote:
> On Wed 21-06-17 10:32:01, Kees Cook wrote:
> > The ELF_ET_DYN_BASE position was originally intended to keep loaders
> > away from ET_EXEC binaries. (For example, running "/lib/ld-
> > linux.so.2
> > /bin/cat" might cause the subsequent load of /bin/cat into where the
> > loader had been loaded.) With the advent of PIE (ET_DYN binaries
> > with
> > an INTERP Program Header), ELF_ET_DYN_BASE continued to be used
> > since
> > the kernel was only looking at ET_DYN. However, since
> > ELF_ET_DYN_BASE
> > is traditionally set at the top 1/3rd of the TASK_SIZE, a
> > substantial
> > portion of the address space is unused.
> > 
> > For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs
> > are loaded below the mmap region. This means they can be made to
> > collide
> > (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
> > pathological
> > stack regions. Lowering ELF_ET_DYN_BASE solves both by moving
> > programs
> > above the mmap region in all cases, and will now additionally avoid
> > programs falling back to the mmap region by enforcing MAP_FIXED for
> > program loads (i.e. if it would have collided with the stack, now it
> > will fail to load instead of falling back to the mmap region).
> 
> I do not understand this part. MAP_FIXED will simply unmap whatever
> was under the requested range, how it could help failing anything? So
> what would happen if something was mapped in that region, or is this
> impossible? Moreover MAP_FIXED close to stack will inhibit the stack
> gap
> protection.

I don't think there's a reason to use MAP_FIXED. PaX likely ignores the
address hint with RANDMMAP in that code, which would explain it there.


Re: [kernel-hardening] non-x86 per-task stack canaries

2017-06-26 Thread Daniel Micay
On Mon, 2017-06-26 at 14:04 -0700, Kees Cook wrote:
> Hi,
> 
> The stack protector functionality on x86_64 uses %gs:0x28 (%gs is the
> percpu area) for __stack_chk_guard, and all other architectures use a
> global variable instead. This means we never change the stack canary
> on non-x86 architectures which allows for a leak in one task to expose
> the canary in another task.
> 
> I'm curious what thoughts people may have about how to get this
> correctly implemented. Teaching the compiler about per-cpu data sounds
> exciting. :)
> 
> -Kees

arm64 has many integer registers so I don't think reserving one would
hurt performance, especially in the kernel where hot numeric loops
barely exist. It would reduce the cost of SSP by getting rid of the
memory read for the canary value. On the other hand, using per-cpu data
would likely be higher cost than the global. x86 has segment registers
but most archs probably need to do something more painful.

It's safe as long as it's a callee-saved register. It should be enforced
that there's no assembly spilling it and calling into C code without the
random canary. There's very little assembly using registers like x28 so
it wouldn't be that bad. It's possible there's one where nothing needs
to be changed, there only needs to be a check to make sure it stays that
way.

It would be a step towards making SSP cheap enough to expand it into a
feature like the StackGuard XOR canaries.

Samsung has a return address XOR feature based on reserving a register
and while RAP's probabilistic return address mitigation isn't open-
source, it was stated that it reserves a register on x86_64 where they
aren't as plentiful as arm64.


Re: [kernel-hardening] non-x86 per-task stack canaries

2017-06-26 Thread Daniel Micay
On Mon, 2017-06-26 at 14:04 -0700, Kees Cook wrote:
> Hi,
> 
> The stack protector functionality on x86_64 uses %gs:0x28 (%gs is the
> percpu area) for __stack_chk_guard, and all other architectures use a
> global variable instead. This means we never change the stack canary
> on non-x86 architectures which allows for a leak in one task to expose
> the canary in another task.
> 
> I'm curious what thoughts people may have about how to get this
> correctly implemented. Teaching the compiler about per-cpu data sounds
> exciting. :)
> 
> -Kees

arm64 has many integer registers so I don't think reserving one would
hurt performance, especially in the kernel where hot numeric loops
barely exist. It would reduce the cost of SSP by getting rid of the
memory read for the canary value. On the other hand, using per-cpu data
would likely be higher cost than the global. x86 has segment registers
but most archs probably need to do something more painful.

It's safe as long as it's a callee-saved register. It should be enforced
that there's no assembly spilling it and calling into C code without the
random canary. There's very little assembly using registers like x28 so
it wouldn't be that bad. It's possible there's one where nothing needs
to be changed, there only needs to be a check to make sure it stays that
way.

It would be a step towards making SSP cheap enough to expand it into a
feature like the StackGuard XOR canaries.

Samsung has a return address XOR feature based on reserving a register
and while RAP's probabilistic return address mitigation isn't open-
source, it was stated that it reserves a register on x86_64 where they
aren't as plentiful as arm64.


Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE

2017-06-21 Thread Daniel Micay
On Wed, 2017-06-21 at 10:28 -0700, Kees Cook wrote:
> On Wed, Jun 21, 2017 at 10:27 AM, Daniel Micay <danielmi...@gmail.com>
> wrote:
> > On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote:
> > > On Wed, Jun 21, 2017 at 5:07 AM, Rik van Riel <r...@redhat.com>
> > > wrote:
> > > > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote:
> > > > > +/* This is the base location for PIE (ET_DYN with INTERP)
> > > > > loads.
> > > > > */
> > > > > +#define ELF_ET_DYN_BASE  0x40UL
> > > > 
> > > > This value is good for 32 bit binaries, but for 64
> > > > bit binaries you probably want to put ELF_ET_DYN_BASE
> > > > at 4GB or higher.
> > > > 
> > > > The latter is necessary because Android uses the
> > > > lower 4GB of address space for its JVM runtime,
> > > > with 32 bit pointers inside that part of the otherwise
> > > > 64 bit address space.
> > > > 
> > > > In other words:
> > > > 
> > > > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL :
> > > > 0x1UL)
> > > 
> > > Ah, interesting. Okay, that should be fine. I'll adjust it.
> > > 
> > > > > +++ b/fs/binfmt_elf.c
> > > > > 
> > > > > +  * Therefore, programs are loaded offset
> > > > > from
> > > > > +  * ELF_ET_DYN_BASE and loaders are
> > > > > loaded
> > > > > into the
> > > > > +  * independently randomized mmap region
> > > > > (0
> > > > > load_bias
> > > > > +  * without MAP_FIXED).
> > > > > +  */
> > > > > + if (elf_interpreter) {
> > > > > + load_bias = ELF_ET_DYN_BASE;
> > > > > + if (current->flags &
> > > > > PF_RANDOMIZE)
> > > > > + load_bias +=
> > > > > arch_mmap_rnd();
> > > > > + elf_flags |= MAP_FIXED;
> > > > > + } else
> > > > > + load_bias = 0;
> > > > > +
> > > > > + load_bias -= vaddr;
> > > > 
> > > > I like this, and the big comment telling people how it
> > > > works :)
> > > 
> > > Thanks! It looks like your patch for commenting load_bias never
> > > got
> > > picked up, so I've added some more comments for that and some
> > > other
> > > things too. (Mostly for all the stuff I have to read a few times
> > > when
> > > I look at this code.)
> > > 
> > > -Kees
> > > 
> > 
> > The stack rlimit calculation fix for space potentially lost to ASLR
> > is
> > probably still needed too, right?
> 
> Yes. Was that picked up by akpm already?
> 
> -Kees

I think it was dropped when the ET_DYN changes were dropped.


Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE

2017-06-21 Thread Daniel Micay
On Wed, 2017-06-21 at 10:28 -0700, Kees Cook wrote:
> On Wed, Jun 21, 2017 at 10:27 AM, Daniel Micay 
> wrote:
> > On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote:
> > > On Wed, Jun 21, 2017 at 5:07 AM, Rik van Riel 
> > > wrote:
> > > > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote:
> > > > > +/* This is the base location for PIE (ET_DYN with INTERP)
> > > > > loads.
> > > > > */
> > > > > +#define ELF_ET_DYN_BASE  0x40UL
> > > > 
> > > > This value is good for 32 bit binaries, but for 64
> > > > bit binaries you probably want to put ELF_ET_DYN_BASE
> > > > at 4GB or higher.
> > > > 
> > > > The latter is necessary because Android uses the
> > > > lower 4GB of address space for its JVM runtime,
> > > > with 32 bit pointers inside that part of the otherwise
> > > > 64 bit address space.
> > > > 
> > > > In other words:
> > > > 
> > > > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL :
> > > > 0x1UL)
> > > 
> > > Ah, interesting. Okay, that should be fine. I'll adjust it.
> > > 
> > > > > +++ b/fs/binfmt_elf.c
> > > > > 
> > > > > +  * Therefore, programs are loaded offset
> > > > > from
> > > > > +  * ELF_ET_DYN_BASE and loaders are
> > > > > loaded
> > > > > into the
> > > > > +  * independently randomized mmap region
> > > > > (0
> > > > > load_bias
> > > > > +  * without MAP_FIXED).
> > > > > +  */
> > > > > + if (elf_interpreter) {
> > > > > + load_bias = ELF_ET_DYN_BASE;
> > > > > + if (current->flags &
> > > > > PF_RANDOMIZE)
> > > > > + load_bias +=
> > > > > arch_mmap_rnd();
> > > > > + elf_flags |= MAP_FIXED;
> > > > > + } else
> > > > > + load_bias = 0;
> > > > > +
> > > > > + load_bias -= vaddr;
> > > > 
> > > > I like this, and the big comment telling people how it
> > > > works :)
> > > 
> > > Thanks! It looks like your patch for commenting load_bias never
> > > got
> > > picked up, so I've added some more comments for that and some
> > > other
> > > things too. (Mostly for all the stuff I have to read a few times
> > > when
> > > I look at this code.)
> > > 
> > > -Kees
> > > 
> > 
> > The stack rlimit calculation fix for space potentially lost to ASLR
> > is
> > probably still needed too, right?
> 
> Yes. Was that picked up by akpm already?
> 
> -Kees

I think it was dropped when the ET_DYN changes were dropped.


Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE

2017-06-21 Thread Daniel Micay
On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote:
> On Wed, Jun 21, 2017 at 5:07 AM, Rik van Riel  wrote:
> > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote:
> > > +/* This is the base location for PIE (ET_DYN with INTERP) loads.
> > > */
> > > +#define ELF_ET_DYN_BASE  0x40UL
> > 
> > This value is good for 32 bit binaries, but for 64
> > bit binaries you probably want to put ELF_ET_DYN_BASE
> > at 4GB or higher.
> > 
> > The latter is necessary because Android uses the
> > lower 4GB of address space for its JVM runtime,
> > with 32 bit pointers inside that part of the otherwise
> > 64 bit address space.
> > 
> > In other words:
> > 
> > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL :
> > 0x1UL)
> 
> Ah, interesting. Okay, that should be fine. I'll adjust it.
> 
> > > +++ b/fs/binfmt_elf.c
> > > 
> > > +  * Therefore, programs are loaded offset
> > > from
> > > +  * ELF_ET_DYN_BASE and loaders are loaded
> > > into the
> > > +  * independently randomized mmap region (0
> > > load_bias
> > > +  * without MAP_FIXED).
> > > +  */
> > > + if (elf_interpreter) {
> > > + load_bias = ELF_ET_DYN_BASE;
> > > + if (current->flags & PF_RANDOMIZE)
> > > + load_bias +=
> > > arch_mmap_rnd();
> > > + elf_flags |= MAP_FIXED;
> > > + } else
> > > + load_bias = 0;
> > > +
> > > + load_bias -= vaddr;
> > 
> > I like this, and the big comment telling people how it
> > works :)
> 
> Thanks! It looks like your patch for commenting load_bias never got
> picked up, so I've added some more comments for that and some other
> things too. (Mostly for all the stuff I have to read a few times when
> I look at this code.)
> 
> -Kees
> 

The stack rlimit calculation fix for space potentially lost to ASLR is
probably still needed too, right?


Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE

2017-06-21 Thread Daniel Micay
On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote:
> On Wed, Jun 21, 2017 at 5:07 AM, Rik van Riel  wrote:
> > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote:
> > > +/* This is the base location for PIE (ET_DYN with INTERP) loads.
> > > */
> > > +#define ELF_ET_DYN_BASE  0x40UL
> > 
> > This value is good for 32 bit binaries, but for 64
> > bit binaries you probably want to put ELF_ET_DYN_BASE
> > at 4GB or higher.
> > 
> > The latter is necessary because Android uses the
> > lower 4GB of address space for its JVM runtime,
> > with 32 bit pointers inside that part of the otherwise
> > 64 bit address space.
> > 
> > In other words:
> > 
> > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL :
> > 0x1UL)
> 
> Ah, interesting. Okay, that should be fine. I'll adjust it.
> 
> > > +++ b/fs/binfmt_elf.c
> > > 
> > > +  * Therefore, programs are loaded offset
> > > from
> > > +  * ELF_ET_DYN_BASE and loaders are loaded
> > > into the
> > > +  * independently randomized mmap region (0
> > > load_bias
> > > +  * without MAP_FIXED).
> > > +  */
> > > + if (elf_interpreter) {
> > > + load_bias = ELF_ET_DYN_BASE;
> > > + if (current->flags & PF_RANDOMIZE)
> > > + load_bias +=
> > > arch_mmap_rnd();
> > > + elf_flags |= MAP_FIXED;
> > > + } else
> > > + load_bias = 0;
> > > +
> > > + load_bias -= vaddr;
> > 
> > I like this, and the big comment telling people how it
> > works :)
> 
> Thanks! It looks like your patch for commenting load_bias never got
> picked up, so I've added some more comments for that and some other
> things too. (Mostly for all the stuff I have to read a few times when
> I look at this code.)
> 
> -Kees
> 

The stack rlimit calculation fix for space potentially lost to ASLR is
probably still needed too, right?


Re: [kernel-hardening] [PATCH 23/23] mm: Allow slab_nomerge to be set at build time

2017-06-19 Thread Daniel Micay
On Mon, 2017-06-19 at 16:36 -0700, Kees Cook wrote:
> Some hardened environments want to build kernels with slab_nomerge
> already set (so that they do not depend on remembering to set the
> kernel
> command line option). This is desired to reduce the risk of kernel
> heap
> overflows being able to overwrite objects from merged caches,
> increasing
> the difficulty of these attacks. By keeping caches unmerged, these
> kinds
> of exploits can usually only damage objects in the same cache (though
> the
> risk to metadata exploitation is unchanged).

It also further fragments the ability to influence slab cache layout,
i.e. primitives to do things like filling up slabs to set things up for
an exploit might not be able to deal with the target slabs anymore. It
doesn't need to be mentioned but it's something to think about too. In
theory, disabling merging can make it *easier* to get the right layout
too if there was some annoyance that's now split away. It's definitely a
lot more good than bad for security though, but allocator changes have
subtle impact on exploitation. This can make caches more deterministic.


Re: [kernel-hardening] [PATCH 23/23] mm: Allow slab_nomerge to be set at build time

2017-06-19 Thread Daniel Micay
On Mon, 2017-06-19 at 16:36 -0700, Kees Cook wrote:
> Some hardened environments want to build kernels with slab_nomerge
> already set (so that they do not depend on remembering to set the
> kernel
> command line option). This is desired to reduce the risk of kernel
> heap
> overflows being able to overwrite objects from merged caches,
> increasing
> the difficulty of these attacks. By keeping caches unmerged, these
> kinds
> of exploits can usually only damage objects in the same cache (though
> the
> risk to metadata exploitation is unchanged).

It also further fragments the ability to influence slab cache layout,
i.e. primitives to do things like filling up slabs to set things up for
an exploit might not be able to deal with the target slabs anymore. It
doesn't need to be mentioned but it's something to think about too. In
theory, disabling merging can make it *easier* to get the right layout
too if there was some annoyance that's now split away. It's definitely a
lot more good than bad for security though, but allocator changes have
subtle impact on exploitation. This can make caches more deterministic.


[PATCH v5] add the option of fortified string.h functions

2017-06-18 Thread Daniel Micay
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time. Unlike glibc,
it covers buffer reads in addition to writes.

GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation. They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks. Inline checks don't add up to much code size and allow
full use of the regular string intrinsics while avoiding the need for a
bunch of _chk functions and per-arch assembly to avoid wrapper overhead.

This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code. There will likely be issues caught in
regular use at runtime too.

Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:

* Some of the fortified string functions (strncpy, strcat), don't yet
  place a limit on reads from the source based on __builtin_object_size of
  the source buffer.

* Extending coverage to more string functions like strlcat.

* It should be possible to optionally use __builtin_object_size(x, 1) for
  some functions (C strings) to detect intra-object overflows (like
  glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
  approach to avoid likely compatibility issues.

* The compile-time checks should be made available via a separate config
  option which can be enabled by default (or always enabled) once enough
  time has passed to get the issues it catches fixed.

Signed-off-by: Daniel Micay <danielmi...@gmail.com>
---
Changes since v4:
- avoid overly aggressive strnlen check for non-null-terminated strings

 arch/arm64/include/asm/string.h  |   5 +
 arch/x86/boot/compressed/misc.c  |   5 +
 arch/x86/include/asm/string_32.h |   9 ++
 arch/x86/include/asm/string_64.h |   7 ++
 include/linux/string.h   | 200 +++
 lib/string.c |   6 ++
 security/Kconfig |   6 ++
 7 files changed, 238 insertions(+)

diff --git a/arch/arm64/include/asm/string.h b/arch/arm64/include/asm/string.h
index 2eb714c4639f..d0aa42907569 100644
--- a/arch/arm64/include/asm/string.h
+++ b/arch/arm64/include/asm/string.h
@@ -63,6 +63,11 @@ extern int memcmp(const void *, const void *, size_t);
 #define memcpy(dst, src, len) __memcpy(dst, src, len)
 #define memmove(dst, src, len) __memmove(dst, src, len)
 #define memset(s, c, n) __memset(s, c, n)
+
+#ifndef __NO_FORTIFY
+#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
+#endif
+
 #endif
 
 #endif
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index b3c5a5f030ce..43691238a21d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -409,3 +409,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
debug_putstr("done.\nBooting the kernel.\n");
return output;
 }
+
+void fortify_panic(const char *name)
+{
+   error("detected buffer overflow");
+}
diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index 3d3e8353ee5c..e9ee84873de5 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -142,7 +142,9 @@ static __always_inline void *__constant_memcpy(void *to, 
const void *from,
 }
 
 #define __HAVE_ARCH_MEMCPY
+extern void *memcpy(void *, const void *, size_t);
 
+#ifndef CONFIG_FORTIFY_SOURCE
 #ifdef CONFIG_X86_USE_3DNOW
 
 #include 
@@ -195,11 +197,15 @@ static inline void *__memcpy3d(void *to, const void 
*from, size_t len)
 #endif
 
 #endif
+#endif /* !CONFIG_FORTIFY_SOURCE */
 
 #define __HAVE_ARCH_MEMMOVE
 void *memmove(void *dest, const void *src, size_t n);
 
+extern int memcmp(const void *, const void *, size_t);
+#ifndef CONFIG_FORTIFY_SOURCE
 #define memcmp __builtin_memcmp
+#endif
 
 #define __HAVE_ARCH_MEMCHR
 extern void *memchr(const void *cs, int c, size_t count);
@@ -321,6 +327,8 @@ void *__constant_c_and_count_memset(void *s, unsigned long 
pattern,
 : __memset_generic((s), (c), (count)))
 
 #define __HAVE_ARCH_MEMSET
+extern void *memset(void *, int, size_t);
+#ifndef CONFIG_FORTIFY_SOURCE
 #if (__GNUC__ >= 4)
 #define memset(s, c, count) __builtin_memset(s, c, count)
 #else
@@ -330,6 +338,7 @@ void *__constant_c_and_count_memset(void *s, unsigned long 
pattern,
 (count))   \
 : __memset((s), (c), (count)))
 #endif
+#endif /* !CONFIG_FORTIFY_SOURCE */
 
 /*
  * find the first occurrence of byte 'c', or 1 past the area if none
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 733bae07fb29..309fe644569f 100644
--- a/arch/x86/

  1   2   3   4   >