Re: [PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free
It will reuse the memory for other things when the whole slab is freed though. Not really realistic to change that without it being backed by virtual memory along with higher-level management of regions to avoid intense fragmentation and metadata waste. It would depend a lot on having much finer-grained slab caches, otherwise it's not going to be much of an alternative to a quarantine feature. Even then, a quarantine feature is still useful, but is less suitable for a mainstream feature due to performance cost. Even a small quarantine has a fairly high performance cost.
Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()
Yeah, a no-op pkey_alloc flag tied to this patch to provide a way to detect if pkey state is preserved on fork, since kernels without the patch would report EINVAL. Something like PKEY_ASSERT_FORK_INHERIT_STATE would make sense. Otherwise, it's going to be quite painful to adopt this in userspace software without kernel version checks. Most software can't accept causing a crash / abort after forking in environments not applying all the stable kernel patches, or in fresh OS installs that aren't fully updated + rebooted yet.
Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()
Yeah, a no-op pkey_alloc flag tied to this patch to provide a way to detect if pkey state is preserved on fork, since kernels without the patch would report EINVAL. Something like PKEY_ASSERT_FORK_INHERIT_STATE would make sense. Otherwise, it's going to be quite painful to adopt this in userspace software without kernel version checks. Most software can't accept causing a crash / abort after forking in environments not applying all the stable kernel patches, or in fresh OS installs that aren't fully updated + rebooted yet.
Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()
On Fri, 26 Oct 2018 at 18:12, Andy Lutomirski wrote: > > > > > On Oct 26, 2018, at 2:39 PM, Daniel Micay wrote: > > > > I ended up working around this with a pthread_atfork handler disabling > > my usage of the feature in the child process for the time being. I > > don't have an easy way to detect if the bug is present within a > > library so > > Can you not just make sure that the fix is backported to all relevant kernels? There are too many different distribution kernels and I won't be in control of where the software is used. > I suppose we could add a new flag for pkey_get() or something. That would work, since I can apply the workaround (disabling the feature in child processes) if I get EINVAL. The flag wouldn't need to do anything, just existing and being tied to this patch so I have a way to detect that I can safely use MPK after fork. > > I'm going to need a kernel version check with a table of > > kernel releases fixing the problem for each stable branch. > > That won’t work right on district kernels. Please don’t go there. If it's backported to an earlier version, it will just mean missing a chance to use it. I'm not going to assume that it behaves a certain way based on having an old kernel, but rather I just won't use it in a forked child on an older kernel version if I don't have a way to detect the problem. It's for a few different uses in library code so I can't have a runtime test trying to detect it with clone(...). I'd definitely prefer a proper way to detect that I can use it after fork. I really don't want to have a hack like that which is why I'm bringing it up.
Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()
On Fri, 26 Oct 2018 at 18:12, Andy Lutomirski wrote: > > > > > On Oct 26, 2018, at 2:39 PM, Daniel Micay wrote: > > > > I ended up working around this with a pthread_atfork handler disabling > > my usage of the feature in the child process for the time being. I > > don't have an easy way to detect if the bug is present within a > > library so > > Can you not just make sure that the fix is backported to all relevant kernels? There are too many different distribution kernels and I won't be in control of where the software is used. > I suppose we could add a new flag for pkey_get() or something. That would work, since I can apply the workaround (disabling the feature in child processes) if I get EINVAL. The flag wouldn't need to do anything, just existing and being tied to this patch so I have a way to detect that I can safely use MPK after fork. > > I'm going to need a kernel version check with a table of > > kernel releases fixing the problem for each stable branch. > > That won’t work right on district kernels. Please don’t go there. If it's backported to an earlier version, it will just mean missing a chance to use it. I'm not going to assume that it behaves a certain way based on having an old kernel, but rather I just won't use it in a forked child on an older kernel version if I don't have a way to detect the problem. It's for a few different uses in library code so I can't have a runtime test trying to detect it with clone(...). I'd definitely prefer a proper way to detect that I can use it after fork. I really don't want to have a hack like that which is why I'm bringing it up.
Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()
I ended up working around this with a pthread_atfork handler disabling my usage of the feature in the child process for the time being. I don't have an easy way to detect if the bug is present within a library so I'm going to need a kernel version check with a table of kernel releases fixing the problem for each stable branch. It would be helpful if there was a new cpuinfo flag to check if the MPK state is preserved on fork in addition to the existing ospke flag. The problem will fade away over time but in my experience there are a lot of people using distributions with kernels not incorporating all of the stable fixes. I expect other people will run into the problem once hardware with MPK is more widely available and other people try to use it for various things like moving GC or assorted security features. Someone will end up running software adopting it on an older kernel with the problem. The clobbering issue I found with MAP_FIXED_NOREPLACE isn't quite as annoying because it was easy to make a runtime test usable in a library to see if the feature works properly.
Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()
I ended up working around this with a pthread_atfork handler disabling my usage of the feature in the child process for the time being. I don't have an easy way to detect if the bug is present within a library so I'm going to need a kernel version check with a table of kernel releases fixing the problem for each stable branch. It would be helpful if there was a new cpuinfo flag to check if the MPK state is preserved on fork in addition to the existing ospke flag. The problem will fade away over time but in my experience there are a lot of people using distributions with kernels not incorporating all of the stable fixes. I expect other people will run into the problem once hardware with MPK is more widely available and other people try to use it for various things like moving GC or assorted security features. Someone will end up running software adopting it on an older kernel with the problem. The clobbering issue I found with MAP_FIXED_NOREPLACE isn't quite as annoying because it was easy to make a runtime test usable in a library to see if the feature works properly.
Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4
Well, that's not related, it's just this: #ifdef __GNUC__ #if (__GNUC__ == 3 && __GNUC_MINOR__ < 3) #error Your compiler is too buggy; it is known to miscompile kernels. #errorKnown good compilers: 3.3, 4.x #endif #if GCC_VERSION >= 40800 && GCC_VERSION < 40803 #error Your compiler is too buggy; it is known to miscompile kernels #error and result in filesystem corruption and oopses. #endif #endif
Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4
Well, that's not related, it's just this: #ifdef __GNUC__ #if (__GNUC__ == 3 && __GNUC_MINOR__ < 3) #error Your compiler is too buggy; it is known to miscompile kernels. #errorKnown good compilers: 3.3, 4.x #endif #if GCC_VERSION >= 40800 && GCC_VERSION < 40803 #error Your compiler is too buggy; it is known to miscompile kernels #error and result in filesystem corruption and oopses. #endif #endif
Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4
On 20 April 2018 at 15:15, Pavel Machekwrote: > Hi! > >> >> Hi! Sorry I lost this email in my inbox. It seems this is specific to >> >> a particular subset of arm architectures? (My local builds of arm all >> >> succeed, for example. Can you send your failing config?) I'll take a >> >> closer look on Monday if Daniel doesn't beat me to it. >> > >> > Daniel, Kees: any news? >> > >> > I'm aware you did not specify which Monday :-). >> >> Hi! Sorry, I got distracted. So the .config you sent me builds fine >> with my cross compiler. I suspect this is something specific to ELDK's >> compiler. I can try some other compiler versions. What version of gcc >> is failing? > > I have: > > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version > arm-linux-gnueabi-gcc (GCC) 4.7.2 > Copyright (C) 2012 Free Software Foundation, Inc. > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version > GNU ld (GNU Binutils) 2.23.1.20121113 > Copyright 2012 Free Software Foundation, Inc. > > Let me try with eldk-5.6: > > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version > arm-linux-gnueabi-gcc (GCC) 4.8.2 > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version > GNU ld (GNU Binutils) 2.24 > > make lib/string.o > > Seems to succeed, so I guess full build will succeed, too... It doesn't imply that a full build would work.
Re: [regression v4.17-rc0] Re: FORTIFY_SOURCE breaks ARM compilation in -next -- was Re: ARM compile failure in Re: linux-next: Tree for Apr 4
On 20 April 2018 at 15:15, Pavel Machek wrote: > Hi! > >> >> Hi! Sorry I lost this email in my inbox. It seems this is specific to >> >> a particular subset of arm architectures? (My local builds of arm all >> >> succeed, for example. Can you send your failing config?) I'll take a >> >> closer look on Monday if Daniel doesn't beat me to it. >> > >> > Daniel, Kees: any news? >> > >> > I'm aware you did not specify which Monday :-). >> >> Hi! Sorry, I got distracted. So the .config you sent me builds fine >> with my cross compiler. I suspect this is something specific to ELDK's >> compiler. I can try some other compiler versions. What version of gcc >> is failing? > > I have: > > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version > arm-linux-gnueabi-gcc (GCC) 4.7.2 > Copyright (C) 2012 Free Software Foundation, Inc. > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version > GNU ld (GNU Binutils) 2.23.1.20121113 > Copyright 2012 Free Software Foundation, Inc. > > Let me try with eldk-5.6: > > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-gcc --version > arm-linux-gnueabi-gcc (GCC) 4.8.2 > pavel@duo:/data/l/linux-n900$ arm-linux-gnueabi-ld --version > GNU ld (GNU Binutils) 2.24 > > make lib/string.o > > Seems to succeed, so I guess full build will succeed, too... It doesn't imply that a full build would work.
Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage
> I don't think the difference between C and C++ const pointer > conversions I mean I don't think the difference between them was intended.
Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage
> I don't think the difference between C and C++ const pointer > conversions I mean I don't think the difference between them was intended.
Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage
No, it's undefined behavior to write to a const variable. The `static` and `const` on the variable both change the code generation in the real world as permitted / encouraged by the standard. It's placed in read-only memory. Trying to write to it will break. It's not "implemented defined" to write to it, it's "undefined behavior" i.e. it's considered incorrect. There a clear distinction between those in the standard. You're confusing having a real `const` for a variable with having it applied to a pointer. It's well-defined to cast away const from a pointer and write to what it points at if it's not actually const. If it is const, that's broken. There's nothing implementation defined about either case. The C standard could have considered `static const` variables to work as constant expressions just like the C++ standard. They borrowed it from there but made it less useful than const in what became the C++ standard. They also used stricter rules for the permitted implicit conversions of const pointers which made those much less usable, i.e. converting `int **` to `const int *const *` wasn't permitted like C++. I don't think the difference between C and C++ const pointer conversions, it's a quirk of them being standardized on different timelines and ending up with different versions of the same thing. On the other hand, they might have left out having ever so slightly more useful constant expressions on purpose since people can use #define.
Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage
No, it's undefined behavior to write to a const variable. The `static` and `const` on the variable both change the code generation in the real world as permitted / encouraged by the standard. It's placed in read-only memory. Trying to write to it will break. It's not "implemented defined" to write to it, it's "undefined behavior" i.e. it's considered incorrect. There a clear distinction between those in the standard. You're confusing having a real `const` for a variable with having it applied to a pointer. It's well-defined to cast away const from a pointer and write to what it points at if it's not actually const. If it is const, that's broken. There's nothing implementation defined about either case. The C standard could have considered `static const` variables to work as constant expressions just like the C++ standard. They borrowed it from there but made it less useful than const in what became the C++ standard. They also used stricter rules for the permitted implicit conversions of const pointers which made those much less usable, i.e. converting `int **` to `const int *const *` wasn't permitted like C++. I don't think the difference between C and C++ const pointer conversions, it's a quirk of them being standardized on different timelines and ending up with different versions of the same thing. On the other hand, they might have left out having ever so slightly more useful constant expressions on purpose since people can use #define.
Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage
> Just wondering. Is this actually a VLA. FFT_NUM_SAMPLES was static const so > not really going to show a lot of variation. This array will always have the > same size on the stack. The issue is that unlike in C++, a `static const` can't be used in a constant expression in C. It's unclear why C is defined that way but it's how it is and there isn't currently a GCC extension making more things into constant expressions like C++.
Re: [PATCH] drivers: net: wireless: ath: ath9: dfs: remove VLA usage
> Just wondering. Is this actually a VLA. FFT_NUM_SAMPLES was static const so > not really going to show a lot of variation. This array will always have the > same size on the stack. The issue is that unlike in C++, a `static const` can't be used in a constant expression in C. It's unclear why C is defined that way but it's how it is and there isn't currently a GCC extension making more things into constant expressions like C++.
Re: VLA removal (was Re: [RFC 2/2] lustre: use VLA_SAFE)
On 7 March 2018 at 13:09, Linus Torvaldswrote: > On Wed, Mar 7, 2018 at 9:37 AM, Kees Cook wrote: >> >> Building with -Wvla, I see 209 unique locations reported in 60 directories: >> http://paste.ubuntu.com/p/srQxwPQS9s/ > > Ok, that's not so bad. Maybe Greg could even add it to one of those > things he encourages new people to do? > > Because at least *some* of them are pretty trivial. For example, > looking at the core code, I was surprised to see something in > lib/btree.c Some are probably just the issue of technically having a VLA that's not really a VLA: static const int size = 5; void foo(void) { int x[size]; } % gcc -c -Wvla foo.c foo.c: In function ‘foo’: foo.c:4:3: warning: ISO C90 forbids variable length array ‘x’ [-Wvla] int x[size]; ^~~ I don't really understand why the C standard didn't make `static const` declarations usable as constant expressions like C++. They made the pointer conversions more painful too. It would be nice to get rid of those cases to use -Werror=vla though.
Re: VLA removal (was Re: [RFC 2/2] lustre: use VLA_SAFE)
On 7 March 2018 at 13:09, Linus Torvalds wrote: > On Wed, Mar 7, 2018 at 9:37 AM, Kees Cook wrote: >> >> Building with -Wvla, I see 209 unique locations reported in 60 directories: >> http://paste.ubuntu.com/p/srQxwPQS9s/ > > Ok, that's not so bad. Maybe Greg could even add it to one of those > things he encourages new people to do? > > Because at least *some* of them are pretty trivial. For example, > looking at the core code, I was surprised to see something in > lib/btree.c Some are probably just the issue of technically having a VLA that's not really a VLA: static const int size = 5; void foo(void) { int x[size]; } % gcc -c -Wvla foo.c foo.c: In function ‘foo’: foo.c:4:3: warning: ISO C90 forbids variable length array ‘x’ [-Wvla] int x[size]; ^~~ I don't really understand why the C standard didn't make `static const` declarations usable as constant expressions like C++. They made the pointer conversions more painful too. It would be nice to get rid of those cases to use -Werror=vla though.
Re: [RFC PATCH] Randomization of address chosen by mmap.
On 5 March 2018 at 08:09, Ilya Smithwrote: > >> On 4 Mar 2018, at 23:56, Matthew Wilcox wrote: >> Thinking about this more ... >> >> - When you call munmap, if you pass in the same (addr, length) that were >> used for mmap, then it should unmap the guard pages as well (that >> wasn't part of the patch, so it would have to be added) >> - If 'addr' is higher than the mapped address, and length at least >> reaches the end of the mapping, then I would expect the guard pages to >> "move down" and be after the end of the newly-shortened mapping. >> - If 'addr' is higher than the mapped address, and the length doesn't >> reach the end of the old mapping, we split the old mapping into two. >> I would expect the guard pages to apply to both mappings, insofar as >> they'll fit. For an example, suppose we have a five-page mapping with >> two guard pages (MGG), and then we unmap the fourth page. Now we >> have a three-page mapping with one guard page followed immediately >> by a one-page mapping with two guard pages (MMMGMGG). > > I’m analysing that approach and see much more problems: > - each time you call mmap like this, you still increase count of vmas as my > patch did > - now feature vma_merge shouldn’t work at all, until MAP_FIXED is set or > PROT_GUARD(0) > - the entropy you provide is like 16 bit, that is really not so hard to brute > - in your patch you don’t use vm_guard at address searching, I see many roots > of bugs here > - if you unmap/remap one page inside region, field vma_guard will show head > or tail pages for vma, not both; kernel don’t know how to handle it > - user mode now choose entropy with PROT_GUARD macro, where did he gets it? > User mode shouldn’t be responsible for entropy at all I didn't suggest this as the way of implementing fine-grained randomization but rather a small starting point for hardening address space layout further. I don't think it should be tied to a mmap flag but rather something like a personality flag or a global sysctl. It doesn't need to be random at all to be valuable, and it's just a first step. It doesn't mean there can't be switches between random pivots like OpenBSD mmap, etc. I'm not so sure that randomly switching around is going to result in isolating things very well though. The VMA count issue is at least something very predictable with a performance cost only for kernel operations. > I can’t understand what direction this conversation is going to. I was talking > about weak implementation in Linux kernel but got many comments about ASLR > should be implemented in user mode what is really weird to me. That's not what I said. I was saying that splitting things into regions based on the type of allocation works really well and allows for high entropy bases, but that the kernel can't really do that right now. It could split up code that starts as PROT_EXEC into a region but that's generally not how libraries are mapped in so it won't know until mprotect which is obviously too late. Unless it had some kind of type key passed from userspace, it can't really do that. > I think it is possible to add GUARD pages into my implementations, but > initially > problem was about entropy of address choosing. I would like to resolve it > step by > step. Starting with fairly aggressive fragmentation of the address space is going to be a really hard sell. The costs of a very spread out address space in terms of TLB misses, etc. are unclear. Starting with enforced gaps (1 page) and randomization for those wouldn't rule out having finer-grained randomization, like randomly switching between different regions. This needs to be cheap enough that people want to enable it, and the goals need to be clearly spelled out. The goal needs to be clearer than "more randomization == good" and then accepting a high performance cost for that. I'm not dictating how things should be done, I don't have any say about that. I'm just trying to discuss it.
Re: [RFC PATCH] Randomization of address chosen by mmap.
On 5 March 2018 at 08:09, Ilya Smith wrote: > >> On 4 Mar 2018, at 23:56, Matthew Wilcox wrote: >> Thinking about this more ... >> >> - When you call munmap, if you pass in the same (addr, length) that were >> used for mmap, then it should unmap the guard pages as well (that >> wasn't part of the patch, so it would have to be added) >> - If 'addr' is higher than the mapped address, and length at least >> reaches the end of the mapping, then I would expect the guard pages to >> "move down" and be after the end of the newly-shortened mapping. >> - If 'addr' is higher than the mapped address, and the length doesn't >> reach the end of the old mapping, we split the old mapping into two. >> I would expect the guard pages to apply to both mappings, insofar as >> they'll fit. For an example, suppose we have a five-page mapping with >> two guard pages (MGG), and then we unmap the fourth page. Now we >> have a three-page mapping with one guard page followed immediately >> by a one-page mapping with two guard pages (MMMGMGG). > > I’m analysing that approach and see much more problems: > - each time you call mmap like this, you still increase count of vmas as my > patch did > - now feature vma_merge shouldn’t work at all, until MAP_FIXED is set or > PROT_GUARD(0) > - the entropy you provide is like 16 bit, that is really not so hard to brute > - in your patch you don’t use vm_guard at address searching, I see many roots > of bugs here > - if you unmap/remap one page inside region, field vma_guard will show head > or tail pages for vma, not both; kernel don’t know how to handle it > - user mode now choose entropy with PROT_GUARD macro, where did he gets it? > User mode shouldn’t be responsible for entropy at all I didn't suggest this as the way of implementing fine-grained randomization but rather a small starting point for hardening address space layout further. I don't think it should be tied to a mmap flag but rather something like a personality flag or a global sysctl. It doesn't need to be random at all to be valuable, and it's just a first step. It doesn't mean there can't be switches between random pivots like OpenBSD mmap, etc. I'm not so sure that randomly switching around is going to result in isolating things very well though. The VMA count issue is at least something very predictable with a performance cost only for kernel operations. > I can’t understand what direction this conversation is going to. I was talking > about weak implementation in Linux kernel but got many comments about ASLR > should be implemented in user mode what is really weird to me. That's not what I said. I was saying that splitting things into regions based on the type of allocation works really well and allows for high entropy bases, but that the kernel can't really do that right now. It could split up code that starts as PROT_EXEC into a region but that's generally not how libraries are mapped in so it won't know until mprotect which is obviously too late. Unless it had some kind of type key passed from userspace, it can't really do that. > I think it is possible to add GUARD pages into my implementations, but > initially > problem was about entropy of address choosing. I would like to resolve it > step by > step. Starting with fairly aggressive fragmentation of the address space is going to be a really hard sell. The costs of a very spread out address space in terms of TLB misses, etc. are unclear. Starting with enforced gaps (1 page) and randomization for those wouldn't rule out having finer-grained randomization, like randomly switching between different regions. This needs to be cheap enough that people want to enable it, and the goals need to be clearly spelled out. The goal needs to be clearer than "more randomization == good" and then accepting a high performance cost for that. I'm not dictating how things should be done, I don't have any say about that. I'm just trying to discuss it.
Re: [RFC PATCH] Randomization of address chosen by mmap.
On 3 March 2018 at 08:58, Ilya Smith <blackz...@gmail.com> wrote: > Hello Daniel, thanks for sharing you experience! > >> On 1 Mar 2018, at 00:02, Daniel Micay <danielmi...@gmail.com> wrote: >> >> I don't think it makes sense for the kernel to attempt mitigations to >> hide libraries. The best way to do that is in userspace, by having the >> linker reserve a large PROT_NONE region for mapping libraries (both at >> initialization and for dlopen) including a random gap to act as a >> separate ASLR base. > Why this is the best and what is the limit of this large region? > Let’s think out of the box. > What you say here means you made a separate memory region for libraries > without > changing kernel. But the basic idea - you have a separate region for libraries > only. Probably you also want to have separate regions for any thread stack, > for > mmaped files, shared memory, etc. This allows to protect memory regions of > different types from each other. It is impossible to implement this without > keeping the whole memory map. This map should be secure from any leak attack > to > prevent ASLR bypass. The only one way to do it is to implement it in the > kernel > and provide different syscalls like uselib or allocstack, etc. This one is > really hard in current kernel implementation. There's the option of reserving PROT_NONE regions and managing memory within them using a similar best-fit allocation scheme to get separate random bases. The kernel could offer something like that but it's already possible to do it for libc mmap usage within libc as we did for libraries. The kernel's help is needed to cover non-libc users of mmap, i.e. not the linker, malloc, etc. It's not possible for libc to assume that everything goes through the libc mmap/mremap/munmap wrappers and it would be a mess so I'm not saying the kernel doesn't have a part to play. I'm only saying it makes sense to look at the whole picture and if something can be done better in libc or the linker, to do it there instead. There isn't an API for dividing stuff up into regions, so it has to be done in userspace right now and I think it works a lot better when it's an option. > > My approach was to hide memory regions from attacker and from each other. > >> If an attacker has library addresses, it's hard to >> see much point in hiding the other libraries from them. > > In some cases attacker has only one leak for whole attack. And we should do > the best > to make even this leak useless. > >> It does make >> sense to keep them from knowing the location of any executable code if >> they leak non-library addresses. An isolated library region + gap is a >> feature we implemented in CopperheadOS and it works well, although we >> haven't ported it to Android 7.x or 8.x. > This one interesting to know and I would like to try to attack it, but it's > out of the > scope of current conversation. I don't think it's out-of-scope. There are different approaches to this kind of finer-grained randomization and they can be done together. >> I don't think the kernel can >> bring much / anything to the table for it. It's inherently the >> responsibility of libc to randomize the lower bits for secondary >> stacks too. > > I think any bit of secondary stack should be randomized to provide attacker as > less information as we can. The issue is that the kernel is only providing a mapping so it can add a random gap or randomize it in other ways but it's ultimately up to libc and other userspace code to do randomization without those mappings. A malloc implementation is similarly going to request fairly large mappings from the kernel to manage a bunch of stuff within them itself. The kernel can't protect against stuff like heap spray attacks very well all by itself. It definitely has a part to play in that but is a small piece of it (unless the malloc impl actually manages virtual memory regions itself, which is already done by performance-oriented allocators for very different reasons). >> Fine-grained randomized mmap isn't going to be used if it causes >> unpredictable levels of fragmentation or has a high / unpredictable >> performance cost. > > Lets pretend any chosen address is pure random and always satisfies request. > At > some time we failed to mmap new chunk with size N. What does this means? This > means that all chunks with size of N are occupied and we even can’t find place > between them. Now lets count already allocated memory. Let’s pretend on all of > these occupied chunks lies one page minimum. So count of these pages is > TASK_SIZE / N. Total bytes already allocated is PASGE_SIZE * TASK_SIZE / N. > Now > we can calculate. TASK_SIZE is 2^48 bytes. PAGE_SIZE 4096. If N is 1MB, > allo
Re: [RFC PATCH] Randomization of address chosen by mmap.
On 3 March 2018 at 08:58, Ilya Smith wrote: > Hello Daniel, thanks for sharing you experience! > >> On 1 Mar 2018, at 00:02, Daniel Micay wrote: >> >> I don't think it makes sense for the kernel to attempt mitigations to >> hide libraries. The best way to do that is in userspace, by having the >> linker reserve a large PROT_NONE region for mapping libraries (both at >> initialization and for dlopen) including a random gap to act as a >> separate ASLR base. > Why this is the best and what is the limit of this large region? > Let’s think out of the box. > What you say here means you made a separate memory region for libraries > without > changing kernel. But the basic idea - you have a separate region for libraries > only. Probably you also want to have separate regions for any thread stack, > for > mmaped files, shared memory, etc. This allows to protect memory regions of > different types from each other. It is impossible to implement this without > keeping the whole memory map. This map should be secure from any leak attack > to > prevent ASLR bypass. The only one way to do it is to implement it in the > kernel > and provide different syscalls like uselib or allocstack, etc. This one is > really hard in current kernel implementation. There's the option of reserving PROT_NONE regions and managing memory within them using a similar best-fit allocation scheme to get separate random bases. The kernel could offer something like that but it's already possible to do it for libc mmap usage within libc as we did for libraries. The kernel's help is needed to cover non-libc users of mmap, i.e. not the linker, malloc, etc. It's not possible for libc to assume that everything goes through the libc mmap/mremap/munmap wrappers and it would be a mess so I'm not saying the kernel doesn't have a part to play. I'm only saying it makes sense to look at the whole picture and if something can be done better in libc or the linker, to do it there instead. There isn't an API for dividing stuff up into regions, so it has to be done in userspace right now and I think it works a lot better when it's an option. > > My approach was to hide memory regions from attacker and from each other. > >> If an attacker has library addresses, it's hard to >> see much point in hiding the other libraries from them. > > In some cases attacker has only one leak for whole attack. And we should do > the best > to make even this leak useless. > >> It does make >> sense to keep them from knowing the location of any executable code if >> they leak non-library addresses. An isolated library region + gap is a >> feature we implemented in CopperheadOS and it works well, although we >> haven't ported it to Android 7.x or 8.x. > This one interesting to know and I would like to try to attack it, but it's > out of the > scope of current conversation. I don't think it's out-of-scope. There are different approaches to this kind of finer-grained randomization and they can be done together. >> I don't think the kernel can >> bring much / anything to the table for it. It's inherently the >> responsibility of libc to randomize the lower bits for secondary >> stacks too. > > I think any bit of secondary stack should be randomized to provide attacker as > less information as we can. The issue is that the kernel is only providing a mapping so it can add a random gap or randomize it in other ways but it's ultimately up to libc and other userspace code to do randomization without those mappings. A malloc implementation is similarly going to request fairly large mappings from the kernel to manage a bunch of stuff within them itself. The kernel can't protect against stuff like heap spray attacks very well all by itself. It definitely has a part to play in that but is a small piece of it (unless the malloc impl actually manages virtual memory regions itself, which is already done by performance-oriented allocators for very different reasons). >> Fine-grained randomized mmap isn't going to be used if it causes >> unpredictable levels of fragmentation or has a high / unpredictable >> performance cost. > > Lets pretend any chosen address is pure random and always satisfies request. > At > some time we failed to mmap new chunk with size N. What does this means? This > means that all chunks with size of N are occupied and we even can’t find place > between them. Now lets count already allocated memory. Let’s pretend on all of > these occupied chunks lies one page minimum. So count of these pages is > TASK_SIZE / N. Total bytes already allocated is PASGE_SIZE * TASK_SIZE / N. > Now > we can calculate. TASK_SIZE is 2^48 bytes. PAGE_SIZE 4096. If N is 1MB, > allocated memory minimum 1125899906842624,
Re: [RFC PATCH] Randomization of address chosen by mmap.
The option to add at least one guard page would be useful whether or not it's tied to randomization. It's not feasible to do that in userspace for mmap as a whole, only specific users of mmap like malloc and it adds significant overhead vs. a kernel implementation. It could optionally let you choose a minimum and maximum guard region size with it picking random sizes if they're not equal. It's important for it to be an enforced gap rather than something that can be filled in by another allocation. It will obviously help a lot more when it's being used with a hardened allocator designed to take advantage of this rather than glibc malloc or jemalloc. I don't think it makes sense for the kernel to attempt mitigations to hide libraries. The best way to do that is in userspace, by having the linker reserve a large PROT_NONE region for mapping libraries (both at initialization and for dlopen) including a random gap to act as a separate ASLR base. If an attacker has library addresses, it's hard to see much point in hiding the other libraries from them. It does make sense to keep them from knowing the location of any executable code if they leak non-library addresses. An isolated library region + gap is a feature we implemented in CopperheadOS and it works well, although we haven't ported it to Android 7.x or 8.x. I don't think the kernel can bring much / anything to the table for it. It's inherently the responsibility of libc to randomize the lower bits for secondary stacks too. Fine-grained randomized mmap isn't going to be used if it causes unpredictable levels of fragmentation or has a high / unpredictable performance cost. I don't think it makes sense to approach it aggressively in a way that people can't use. The OpenBSD randomized mmap is a fairly conservative implementation to avoid causing excessive fragmentation. I think they do a bit more than adding random gaps by switching between different 'pivots' but that isn't very high benefit. The main benefit is having random bits of unmapped space all over the heap when combined with their hardened allocator which heavily uses small mmap mappings and has a fair bit of malloc-level randomization (it's a bitmap / hash table based slab allocator using 4k regions with a page span cache and we use a port of it to Android with added hardening features but we're missing the fine-grained mmap rand it's meant to have underneath what it does itself). The default vm.max_map_count = 65530 is also a major problem for doing fine-grained mmap randomization of any kind and there's the 32-bit reference count overflow issue on high memory machines with max_map_count * pid_max which isn't resolved yet.
Re: [RFC PATCH] Randomization of address chosen by mmap.
The option to add at least one guard page would be useful whether or not it's tied to randomization. It's not feasible to do that in userspace for mmap as a whole, only specific users of mmap like malloc and it adds significant overhead vs. a kernel implementation. It could optionally let you choose a minimum and maximum guard region size with it picking random sizes if they're not equal. It's important for it to be an enforced gap rather than something that can be filled in by another allocation. It will obviously help a lot more when it's being used with a hardened allocator designed to take advantage of this rather than glibc malloc or jemalloc. I don't think it makes sense for the kernel to attempt mitigations to hide libraries. The best way to do that is in userspace, by having the linker reserve a large PROT_NONE region for mapping libraries (both at initialization and for dlopen) including a random gap to act as a separate ASLR base. If an attacker has library addresses, it's hard to see much point in hiding the other libraries from them. It does make sense to keep them from knowing the location of any executable code if they leak non-library addresses. An isolated library region + gap is a feature we implemented in CopperheadOS and it works well, although we haven't ported it to Android 7.x or 8.x. I don't think the kernel can bring much / anything to the table for it. It's inherently the responsibility of libc to randomize the lower bits for secondary stacks too. Fine-grained randomized mmap isn't going to be used if it causes unpredictable levels of fragmentation or has a high / unpredictable performance cost. I don't think it makes sense to approach it aggressively in a way that people can't use. The OpenBSD randomized mmap is a fairly conservative implementation to avoid causing excessive fragmentation. I think they do a bit more than adding random gaps by switching between different 'pivots' but that isn't very high benefit. The main benefit is having random bits of unmapped space all over the heap when combined with their hardened allocator which heavily uses small mmap mappings and has a fair bit of malloc-level randomization (it's a bitmap / hash table based slab allocator using 4k regions with a page span cache and we use a port of it to Android with added hardening features but we're missing the fine-grained mmap rand it's meant to have underneath what it does itself). The default vm.max_map_count = 65530 is also a major problem for doing fine-grained mmap randomization of any kind and there's the 32-bit reference count overflow issue on high memory machines with max_map_count * pid_max which isn't resolved yet.
Re: [RFC] Warn the user when they could overflow mapcount
I think there are likely legitimate programs mapping something a bunch of times. Falling back to a global object -> count mapping (an rbtree / radix trie or whatever) with a lock once it hits saturation wouldn't risk breaking something. It would permanently leave the inline count saturated and just use the address of the inline counter as the key for the map to find the 64-bit counter. Once it gets to 0 in the map, it can delete it from the map and do the standard freeing process, avoiding leaks. It would really just make it a 64-bit reference count heavily size optimized for the common case. It would work elsewhere too, not just this case.
Re: [RFC] Warn the user when they could overflow mapcount
I think there are likely legitimate programs mapping something a bunch of times. Falling back to a global object -> count mapping (an rbtree / radix trie or whatever) with a lock once it hits saturation wouldn't risk breaking something. It would permanently leave the inline count saturated and just use the address of the inline counter as the key for the map to find the 64-bit counter. Once it gets to 0 in the map, it can delete it from the map and do the standard freeing process, avoiding leaks. It would really just make it a 64-bit reference count heavily size optimized for the common case. It would work elsewhere too, not just this case.
Re: [RFC] Warn the user when they could overflow mapcount
I guess it could saturate and then switch to tracking the count via an object pointer -> count mapping with a global lock? Whatever the solution is should probably be a generic one since it's a recurring issue.
Re: [RFC] Warn the user when they could overflow mapcount
I guess it could saturate and then switch to tracking the count via an object pointer -> count mapping with a global lock? Whatever the solution is should probably be a generic one since it's a recurring issue.
Re: [RFC] Warn the user when they could overflow mapcount
I don't think the kernel can get away with the current approach. Object sizes and counts on 64-bit should be 64-bit unless there's a verifiable reason they can get away with 32-bit. Having it use leak memory isn't okay, just much less bad than vulnerabilities exploitable beyond just denial of service. Every 32-bit reference count should probably have a short comment explaining why it can't overflow on 64-bit... if that can't be written or it's too complicated to demonstrate, it probably needs to be 64-bit. It's one of many pervasive forms of integer overflows in the kernel... :(
Re: [RFC] Warn the user when they could overflow mapcount
I don't think the kernel can get away with the current approach. Object sizes and counts on 64-bit should be 64-bit unless there's a verifiable reason they can get away with 32-bit. Having it use leak memory isn't okay, just much less bad than vulnerabilities exploitable beyond just denial of service. Every 32-bit reference count should probably have a short comment explaining why it can't overflow on 64-bit... if that can't be written or it's too complicated to demonstrate, it probably needs to be 64-bit. It's one of many pervasive forms of integer overflows in the kernel... :(
Re: [RFC] Warn the user when they could overflow mapcount
>> That seems pretty bad. So here's a patch which adds documentation to the >> two sysctls that a sysadmin could use to shoot themselves in the foot, >> and adds a warning if they change either of them to a dangerous value. > > I have negative feelings about this patch, mostly because AFAICS: > > - It documents an issue instead of fixing it. > - It likely only addresses a small part of the actual problem. The standard map_max_count / pid_max are very low and there are many situations where either or both need to be raised. VM fragmentation in long-lived processes is a major issue. There are allocators like jemalloc designed to minimize VM fragmentation by never unmapping memory but they're relying on not having anything else using mmap regularly so they can have all their ranges merged together, unless they decide to do something like making a 1TB PROT_NONE mapping up front to slowly consume. If you Google this sysctl name, you'll find lots of people running into the limit. If you're using a debugging / hardened allocator designed to use a lot of guard pages, the standard map_max_count is close to unusable... I think the same thing applies to pid_max. There are too many reasonable reasons to increase it. Process-per-request is quite reasonable if you care about robustness / security and want to sandbox each request handler. Look at Chrome / Chromium: it's currently process-per-site-instance, but they're moving to having more processes with site isolation to isolate iframes into their own processes to work towards enforcing the boundaries between sites at a process level. It's way worse for fine-grained server-side sandboxing. Using a lot of processes like this does counter VM fragmentation especially if long-lived processes doing a lot of work are mostly avoided... but if your allocators like using guard pages you're still going to hit the limit. I do think the default value in the documentation should be fixed but if there's a clear problem with raising these it really needs to be fixed. Google either of the sysctl names and look at all the people running into issues and needing to raise them. It's only going to become more common to raise these with people trying to use lots of fine-grained sandboxing. Process-per-request is back in style.
Re: [RFC] Warn the user when they could overflow mapcount
>> That seems pretty bad. So here's a patch which adds documentation to the >> two sysctls that a sysadmin could use to shoot themselves in the foot, >> and adds a warning if they change either of them to a dangerous value. > > I have negative feelings about this patch, mostly because AFAICS: > > - It documents an issue instead of fixing it. > - It likely only addresses a small part of the actual problem. The standard map_max_count / pid_max are very low and there are many situations where either or both need to be raised. VM fragmentation in long-lived processes is a major issue. There are allocators like jemalloc designed to minimize VM fragmentation by never unmapping memory but they're relying on not having anything else using mmap regularly so they can have all their ranges merged together, unless they decide to do something like making a 1TB PROT_NONE mapping up front to slowly consume. If you Google this sysctl name, you'll find lots of people running into the limit. If you're using a debugging / hardened allocator designed to use a lot of guard pages, the standard map_max_count is close to unusable... I think the same thing applies to pid_max. There are too many reasonable reasons to increase it. Process-per-request is quite reasonable if you care about robustness / security and want to sandbox each request handler. Look at Chrome / Chromium: it's currently process-per-site-instance, but they're moving to having more processes with site isolation to isolate iframes into their own processes to work towards enforcing the boundaries between sites at a process level. It's way worse for fine-grained server-side sandboxing. Using a lot of processes like this does counter VM fragmentation especially if long-lived processes doing a lot of work are mostly avoided... but if your allocators like using guard pages you're still going to hit the limit. I do think the default value in the documentation should be fixed but if there's a clear problem with raising these it really needs to be fixed. Google either of the sysctl names and look at all the people running into issues and needing to raise them. It's only going to become more common to raise these with people trying to use lots of fine-grained sandboxing. Process-per-request is back in style.
Re: [kernel-hardening] Re: [PATCHv2 5/7] printk: allow kmsg to be encrypted using public key encryption
> Do you have any backing from makers of such devices? I'd like to hear > from Google's Android team or whoever that would turn this feature on. (I'm not a Google employee, but I work on Android security and contribute some of that to AOSP.) Android restricts access to dmesg with SELinux, so it's not really an issue there. They previously used dmesg_restrict but switched to SELinux for fine-grained control without using the capability. In general, the access control features / toggles proposed on kernel-hardening don't do much for Android because there's nothing unconfined and there's no system administrator reconfiguring the system and disabling SELinux to make it work. The toggles like dmesg_restrict tend to be too coarse without a good way to make exceptions. Unprivileged processes including apps can't access dmesg, debugfs, sysfs (other than a tiny set of exceptions), procfs outside of /proc/net and /proc/PID (but it uses hidepid=2) and a few whitelisted files, etc. That's mostly done with SELinux along with using it for ioctl command whitelisting to have per-device per-domain whitelists of commands instead of using their global seccomp-bpf policy for it. The hidepid=2 usage is an example of a feature where SELinux doesn't quite fit the task as an alternative since apps are all in the untrusted_app / isolated_app domains but each have a unique uid/gid. It works because hidepid has a gid option to set fine-grained exceptions, rather than only being able to bypass it with a capability. They do have some out-of-tree access control features and finding ways to upstream those might be useful: There's the added perf_event_paranoid=3 level which is set by default. It's toggled off when a developer using profiling tools via Android Debug Bridge (USB) after enabling ADB support in the OS and whitelisting their key. That was submitted upstream again for Android but it didn't end up landing. An LSM hook for perf events might help but Android uses a fully static SELinux policy and would need a way to toggle it off / on. They also have group-based control over sockets used to implement the INTERNET permission but technically they could double the number of app domains and use SELinux for that if they absolutely had to avoid an out-of-tree patch. They might not design it the same way today since it predates using SELinux.
Re: [kernel-hardening] Re: [PATCHv2 5/7] printk: allow kmsg to be encrypted using public key encryption
> Do you have any backing from makers of such devices? I'd like to hear > from Google's Android team or whoever that would turn this feature on. (I'm not a Google employee, but I work on Android security and contribute some of that to AOSP.) Android restricts access to dmesg with SELinux, so it's not really an issue there. They previously used dmesg_restrict but switched to SELinux for fine-grained control without using the capability. In general, the access control features / toggles proposed on kernel-hardening don't do much for Android because there's nothing unconfined and there's no system administrator reconfiguring the system and disabling SELinux to make it work. The toggles like dmesg_restrict tend to be too coarse without a good way to make exceptions. Unprivileged processes including apps can't access dmesg, debugfs, sysfs (other than a tiny set of exceptions), procfs outside of /proc/net and /proc/PID (but it uses hidepid=2) and a few whitelisted files, etc. That's mostly done with SELinux along with using it for ioctl command whitelisting to have per-device per-domain whitelists of commands instead of using their global seccomp-bpf policy for it. The hidepid=2 usage is an example of a feature where SELinux doesn't quite fit the task as an alternative since apps are all in the untrusted_app / isolated_app domains but each have a unique uid/gid. It works because hidepid has a gid option to set fine-grained exceptions, rather than only being able to bypass it with a capability. They do have some out-of-tree access control features and finding ways to upstream those might be useful: There's the added perf_event_paranoid=3 level which is set by default. It's toggled off when a developer using profiling tools via Android Debug Bridge (USB) after enabling ADB support in the OS and whitelisting their key. That was submitted upstream again for Android but it didn't end up landing. An LSM hook for perf events might help but Android uses a fully static SELinux policy and would need a way to toggle it off / on. They also have group-based control over sockets used to implement the INTERNET permission but technically they could double the number of app domains and use SELinux for that if they absolutely had to avoid an out-of-tree patch. They might not design it the same way today since it predates using SELinux.
Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
It was suggested that the feature would only be adopted in niches like Android and I pointed out that it's not really relevant to Android. It's a waste of time to try convincing me that it's useful elsewhere. I never said or implied that it wasn't. On Thu, 2017-11-30 at 09:50 +0100, Djalal Harouni wrote: > On Thu, Nov 30, 2017 at 7:51 AM, Daniel Micay <danielmi...@gmail.com> > wrote: > [...] > > Lots of potential module attack surface also gets eliminated by > > default > > via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl > > commands, etc. The global seccomp whitelist might be relevant in > > some > > cases too. > > In embedded systems we can't maintain a SELinux policy, distro man > power hardly manage. We have abstracted seccomp etc, but the kernel > inherited the difficult multiplex things, plus all other paths that > trigger this. It's cheaper to use an existing system like Android Things where device makers only need to make their apps and perhaps some userspace hardware drivers for cases not covered by mainline kernel drivers. I don't think it makes sense for every device vendor to manage an OS and I seriously doubt that's how the ecosystem is going to end up as it matures. > > Android devices like to build everything into the kernel too, so > > even if > > they weren't using a module this feature wouldn't usually help them. > > It > > would need to work like this existing sysctl: > > > > net.ipv4.tcp_available_congestion_control = cubic reno lp > > > > i.e. whitelists for functionality offered by the modules, not just > > whether they can be loaded. > > Yes, but it is hard to maintain a whitelist policy, the code is hardly > maintained... if you include everything you should have an LSM policy > or something like that, and compiling kernels is expert thing. I'm not talking about whitelist vs. blacklist, compiling kernels or anything like that. > Otherwise IMHO the kernel should provide default secure behaviour on > how to load or add new functionality to the running one. From a user > perspective, a switch "yes/no" that a privileged entity will > *understand* and assume is what should be there, and the switch or > flag as discussed here is local to processes, the sysctl will be > removed. IMO it should come from userspace point of view, cause as an > example the sysctl: > > net.ipv4.tcp_available_congestion_control = cubic reno lp > > Is kernel thing, too technical, userspace developers, admins or > privileged entity will not understand what cubic or reno mean. Congestion control algorithms are being used as an example in terms of the mechanism being used to control which are available to unprivileged users. The obscurity of congestion control algorithms is irrelevant. > Doing > the same per functionality directly like this seems to much of a > burden compared to the use case. The kernel maybe can do this to > advance the art of the networking stack and for advanced cases, but in > IMHO a sane default behaviour + an abstracted process/sandbox flag > "yes/no" for most others, userspace developers and humans is what > should be provided and we need the kernel to help here. There are cases where unprivileged module auto-loading is relied upon like network protocols. Having configuration for which protocols can be used by unprivileged users is superior to limiting only which ones can be auto-loaded. That's why I bought up the existing congestion control knob. It works well in terms of having a whitelist of the sane, widely used cases with exposing anything obscure requiring configuration. They happen to be implemented as modules too. Killing off unprivileged module loading other than a few cases like that makes sense, and then those can provide similar control with similarly sane defaults.
Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
It was suggested that the feature would only be adopted in niches like Android and I pointed out that it's not really relevant to Android. It's a waste of time to try convincing me that it's useful elsewhere. I never said or implied that it wasn't. On Thu, 2017-11-30 at 09:50 +0100, Djalal Harouni wrote: > On Thu, Nov 30, 2017 at 7:51 AM, Daniel Micay > wrote: > [...] > > Lots of potential module attack surface also gets eliminated by > > default > > via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl > > commands, etc. The global seccomp whitelist might be relevant in > > some > > cases too. > > In embedded systems we can't maintain a SELinux policy, distro man > power hardly manage. We have abstracted seccomp etc, but the kernel > inherited the difficult multiplex things, plus all other paths that > trigger this. It's cheaper to use an existing system like Android Things where device makers only need to make their apps and perhaps some userspace hardware drivers for cases not covered by mainline kernel drivers. I don't think it makes sense for every device vendor to manage an OS and I seriously doubt that's how the ecosystem is going to end up as it matures. > > Android devices like to build everything into the kernel too, so > > even if > > they weren't using a module this feature wouldn't usually help them. > > It > > would need to work like this existing sysctl: > > > > net.ipv4.tcp_available_congestion_control = cubic reno lp > > > > i.e. whitelists for functionality offered by the modules, not just > > whether they can be loaded. > > Yes, but it is hard to maintain a whitelist policy, the code is hardly > maintained... if you include everything you should have an LSM policy > or something like that, and compiling kernels is expert thing. I'm not talking about whitelist vs. blacklist, compiling kernels or anything like that. > Otherwise IMHO the kernel should provide default secure behaviour on > how to load or add new functionality to the running one. From a user > perspective, a switch "yes/no" that a privileged entity will > *understand* and assume is what should be there, and the switch or > flag as discussed here is local to processes, the sysctl will be > removed. IMO it should come from userspace point of view, cause as an > example the sysctl: > > net.ipv4.tcp_available_congestion_control = cubic reno lp > > Is kernel thing, too technical, userspace developers, admins or > privileged entity will not understand what cubic or reno mean. Congestion control algorithms are being used as an example in terms of the mechanism being used to control which are available to unprivileged users. The obscurity of congestion control algorithms is irrelevant. > Doing > the same per functionality directly like this seems to much of a > burden compared to the use case. The kernel maybe can do this to > advance the art of the networking stack and for advanced cases, but in > IMHO a sane default behaviour + an abstracted process/sandbox flag > "yes/no" for most others, userspace developers and humans is what > should be provided and we need the kernel to help here. There are cases where unprivileged module auto-loading is relied upon like network protocols. Having configuration for which protocols can be used by unprivileged users is superior to limiting only which ones can be auto-loaded. That's why I bought up the existing congestion control knob. It works well in terms of having a whitelist of the sane, widely used cases with exposing anything obscure requiring configuration. They happen to be implemented as modules too. Killing off unprivileged module loading other than a few cases like that makes sense, and then those can provide similar control with similarly sane defaults.
Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
> And once you disable it by default, and it becomes purely opt-in, that > means that nothing will change for most cases. Some embedded people > that do their own thing (ie Android) might change, but normal > distributions probably won't. > > Yes, Android may be 99% of the users, and yes, the embedded world in > general needs to be secure, but I'd still like this to be something > that helps _everybody_. Android devices won't get much benefit since they ship a tiny set of modules chosen for the device. The kernels already get very stripped down to the bare minimum vs. enabling every feature and driver available and shipping it all by default on a traditional distribution. Lots of potential module attack surface also gets eliminated by default via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl commands, etc. The global seccomp whitelist might be relevant in some cases too. Android devices like to build everything into the kernel too, so even if they weren't using a module this feature wouldn't usually help them. It would need to work like this existing sysctl: net.ipv4.tcp_available_congestion_control = cubic reno lp i.e. whitelists for functionality offered by the modules, not just whether they can be loaded.
Re: [kernel-hardening] Re: [PATCH v5 next 5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules
> And once you disable it by default, and it becomes purely opt-in, that > means that nothing will change for most cases. Some embedded people > that do their own thing (ie Android) might change, but normal > distributions probably won't. > > Yes, Android may be 99% of the users, and yes, the embedded world in > general needs to be secure, but I'd still like this to be something > that helps _everybody_. Android devices won't get much benefit since they ship a tiny set of modules chosen for the device. The kernels already get very stripped down to the bare minimum vs. enabling every feature and driver available and shipping it all by default on a traditional distribution. Lots of potential module attack surface also gets eliminated by default via their SELinux whitelists for /dev, /sys, /proc, debugfs, ioctl commands, etc. The global seccomp whitelist might be relevant in some cases too. Android devices like to build everything into the kernel too, so even if they weren't using a module this feature wouldn't usually help them. It would need to work like this existing sysctl: net.ipv4.tcp_available_congestion_control = cubic reno lp i.e. whitelists for functionality offered by the modules, not just whether they can be loaded.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote: > Quoting Daniel Micay (danielmi...@gmail.com): > > Substantial added attack surface will never go away as a problem. > > There > > aren't a finite number of vulnerabilities to be found. > > There's varying levels of usefulness and quality. There is code which > I > want to be able to use in a container, and code which I can't ever see > a > reason for using there. The latter, especially if it's also in a > staging driver, would be nice to have a toggle to disable. > > You're not advocating dropping the added attack surface, only adding a > way of dealing with an 0day after the fact. Privilege raising 0days > can > exist anywhere, not just in code which only root in a user namespace > can > exercise. So from that point of view, ksplice seems a more complete > solution. Why not just actually fix the bad code block when we know > about it? That's not what I'm advocating. I only care about it for proactive attack surface reduction downstream. I have no interest in using it to block access to known vulnerabilities. > Finally, it has been well argued that you can gain many new caps from > having only a few others. Given that, how could you ever be sure > that, > if an 0day is found which allows root in a user ns to abuse > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > would suffice? I didn't suggest using it that way... > It seems to me that the existing control in > /proc/sys/kernel/unprivileged_userns_clone might be the better duct > tape > in that case. There's no such thing as unprivileged_userns_clone in mainline. The advantage of this over unprivileged_userns_clone in Debian and maybe some other distributions is not giving up unprivileged app containers / sandboxes implemented via user namespaces. For example, Chromium's user namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium will be dropping their setuid sandbox, forcing usage of user namespaces to avoid losing the sandbox which will greatly increase local kernel attack surface on the host by exposing netfilter management, etc. to unprivileged users. The proposed approach isn't necessarily the best way to implement this kind of mitigation but I think it's filling a real need.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote: > Quoting Daniel Micay (danielmi...@gmail.com): > > Substantial added attack surface will never go away as a problem. > > There > > aren't a finite number of vulnerabilities to be found. > > There's varying levels of usefulness and quality. There is code which > I > want to be able to use in a container, and code which I can't ever see > a > reason for using there. The latter, especially if it's also in a > staging driver, would be nice to have a toggle to disable. > > You're not advocating dropping the added attack surface, only adding a > way of dealing with an 0day after the fact. Privilege raising 0days > can > exist anywhere, not just in code which only root in a user namespace > can > exercise. So from that point of view, ksplice seems a more complete > solution. Why not just actually fix the bad code block when we know > about it? That's not what I'm advocating. I only care about it for proactive attack surface reduction downstream. I have no interest in using it to block access to known vulnerabilities. > Finally, it has been well argued that you can gain many new caps from > having only a few others. Given that, how could you ever be sure > that, > if an 0day is found which allows root in a user ns to abuse > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > would suffice? I didn't suggest using it that way... > It seems to me that the existing control in > /proc/sys/kernel/unprivileged_userns_clone might be the better duct > tape > in that case. There's no such thing as unprivileged_userns_clone in mainline. The advantage of this over unprivileged_userns_clone in Debian and maybe some other distributions is not giving up unprivileged app containers / sandboxes implemented via user namespaces. For example, Chromium's user namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium will be dropping their setuid sandbox, forcing usage of user namespaces to avoid losing the sandbox which will greatly increase local kernel attack surface on the host by exposing netfilter management, etc. to unprivileged users. The proposed approach isn't necessarily the best way to implement this kind of mitigation but I think it's filling a real need.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Substantial added attack surface will never go away as a problem. There aren't a finite number of vulnerabilities to be found.
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Substantial added attack surface will never go away as a problem. There aren't a finite number of vulnerabilities to be found.
Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary
> Brad trolls us all lightly with this trivia question: > > https://twitter.com/grsecurity/status/905246423591084033 I'll respond to your proposed scenario rather than guessing at what is being suggested there and if it's actually the same thing as what you've brought up. They've stated many times that stack canaries are near useless and enabling PaX UDEREF on i386 actually disabled stack canaries and presumably still does after the last public patches: + select HAVE_CC_STACKPROTECTOR if X86_64 || !PAX_MEMORY_UDEREF Making the kernel more vulnerable to remote stack overflow exploits like this month's overhyped Bluetooth bug to improve security against local exploits seems like much more of a compromise. Not that i386 has any real relevance anymore, but I think that's some interesting context... It's not like upstream has a monopoly on needing to make hard choices for security features, I don't see this as one of those hard choices like whether sacrificing availability for SIZE_OVERFLOW makes sense. > "For #trivia can you describe one scenario where this change actually > helps exploitation using similar C string funcs?" > > I suppose the expected answer is: > > The change helps exploitation when the overwriting string ends just > before the canary. Its NUL overwriting the NUL byte in the canary > would > go undetected. Before this change, there would be a 255/256 chance of > detection. > > I hope this was considered. The change might still be a good > tradeoff, > or it might not, depending on which scenarios are more likely (leak of > canary value or the string required in an exploit ending at that exact > byte location), and we probably lack such statistics. It was considered that guaranteeing some forms of string overflows are contained within the frame can have a negative impact but I don't think it's significant. There would need to be something worth overwriting between the source of the overflow and the canary. Since SSP reorders local variables within the stack frame based on risk of overflow, the targets would be locals that Clang/GCC considers to have a higher risk of overflow. An array can only have other arrays between it and the canary and there's ordering by size (small vs. large at least) within the array section. If there was a zero byte in any of that other data between the source of the overflow and the canary, this wouldn't make a difference anyway. Since the canary is only checked on return, it's also only relevant if control flow still makes it to the function epilogue and it still matters after the attacker has accomplished their goal. The chance of there being a zero as a leading byte is already 1/256, but there could also be a NUL elsewhere in the canary compromising a subset of the entropy for this kind of scenario too so it's a tiny bit higher than a 1/256 chance already. Not much resistance to brute force if it really does ever end up mattering, unless canaries with NUL in leading positions were actually *rejected*... There's a more substantial compromise between zero and non-zero poison on free but for production use. This was part of a set of changes where CopperheadOS switched from non-zero fill -> zero fill on free for the kernel and userspace and added a leading zero byte on 64-bit for heap and stack canaries in the production builds while keeping around the old way of doing things for some forms of debug builds. Non-zero fill on free ended up causing too much harm by turning benign bugs into bugs that might have been exploitable even beyond DoS in some cases tied to non-NUL-terminated strings where non-zero poisoning got rid of zeroes that were otherwise all over the heap containing them or when it broke code wrongly depending on uninitialized data being zero in practice. I think there's more of an argument to be had about poison-on-free as a mitigation. This leading zero byte? Not so much, beyond perhaps wanting the option to turn it off for bug finding... but there are better tools for that now. > I am not proposing to revert the change. I had actually contemplated > speaking up when this was discussed, but did not for lack of a better > suggestion. We could put/require a NUL in the middle of the canary, > but > with the full canary being only 64-bit at most that would also make > some > attacks easier. > > So this is JFYI. No action needed on it, I think. It might make sense to put it in the middle, but a scenario where it matters seems unlikely and as you mention it would make it weaker it some other cases. I think it's already the best choice right now. I'd choose XOR canaries over a leading zero byte if it could only be one or the other, but I'm not really convinced that there's any significant loss compared to the normal canaries.
Re: [kernel-hardening] [PATCH v2 0/5] stackprotector: ascii armor the stack canary
> Brad trolls us all lightly with this trivia question: > > https://twitter.com/grsecurity/status/905246423591084033 I'll respond to your proposed scenario rather than guessing at what is being suggested there and if it's actually the same thing as what you've brought up. They've stated many times that stack canaries are near useless and enabling PaX UDEREF on i386 actually disabled stack canaries and presumably still does after the last public patches: + select HAVE_CC_STACKPROTECTOR if X86_64 || !PAX_MEMORY_UDEREF Making the kernel more vulnerable to remote stack overflow exploits like this month's overhyped Bluetooth bug to improve security against local exploits seems like much more of a compromise. Not that i386 has any real relevance anymore, but I think that's some interesting context... It's not like upstream has a monopoly on needing to make hard choices for security features, I don't see this as one of those hard choices like whether sacrificing availability for SIZE_OVERFLOW makes sense. > "For #trivia can you describe one scenario where this change actually > helps exploitation using similar C string funcs?" > > I suppose the expected answer is: > > The change helps exploitation when the overwriting string ends just > before the canary. Its NUL overwriting the NUL byte in the canary > would > go undetected. Before this change, there would be a 255/256 chance of > detection. > > I hope this was considered. The change might still be a good > tradeoff, > or it might not, depending on which scenarios are more likely (leak of > canary value or the string required in an exploit ending at that exact > byte location), and we probably lack such statistics. It was considered that guaranteeing some forms of string overflows are contained within the frame can have a negative impact but I don't think it's significant. There would need to be something worth overwriting between the source of the overflow and the canary. Since SSP reorders local variables within the stack frame based on risk of overflow, the targets would be locals that Clang/GCC considers to have a higher risk of overflow. An array can only have other arrays between it and the canary and there's ordering by size (small vs. large at least) within the array section. If there was a zero byte in any of that other data between the source of the overflow and the canary, this wouldn't make a difference anyway. Since the canary is only checked on return, it's also only relevant if control flow still makes it to the function epilogue and it still matters after the attacker has accomplished their goal. The chance of there being a zero as a leading byte is already 1/256, but there could also be a NUL elsewhere in the canary compromising a subset of the entropy for this kind of scenario too so it's a tiny bit higher than a 1/256 chance already. Not much resistance to brute force if it really does ever end up mattering, unless canaries with NUL in leading positions were actually *rejected*... There's a more substantial compromise between zero and non-zero poison on free but for production use. This was part of a set of changes where CopperheadOS switched from non-zero fill -> zero fill on free for the kernel and userspace and added a leading zero byte on 64-bit for heap and stack canaries in the production builds while keeping around the old way of doing things for some forms of debug builds. Non-zero fill on free ended up causing too much harm by turning benign bugs into bugs that might have been exploitable even beyond DoS in some cases tied to non-NUL-terminated strings where non-zero poisoning got rid of zeroes that were otherwise all over the heap containing them or when it broke code wrongly depending on uninitialized data being zero in practice. I think there's more of an argument to be had about poison-on-free as a mitigation. This leading zero byte? Not so much, beyond perhaps wanting the option to turn it off for bug finding... but there are better tools for that now. > I am not proposing to revert the change. I had actually contemplated > speaking up when this was discussed, but did not for lack of a better > suggestion. We could put/require a NUL in the middle of the canary, > but > with the full canary being only 64-bit at most that would also make > some > attacks easier. > > So this is JFYI. No action needed on it, I think. It might make sense to put it in the middle, but a scenario where it matters seems unlikely and as you mention it would make it weaker it some other cases. I think it's already the best choice right now. I'd choose XOR canaries over a leading zero byte if it could only be one or the other, but I'm not really convinced that there's any significant loss compared to the normal canaries.
Re: nios2 crash due to 'init/main.c: extract early boot entropy from the passed cmdline'
On Mon, 2017-09-11 at 10:35 -0700, Guenter Roeck wrote: > On Mon, Sep 11, 2017 at 09:36:00AM -0700, Kees Cook wrote: > > On Sat, Sep 9, 2017 at 8:58 PM, Guenter Roeck> > wrote: > > > Hi, > > > > > > I noticed that nios2 images crash in mainline. Bisect points to > > > commit > > > 33d72f3822d7 ("init/main.c: extract early boot entropy from the > > > passed > > > cmdline"). Bisect log is attached. > > > > > > As far as I can see, the problem is seen because > > > add_device_randomness() > > > calls random_get_entropy(). However, the underlying timer function > > > used by the nios2 architecture (nios2_timer_read) is not yet > > > initialized, > > > causing a NULL pointer access and crash. A sample crash log is at > > > http://kerneltests.org/builders/qemu-nios2-master/builds/1 > > > 75/steps/qemubuildcommand/logs/stdio > > > > Oh, yikes. Do you have a full call trace? (Does this come through > > get_cycles() or via the It seems like we could either initialize the > > timer earlier or allow it to fall back when not initialized... > > > > nios2 doesn't give me a traceback. I followed it by adding debug > messages. > The code path is through get_cycles(). > > On nios2: > > static u64 nios2_timer_read(struct clocksource *cs) > { > struct nios2_clocksource *nios2_cs = to_nios2_clksource(cs); > unsigned long flags; > u32 count; > > local_irq_save(flags); > count = read_timersnapshot(_cs->timer); // <- not > initialized > local_irq_restore(flags); > > /* Counter is counting down */ > return ~count; > } > > cycles_t get_cycles(void) > { > return nios2_timer_read(_cs.cs); > } > EXPORT_SYMBOL(get_cycles); > > Guenter Maybe it should WARN and return 0 for now if that's NULL?
Re: nios2 crash due to 'init/main.c: extract early boot entropy from the passed cmdline'
On Mon, 2017-09-11 at 10:35 -0700, Guenter Roeck wrote: > On Mon, Sep 11, 2017 at 09:36:00AM -0700, Kees Cook wrote: > > On Sat, Sep 9, 2017 at 8:58 PM, Guenter Roeck > > wrote: > > > Hi, > > > > > > I noticed that nios2 images crash in mainline. Bisect points to > > > commit > > > 33d72f3822d7 ("init/main.c: extract early boot entropy from the > > > passed > > > cmdline"). Bisect log is attached. > > > > > > As far as I can see, the problem is seen because > > > add_device_randomness() > > > calls random_get_entropy(). However, the underlying timer function > > > used by the nios2 architecture (nios2_timer_read) is not yet > > > initialized, > > > causing a NULL pointer access and crash. A sample crash log is at > > > http://kerneltests.org/builders/qemu-nios2-master/builds/1 > > > 75/steps/qemubuildcommand/logs/stdio > > > > Oh, yikes. Do you have a full call trace? (Does this come through > > get_cycles() or via the It seems like we could either initialize the > > timer earlier or allow it to fall back when not initialized... > > > > nios2 doesn't give me a traceback. I followed it by adding debug > messages. > The code path is through get_cycles(). > > On nios2: > > static u64 nios2_timer_read(struct clocksource *cs) > { > struct nios2_clocksource *nios2_cs = to_nios2_clksource(cs); > unsigned long flags; > u32 count; > > local_irq_save(flags); > count = read_timersnapshot(_cs->timer); // <- not > initialized > local_irq_restore(flags); > > /* Counter is counting down */ > return ~count; > } > > cycles_t get_cycles(void) > { > return nios2_timer_read(_cs.cs); > } > EXPORT_SYMBOL(get_cycles); > > Guenter Maybe it should WARN and return 0 for now if that's NULL?
Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline
> I did say 'external attacker' but it could be made clearer. Er, s/say/mean to imply/ I do think it will have some local value after Android 8 which should start shipping in a few days though. I'll look into having the kernel stash some entropy in pstore soon since that seems like it could be a great improvement. I'm not sure how often / where it should hook into for regularly refreshing it though. Doing it only on powering down isn't ideal.
Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline
> I did say 'external attacker' but it could be made clearer. Er, s/say/mean to imply/ I do think it will have some local value after Android 8 which should start shipping in a few days though. I'll look into having the kernel stash some entropy in pstore soon since that seems like it could be a great improvement. I'm not sure how often / where it should hook into for regularly refreshing it though. Doing it only on powering down isn't ideal.
Re: [kernel-hardening] [PATCHv2 2/2] extract early boot entropy from the passed cmdline
On Wed, 2017-08-16 at 21:58 -0700, Kees Cook wrote: > On Wed, Aug 16, 2017 at 9:56 PM, Nick Kralevich <n...@google.com> > wrote: > > On Wed, Aug 16, 2017 at 3:46 PM, Laura Abbott <labb...@redhat.com> > > wrote: > > > From: Daniel Micay <danielmi...@gmail.com> > > > > > > Existing Android bootloaders usually pass data useful as early > > > entropy > > > on the kernel command-line. It may also be the case on other > > > embedded > > > systems. Sample command-line from a Google Pixel running > > > CopperheadOS: > > > > > > > Why is it better to put this into the kernel, rather than just rely > > on > > the existing userspace functionality which does exactly the same > > thing? This is what Android already does today: > > https://android-review.googlesource.com/198113 > > That's too late for setting up the kernel stack canary, among other > things. The kernel will also be generating some early secrets for slab > cache canaries, etc. That all needs to happen well before init is > started. > > -Kees > It's also unfortunately the kernel's global stack canary for the entire boot since unlike x86 there aren't per-task canaries. GCC / Clang access it via a segment register on x86 vs. a global on other architectures. In theory it could be task-local elsewhere but doing it efficiently would imply reserving a register to store the random value. I think that may actually end up helping performance more than it hurts by not needing to read the global stack canary value from cache repeatedly. If stack canaries were augmented into something more (XOR in the retaddr and offer the option of more coverage than STRONG) it would be more important.
Re: [kernel-hardening] [PATCHv2 2/2] extract early boot entropy from the passed cmdline
On Wed, 2017-08-16 at 21:58 -0700, Kees Cook wrote: > On Wed, Aug 16, 2017 at 9:56 PM, Nick Kralevich > wrote: > > On Wed, Aug 16, 2017 at 3:46 PM, Laura Abbott > > wrote: > > > From: Daniel Micay > > > > > > Existing Android bootloaders usually pass data useful as early > > > entropy > > > on the kernel command-line. It may also be the case on other > > > embedded > > > systems. Sample command-line from a Google Pixel running > > > CopperheadOS: > > > > > > > Why is it better to put this into the kernel, rather than just rely > > on > > the existing userspace functionality which does exactly the same > > thing? This is what Android already does today: > > https://android-review.googlesource.com/198113 > > That's too late for setting up the kernel stack canary, among other > things. The kernel will also be generating some early secrets for slab > cache canaries, etc. That all needs to happen well before init is > started. > > -Kees > It's also unfortunately the kernel's global stack canary for the entire boot since unlike x86 there aren't per-task canaries. GCC / Clang access it via a segment register on x86 vs. a global on other architectures. In theory it could be task-local elsewhere but doing it efficiently would imply reserving a register to store the random value. I think that may actually end up helping performance more than it hurts by not needing to read the global stack canary value from cache repeatedly. If stack canaries were augmented into something more (XOR in the retaddr and offer the option of more coverage than STRONG) it would be more important.
Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline
On Wed, 2017-08-16 at 23:31 -0400, Theodore Ts'o wrote: > On Wed, Aug 16, 2017 at 04:14:58PM -0700, Laura Abbott wrote: > > From: Daniel Micay <danielmi...@gmail.com> > > > > Existing Android bootloaders usually pass data useful as early > > entropy > > on the kernel command-line. It may also be the case on other > > embedded > > systems. > > May I suggest a slight adjustment to the beginning commit description? > >Feed the boot command-line as to the /dev/random entropy pool > >Existing Android bootloaders usually pass data which may not be >known by an external attacker on the kernel command-line. It may >also be the case on other embedded systems. Sample command-line >from a Google Pixel running CopperheadOS > > The idea here is to if anything, err on the side of under-promising > the amount of security we can guarantee that this technique will > provide. For example, how hard is it really for an attacker who has > an APK installed locally to get the device serial number? Or the OS > version? And how much variability is there in the bootloader stages > in milliseconds? The serial number is currently accessible to local apps up until Android 7.x so it doesn't have value if the adversary has local access. Access to it without the READ_PHONE_STATE permission is being removed for apps targeting Android 8.0 and will presumably be restructed for all apps at some point in the future: https://android-developers.googleblog.com/2017/04/changes-to-device-identifiers-in.html Some bootloader stages vary a bit in time each boot. There's not much variance or measurement precision so there's only a small amount of entropy from this. The ones that consistently vary in timing do so independently from each other so that helps a bit. Also worth noting that before Android 8.0+, local apps can access the boot times since it's written to a system property. After Android 8.0+, all that stuff is inaccessible to them (no permission to get them) since there's a whitelisting model for system property access. > I think we should definitely do this. So this is more of a request to > be very careful what we promise in the commit description, not an > objection to the change itself. I did say 'external attacker' but it could be made clearer. It's primarily aimed at getting a tiny bit of extra entropy for the kernel stack canary and other probabilistic exploit mitigations set up in early boot. On non-x86 archs, i.e. 99.9% of Android devices, the kernel stack canary remains the same after it's set up in that early boot code. Android devices almost all have a hardware RNG and Android init blocks until a fair bit of data is read from it along with restoring entropy that's regularly saved while running, but unfortunately that's not available at this point in the boot process. The kernel could save / restore entropy using pstore (which at least Nexus / Pixel devices have - not sure about others). I don't know how early that could feasibly be done. Ideally it would do that combined with early usage of the hwrng.
Re: [PATCHv3 2/2] extract early boot entropy from the passed cmdline
On Wed, 2017-08-16 at 23:31 -0400, Theodore Ts'o wrote: > On Wed, Aug 16, 2017 at 04:14:58PM -0700, Laura Abbott wrote: > > From: Daniel Micay > > > > Existing Android bootloaders usually pass data useful as early > > entropy > > on the kernel command-line. It may also be the case on other > > embedded > > systems. > > May I suggest a slight adjustment to the beginning commit description? > >Feed the boot command-line as to the /dev/random entropy pool > >Existing Android bootloaders usually pass data which may not be >known by an external attacker on the kernel command-line. It may >also be the case on other embedded systems. Sample command-line >from a Google Pixel running CopperheadOS > > The idea here is to if anything, err on the side of under-promising > the amount of security we can guarantee that this technique will > provide. For example, how hard is it really for an attacker who has > an APK installed locally to get the device serial number? Or the OS > version? And how much variability is there in the bootloader stages > in milliseconds? The serial number is currently accessible to local apps up until Android 7.x so it doesn't have value if the adversary has local access. Access to it without the READ_PHONE_STATE permission is being removed for apps targeting Android 8.0 and will presumably be restructed for all apps at some point in the future: https://android-developers.googleblog.com/2017/04/changes-to-device-identifiers-in.html Some bootloader stages vary a bit in time each boot. There's not much variance or measurement precision so there's only a small amount of entropy from this. The ones that consistently vary in timing do so independently from each other so that helps a bit. Also worth noting that before Android 8.0+, local apps can access the boot times since it's written to a system property. After Android 8.0+, all that stuff is inaccessible to them (no permission to get them) since there's a whitelisting model for system property access. > I think we should definitely do this. So this is more of a request to > be very careful what we promise in the commit description, not an > objection to the change itself. I did say 'external attacker' but it could be made clearer. It's primarily aimed at getting a tiny bit of extra entropy for the kernel stack canary and other probabilistic exploit mitigations set up in early boot. On non-x86 archs, i.e. 99.9% of Android devices, the kernel stack canary remains the same after it's set up in that early boot code. Android devices almost all have a hardware RNG and Android init blocks until a fair bit of data is read from it along with restoring entropy that's regularly saved while running, but unfortunately that's not available at this point in the boot process. The kernel could save / restore entropy using pstore (which at least Nexus / Pixel devices have - not sure about others). I don't know how early that could feasibly be done. Ideally it would do that combined with early usage of the hwrng.
Re: drivers/tty/serial/8250/8250_fintek.c:364: warning: 'probe_data' is used uninitialized in this function
On Wed, 2017-08-09 at 17:32 +0200, Arnd Bergmann wrote: > On Wed, Aug 9, 2017 at 5:07 PM, kbuild test robot >wrote: > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin > > ux.git master > > head: bfa738cf3dfae2111626650f86135f93c5ff0a22 > > commit: 6974f0c4555e285ab217cee58b6e874f776ff409 > > include/linux/string.h: add the option of fortified string.h > > functions > > date: 4 weeks ago > > config: x86_64-randconfig-v0-08092220 (attached as .config) > > compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 > > reproduce: > > git checkout 6974f0c4555e285ab217cee58b6e874f776ff409 > > # save the attached .config to linux build tree > > make ARCH=x86_64 > > > > All warnings (new ones prefixed by >>): > > > >In file included from include/linux/bitmap.h:8, > > from include/linux/cpumask.h:11, > > from arch/x86/include/asm/cpumask.h:4, > > from arch/x86/include/asm/msr.h:10, > > from arch/x86/include/asm/processor.h:20, > > from arch/x86/include/asm/cpufeature.h:4, > > from arch/x86/include/asm/thread_info.h:52, > > from include/linux/thread_info.h:37, > > from arch/x86/include/asm/preempt.h:6, > > from include/linux/preempt.h:80, > > from include/linux/spinlock.h:50, > > from include/linux/seqlock.h:35, > > from include/linux/time.h:5, > > from include/linux/stat.h:18, > > from include/linux/module.h:10, > > from drivers/tty/serial/8250/8250_fintek.c:11: > >include/linux/string.h: In function 'strcpy': > >include/linux/string.h:209: warning: '__f' is static but > > declared in inline function 'strcpy' which is not static > >include/linux/string.h:211: warning: '__f' is static but > > declared in inline function 'strcpy' which is not static > > > This clearly comes from __trace_if() when CONFIG_PROFILE_ALL_BRANCHES > is enabled. I did not see the warning with gcc-7.1.1, and I guess this > only > happens on older compilers like the gcc-4.4 that was used here. > > What is the reason for __FORTIFY_INLINE to be "extern __always_inline" > rather than "static __always_inline"? If they cannot just be 'static', > maybe > this can be changed to depend on the compiler version. > >Arnd It's important to get the semantics of using extern. It means if you do something like , it resolves to the address of the direct symbol instead of generating a useless thunk for that object file. It might also be required for Clang compatibility, I don't remember. It could have a compiler version dependency or maybe one specifically tied to old compiler && CONFIG_PROFILE_ALL_BRANCHES / other options that conflict with it like that.
Re: drivers/tty/serial/8250/8250_fintek.c:364: warning: 'probe_data' is used uninitialized in this function
On Wed, 2017-08-09 at 17:32 +0200, Arnd Bergmann wrote: > On Wed, Aug 9, 2017 at 5:07 PM, kbuild test robot > wrote: > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin > > ux.git master > > head: bfa738cf3dfae2111626650f86135f93c5ff0a22 > > commit: 6974f0c4555e285ab217cee58b6e874f776ff409 > > include/linux/string.h: add the option of fortified string.h > > functions > > date: 4 weeks ago > > config: x86_64-randconfig-v0-08092220 (attached as .config) > > compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 > > reproduce: > > git checkout 6974f0c4555e285ab217cee58b6e874f776ff409 > > # save the attached .config to linux build tree > > make ARCH=x86_64 > > > > All warnings (new ones prefixed by >>): > > > >In file included from include/linux/bitmap.h:8, > > from include/linux/cpumask.h:11, > > from arch/x86/include/asm/cpumask.h:4, > > from arch/x86/include/asm/msr.h:10, > > from arch/x86/include/asm/processor.h:20, > > from arch/x86/include/asm/cpufeature.h:4, > > from arch/x86/include/asm/thread_info.h:52, > > from include/linux/thread_info.h:37, > > from arch/x86/include/asm/preempt.h:6, > > from include/linux/preempt.h:80, > > from include/linux/spinlock.h:50, > > from include/linux/seqlock.h:35, > > from include/linux/time.h:5, > > from include/linux/stat.h:18, > > from include/linux/module.h:10, > > from drivers/tty/serial/8250/8250_fintek.c:11: > >include/linux/string.h: In function 'strcpy': > >include/linux/string.h:209: warning: '__f' is static but > > declared in inline function 'strcpy' which is not static > >include/linux/string.h:211: warning: '__f' is static but > > declared in inline function 'strcpy' which is not static > > > This clearly comes from __trace_if() when CONFIG_PROFILE_ALL_BRANCHES > is enabled. I did not see the warning with gcc-7.1.1, and I guess this > only > happens on older compilers like the gcc-4.4 that was used here. > > What is the reason for __FORTIFY_INLINE to be "extern __always_inline" > rather than "static __always_inline"? If they cannot just be 'static', > maybe > this can be changed to depend on the compiler version. > >Arnd It's important to get the semantics of using extern. It means if you do something like , it resolves to the address of the direct symbol instead of generating a useless thunk for that object file. It might also be required for Clang compatibility, I don't remember. It could have a compiler version dependency or maybe one specifically tied to old compiler && CONFIG_PROFILE_ALL_BRANCHES / other options that conflict with it like that.
Re: [PATCH] infiniband: avoid overflow warning
On Mon, 2017-07-31 at 14:18 -0700, Kees Cook wrote: > On Mon, Jul 31, 2017 at 2:10 PM, Arnd Bergmannwrote: > > On Mon, Jul 31, 2017 at 10:58 PM, Kees Cook > > wrote: > > > On Mon, Jul 31, 2017 at 12:30 AM, Arnd Bergmann > > > wrote: > > > > On Mon, Jul 31, 2017 at 9:08 AM, Moni Shoua > > > > wrote: > > > > > > break; > > > > > > default: > > > > > > return -EINVAL; > > > > > > > > > > what happens if you replace 16 with sizeof(struct in6_addr)? > > > > > > > > Same thing: the problem is that gcc already knows the size of > > > > the structure we > > > > pass in here, and it is in fact shorter. > > > > > > So gcc is ignoring both the cast (to 16 byte struct in6_addr) and > > > the > > > caller's actual 128 byte struct sockaddr_storage, and looking only > > > at > > > struct sockaddr? That seems really weird. > > > > Using a sockaddr_storage on the stack would address the warning, but > > the question was about just changing the hardcoded 16 to a sizeof() > > operation, and that has no effect. > > Right, I didn't mean that; I was curious why the fortify macro > resulted in an error at all. The callers are casting from struct > sockaddr_storage (large enough) to struct sockaddr (not large enough), > and then the inline is casting back to sockaddr_in6 (large enough). I > would have expected fortify to check either sockaddr_storage or > sockaddr_in6, but not sockaddr. > > -Kees > I don't think that's quite what's happening. It will report unknown if it can't find allocation sites or other guarantees of size. There are no alloc_size markers yet so it *mostly* does stack / global checks. It won't infer sizes based on pointer types. It's not a heuristic. Hoowever, fortify / -fsanitize=object-size can indirectly uncover other forms of undefined behavior though. Code may rely on doing something undefined that GCC actively assumes can't happen for optimization. It can have false positives due to dead code with the compile-time errors but if the code is actually called with the size that it rejects as invalid, then it's unlikely there isn't a bug.
Re: [PATCH] infiniband: avoid overflow warning
On Mon, 2017-07-31 at 14:18 -0700, Kees Cook wrote: > On Mon, Jul 31, 2017 at 2:10 PM, Arnd Bergmann wrote: > > On Mon, Jul 31, 2017 at 10:58 PM, Kees Cook > > wrote: > > > On Mon, Jul 31, 2017 at 12:30 AM, Arnd Bergmann > > > wrote: > > > > On Mon, Jul 31, 2017 at 9:08 AM, Moni Shoua > > > > wrote: > > > > > > break; > > > > > > default: > > > > > > return -EINVAL; > > > > > > > > > > what happens if you replace 16 with sizeof(struct in6_addr)? > > > > > > > > Same thing: the problem is that gcc already knows the size of > > > > the structure we > > > > pass in here, and it is in fact shorter. > > > > > > So gcc is ignoring both the cast (to 16 byte struct in6_addr) and > > > the > > > caller's actual 128 byte struct sockaddr_storage, and looking only > > > at > > > struct sockaddr? That seems really weird. > > > > Using a sockaddr_storage on the stack would address the warning, but > > the question was about just changing the hardcoded 16 to a sizeof() > > operation, and that has no effect. > > Right, I didn't mean that; I was curious why the fortify macro > resulted in an error at all. The callers are casting from struct > sockaddr_storage (large enough) to struct sockaddr (not large enough), > and then the inline is casting back to sockaddr_in6 (large enough). I > would have expected fortify to check either sockaddr_storage or > sockaddr_in6, but not sockaddr. > > -Kees > I don't think that's quite what's happening. It will report unknown if it can't find allocation sites or other guarantees of size. There are no alloc_size markers yet so it *mostly* does stack / global checks. It won't infer sizes based on pointer types. It's not a heuristic. Hoowever, fortify / -fsanitize=object-size can indirectly uncover other forms of undefined behavior though. Code may rely on doing something undefined that GCC actively assumes can't happen for optimization. It can have false positives due to dead code with the compile-time errors but if the code is actually called with the size that it rejects as invalid, then it's unlikely there isn't a bug.
Re: [PATCH] infiniband: avoid overflow warning
On Mon, 2017-07-31 at 21:19 +0200, Arnd Bergmann wrote: > On Mon, Jul 31, 2017 at 6:17 PM, Bart Van Asscheom> wrote: > > On Mon, 2017-07-31 at 18:04 +0200, Arnd Bergmann wrote: > > > On Mon, Jul 31, 2017 at 5:32 PM, Bart Van Assche > > dc.com> wrote: > > > > So inetaddr_event() assigns AF_INET so .sin_family and gcc warns > > > > about code > > > > that is only executed if .sin_family == AF_INET6? Since this > > > > warning is the > > > > result of incorrect interprocedural analysis by gcc, shouldn't > > > > this be > > > > reported as a bug to the gcc authors? > > > > > > I think the interprocedural analysis here is just a little worse > > > than it could > > > be, but it's not actually correct. It's not gcc that prints the > > > warning (if > > > it did, then I'd agree it would be a gcc bug) but the warning is > > > triggered > > > intentionally by the fortified version of memcpy in > > > include/linux/string.h. > > > > > > The problem as I understand it is that gcc cannot guarantee that > > > it > > > tracks the value of addr->sa_family at least as far as the size > > > of the > > > stack object, and it has no strict reason to do so, so the inlined > > > rdma_ip2gid() will still contain both cases. > > > > Hello Arnd, > > > > Had you already considered to uninline the rdma_ip2gid() function? > > Not really, that would prevent the normal optimization from happening, > so that would be worse than uninlining addr_event() as I tried. > > It would of course get rid of the warning, so if you think that's a > better > solution, I won't complain. > >Arnd You could also use __memcpy but using a struct assignment seems cleaner. The compile-time fortify errors aren't perfect since they rely on GCC being able to optimize them out and there can be dead code that has intentional overflows, etc. Their purpose is just to move many runtime errors (which don't have these false positives) to compile-time since it's a lot less painful to deal with a few false positives like this than errors slipping through to runtime (although it's less important since it's going to be using WARN for the time being). If the compile-time errors would removed, all of the real overflows would still be detected at runtime. One option would be having something like #define __NO_FORTIFY but *only* disabling the compile-time part of the checks to work around false positives. *shrug*
Re: [PATCH] infiniband: avoid overflow warning
On Mon, 2017-07-31 at 21:19 +0200, Arnd Bergmann wrote: > On Mon, Jul 31, 2017 at 6:17 PM, Bart Van Assche om> wrote: > > On Mon, 2017-07-31 at 18:04 +0200, Arnd Bergmann wrote: > > > On Mon, Jul 31, 2017 at 5:32 PM, Bart Van Assche > > dc.com> wrote: > > > > So inetaddr_event() assigns AF_INET so .sin_family and gcc warns > > > > about code > > > > that is only executed if .sin_family == AF_INET6? Since this > > > > warning is the > > > > result of incorrect interprocedural analysis by gcc, shouldn't > > > > this be > > > > reported as a bug to the gcc authors? > > > > > > I think the interprocedural analysis here is just a little worse > > > than it could > > > be, but it's not actually correct. It's not gcc that prints the > > > warning (if > > > it did, then I'd agree it would be a gcc bug) but the warning is > > > triggered > > > intentionally by the fortified version of memcpy in > > > include/linux/string.h. > > > > > > The problem as I understand it is that gcc cannot guarantee that > > > it > > > tracks the value of addr->sa_family at least as far as the size > > > of the > > > stack object, and it has no strict reason to do so, so the inlined > > > rdma_ip2gid() will still contain both cases. > > > > Hello Arnd, > > > > Had you already considered to uninline the rdma_ip2gid() function? > > Not really, that would prevent the normal optimization from happening, > so that would be worse than uninlining addr_event() as I tried. > > It would of course get rid of the warning, so if you think that's a > better > solution, I won't complain. > >Arnd You could also use __memcpy but using a struct assignment seems cleaner. The compile-time fortify errors aren't perfect since they rely on GCC being able to optimize them out and there can be dead code that has intentional overflows, etc. Their purpose is just to move many runtime errors (which don't have these false positives) to compile-time since it's a lot less painful to deal with a few false positives like this than errors slipping through to runtime (although it's less important since it's going to be using WARN for the time being). If the compile-time errors would removed, all of the real overflows would still be detected at runtime. One option would be having something like #define __NO_FORTIFY but *only* disabling the compile-time part of the checks to work around false positives. *shrug*
Re: [PATCH] fortify: Use WARN instead of BUG for now
I think the 'else' added in the proposed patch makes it too complicated for GCC to optimize out the __attribute__((error)) checks before they're considered to be errors. It's not needed so it's probably best to just avoid doing something like that. The runtime checks can't get false positives from overly complex code but the compile-time ones depend on GCC being able to reliably optimize them out. This might be easier for GCC: if (__builtin_constant_p(size) && condition_a) { compiletimeerror(); } if (__builtin_constant_p(size) && condition_b) { compiletimeerror(); } than the current: if (__builtin_constant_p(size)) { if (condition_a) { compiletimeerror(); } if (condition_b) { compiletimeerror(); } } but it hasn't had a false positive like that with the current code. Removing __noreturn is making the inline code more complex from GCC's perspective too, but hopefully it's neither reducing coverage (i.e. not making it less able to resolve __builtin_object_size - intuitively it shouldn't impact it much but you never know) or making GCC unable to deal with the compile-time checks.
Re: [PATCH] fortify: Use WARN instead of BUG for now
I think the 'else' added in the proposed patch makes it too complicated for GCC to optimize out the __attribute__((error)) checks before they're considered to be errors. It's not needed so it's probably best to just avoid doing something like that. The runtime checks can't get false positives from overly complex code but the compile-time ones depend on GCC being able to reliably optimize them out. This might be easier for GCC: if (__builtin_constant_p(size) && condition_a) { compiletimeerror(); } if (__builtin_constant_p(size) && condition_b) { compiletimeerror(); } than the current: if (__builtin_constant_p(size)) { if (condition_a) { compiletimeerror(); } if (condition_b) { compiletimeerror(); } } but it hasn't had a false positive like that with the current code. Removing __noreturn is making the inline code more complex from GCC's perspective too, but hopefully it's neither reducing coverage (i.e. not making it less able to resolve __builtin_object_size - intuitively it shouldn't impact it much but you never know) or making GCC unable to deal with the compile-time checks.
Re: [PATCH] fortify: Use WARN instead of BUG for now
> Maybe we could do two phases? One to s/BUG/WARN/ and the second to > improve the message? s/fortify_panic/fortify_overflow/ + use WARN + remove __noreturn makes sense as one commit. Still think the *option* of __noreturn + BUG should be kept there even just for measuring the size overhead. !COMPILE_TIME && EXPERT if it needs to be for now. If you're fully removing __noreturn then the entry in tools/objtool/check.c for __noreturn functions also won't make sense (either way it needs to use the new name). I think improving error messages should be done a bit differently though and it'll be easier to not tie these things together.
Re: [PATCH] fortify: Use WARN instead of BUG for now
> Maybe we could do two phases? One to s/BUG/WARN/ and the second to > improve the message? s/fortify_panic/fortify_overflow/ + use WARN + remove __noreturn makes sense as one commit. Still think the *option* of __noreturn + BUG should be kept there even just for measuring the size overhead. !COMPILE_TIME && EXPERT if it needs to be for now. If you're fully removing __noreturn then the entry in tools/objtool/check.c for __noreturn functions also won't make sense (either way it needs to use the new name). I think improving error messages should be done a bit differently though and it'll be easier to not tie these things together.
Re: [PATCH] fortify: Use WARN instead of BUG for now
It should just be renamed from fortify_panic -> fortify_error, including in arch/x86/boot/compressed/misc.c and arch/x86/boot/compressed/misc.c. It can use WARN instead of BUG by with a 'default n', !COMPILE_TEST option to use BUG again. Otherwise it needs to be patched downstream when that's wanted. I don't think splitting it is the right approach to improving the runtime error handling. That only makes sense for the compile-time errors due to the limitations of __attribute__((error)). Can we think about that before changing it? Just make it use WARN for now. The best debugging experience would be passing along the sizes and having the fortify_error function convert that into nice error messages. For memcpy(p, q, n), n can be larger than both the detected sizes of p and q, not just either one. The error should just be saying the function name and printing the copy size and maximum sizes of p and q. That's going to increase the code size too but I think splitting it will be worse and it goes in the wrong direction in terms of complexity. It's going to make future extensions / optimization harder if it's split.
Re: [PATCH] fortify: Use WARN instead of BUG for now
It should just be renamed from fortify_panic -> fortify_error, including in arch/x86/boot/compressed/misc.c and arch/x86/boot/compressed/misc.c. It can use WARN instead of BUG by with a 'default n', !COMPILE_TEST option to use BUG again. Otherwise it needs to be patched downstream when that's wanted. I don't think splitting it is the right approach to improving the runtime error handling. That only makes sense for the compile-time errors due to the limitations of __attribute__((error)). Can we think about that before changing it? Just make it use WARN for now. The best debugging experience would be passing along the sizes and having the fortify_error function convert that into nice error messages. For memcpy(p, q, n), n can be larger than both the detected sizes of p and q, not just either one. The error should just be saying the function name and printing the copy size and maximum sizes of p and q. That's going to increase the code size too but I think splitting it will be worse and it goes in the wrong direction in terms of complexity. It's going to make future extensions / optimization harder if it's split.
Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c
It was known that there are going to be bugs to work through, many of them relatively benign like the leaks of data near string constants (probably other string constants) in rodata. It makes sense to have it default to WARN with BUG / noreturn as a non-default configuration option for it, I guess with !COMPILE_TEST like UBSAN_SANITIZE_ALL. I don't think there's any sane way to bound the length of either reads / writes. It needs to either WARN + continue on into doing the overflow or use BUG. Trying to correct it might make things worse and would make this more complicated / bug-prone. It already has enough subtle edge cases to deal with. I think 'benign' is a better term than 'false positive' because there hasn't been a non-bug found yet. They're mostly not security vulns but they're undefined behavior.
Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c
It was known that there are going to be bugs to work through, many of them relatively benign like the leaks of data near string constants (probably other string constants) in rodata. It makes sense to have it default to WARN with BUG / noreturn as a non-default configuration option for it, I guess with !COMPILE_TEST like UBSAN_SANITIZE_ALL. I don't think there's any sane way to bound the length of either reads / writes. It needs to either WARN + continue on into doing the overflow or use BUG. Trying to correct it might make things worse and would make this more complicated / bug-prone. It already has enough subtle edge cases to deal with. I think 'benign' is a better term than 'false positive' because there hasn't been a non-bug found yet. They're mostly not security vulns but they're undefined behavior.
Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c
> So the fortify_string code has decided that only a single-byte (or > empty) memcpy is ok. > > And that, in turn, seems to be because we're copying from > optprobe_template_entry, which is declared as > > extern __visible kprobe_opcode_t optprobe_template_entry; > > so the fortify code decides it's a single character. > > Does just changing all those things to be declared as arrays fix > things? Yeah, that fixes it because GCC will consider the size of 'char foo[]' unknown (i.e. (size_t)-1 from __builtin_object_size). GCC doesn't know this essentially constant value at compile-time so it wasn't a compile-time error: #define TMPL_END_IDX \ ((long)_template_end - (long)_template_entry) -fsanitize=object-size works the same way for pointer dereferences so replacing might fix some issues for CONFIG_UBSAN_SANITIZE_ALL. I guess that's way too noisy at the moment thus the !COMPILE_TEST.
Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c
> So the fortify_string code has decided that only a single-byte (or > empty) memcpy is ok. > > And that, in turn, seems to be because we're copying from > optprobe_template_entry, which is declared as > > extern __visible kprobe_opcode_t optprobe_template_entry; > > so the fortify code decides it's a single character. > > Does just changing all those things to be declared as arrays fix > things? Yeah, that fixes it because GCC will consider the size of 'char foo[]' unknown (i.e. (size_t)-1 from __builtin_object_size). GCC doesn't know this essentially constant value at compile-time so it wasn't a compile-time error: #define TMPL_END_IDX \ ((long)_template_end - (long)_template_entry) -fsanitize=object-size works the same way for pointer dereferences so replacing might fix some issues for CONFIG_UBSAN_SANITIZE_ALL. I guess that's way too noisy at the moment thus the !COMPILE_TEST.
Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c
> [8.134886] arch_prepare_optimized_kprobe+0xd5/0x171 > [8.134886] arch_prepare_optimized_kprobe+0xd5/0x171 Probably this: /* Copy arch-dep-instance from template */ memcpy(buf, _template_entry, TMPL_END_IDX); Not a real bug, just technically undefined because these are u32: typedef u32 kprobe_opcode_t; extern __visible kprobe_opcode_t optprobe_template_entry; extern __visible kprobe_opcode_t optprobe_template_val; extern __visible kprobe_opcode_t optprobe_template_call; extern __visible kprobe_opcode_t optprobe_template_end; extern __visible kprobe_opcode_t optprobe_template_sub_sp; extern __visible kprobe_opcode_t optprobe_template_add_sp; extern __visible kprobe_opcode_t optprobe_template_restore_begin; extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn; extern __visible kprobe_opcode_t optprobe_template_restore_end; Could be switched to unknown size arrays like optprobe_template_entry[] but it might be best to just mark the kprobe code with #define __NO_FORTIFY.
Re: [lkp-robot] [include/linux/string.h] 6974f0c455: kernel_BUG_at_lib/string.c
> [8.134886] arch_prepare_optimized_kprobe+0xd5/0x171 > [8.134886] arch_prepare_optimized_kprobe+0xd5/0x171 Probably this: /* Copy arch-dep-instance from template */ memcpy(buf, _template_entry, TMPL_END_IDX); Not a real bug, just technically undefined because these are u32: typedef u32 kprobe_opcode_t; extern __visible kprobe_opcode_t optprobe_template_entry; extern __visible kprobe_opcode_t optprobe_template_val; extern __visible kprobe_opcode_t optprobe_template_call; extern __visible kprobe_opcode_t optprobe_template_end; extern __visible kprobe_opcode_t optprobe_template_sub_sp; extern __visible kprobe_opcode_t optprobe_template_add_sp; extern __visible kprobe_opcode_t optprobe_template_restore_begin; extern __visible kprobe_opcode_t optprobe_template_restore_orig_insn; extern __visible kprobe_opcode_t optprobe_template_restore_end; Could be switched to unknown size arrays like optprobe_template_entry[] but it might be best to just mark the kprobe code with #define __NO_FORTIFY.
Re: [PATCH] replace incorrect strscpy use in FORTIFY_SOURCE
On Fri, 2017-07-14 at 16:51 -0700, Kees Cook wrote: > On Fri, Jul 14, 2017 at 2:28 PM, Daniel Micay <danielmi...@gmail.com> > wrote: > > Using strscpy was wrong because FORTIFY_SOURCE is passing the > > maximum > > possible size of the outermost object, but strscpy defines the count > > parameter as the exact buffer size, so this could copy past the end > > of > > the source. This would still be wrong with the planned usage of > > __builtin_object_size(p, 1) for intra-object overflow checks since > > it's > > the maximum possible size of the specified object with no guarantee > > of > > it being that large. > > > > Reuse of the fortified functions like this currently makes the > > runtime > > error reporting less precise but that can be improved later on. > > > > Signed-off-by: Daniel Micay <danielmi...@gmail.com> > > Thanks for fixing this! Linus, do you want to take this directly or > have it go via -mm where fortify landed originally? > > Acked-by: Kees Cook <keesc...@chromium.org> > > As far as testing goes, was the NFS tree not in -next, or was a test > not running against -next? I'm curious why it took until the NFS tree > landed in Linus's tree for this to get noticed. Fortify was in -next > for a while... > > -Kees Not sure but one issue is that v1 wasn't broken and that's what I most heavily tested myself. I then switched to testing with intra-object size checks on top of it, and there would have needed to be a case like this to break with those stricter checks: char a[2]; char b[3]; char *p = cond ? a : b; strcpy(a, c); __builtin_object_size(p, 0) / __builtin_object_size(p, 1) will both return 3 there, which is wrong with that incorrect strscpy usage. I wouldn't have found it via NFS since I've been testing with the string (but not memory) functions switched to the stricter type 1. I started on some test cases for FORTIFY_SOURCE but I was testing to make sure it catches all the bugs it's supposed to catch, it's probably a good idea to write some more generic string edge case tests too. One of the somewhat subtle cases (which is handled properly already): struct foo { char a[2]; char b; } struct foo x; x.a[0] = '1'; x.a[1] = '2'; x.b = '\0'; strlen(x.a); // BUG with stricter intra-object overflow checks on strnlen(x.a, 3); // BUG with stricter intra-object overflow checks on strnlen(x.a, 2); // no overflow Anyway, it'd be really good if other people looked closely at these. I wasn't sure what to do with test cases that I've made though. It seems ones that are *supposed* to BUG should go in lkdtm, and should the other tests just be together with those? Some should also pass w/o the intra object overflow checks, but BUG with them enabled.
Re: [PATCH] replace incorrect strscpy use in FORTIFY_SOURCE
On Fri, 2017-07-14 at 16:51 -0700, Kees Cook wrote: > On Fri, Jul 14, 2017 at 2:28 PM, Daniel Micay > wrote: > > Using strscpy was wrong because FORTIFY_SOURCE is passing the > > maximum > > possible size of the outermost object, but strscpy defines the count > > parameter as the exact buffer size, so this could copy past the end > > of > > the source. This would still be wrong with the planned usage of > > __builtin_object_size(p, 1) for intra-object overflow checks since > > it's > > the maximum possible size of the specified object with no guarantee > > of > > it being that large. > > > > Reuse of the fortified functions like this currently makes the > > runtime > > error reporting less precise but that can be improved later on. > > > > Signed-off-by: Daniel Micay > > Thanks for fixing this! Linus, do you want to take this directly or > have it go via -mm where fortify landed originally? > > Acked-by: Kees Cook > > As far as testing goes, was the NFS tree not in -next, or was a test > not running against -next? I'm curious why it took until the NFS tree > landed in Linus's tree for this to get noticed. Fortify was in -next > for a while... > > -Kees Not sure but one issue is that v1 wasn't broken and that's what I most heavily tested myself. I then switched to testing with intra-object size checks on top of it, and there would have needed to be a case like this to break with those stricter checks: char a[2]; char b[3]; char *p = cond ? a : b; strcpy(a, c); __builtin_object_size(p, 0) / __builtin_object_size(p, 1) will both return 3 there, which is wrong with that incorrect strscpy usage. I wouldn't have found it via NFS since I've been testing with the string (but not memory) functions switched to the stricter type 1. I started on some test cases for FORTIFY_SOURCE but I was testing to make sure it catches all the bugs it's supposed to catch, it's probably a good idea to write some more generic string edge case tests too. One of the somewhat subtle cases (which is handled properly already): struct foo { char a[2]; char b; } struct foo x; x.a[0] = '1'; x.a[1] = '2'; x.b = '\0'; strlen(x.a); // BUG with stricter intra-object overflow checks on strnlen(x.a, 3); // BUG with stricter intra-object overflow checks on strnlen(x.a, 2); // no overflow Anyway, it'd be really good if other people looked closely at these. I wasn't sure what to do with test cases that I've made though. It seems ones that are *supposed* to BUG should go in lkdtm, and should the other tests just be together with those? Some should also pass w/o the intra object overflow checks, but BUG with them enabled.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
> I find "hardening" code that adds bugs to be particularly bad and > ugly, the same way that I absolutely *hate* debugging code that turns > out to make debugging impossible (we had that with the "better" stack > tracing code that caused kernel panics to kill the machine entirely > rather than show the backtrace, and I'm still bitter about it a decade > after the fact). Agree, it's very important for this code to be correct and the string functions have some subtleties so it needs scrutiny. I messed up strcpy between v1 and v2 trying to add a proper read overflow check. My fault for not looking more closely at strscpy before adopting it based on my misinterpretation of the API. This is primarily a bug finding feature right now and it has gotten a few fixed that actually matter (most were unimportant memcpy read past end of string constant but not all). I don't think it has another bug like this strscpy misuse itself, but there will need to be some more fixes for minor read overflows, etc. elsewhere in the tree before it'll actually make sense as a hardening feature because it can turn a benign read overflow into a DoS via BUG(). I think it will be fine for 4.13, but I definitely wouldn't propose 'default y' for a while, even if there was no performance cost (and there is). Fix for this issue is here in case anyone just looks only at this thread (realized I should have passed send-email a reply id): http://marc.info/?l=linux-fsdevel=150006772418003=2
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
> I find "hardening" code that adds bugs to be particularly bad and > ugly, the same way that I absolutely *hate* debugging code that turns > out to make debugging impossible (we had that with the "better" stack > tracing code that caused kernel panics to kill the machine entirely > rather than show the backtrace, and I'm still bitter about it a decade > after the fact). Agree, it's very important for this code to be correct and the string functions have some subtleties so it needs scrutiny. I messed up strcpy between v1 and v2 trying to add a proper read overflow check. My fault for not looking more closely at strscpy before adopting it based on my misinterpretation of the API. This is primarily a bug finding feature right now and it has gotten a few fixed that actually matter (most were unimportant memcpy read past end of string constant but not all). I don't think it has another bug like this strscpy misuse itself, but there will need to be some more fixes for minor read overflows, etc. elsewhere in the tree before it'll actually make sense as a hardening feature because it can turn a benign read overflow into a DoS via BUG(). I think it will be fine for 4.13, but I definitely wouldn't propose 'default y' for a while, even if there was no performance cost (and there is). Fix for this issue is here in case anyone just looks only at this thread (realized I should have passed send-email a reply id): http://marc.info/?l=linux-fsdevel=150006772418003=2
[PATCH] replace incorrect strscpy use in FORTIFY_SOURCE
Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum possible size of the outermost object, but strscpy defines the count parameter as the exact buffer size, so this could copy past the end of the source. This would still be wrong with the planned usage of __builtin_object_size(p, 1) for intra-object overflow checks since it's the maximum possible size of the specified object with no guarantee of it being that large. Reuse of the fortified functions like this currently makes the runtime error reporting less precise but that can be improved later on. Signed-off-by: Daniel Micay <danielmi...@gmail.com> --- include/linux/string.h | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/string.h b/include/linux/string.h index 96f5a5fd0377..049866760e8b 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -202,17 +202,6 @@ void __read_overflow2(void) __compiletime_error("detected read beyond size of ob void __write_overflow(void) __compiletime_error("detected write beyond size of object passed as 1st parameter"); #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && defined(CONFIG_FORTIFY_SOURCE) -__FORTIFY_INLINE char *strcpy(char *p, const char *q) -{ - size_t p_size = __builtin_object_size(p, 0); - size_t q_size = __builtin_object_size(q, 0); - if (p_size == (size_t)-1 && q_size == (size_t)-1) - return __builtin_strcpy(p, q); - if (strscpy(p, q, p_size < q_size ? p_size : q_size) < 0) - fortify_panic(__func__); - return p; -} - __FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -391,6 +380,18 @@ __FORTIFY_INLINE void *kmemdup(const void *p, size_t size, gfp_t gfp) fortify_panic(__func__); return __real_kmemdup(p, size, gfp); } + +/* defined after fortified strlen and memcpy to reuse them */ +__FORTIFY_INLINE char *strcpy(char *p, const char *q) +{ + size_t p_size = __builtin_object_size(p, 0); + size_t q_size = __builtin_object_size(q, 0); + if (p_size == (size_t)-1 && q_size == (size_t)-1) + return __builtin_strcpy(p, q); + memcpy(p, q, strlen(q) + 1); + return p; +} + #endif #endif /* _LINUX_STRING_H_ */ -- 2.13.3
[PATCH] replace incorrect strscpy use in FORTIFY_SOURCE
Using strscpy was wrong because FORTIFY_SOURCE is passing the maximum possible size of the outermost object, but strscpy defines the count parameter as the exact buffer size, so this could copy past the end of the source. This would still be wrong with the planned usage of __builtin_object_size(p, 1) for intra-object overflow checks since it's the maximum possible size of the specified object with no guarantee of it being that large. Reuse of the fortified functions like this currently makes the runtime error reporting less precise but that can be improved later on. Signed-off-by: Daniel Micay --- include/linux/string.h | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/string.h b/include/linux/string.h index 96f5a5fd0377..049866760e8b 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -202,17 +202,6 @@ void __read_overflow2(void) __compiletime_error("detected read beyond size of ob void __write_overflow(void) __compiletime_error("detected write beyond size of object passed as 1st parameter"); #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && defined(CONFIG_FORTIFY_SOURCE) -__FORTIFY_INLINE char *strcpy(char *p, const char *q) -{ - size_t p_size = __builtin_object_size(p, 0); - size_t q_size = __builtin_object_size(q, 0); - if (p_size == (size_t)-1 && q_size == (size_t)-1) - return __builtin_strcpy(p, q); - if (strscpy(p, q, p_size < q_size ? p_size : q_size) < 0) - fortify_panic(__func__); - return p; -} - __FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -391,6 +380,18 @@ __FORTIFY_INLINE void *kmemdup(const void *p, size_t size, gfp_t gfp) fortify_panic(__func__); return __real_kmemdup(p, size, gfp); } + +/* defined after fortified strlen and memcpy to reuse them */ +__FORTIFY_INLINE char *strcpy(char *p, const char *q) +{ + size_t p_size = __builtin_object_size(p, 0); + size_t q_size = __builtin_object_size(q, 0); + if (p_size == (size_t)-1 && q_size == (size_t)-1) + return __builtin_strcpy(p, q); + memcpy(p, q, strlen(q) + 1); + return p; +} + #endif #endif /* _LINUX_STRING_H_ */ -- 2.13.3
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
> The reason q_size isn't used is because it doesn't yet prevent read > overflow. The commit message mentions that among the current > limitations > along with __builtin_object_size(ptr, 1). Er rather, in strlcat, the q_size is unused after the fast path is because strnlen obtains the constant again itself.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
> The reason q_size isn't used is because it doesn't yet prevent read > overflow. The commit message mentions that among the current > limitations > along with __builtin_object_size(ptr, 1). Er rather, in strlcat, the q_size is unused after the fast path is because strnlen obtains the constant again itself.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
On Fri, 2017-07-14 at 13:50 -0700, Linus Torvalds wrote: > On Fri, Jul 14, 2017 at 1:38 PM, Daniel Micay <danielmi...@gmail.com> > wrote: > > > > If strscpy treats the count parameter as a *guarantee* of the dest > > size > > rather than a limit, > > No, it's a *limit*. > > And by a *limit*, I mean that we know that we can access both source > and destination within that limit. FORTIFY_SOURCE needs to be able to pass a limit without implying that there's a minimum. That's the distinction I was trying to make. It's wrong to use anything where it's interpreted as a minimum here. Using __builtin_object_size(ptr, 0) vs. __builtin_object_size(ptr, 1) doesn't avoid the problem. __builtin_object_size(ptr, 1) returns a maximum among the possible buffer sizes just like 0. It's just stricter, i.e. catches intra-object overflow, which isn't desirable for the first take since it will cause compatibility issues. There's code using memcpy, copy_to_user, etc. to read / write multiple fields with a pointer to the first one passed as the source / destination. > > My initial patch used strlcpy there, because I wasn't aware of > > strscpy > > before it was suggested: > > Since I'm looking at this, I note that the "strlcpy()" code is > complete garbage too, and has that same > > p_size == (size_t)-1 && q_size == (size_t)-1 > > check which is wrong. Of course, in strlcpy, q_size is never actually > *used*, so the whole check seems bogus. That check is only an optimization. __builtin_object_size always returns a constant, and it's (size_t)-1 when no limit could be found. The reason q_size isn't used is because it doesn't yet prevent read overflow. The commit message mentions that among the current limitations along with __builtin_object_size(ptr, 1). > But no, strlcpy() is complete garbage, and should never be used. It is > truly a shit interface, and anybody who uses it is by definition > buggy. > > Why? Because the return value of "strlcpy()" is defined to be ignoring > the limit, so you FUNDAMENTALLY must not use that thing on untrusted > source strings. > > But since the whole *point* of people using it is for untrusted > sources, it by definition is garbage. > > Ergo: don't use strlcpy(). It's unbelievable crap. It's wrong. There's > a reason we defined "strscpy()" as the way to do safe copies > (strncpy(), of course, is broken for both lack of NUL termination > _and_ for excessive NUL termination when a NUL did exist). Sure, it doesn't prevent read overflow (but it's not worse than strcpy, which is the purpose) which is why I said this: "The fortified string functions should place a limit on reads from the source. For strcat/strcpy, these could just be a variant of strlcat / strlcpy using the size limit as a bound on both the source and destination, with the return value only reporting whether truncation occurred rather than providing the source length. It would be an easier API for developers to use too and not that it really matters but it would be more efficient for the case where truncation is intended." That's why strscpy was suggested and I switched to that + updated the commit message to only mention strcat, but it's wrong to use it here because __builtin_object_size(p, 0) / __builtin_object_size(p, 1) are only a guaranteed maximum length with no minimum guarantee.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
On Fri, 2017-07-14 at 13:50 -0700, Linus Torvalds wrote: > On Fri, Jul 14, 2017 at 1:38 PM, Daniel Micay > wrote: > > > > If strscpy treats the count parameter as a *guarantee* of the dest > > size > > rather than a limit, > > No, it's a *limit*. > > And by a *limit*, I mean that we know that we can access both source > and destination within that limit. FORTIFY_SOURCE needs to be able to pass a limit without implying that there's a minimum. That's the distinction I was trying to make. It's wrong to use anything where it's interpreted as a minimum here. Using __builtin_object_size(ptr, 0) vs. __builtin_object_size(ptr, 1) doesn't avoid the problem. __builtin_object_size(ptr, 1) returns a maximum among the possible buffer sizes just like 0. It's just stricter, i.e. catches intra-object overflow, which isn't desirable for the first take since it will cause compatibility issues. There's code using memcpy, copy_to_user, etc. to read / write multiple fields with a pointer to the first one passed as the source / destination. > > My initial patch used strlcpy there, because I wasn't aware of > > strscpy > > before it was suggested: > > Since I'm looking at this, I note that the "strlcpy()" code is > complete garbage too, and has that same > > p_size == (size_t)-1 && q_size == (size_t)-1 > > check which is wrong. Of course, in strlcpy, q_size is never actually > *used*, so the whole check seems bogus. That check is only an optimization. __builtin_object_size always returns a constant, and it's (size_t)-1 when no limit could be found. The reason q_size isn't used is because it doesn't yet prevent read overflow. The commit message mentions that among the current limitations along with __builtin_object_size(ptr, 1). > But no, strlcpy() is complete garbage, and should never be used. It is > truly a shit interface, and anybody who uses it is by definition > buggy. > > Why? Because the return value of "strlcpy()" is defined to be ignoring > the limit, so you FUNDAMENTALLY must not use that thing on untrusted > source strings. > > But since the whole *point* of people using it is for untrusted > sources, it by definition is garbage. > > Ergo: don't use strlcpy(). It's unbelievable crap. It's wrong. There's > a reason we defined "strscpy()" as the way to do safe copies > (strncpy(), of course, is broken for both lack of NUL termination > _and_ for excessive NUL termination when a NUL did exist). Sure, it doesn't prevent read overflow (but it's not worse than strcpy, which is the purpose) which is why I said this: "The fortified string functions should place a limit on reads from the source. For strcat/strcpy, these could just be a variant of strlcat / strlcpy using the size limit as a bound on both the source and destination, with the return value only reporting whether truncation occurred rather than providing the source length. It would be an easier API for developers to use too and not that it really matters but it would be more efficient for the case where truncation is intended." That's why strscpy was suggested and I switched to that + updated the commit message to only mention strcat, but it's wrong to use it here because __builtin_object_size(p, 0) / __builtin_object_size(p, 1) are only a guaranteed maximum length with no minimum guarantee.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
> My initial patch used strlcpy there, because I wasn't aware of strscpy > before it was suggested: > > http://www.openwall.com/lists/kernel-hardening/2017/05/04/11 > > I was wrong to move it to strscpy. It could be switched back to > strlcpy > again unless the kernel considers the count parameter to be a > guarantee > that could be leveraged in the future. Using the fortified strlen + > memcpy would provide the improvement that strscpy was meant to provide > there over strlcpy. Similarly, the FORTIFY_SOURCE strcat uses strlcat with the assumption that the count parameter is a limit, not a guarantee for a optimization. There's only a C implementation and it currently doesn't, but if it's meant to be a guarantee then the strcat needs to be changed too. I'll make a fix moving away both from the existing functions.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
> My initial patch used strlcpy there, because I wasn't aware of strscpy > before it was suggested: > > http://www.openwall.com/lists/kernel-hardening/2017/05/04/11 > > I was wrong to move it to strscpy. It could be switched back to > strlcpy > again unless the kernel considers the count parameter to be a > guarantee > that could be leveraged in the future. Using the fortified strlen + > memcpy would provide the improvement that strscpy was meant to provide > there over strlcpy. Similarly, the FORTIFY_SOURCE strcat uses strlcat with the assumption that the count parameter is a limit, not a guarantee for a optimization. There's only a C implementation and it currently doesn't, but if it's meant to be a guarantee then the strcat needs to be changed too. I'll make a fix moving away both from the existing functions.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
On Fri, 2017-07-14 at 12:58 -0700, Linus Torvalds wrote: > On Fri, Jul 14, 2017 at 12:43 PM, Andrey Ryabinin >wrote: > > > > > yet when I look at the generated code for __ip_map_lookup, I see > > > > > >movl$32, %edx #, > > >movq%r13, %rsi # class, > > >leaq48(%rax), %rdi #, tmp126 > > >callstrscpy # > > > > > > what's the bug here? Look at that third argume8nt - %rdx. It is > > > initialized to 32. > > > > It's not a compiler bug, it's a bug in our strcpy(). > > Whoever wrote this strcpy() into strscpy() code apparently didn't > > read carefully > > enough gcc manual about __builtin_object_size(). > > > > Summary from https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking > > .html : > > > > __builtin_object_size(ptr, type) returns a constant number > > of bytes from 'ptr' to the end of the object 'ptr' > > pointer points to. "type" is an integer constant from 0 to > > 3. If the least significant bit is clear, objects > > are whole variables, if it is set, a closest surrounding > > subobject is considered the object a pointer points to. > > The second bit determines if maximum or minimum of remaining > > bytes is computed. > > > > We have type = 0 in strcpy(), so the least significant bit is clear. > > So the 'ptr' is considered as a pointer to the whole > > variable i.e. pointer to struct ip_map ip; > > And the number of bytes from 'ip.m_class' to the end of the ip > > object is exactly 32. > > > > I suppose that changing the type to 1 should fix this bug. > > Oh, that absolutely needs to be done. > > Because that "strcpy() -> strscpy()" conversion really depends on that > size being the right size (well, in this case minimal safe size) for > the actual accesses, exactly because "strscpy()" is perfectly willing > to write *past* the end of the destination string within that given > size limit (ie it reads and writes in the same 8-byte chunks). > > So if you have a small target string that is contained in a big > object, then the "hardened" strcpy() code can actually end up > overwriting things past the end of the strring, even if the string > itself were to have fit in the buffer. > > I note that every single use in string.h is buggy, and it worries me > that __compiletime_object_size() does this too. The only user of that > seems to be check_copy_size(), and now I'm a bit worried what that bug > may have hidden. > > I find "hardening" code that adds bugs to be particularly bad and > ugly, the same way that I absolutely *hate* debugging code that turns > out to make debugging impossible (we had that with the "better" stack > tracing code that caused kernel panics to kill the machine entirely > rather than show the backtrace, and I'm still bitter about it a decade > after the fact). > > There is something actively *evil* about it. Daniel, Kees, please jump > on this. > > Andrey, thanks for noticing this thing, > > Linus The issue is the usage of strscpy then, not the __builtin_object_size type parameter. The type is set 0 rather than 1 to be more lenient by not detecting intra-object overflow, which is going to come later. If strscpy treats the count parameter as a *guarantee* of the dest size rather than a limit, it's wrong to use it there, whether or not the type parameter for __builtin_object_size is 0 or 1 since it can still return a larger size. It's a limit with no guaranteed minimum. My initial patch used strlcpy there, because I wasn't aware of strscpy before it was suggested: http://www.openwall.com/lists/kernel-hardening/2017/05/04/11 I was wrong to move it to strscpy. It could be switched back to strlcpy again unless the kernel considers the count parameter to be a guarantee that could be leveraged in the future. Using the fortified strlen + memcpy would provide the improvement that strscpy was meant to provide there over strlcpy.
Re: [GIT PULL] Please pull NFS client changes for Linux 4.13
On Fri, 2017-07-14 at 12:58 -0700, Linus Torvalds wrote: > On Fri, Jul 14, 2017 at 12:43 PM, Andrey Ryabinin > wrote: > > > > > yet when I look at the generated code for __ip_map_lookup, I see > > > > > >movl$32, %edx #, > > >movq%r13, %rsi # class, > > >leaq48(%rax), %rdi #, tmp126 > > >callstrscpy # > > > > > > what's the bug here? Look at that third argume8nt - %rdx. It is > > > initialized to 32. > > > > It's not a compiler bug, it's a bug in our strcpy(). > > Whoever wrote this strcpy() into strscpy() code apparently didn't > > read carefully > > enough gcc manual about __builtin_object_size(). > > > > Summary from https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking > > .html : > > > > __builtin_object_size(ptr, type) returns a constant number > > of bytes from 'ptr' to the end of the object 'ptr' > > pointer points to. "type" is an integer constant from 0 to > > 3. If the least significant bit is clear, objects > > are whole variables, if it is set, a closest surrounding > > subobject is considered the object a pointer points to. > > The second bit determines if maximum or minimum of remaining > > bytes is computed. > > > > We have type = 0 in strcpy(), so the least significant bit is clear. > > So the 'ptr' is considered as a pointer to the whole > > variable i.e. pointer to struct ip_map ip; > > And the number of bytes from 'ip.m_class' to the end of the ip > > object is exactly 32. > > > > I suppose that changing the type to 1 should fix this bug. > > Oh, that absolutely needs to be done. > > Because that "strcpy() -> strscpy()" conversion really depends on that > size being the right size (well, in this case minimal safe size) for > the actual accesses, exactly because "strscpy()" is perfectly willing > to write *past* the end of the destination string within that given > size limit (ie it reads and writes in the same 8-byte chunks). > > So if you have a small target string that is contained in a big > object, then the "hardened" strcpy() code can actually end up > overwriting things past the end of the strring, even if the string > itself were to have fit in the buffer. > > I note that every single use in string.h is buggy, and it worries me > that __compiletime_object_size() does this too. The only user of that > seems to be check_copy_size(), and now I'm a bit worried what that bug > may have hidden. > > I find "hardening" code that adds bugs to be particularly bad and > ugly, the same way that I absolutely *hate* debugging code that turns > out to make debugging impossible (we had that with the "better" stack > tracing code that caused kernel panics to kill the machine entirely > rather than show the backtrace, and I'm still bitter about it a decade > after the fact). > > There is something actively *evil* about it. Daniel, Kees, please jump > on this. > > Andrey, thanks for noticing this thing, > > Linus The issue is the usage of strscpy then, not the __builtin_object_size type parameter. The type is set 0 rather than 1 to be more lenient by not detecting intra-object overflow, which is going to come later. If strscpy treats the count parameter as a *guarantee* of the dest size rather than a limit, it's wrong to use it there, whether or not the type parameter for __builtin_object_size is 0 or 1 since it can still return a larger size. It's a limit with no guaranteed minimum. My initial patch used strlcpy there, because I wasn't aware of strscpy before it was suggested: http://www.openwall.com/lists/kernel-hardening/2017/05/04/11 I was wrong to move it to strscpy. It could be switched back to strlcpy again unless the kernel considers the count parameter to be a guarantee that could be leveraged in the future. Using the fortified strlen + memcpy would provide the improvement that strscpy was meant to provide there over strlcpy.
Re: [kernel-hardening] Re: [PATCH v3] mm: Add SLUB free list pointer obfuscation
On Thu, 2017-07-06 at 10:55 -0500, Christoph Lameter wrote: > On Thu, 6 Jul 2017, Kees Cook wrote: > > > On Thu, Jul 6, 2017 at 6:43 AM, Christoph Lameter> > wrote: > > > On Wed, 5 Jul 2017, Kees Cook wrote: > > > > > > > @@ -3536,6 +3565,9 @@ static int kmem_cache_open(struct > > > > kmem_cache *s, unsigned long flags) > > > > { > > > > s->flags = kmem_cache_flags(s->size, flags, s->name, s- > > > > >ctor); > > > > s->reserved = 0; > > > > +#ifdef CONFIG_SLAB_FREELIST_HARDENED > > > > + s->random = get_random_long(); > > > > +#endif > > > > > > > > if (need_reserve_slab_rcu && (s->flags & > > > > SLAB_TYPESAFE_BY_RCU)) > > > > s->reserved = sizeof(struct rcu_head); > > > > > > > > > > So if an attacker knows the internal structure of data then he can > > > simply > > > dereference page->kmem_cache->random to decode the freepointer. > > > > That requires a series of arbitrary reads. This is protecting > > against > > attacks that use an adjacent slab object write overflow to write the > > freelist pointer. This internal structure is very reliable, and has > > been the basis of freelist attacks against the kernel for a decade. > > These reads are not arbitrary. You can usually calculate the page > struct > address easily from the address and then do a couple of loads to get > there. You're describing an arbitrary read vulnerability: an attacker able to read the value of an address of their choosing. Requiring a powerful additional primitive rather than only a small fixed size overflow or a weak use-after-free vulnerability to use a common exploit vector is useful. A deterministic mitigation would be better, but I don't think an extra slab allocator for hardened kernels would be welcomed. Since there isn't a separate allocator for that niche, SLAB or SLUB are used. The ideal would be bitmaps in `struct page` but that implies another allocator, using single pages for the smallest size classes and potentially needing to bloat `struct page` even with that. There's definitely a limit to the hardening that can be done for SLUB, but unless forking it into a different allocator is welcome that's what will be suggested. Similarly, the slab freelist randomization feature is a much weaker mitigation than it could be without these constraints placed on it. This is much lower complexity than that and higher value though... > Ok so you get rid of the old attacks because we did not have that > hardening in effect when they designed their approaches? > > > It is a probabilistic defense, but then so is the stack protector. > > This is a similar defense; while not perfect it makes the class of > > attack much more difficult to mount. > > Na I am not convinced of the "much more difficult". Maybe they will > just > have to upgrade their approaches to fetch the proper values to decode. To fetch the values they would need an arbitrary read vulnerability or the ability to dump them via uninitialized slab allocations as an extra requirement. An attacker can similarly bypass the stack canary by reading them from stack frames via a stack buffer read overflow or uninitialized variable usage leaking stack data. On non-x86, at least with SMP, the stack canary is just a global variable that remains the same after initialization too. That doesn't make it useless, although the kernel doesn't have many linear overflows on the stack which is the real issue with it as a mitigation. Despite that, most people are using kernels with stack canaries, and that has a significant performance cost unlike these kinds of changes.
Re: [kernel-hardening] Re: [PATCH v3] mm: Add SLUB free list pointer obfuscation
On Thu, 2017-07-06 at 10:55 -0500, Christoph Lameter wrote: > On Thu, 6 Jul 2017, Kees Cook wrote: > > > On Thu, Jul 6, 2017 at 6:43 AM, Christoph Lameter > > wrote: > > > On Wed, 5 Jul 2017, Kees Cook wrote: > > > > > > > @@ -3536,6 +3565,9 @@ static int kmem_cache_open(struct > > > > kmem_cache *s, unsigned long flags) > > > > { > > > > s->flags = kmem_cache_flags(s->size, flags, s->name, s- > > > > >ctor); > > > > s->reserved = 0; > > > > +#ifdef CONFIG_SLAB_FREELIST_HARDENED > > > > + s->random = get_random_long(); > > > > +#endif > > > > > > > > if (need_reserve_slab_rcu && (s->flags & > > > > SLAB_TYPESAFE_BY_RCU)) > > > > s->reserved = sizeof(struct rcu_head); > > > > > > > > > > So if an attacker knows the internal structure of data then he can > > > simply > > > dereference page->kmem_cache->random to decode the freepointer. > > > > That requires a series of arbitrary reads. This is protecting > > against > > attacks that use an adjacent slab object write overflow to write the > > freelist pointer. This internal structure is very reliable, and has > > been the basis of freelist attacks against the kernel for a decade. > > These reads are not arbitrary. You can usually calculate the page > struct > address easily from the address and then do a couple of loads to get > there. You're describing an arbitrary read vulnerability: an attacker able to read the value of an address of their choosing. Requiring a powerful additional primitive rather than only a small fixed size overflow or a weak use-after-free vulnerability to use a common exploit vector is useful. A deterministic mitigation would be better, but I don't think an extra slab allocator for hardened kernels would be welcomed. Since there isn't a separate allocator for that niche, SLAB or SLUB are used. The ideal would be bitmaps in `struct page` but that implies another allocator, using single pages for the smallest size classes and potentially needing to bloat `struct page` even with that. There's definitely a limit to the hardening that can be done for SLUB, but unless forking it into a different allocator is welcome that's what will be suggested. Similarly, the slab freelist randomization feature is a much weaker mitigation than it could be without these constraints placed on it. This is much lower complexity than that and higher value though... > Ok so you get rid of the old attacks because we did not have that > hardening in effect when they designed their approaches? > > > It is a probabilistic defense, but then so is the stack protector. > > This is a similar defense; while not perfect it makes the class of > > attack much more difficult to mount. > > Na I am not convinced of the "much more difficult". Maybe they will > just > have to upgrade their approaches to fetch the proper values to decode. To fetch the values they would need an arbitrary read vulnerability or the ability to dump them via uninitialized slab allocations as an extra requirement. An attacker can similarly bypass the stack canary by reading them from stack frames via a stack buffer read overflow or uninitialized variable usage leaking stack data. On non-x86, at least with SMP, the stack canary is just a global variable that remains the same after initialization too. That doesn't make it useless, although the kernel doesn't have many linear overflows on the stack which is the real issue with it as a mitigation. Despite that, most people are using kernels with stack canaries, and that has a significant performance cost unlike these kinds of changes.
Re: [PATCH v2] kref: Avoid null pointer dereference after WARN
On Tue, 2017-06-27 at 12:34 -0700, Kees Cook wrote: > On Tue, Jun 27, 2017 at 12:26 PM, Jason A. Donenfeld> wrote: > > On Tue, Jun 27, 2017 at 9:22 PM, Andi Kleen > > wrote: > > > Who would actually set mman_min_addr incorrectly? > > > > Historically there have been quite a few bypasses of mmap_min_addr, > > actually. This is well-trodden ground. > > Targeting things in /proc/sys via confused privileged helpers is > extremely common. See Chrome OS pwn2own exploits (targetting modprobe > sysctl), and plenty of others. Modern attack methodology is rarely a > single-bug attack, but rather a chain of bugs, which may include > producing or exploiting weak userspace configurations to soften the > kernel. > > Regardless, it's a fair point that checking this unconditionally is > wasteful. Strangely this doesn't help: > > - BUG_ON(release == NULL); > + if (!__builtin_constant_p(release)) > + BUG_ON(release == NULL); > > When nearly all callers pass a function directly: > > ... > drivers/block/rbd.c:kref_put(>kref, rbd_spec_free); > drivers/char/hw_random/core.c: kref_put(>ref, > cleanup_rng); > drivers/char/ipmi/ipmi_msghandler.c: > kref_put(>intf->refcount, intf_free); > ... > > Hmmm > > -Kees It doesn't mean the address is constant if there's a fixed function being passed to it. It's not known at compile-time and if the code can be relocated it's not known at link-time. I don't personally care about checks like this but I split it out with some others just because it was there already. Clang has a nullability attribute which is similar to nonnull but it doesn't cause UB when violated, so if GCC picked that up it could be added all over the place as an annotation on parameters to trigger warnings. There's a sanitizer for it, so it can be made to trap with -fsanitize=nullability -fsanitize-trap=nullability.
Re: [PATCH v2] kref: Avoid null pointer dereference after WARN
On Tue, 2017-06-27 at 12:34 -0700, Kees Cook wrote: > On Tue, Jun 27, 2017 at 12:26 PM, Jason A. Donenfeld > wrote: > > On Tue, Jun 27, 2017 at 9:22 PM, Andi Kleen > > wrote: > > > Who would actually set mman_min_addr incorrectly? > > > > Historically there have been quite a few bypasses of mmap_min_addr, > > actually. This is well-trodden ground. > > Targeting things in /proc/sys via confused privileged helpers is > extremely common. See Chrome OS pwn2own exploits (targetting modprobe > sysctl), and plenty of others. Modern attack methodology is rarely a > single-bug attack, but rather a chain of bugs, which may include > producing or exploiting weak userspace configurations to soften the > kernel. > > Regardless, it's a fair point that checking this unconditionally is > wasteful. Strangely this doesn't help: > > - BUG_ON(release == NULL); > + if (!__builtin_constant_p(release)) > + BUG_ON(release == NULL); > > When nearly all callers pass a function directly: > > ... > drivers/block/rbd.c:kref_put(>kref, rbd_spec_free); > drivers/char/hw_random/core.c: kref_put(>ref, > cleanup_rng); > drivers/char/ipmi/ipmi_msghandler.c: > kref_put(>intf->refcount, intf_free); > ... > > Hmmm > > -Kees It doesn't mean the address is constant if there's a fixed function being passed to it. It's not known at compile-time and if the code can be relocated it's not known at link-time. I don't personally care about checks like this but I split it out with some others just because it was there already. Clang has a nullability attribute which is similar to nonnull but it doesn't cause UB when violated, so if GCC picked that up it could be added all over the place as an annotation on parameters to trigger warnings. There's a sanitizer for it, so it can be made to trap with -fsanitize=nullability -fsanitize-trap=nullability.
Re: [PATCH v2] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE
On Tue, 2017-06-27 at 16:49 +0200, Michal Hocko wrote: > On Wed 21-06-17 10:32:01, Kees Cook wrote: > > The ELF_ET_DYN_BASE position was originally intended to keep loaders > > away from ET_EXEC binaries. (For example, running "/lib/ld- > > linux.so.2 > > /bin/cat" might cause the subsequent load of /bin/cat into where the > > loader had been loaded.) With the advent of PIE (ET_DYN binaries > > with > > an INTERP Program Header), ELF_ET_DYN_BASE continued to be used > > since > > the kernel was only looking at ET_DYN. However, since > > ELF_ET_DYN_BASE > > is traditionally set at the top 1/3rd of the TASK_SIZE, a > > substantial > > portion of the address space is unused. > > > > For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs > > are loaded below the mmap region. This means they can be made to > > collide > > (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with > > pathological > > stack regions. Lowering ELF_ET_DYN_BASE solves both by moving > > programs > > above the mmap region in all cases, and will now additionally avoid > > programs falling back to the mmap region by enforcing MAP_FIXED for > > program loads (i.e. if it would have collided with the stack, now it > > will fail to load instead of falling back to the mmap region). > > I do not understand this part. MAP_FIXED will simply unmap whatever > was under the requested range, how it could help failing anything? So > what would happen if something was mapped in that region, or is this > impossible? Moreover MAP_FIXED close to stack will inhibit the stack > gap > protection. I don't think there's a reason to use MAP_FIXED. PaX likely ignores the address hint with RANDMMAP in that code, which would explain it there.
Re: [PATCH v2] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE
On Tue, 2017-06-27 at 16:49 +0200, Michal Hocko wrote: > On Wed 21-06-17 10:32:01, Kees Cook wrote: > > The ELF_ET_DYN_BASE position was originally intended to keep loaders > > away from ET_EXEC binaries. (For example, running "/lib/ld- > > linux.so.2 > > /bin/cat" might cause the subsequent load of /bin/cat into where the > > loader had been loaded.) With the advent of PIE (ET_DYN binaries > > with > > an INTERP Program Header), ELF_ET_DYN_BASE continued to be used > > since > > the kernel was only looking at ET_DYN. However, since > > ELF_ET_DYN_BASE > > is traditionally set at the top 1/3rd of the TASK_SIZE, a > > substantial > > portion of the address space is unused. > > > > For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs > > are loaded below the mmap region. This means they can be made to > > collide > > (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with > > pathological > > stack regions. Lowering ELF_ET_DYN_BASE solves both by moving > > programs > > above the mmap region in all cases, and will now additionally avoid > > programs falling back to the mmap region by enforcing MAP_FIXED for > > program loads (i.e. if it would have collided with the stack, now it > > will fail to load instead of falling back to the mmap region). > > I do not understand this part. MAP_FIXED will simply unmap whatever > was under the requested range, how it could help failing anything? So > what would happen if something was mapped in that region, or is this > impossible? Moreover MAP_FIXED close to stack will inhibit the stack > gap > protection. I don't think there's a reason to use MAP_FIXED. PaX likely ignores the address hint with RANDMMAP in that code, which would explain it there.
Re: [kernel-hardening] non-x86 per-task stack canaries
On Mon, 2017-06-26 at 14:04 -0700, Kees Cook wrote: > Hi, > > The stack protector functionality on x86_64 uses %gs:0x28 (%gs is the > percpu area) for __stack_chk_guard, and all other architectures use a > global variable instead. This means we never change the stack canary > on non-x86 architectures which allows for a leak in one task to expose > the canary in another task. > > I'm curious what thoughts people may have about how to get this > correctly implemented. Teaching the compiler about per-cpu data sounds > exciting. :) > > -Kees arm64 has many integer registers so I don't think reserving one would hurt performance, especially in the kernel where hot numeric loops barely exist. It would reduce the cost of SSP by getting rid of the memory read for the canary value. On the other hand, using per-cpu data would likely be higher cost than the global. x86 has segment registers but most archs probably need to do something more painful. It's safe as long as it's a callee-saved register. It should be enforced that there's no assembly spilling it and calling into C code without the random canary. There's very little assembly using registers like x28 so it wouldn't be that bad. It's possible there's one where nothing needs to be changed, there only needs to be a check to make sure it stays that way. It would be a step towards making SSP cheap enough to expand it into a feature like the StackGuard XOR canaries. Samsung has a return address XOR feature based on reserving a register and while RAP's probabilistic return address mitigation isn't open- source, it was stated that it reserves a register on x86_64 where they aren't as plentiful as arm64.
Re: [kernel-hardening] non-x86 per-task stack canaries
On Mon, 2017-06-26 at 14:04 -0700, Kees Cook wrote: > Hi, > > The stack protector functionality on x86_64 uses %gs:0x28 (%gs is the > percpu area) for __stack_chk_guard, and all other architectures use a > global variable instead. This means we never change the stack canary > on non-x86 architectures which allows for a leak in one task to expose > the canary in another task. > > I'm curious what thoughts people may have about how to get this > correctly implemented. Teaching the compiler about per-cpu data sounds > exciting. :) > > -Kees arm64 has many integer registers so I don't think reserving one would hurt performance, especially in the kernel where hot numeric loops barely exist. It would reduce the cost of SSP by getting rid of the memory read for the canary value. On the other hand, using per-cpu data would likely be higher cost than the global. x86 has segment registers but most archs probably need to do something more painful. It's safe as long as it's a callee-saved register. It should be enforced that there's no assembly spilling it and calling into C code without the random canary. There's very little assembly using registers like x28 so it wouldn't be that bad. It's possible there's one where nothing needs to be changed, there only needs to be a check to make sure it stays that way. It would be a step towards making SSP cheap enough to expand it into a feature like the StackGuard XOR canaries. Samsung has a return address XOR feature based on reserving a register and while RAP's probabilistic return address mitigation isn't open- source, it was stated that it reserves a register on x86_64 where they aren't as plentiful as arm64.
Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE
On Wed, 2017-06-21 at 10:28 -0700, Kees Cook wrote: > On Wed, Jun 21, 2017 at 10:27 AM, Daniel Micay <danielmi...@gmail.com> > wrote: > > On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote: > > > On Wed, Jun 21, 2017 at 5:07 AM, Rik van Riel <r...@redhat.com> > > > wrote: > > > > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote: > > > > > +/* This is the base location for PIE (ET_DYN with INTERP) > > > > > loads. > > > > > */ > > > > > +#define ELF_ET_DYN_BASE 0x40UL > > > > > > > > This value is good for 32 bit binaries, but for 64 > > > > bit binaries you probably want to put ELF_ET_DYN_BASE > > > > at 4GB or higher. > > > > > > > > The latter is necessary because Android uses the > > > > lower 4GB of address space for its JVM runtime, > > > > with 32 bit pointers inside that part of the otherwise > > > > 64 bit address space. > > > > > > > > In other words: > > > > > > > > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL : > > > > 0x1UL) > > > > > > Ah, interesting. Okay, that should be fine. I'll adjust it. > > > > > > > > +++ b/fs/binfmt_elf.c > > > > > > > > > > + * Therefore, programs are loaded offset > > > > > from > > > > > + * ELF_ET_DYN_BASE and loaders are > > > > > loaded > > > > > into the > > > > > + * independently randomized mmap region > > > > > (0 > > > > > load_bias > > > > > + * without MAP_FIXED). > > > > > + */ > > > > > + if (elf_interpreter) { > > > > > + load_bias = ELF_ET_DYN_BASE; > > > > > + if (current->flags & > > > > > PF_RANDOMIZE) > > > > > + load_bias += > > > > > arch_mmap_rnd(); > > > > > + elf_flags |= MAP_FIXED; > > > > > + } else > > > > > + load_bias = 0; > > > > > + > > > > > + load_bias -= vaddr; > > > > > > > > I like this, and the big comment telling people how it > > > > works :) > > > > > > Thanks! It looks like your patch for commenting load_bias never > > > got > > > picked up, so I've added some more comments for that and some > > > other > > > things too. (Mostly for all the stuff I have to read a few times > > > when > > > I look at this code.) > > > > > > -Kees > > > > > > > The stack rlimit calculation fix for space potentially lost to ASLR > > is > > probably still needed too, right? > > Yes. Was that picked up by akpm already? > > -Kees I think it was dropped when the ET_DYN changes were dropped.
Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE
On Wed, 2017-06-21 at 10:28 -0700, Kees Cook wrote: > On Wed, Jun 21, 2017 at 10:27 AM, Daniel Micay > wrote: > > On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote: > > > On Wed, Jun 21, 2017 at 5:07 AM, Rik van Riel > > > wrote: > > > > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote: > > > > > +/* This is the base location for PIE (ET_DYN with INTERP) > > > > > loads. > > > > > */ > > > > > +#define ELF_ET_DYN_BASE 0x40UL > > > > > > > > This value is good for 32 bit binaries, but for 64 > > > > bit binaries you probably want to put ELF_ET_DYN_BASE > > > > at 4GB or higher. > > > > > > > > The latter is necessary because Android uses the > > > > lower 4GB of address space for its JVM runtime, > > > > with 32 bit pointers inside that part of the otherwise > > > > 64 bit address space. > > > > > > > > In other words: > > > > > > > > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL : > > > > 0x1UL) > > > > > > Ah, interesting. Okay, that should be fine. I'll adjust it. > > > > > > > > +++ b/fs/binfmt_elf.c > > > > > > > > > > + * Therefore, programs are loaded offset > > > > > from > > > > > + * ELF_ET_DYN_BASE and loaders are > > > > > loaded > > > > > into the > > > > > + * independently randomized mmap region > > > > > (0 > > > > > load_bias > > > > > + * without MAP_FIXED). > > > > > + */ > > > > > + if (elf_interpreter) { > > > > > + load_bias = ELF_ET_DYN_BASE; > > > > > + if (current->flags & > > > > > PF_RANDOMIZE) > > > > > + load_bias += > > > > > arch_mmap_rnd(); > > > > > + elf_flags |= MAP_FIXED; > > > > > + } else > > > > > + load_bias = 0; > > > > > + > > > > > + load_bias -= vaddr; > > > > > > > > I like this, and the big comment telling people how it > > > > works :) > > > > > > Thanks! It looks like your patch for commenting load_bias never > > > got > > > picked up, so I've added some more comments for that and some > > > other > > > things too. (Mostly for all the stuff I have to read a few times > > > when > > > I look at this code.) > > > > > > -Kees > > > > > > > The stack rlimit calculation fix for space potentially lost to ASLR > > is > > probably still needed too, right? > > Yes. Was that picked up by akpm already? > > -Kees I think it was dropped when the ET_DYN changes were dropped.
Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE
On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote: > On Wed, Jun 21, 2017 at 5:07 AM, Rik van Rielwrote: > > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote: > > > +/* This is the base location for PIE (ET_DYN with INTERP) loads. > > > */ > > > +#define ELF_ET_DYN_BASE 0x40UL > > > > This value is good for 32 bit binaries, but for 64 > > bit binaries you probably want to put ELF_ET_DYN_BASE > > at 4GB or higher. > > > > The latter is necessary because Android uses the > > lower 4GB of address space for its JVM runtime, > > with 32 bit pointers inside that part of the otherwise > > 64 bit address space. > > > > In other words: > > > > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL : > > 0x1UL) > > Ah, interesting. Okay, that should be fine. I'll adjust it. > > > > +++ b/fs/binfmt_elf.c > > > > > > + * Therefore, programs are loaded offset > > > from > > > + * ELF_ET_DYN_BASE and loaders are loaded > > > into the > > > + * independently randomized mmap region (0 > > > load_bias > > > + * without MAP_FIXED). > > > + */ > > > + if (elf_interpreter) { > > > + load_bias = ELF_ET_DYN_BASE; > > > + if (current->flags & PF_RANDOMIZE) > > > + load_bias += > > > arch_mmap_rnd(); > > > + elf_flags |= MAP_FIXED; > > > + } else > > > + load_bias = 0; > > > + > > > + load_bias -= vaddr; > > > > I like this, and the big comment telling people how it > > works :) > > Thanks! It looks like your patch for commenting load_bias never got > picked up, so I've added some more comments for that and some other > things too. (Mostly for all the stuff I have to read a few times when > I look at this code.) > > -Kees > The stack rlimit calculation fix for space potentially lost to ASLR is probably still needed too, right?
Re: [kernel-hardening] [PATCH] [RFC] binfmt_elf: Use ELF_ET_DYN_BASE only for PIE
On Wed, 2017-06-21 at 10:23 -0700, Kees Cook wrote: > On Wed, Jun 21, 2017 at 5:07 AM, Rik van Riel wrote: > > On Tue, 2017-06-20 at 22:58 -0700, Kees Cook wrote: > > > +/* This is the base location for PIE (ET_DYN with INTERP) loads. > > > */ > > > +#define ELF_ET_DYN_BASE 0x40UL > > > > This value is good for 32 bit binaries, but for 64 > > bit binaries you probably want to put ELF_ET_DYN_BASE > > at 4GB or higher. > > > > The latter is necessary because Android uses the > > lower 4GB of address space for its JVM runtime, > > with 32 bit pointers inside that part of the otherwise > > 64 bit address space. > > > > In other words: > > > > #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x40UL : > > 0x1UL) > > Ah, interesting. Okay, that should be fine. I'll adjust it. > > > > +++ b/fs/binfmt_elf.c > > > > > > + * Therefore, programs are loaded offset > > > from > > > + * ELF_ET_DYN_BASE and loaders are loaded > > > into the > > > + * independently randomized mmap region (0 > > > load_bias > > > + * without MAP_FIXED). > > > + */ > > > + if (elf_interpreter) { > > > + load_bias = ELF_ET_DYN_BASE; > > > + if (current->flags & PF_RANDOMIZE) > > > + load_bias += > > > arch_mmap_rnd(); > > > + elf_flags |= MAP_FIXED; > > > + } else > > > + load_bias = 0; > > > + > > > + load_bias -= vaddr; > > > > I like this, and the big comment telling people how it > > works :) > > Thanks! It looks like your patch for commenting load_bias never got > picked up, so I've added some more comments for that and some other > things too. (Mostly for all the stuff I have to read a few times when > I look at this code.) > > -Kees > The stack rlimit calculation fix for space potentially lost to ASLR is probably still needed too, right?
Re: [kernel-hardening] [PATCH 23/23] mm: Allow slab_nomerge to be set at build time
On Mon, 2017-06-19 at 16:36 -0700, Kees Cook wrote: > Some hardened environments want to build kernels with slab_nomerge > already set (so that they do not depend on remembering to set the > kernel > command line option). This is desired to reduce the risk of kernel > heap > overflows being able to overwrite objects from merged caches, > increasing > the difficulty of these attacks. By keeping caches unmerged, these > kinds > of exploits can usually only damage objects in the same cache (though > the > risk to metadata exploitation is unchanged). It also further fragments the ability to influence slab cache layout, i.e. primitives to do things like filling up slabs to set things up for an exploit might not be able to deal with the target slabs anymore. It doesn't need to be mentioned but it's something to think about too. In theory, disabling merging can make it *easier* to get the right layout too if there was some annoyance that's now split away. It's definitely a lot more good than bad for security though, but allocator changes have subtle impact on exploitation. This can make caches more deterministic.
Re: [kernel-hardening] [PATCH 23/23] mm: Allow slab_nomerge to be set at build time
On Mon, 2017-06-19 at 16:36 -0700, Kees Cook wrote: > Some hardened environments want to build kernels with slab_nomerge > already set (so that they do not depend on remembering to set the > kernel > command line option). This is desired to reduce the risk of kernel > heap > overflows being able to overwrite objects from merged caches, > increasing > the difficulty of these attacks. By keeping caches unmerged, these > kinds > of exploits can usually only damage objects in the same cache (though > the > risk to metadata exploitation is unchanged). It also further fragments the ability to influence slab cache layout, i.e. primitives to do things like filling up slabs to set things up for an exploit might not be able to deal with the target slabs anymore. It doesn't need to be mentioned but it's something to think about too. In theory, disabling merging can make it *easier* to get the right layout too if there was some annoyance that's now split away. It's definitely a lot more good than bad for security though, but allocator changes have subtle impact on exploitation. This can make caches more deterministic.
[PATCH v5] add the option of fortified string.h functions
This adds support for compiling with a rough equivalent to the glibc _FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer overflow checks for string.h functions when the compiler determines the size of the source or destination buffer at compile-time. Unlike glibc, it covers buffer reads in addition to writes. GNU C __builtin_*_chk intrinsics are avoided because they would force a much more complex implementation. They aren't designed to detect read overflows and offer no real benefit when using an implementation based on inline checks. Inline checks don't add up to much code size and allow full use of the regular string intrinsics while avoiding the need for a bunch of _chk functions and per-arch assembly to avoid wrapper overhead. This detects various overflows at compile-time in various drivers and some non-x86 core kernel code. There will likely be issues caught in regular use at runtime too. Future improvements left out of initial implementation for simplicity, as it's all quite optional and can be done incrementally: * Some of the fortified string functions (strncpy, strcat), don't yet place a limit on reads from the source based on __builtin_object_size of the source buffer. * Extending coverage to more string functions like strlcat. * It should be possible to optionally use __builtin_object_size(x, 1) for some functions (C strings) to detect intra-object overflows (like glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative approach to avoid likely compatibility issues. * The compile-time checks should be made available via a separate config option which can be enabled by default (or always enabled) once enough time has passed to get the issues it catches fixed. Signed-off-by: Daniel Micay <danielmi...@gmail.com> --- Changes since v4: - avoid overly aggressive strnlen check for non-null-terminated strings arch/arm64/include/asm/string.h | 5 + arch/x86/boot/compressed/misc.c | 5 + arch/x86/include/asm/string_32.h | 9 ++ arch/x86/include/asm/string_64.h | 7 ++ include/linux/string.h | 200 +++ lib/string.c | 6 ++ security/Kconfig | 6 ++ 7 files changed, 238 insertions(+) diff --git a/arch/arm64/include/asm/string.h b/arch/arm64/include/asm/string.h index 2eb714c4639f..d0aa42907569 100644 --- a/arch/arm64/include/asm/string.h +++ b/arch/arm64/include/asm/string.h @@ -63,6 +63,11 @@ extern int memcmp(const void *, const void *, size_t); #define memcpy(dst, src, len) __memcpy(dst, src, len) #define memmove(dst, src, len) __memmove(dst, src, len) #define memset(s, c, n) __memset(s, c, n) + +#ifndef __NO_FORTIFY +#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */ +#endif + #endif #endif diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index b3c5a5f030ce..43691238a21d 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -409,3 +409,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, debug_putstr("done.\nBooting the kernel.\n"); return output; } + +void fortify_panic(const char *name) +{ + error("detected buffer overflow"); +} diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h index 3d3e8353ee5c..e9ee84873de5 100644 --- a/arch/x86/include/asm/string_32.h +++ b/arch/x86/include/asm/string_32.h @@ -142,7 +142,9 @@ static __always_inline void *__constant_memcpy(void *to, const void *from, } #define __HAVE_ARCH_MEMCPY +extern void *memcpy(void *, const void *, size_t); +#ifndef CONFIG_FORTIFY_SOURCE #ifdef CONFIG_X86_USE_3DNOW #include @@ -195,11 +197,15 @@ static inline void *__memcpy3d(void *to, const void *from, size_t len) #endif #endif +#endif /* !CONFIG_FORTIFY_SOURCE */ #define __HAVE_ARCH_MEMMOVE void *memmove(void *dest, const void *src, size_t n); +extern int memcmp(const void *, const void *, size_t); +#ifndef CONFIG_FORTIFY_SOURCE #define memcmp __builtin_memcmp +#endif #define __HAVE_ARCH_MEMCHR extern void *memchr(const void *cs, int c, size_t count); @@ -321,6 +327,8 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern, : __memset_generic((s), (c), (count))) #define __HAVE_ARCH_MEMSET +extern void *memset(void *, int, size_t); +#ifndef CONFIG_FORTIFY_SOURCE #if (__GNUC__ >= 4) #define memset(s, c, count) __builtin_memset(s, c, count) #else @@ -330,6 +338,7 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern, (count)) \ : __memset((s), (c), (count))) #endif +#endif /* !CONFIG_FORTIFY_SOURCE */ /* * find the first occurrence of byte 'c', or 1 past the area if none diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index 733bae07fb29..309fe644569f 100644 --- a/arch/x86/