Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
On Thu, 3 Dec 2020, Florian Weimer wrote: > My knowledge of probability theory is quite limited, so I have to rely > on simulations. But I think you would see a 40 GiB gap somewhere for a > 47-bit address space with 32K allocations, most of the time. Which is > not too bad. This is very close to a Poisson process (if the number of small allocations being distributed independently in the address space is large), so the probability that any given gap is at least x times the mean gap is about exp(-x). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
On Thu, Dec 03, 2020 at 09:42:54AM -0800, Andy Lutomirski wrote: > I suspect that something much more clever could be done in which the heap is > divided up into a few independently randomized sections and heap pages are > randomized within the sections might do much better. There should certainly > be a lot of room for something between what we have now and a fully > randomized scheme. > > It might also be worth looking at what other OSes do. How about dividing the address space up into 1GB sections (or, rather, PUD_SIZE sections), allocating from each one until it's 50% full, then choose another one? Sufficiently large allocations would ignore this division and just look for any space. I'm thinking something like the slab allocator (so the 1GB chunk would go back into the allocatable list when >50% of it was empty). That might strike a happy medium between full randomisation and efficient use of page tables / leaving large chunks of address space free for large mmaps.
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
> On Dec 3, 2020, at 9:29 AM, Florian Weimer wrote: > > * Andy Lutomirski: > >> If you want a 4GB allocation to succeed, you can only divide the >> address space into 32k fragments. Or, a little more precisely, if you >> want a randomly selected 4GB region to be empty, any other allocation >> has a 1/32k chance of being in the way. (Rough numbers — I’m ignoring >> effects of the beginning and end of the address space, and I’m >> ignoring the size of a potential conflicting allocation.). > > I think the probability distribution is way more advantageous than that > because it is unlikely that 32K allocations are all exactly spaced 4 GB > apart. (And with 32K allocations, you are close to the VMA limit anyway.) I’m assuming the naive algorithm of choosing an address and trying it. Actually looking for a big enough gap would be more reliable. I suspect that something much more clever could be done in which the heap is divided up into a few independently randomized sections and heap pages are randomized within the sections might do much better. There should certainly be a lot of room for something between what we have now and a fully randomized scheme. It might also be worth looking at what other OSes do.
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
* Andy Lutomirski: > If you want a 4GB allocation to succeed, you can only divide the > address space into 32k fragments. Or, a little more precisely, if you > want a randomly selected 4GB region to be empty, any other allocation > has a 1/32k chance of being in the way. (Rough numbers — I’m ignoring > effects of the beginning and end of the address space, and I’m > ignoring the size of a potential conflicting allocation.). I think the probability distribution is way more advantageous than that because it is unlikely that 32K allocations are all exactly spaced 4 GB apart. (And with 32K allocations, you are close to the VMA limit anyway.) My knowledge of probability theory is quite limited, so I have to rely on simulations. But I think you would see a 40 GiB gap somewhere for a 47-bit address space with 32K allocations, most of the time. Which is not too bad. But even with a 47 bit address space and just 500 threads (each with at least a stack and local heap, randomized indepently), simulations suggestion that the largest gap is often just 850 GB. At that point, you can't expect to map your NVDIMM (or whatever) in a single mapping anymore, and you have to code around that. Not randomizing large allocations and sacrificing one bit of randomness for small allocations would avoid this issue, though. (I still expect page walking performance to suffer drastically, with or without this tweak. I assume page walking uses the CPU cache hierarchy today, and with full randomization, accessing page entry at each level after a TLB miss would result in a data cache miss. But then, I'm firmly a software person.) Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
> On Dec 3, 2020, at 4:06 AM, Topi Miettinen wrote: > > On 3.12.2020 11.47, Florian Weimer wrote: >> * Topi Miettinen: >>> +3 Additionally enable full randomization of memory mappings created >>> +with mmap(NULL, ...). With 2, the base of the VMA used for such >>> +mappings is random, but the mappings are created in predictable >>> +places within the VMA and in sequential order. With 3, new VMAs >>> +are created to fully randomize the mappings. >>> + >>> +Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if >>> +not necessary and the location of stack and vdso are also >>> +randomized. >>> + >>> +On 32 bit systems this may cause problems due to increased VM >>> +fragmentation if the address space gets crowded. >> Isn't this a bit of an understatement? I think you'll have to restrict >> this randomization to a subregion of the entire address space, otherwise >> the reduction in maximum mapping size due to fragmentation will be a >> problem on 64-bit architectures as well (which generally do not support >> the full 64 bits for user-space addresses). > > Restricting randomization would reduce the address space layout randomization > and make this less useful. There's 48 or 56 bits, which translate to 128TB > and 64PB of VM for user applications. Is it really possible to build today > (or in near future) a system, which would contain so much RAM that such > fragmentation could realistically happen? Perhaps also in a special case > where lots of 1GB huge pages are necessary? Maybe in those cases you > shouldn't use randomize_va_space=3. Or perhaps there could be > randomize_va_space=3 which does something, and randomize_va_space=4 for those > who want maximum randomization. If you want a 4GB allocation to succeed, you can only divide the address space into 32k fragments. Or, a little more precisely, if you want a randomly selected 4GB region to be empty, any other allocation has a 1/32k chance of being in the way. (Rough numbers — I’m ignoring effects of the beginning and end of the address space, and I’m ignoring the size of a potential conflicting allocation.). This sounds good, except that a program could easily make a whole bunch of tiny allocations that get merged in current kernels but wouldn’t with your scheme. So maybe this is okay, but it’s not likely to be a good default. > >>> +On all systems, it will reduce performance and increase memory >>> +usage due to less efficient use of page tables and inability to >>> +merge adjacent VMAs with compatible attributes. In the worst case, >>> +additional page table entries of up to 4 pages are created for >>> +each mapping, so with small mappings there's considerable penalty. >> The number 4 is architecture-specific, right? > > Yes, I only know x86_64. Actually it could have 5 level page tables. I'll fix > this in next version. > > -Topi
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
On 3.12.2020 11.47, Florian Weimer wrote: * Topi Miettinen: +3 Additionally enable full randomization of memory mappings created +with mmap(NULL, ...). With 2, the base of the VMA used for such +mappings is random, but the mappings are created in predictable +places within the VMA and in sequential order. With 3, new VMAs +are created to fully randomize the mappings. + +Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if +not necessary and the location of stack and vdso are also +randomized. + +On 32 bit systems this may cause problems due to increased VM +fragmentation if the address space gets crowded. Isn't this a bit of an understatement? I think you'll have to restrict this randomization to a subregion of the entire address space, otherwise the reduction in maximum mapping size due to fragmentation will be a problem on 64-bit architectures as well (which generally do not support the full 64 bits for user-space addresses). Restricting randomization would reduce the address space layout randomization and make this less useful. There's 48 or 56 bits, which translate to 128TB and 64PB of VM for user applications. Is it really possible to build today (or in near future) a system, which would contain so much RAM that such fragmentation could realistically happen? Perhaps also in a special case where lots of 1GB huge pages are necessary? Maybe in those cases you shouldn't use randomize_va_space=3. Or perhaps there could be randomize_va_space=3 which does something, and randomize_va_space=4 for those who want maximum randomization. +On all systems, it will reduce performance and increase memory +usage due to less efficient use of page tables and inability to +merge adjacent VMAs with compatible attributes. In the worst case, +additional page table entries of up to 4 pages are created for +each mapping, so with small mappings there's considerable penalty. The number 4 is architecture-specific, right? Yes, I only know x86_64. Actually it could have 5 level page tables. I'll fix this in next version. -Topi
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
* Topi Miettinen: > +3 Additionally enable full randomization of memory mappings created > +with mmap(NULL, ...). With 2, the base of the VMA used for such > +mappings is random, but the mappings are created in predictable > +places within the VMA and in sequential order. With 3, new VMAs > +are created to fully randomize the mappings. > + > +Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if > +not necessary and the location of stack and vdso are also > +randomized. > + > +On 32 bit systems this may cause problems due to increased VM > +fragmentation if the address space gets crowded. Isn't this a bit of an understatement? I think you'll have to restrict this randomization to a subregion of the entire address space, otherwise the reduction in maximum mapping size due to fragmentation will be a problem on 64-bit architectures as well (which generally do not support the full 64 bits for user-space addresses). > +On all systems, it will reduce performance and increase memory > +usage due to less efficient use of page tables and inability to > +merge adjacent VMAs with compatible attributes. In the worst case, > +additional page table entries of up to 4 pages are created for > +each mapping, so with small mappings there's considerable penalty. The number 4 is architecture-specific, right? Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
On 30.11.2020 19.57, Andy Lutomirski wrote: On Sun, Nov 29, 2020 at 1:20 PM Topi Miettinen wrote: Writing a new value of 3 to /proc/sys/kernel/randomize_va_space enables full randomization of memory mappings created with mmap(NULL, ...). With 2, the base of the VMA used for such mappings is random, but the mappings are created in predictable places within the VMA and in sequential order. With 3, new VMAs are created to fully randomize the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not necessary and the location of stack and vdso are also randomized. The method is to randomize the new address without considering VMAs. If the address fails checks because of overlap with the stack area (or in case of mremap(), overlap with the old mapping), the operation is retried a few times before falling back to old method. On 32 bit systems this may cause problems due to increased VM fragmentation if the address space gets crowded. On all systems, it will reduce performance and increase memory usage due to less efficient use of page tables and inability to merge adjacent VMAs with compatible attributes. In the worst case, additional page table entries of up to 4 pages are created for each mapping, so with small mappings there's considerable penalty. In this example with sysctl.kernel.randomize_va_space = 2, dynamic loader, libc, anonymous memory reserved with mmap() and locale-archive are located close to each other: $ cat /proc/self/maps (only first line for each object shown for brevity) 5acea452d000-5acea452f000 r--p fe:0c 1868624 /usr/bin/cat 74f438f9-74f4394f2000 r--p fe:0c 2473999 /usr/lib/locale/locale-archive 74f4394f2000-74f4395f2000 rw-p 00:00 0 74f4395f2000-74f439617000 r--p fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so 74f4397b3000-74f4397b9000 rw-p 00:00 0 74f4397e5000-74f4397e6000 r--p fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so 74f439811000-74f439812000 rw-p 00:00 0 7fffdca0d000-7fffdca2e000 rw-p 00:00 0 [stack] 7fffdcb49000-7fffdcb4d000 r--p 00:00 0 [vvar] 7fffdcb4d000-7fffdcb4f000 r-xp 00:00 0 [vdso] With sysctl.kernel.randomize_va_space = 3, they are located at unrelated addresses and the order is random: $ echo 3 > /proc/sys/kernel/randomize_va_space $ cat /proc/self/maps (only first line for each object shown for brevity) 385052-385062 rw-p 00:00 0 28cfb4c8000-28cfb4cc000 r--p 00:00 0[vvar] 28cfb4cc000-28cfb4ce000 r-xp 00:00 0[vdso] 9e74c385000-9e74c387000 rw-p 00:00 0 a42e0233000-a42e0234000 r--p fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so a42e025f000-a42e026 rw-p 00:00 0 bea40427000-bea4044c000 r--p fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so bea405e8000-bea405ec000 rw-p 00:00 0 f6d446fa000-f6d44c5c000 r--p fe:0c 2473999 /usr/lib/locale/locale-archive fcfbf684000-fcfbf6a5000 rw-p 00:00 0[stack] 619aba62d000-619aba62f000 r--p fe:0c 1868624 /usr/bin/cat CC: Andrew Morton CC: Jann Horn CC: Kees Cook CC: Matthew Wilcox CC: Mike Rapoport CC: Linux API Signed-off-by: Topi Miettinen --- v2: also randomize mremap(..., MREMAP_MAYMOVE) v3: avoid stack area and retry in case of bad random address (Jann Horn), improve description in kernel.rst (Matthew Wilcox) v4: - use /proc/$pid/maps in the example (Mike Rapaport) - CCs (Andrew Morton) - only check randomize_va_space == 3 v5: randomize also vdso and stack --- Documentation/admin-guide/hw-vuln/spectre.rst | 6 ++-- Documentation/admin-guide/sysctl/kernel.rst | 20 + arch/x86/entry/vdso/vma.c | 26 +++- include/linux/mm.h| 8 + init/Kconfig | 2 +- mm/mmap.c | 30 +-- mm/mremap.c | 27 + mm/util.c | 6 8 files changed, 111 insertions(+), 14 deletions(-) diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index e05e581af5cf..9ea250522077 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -254,7 +254,7 @@ Spectre variant 2 left by the previous process will also be cleared. User programs should use address space randomization to make attacks - more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2). + more difficult (Set /proc/sys/kernel/randomize_va_space =
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
On Sun, Nov 29, 2020 at 1:20 PM Topi Miettinen wrote: > > Writing a new value of 3 to /proc/sys/kernel/randomize_va_space > enables full randomization of memory mappings created with mmap(NULL, > ...). With 2, the base of the VMA used for such mappings is random, > but the mappings are created in predictable places within the VMA and > in sequential order. With 3, new VMAs are created to fully randomize > the mappings. > > Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not > necessary and the location of stack and vdso are also randomized. > > The method is to randomize the new address without considering > VMAs. If the address fails checks because of overlap with the stack > area (or in case of mremap(), overlap with the old mapping), the > operation is retried a few times before falling back to old method. > > On 32 bit systems this may cause problems due to increased VM > fragmentation if the address space gets crowded. > > On all systems, it will reduce performance and increase memory usage > due to less efficient use of page tables and inability to merge > adjacent VMAs with compatible attributes. In the worst case, > additional page table entries of up to 4 pages are created for each > mapping, so with small mappings there's considerable penalty. > > In this example with sysctl.kernel.randomize_va_space = 2, dynamic > loader, libc, anonymous memory reserved with mmap() and locale-archive > are located close to each other: > > $ cat /proc/self/maps (only first line for each object shown for brevity) > 5acea452d000-5acea452f000 r--p fe:0c 1868624 > /usr/bin/cat > 74f438f9-74f4394f2000 r--p fe:0c 2473999 > /usr/lib/locale/locale-archive > 74f4394f2000-74f4395f2000 rw-p 00:00 0 > 74f4395f2000-74f439617000 r--p fe:0c 2402332 > /usr/lib/x86_64-linux-gnu/libc-2.31.so > 74f4397b3000-74f4397b9000 rw-p 00:00 0 > 74f4397e5000-74f4397e6000 r--p fe:0c 2400754 > /usr/lib/x86_64-linux-gnu/ld-2.31.so > 74f439811000-74f439812000 rw-p 00:00 0 > 7fffdca0d000-7fffdca2e000 rw-p 00:00 0 > [stack] > 7fffdcb49000-7fffdcb4d000 r--p 00:00 0 > [vvar] > 7fffdcb4d000-7fffdcb4f000 r-xp 00:00 0 > [vdso] > > With sysctl.kernel.randomize_va_space = 3, they are located at > unrelated addresses and the order is random: > > $ echo 3 > /proc/sys/kernel/randomize_va_space > $ cat /proc/self/maps (only first line for each object shown for brevity) > 385052-385062 rw-p 00:00 0 > 28cfb4c8000-28cfb4cc000 r--p 00:00 0 > [vvar] > 28cfb4cc000-28cfb4ce000 r-xp 00:00 0 > [vdso] > 9e74c385000-9e74c387000 rw-p 00:00 0 > a42e0233000-a42e0234000 r--p fe:0c 2400754 > /usr/lib/x86_64-linux-gnu/ld-2.31.so > a42e025f000-a42e026 rw-p 00:00 0 > bea40427000-bea4044c000 r--p fe:0c 2402332 > /usr/lib/x86_64-linux-gnu/libc-2.31.so > bea405e8000-bea405ec000 rw-p 00:00 0 > f6d446fa000-f6d44c5c000 r--p fe:0c 2473999 > /usr/lib/locale/locale-archive > fcfbf684000-fcfbf6a5000 rw-p 00:00 0 > [stack] > 619aba62d000-619aba62f000 r--p fe:0c 1868624 > /usr/bin/cat > > CC: Andrew Morton > CC: Jann Horn > CC: Kees Cook > CC: Matthew Wilcox > CC: Mike Rapoport > CC: Linux API > Signed-off-by: Topi Miettinen > --- > v2: also randomize mremap(..., MREMAP_MAYMOVE) > v3: avoid stack area and retry in case of bad random address (Jann > Horn), improve description in kernel.rst (Matthew Wilcox) > v4: > - use /proc/$pid/maps in the example (Mike Rapaport) > - CCs (Andrew Morton) > - only check randomize_va_space == 3 > v5: randomize also vdso and stack > --- > Documentation/admin-guide/hw-vuln/spectre.rst | 6 ++-- > Documentation/admin-guide/sysctl/kernel.rst | 20 + > arch/x86/entry/vdso/vma.c | 26 +++- > include/linux/mm.h| 8 + > init/Kconfig | 2 +- > mm/mmap.c | 30 +-- > mm/mremap.c | 27 + > mm/util.c | 6 > 8 files changed, 111 insertions(+), 14 deletions(-) > > diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst > b/Documentation/admin-guide/hw-vuln/spectre.rst > index e05e581af5cf..9ea250522077 100644 > --- a/Documentation/admin-guide/hw-vuln/spectre.rst > +++ b/Documentation/admin-guide/hw-vuln/spectre.rst > @@ -254,7 +254,7 @@ Spectre variant 2 > left by the previous process will also be cleared. > > User programs should use address space
Re: [PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
Hi Topi, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on aae5ab854e38151e69f261dbf0e3b7e396403178] url: https://github.com/0day-ci/linux/commits/Topi-Miettinen/mm-Optional-full-ASLR-for-mmap-mremap-vdso-and-stack/20201130-051703 base:aae5ab854e38151e69f261dbf0e3b7e396403178 config: x86_64-randconfig-a002-20201130 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project f502b14d40e751fe00afc493ef0d08f196524886) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # https://github.com/0day-ci/linux/commit/c06384c5cecf700db214c69a4565c41a4c4fad82 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Topi-Miettinen/mm-Optional-full-ASLR-for-mmap-mremap-vdso-and-stack/20201130-051703 git checkout c06384c5cecf700db214c69a4565c41a4c4fad82 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): arch/x86/entry/vdso/vma.c:38:19: warning: no previous prototype for function 'arch_get_vdso_data' [-Wmissing-prototypes] struct vdso_data *arch_get_vdso_data(void *vvar_page) ^ arch/x86/entry/vdso/vma.c:38:1: note: declare 'static' if the function is not intended to be used outside of this translation unit struct vdso_data *arch_get_vdso_data(void *vvar_page) ^ static >> arch/x86/entry/vdso/vma.c:382:9: warning: cast to 'void *' from smaller >> integer type 'int' [-Wint-to-void-pointer-cast] if (!IS_ERR_VALUE(ret)) ^ include/linux/err.h:22:49: note: expanded from macro 'IS_ERR_VALUE' #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO) ^ include/linux/compiler.h:48:41: note: expanded from macro 'unlikely' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ~^~ include/linux/compiler.h:33:34: note: expanded from macro '__branch_check__' __r = __builtin_expect(!!(x), expect); \ ^ >> arch/x86/entry/vdso/vma.c:382:9: warning: cast to 'void *' from smaller >> integer type 'int' [-Wint-to-void-pointer-cast] if (!IS_ERR_VALUE(ret)) ^ include/linux/err.h:22:49: note: expanded from macro 'IS_ERR_VALUE' #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO) ^ include/linux/compiler.h:48:68: note: expanded from macro 'unlikely' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^~~ include/linux/compiler.h:35:19: note: expanded from macro '__branch_check__' expect, is_constant); \ ^~~ 3 warnings generated. vim +382 arch/x86/entry/vdso/vma.c 364 365 static int map_vdso_randomized(const struct vdso_image *image) 366 { 367 unsigned long addr; 368 369 if (randomize_va_space == 3) { 370 /* 371 * Randomize vdso address. 372 */ 373 int i = MAX_RANDOM_VDSO_RETRIES; 374 375 do { 376 int ret; 377 378 /* Try a few times to find a free area */ 379 addr = arch_mmap_rnd(); 380 381 ret = map_vdso(image, addr); > 382 if (!IS_ERR_VALUE(ret)) 383 return ret; 384 } while (--i >= 0); 385 386 /* Give up and try the less random way */ 387 } 388 addr = vdso_addr(current->mm->start_stack, image->size-image->sym_vvar_start); 389 390 return map_vdso(image, addr); 391 } 392 #endif 393 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH v5] mm: Optional full ASLR for mmap(), mremap(), vdso and stack
Writing a new value of 3 to /proc/sys/kernel/randomize_va_space enables full randomization of memory mappings created with mmap(NULL, ...). With 2, the base of the VMA used for such mappings is random, but the mappings are created in predictable places within the VMA and in sequential order. With 3, new VMAs are created to fully randomize the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not necessary and the location of stack and vdso are also randomized. The method is to randomize the new address without considering VMAs. If the address fails checks because of overlap with the stack area (or in case of mremap(), overlap with the old mapping), the operation is retried a few times before falling back to old method. On 32 bit systems this may cause problems due to increased VM fragmentation if the address space gets crowded. On all systems, it will reduce performance and increase memory usage due to less efficient use of page tables and inability to merge adjacent VMAs with compatible attributes. In the worst case, additional page table entries of up to 4 pages are created for each mapping, so with small mappings there's considerable penalty. In this example with sysctl.kernel.randomize_va_space = 2, dynamic loader, libc, anonymous memory reserved with mmap() and locale-archive are located close to each other: $ cat /proc/self/maps (only first line for each object shown for brevity) 5acea452d000-5acea452f000 r--p fe:0c 1868624 /usr/bin/cat 74f438f9-74f4394f2000 r--p fe:0c 2473999 /usr/lib/locale/locale-archive 74f4394f2000-74f4395f2000 rw-p 00:00 0 74f4395f2000-74f439617000 r--p fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so 74f4397b3000-74f4397b9000 rw-p 00:00 0 74f4397e5000-74f4397e6000 r--p fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so 74f439811000-74f439812000 rw-p 00:00 0 7fffdca0d000-7fffdca2e000 rw-p 00:00 0 [stack] 7fffdcb49000-7fffdcb4d000 r--p 00:00 0 [vvar] 7fffdcb4d000-7fffdcb4f000 r-xp 00:00 0 [vdso] With sysctl.kernel.randomize_va_space = 3, they are located at unrelated addresses and the order is random: $ echo 3 > /proc/sys/kernel/randomize_va_space $ cat /proc/self/maps (only first line for each object shown for brevity) 385052-385062 rw-p 00:00 0 28cfb4c8000-28cfb4cc000 r--p 00:00 0[vvar] 28cfb4cc000-28cfb4ce000 r-xp 00:00 0[vdso] 9e74c385000-9e74c387000 rw-p 00:00 0 a42e0233000-a42e0234000 r--p fe:0c 2400754 /usr/lib/x86_64-linux-gnu/ld-2.31.so a42e025f000-a42e026 rw-p 00:00 0 bea40427000-bea4044c000 r--p fe:0c 2402332 /usr/lib/x86_64-linux-gnu/libc-2.31.so bea405e8000-bea405ec000 rw-p 00:00 0 f6d446fa000-f6d44c5c000 r--p fe:0c 2473999 /usr/lib/locale/locale-archive fcfbf684000-fcfbf6a5000 rw-p 00:00 0[stack] 619aba62d000-619aba62f000 r--p fe:0c 1868624 /usr/bin/cat CC: Andrew Morton CC: Jann Horn CC: Kees Cook CC: Matthew Wilcox CC: Mike Rapoport CC: Linux API Signed-off-by: Topi Miettinen --- v2: also randomize mremap(..., MREMAP_MAYMOVE) v3: avoid stack area and retry in case of bad random address (Jann Horn), improve description in kernel.rst (Matthew Wilcox) v4: - use /proc/$pid/maps in the example (Mike Rapaport) - CCs (Andrew Morton) - only check randomize_va_space == 3 v5: randomize also vdso and stack --- Documentation/admin-guide/hw-vuln/spectre.rst | 6 ++-- Documentation/admin-guide/sysctl/kernel.rst | 20 + arch/x86/entry/vdso/vma.c | 26 +++- include/linux/mm.h| 8 + init/Kconfig | 2 +- mm/mmap.c | 30 +-- mm/mremap.c | 27 + mm/util.c | 6 8 files changed, 111 insertions(+), 14 deletions(-) diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index e05e581af5cf..9ea250522077 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -254,7 +254,7 @@ Spectre variant 2 left by the previous process will also be cleared. User programs should use address space randomization to make attacks - more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2). + more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3). 3. A virtualized guest attacking the host ^ @@ -499,8