Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote: > Subject: i386: map enough initial memory to create lowmem mappings > > head.S creates the very initial pagetable for the kernel. This just > maps enough space for the kernel itself, and an allocation bitmap. > The amount of mapped memory is rounded up to 4Mbytes, and so this > typically ends up mapping 8Mbytes of memory. > > When booting, pagetable_init() needs to create mappings for all > lowmem, and the pagetables for these mappings are allocated from the > free pages around the kernel in low memory. If the number of > pagetable pages + kernel size exceeds head.S's initial mapping, it > will end up faulting on an unmapped page. This will only happen with > specific combinations of kernel size and memory size. > > This patch makes sure that head.S also maps enough space to fit the > kernel pagetables as well as the kernel itself. It ends up using an > additional two pages of unreclaimable memory. Yup, fixes it up here, nice catch. Acked-by: Chris Wright <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
Jeremy Fitzhardinge wrote: H. Peter Anvin wrote: Even with PSE? Perhaps not. However, the main reason I wanted it done that way is to avoid cargo cult programming; this makes it much clearer where the numbers actually come from. Well, how about this then? I like. Acked-by: H. Peter Anvin <[EMAIL PROTECTED]> -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
H. Peter Anvin wrote: #ifdef CONFIG_X86_PAE PAGE_TABLE_SIZE = (2048+4)*4096 #else PAGE_TABLE_SIZE = (1024+1)*4096 #endif BOOTMEM_SIZE = 128*1024 /* ACPI and SMP trampoline allocate bootmem pages before paging_init */ #ifdef CONFIG_SMP SMP_BOOTMEM_EARLY= 1 #else SMP_BOOTMEM_EARLY= 0 #endif #ifdef CONFIG_ACPI ACPI_BOOTMEM_EARLY = 1 #else ACPI_BOOTMEM_EARLY = 0 #endif INIT_MAP_BEYOND_END = BOOTMEM_SIZE + PAGE_TABLE_SIZE +SMP_BOOTMEM_EARLY + ACPI_BOOTMEM_EARLY Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
H. Peter Anvin wrote: > Even with PSE? Perhaps not. > However, the main reason I wanted it done that way is to avoid cargo > cult programming; this makes it much clearer where the numbers > actually come from. Well, how about this then? Subject: i386: map enough initial memory to create lowmem mappings head.S creates the very initial pagetable for the kernel. This just maps enough space for the kernel itself, and an allocation bitmap. The amount of mapped memory is rounded up to 4Mbytes, and so this typically ends up mapping 8Mbytes of memory. When booting, pagetable_init() needs to create mappings for all lowmem, and the pagetables for these mappings are allocated from the free pages around the kernel in low memory. If the number of pagetable pages + kernel size exceeds head.S's initial mapping, it will end up faulting on an unmapped page. This will only happen with specific combinations of kernel size and memory size. This patch makes sure that head.S also maps enough space to fit the kernel pagetables as well as the kernel itself. It ends up using an additional two pages of unreclaimable memory. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Zachary Amsden <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: "Eric W. Biederman" <[EMAIL PROTECTED]> Cc: "H. Peter Anvin" <[EMAIL PROTECTED]> Cc: Linus Torvalds <[EMAIL PROTECTED]>, --- arch/i386/kernel/asm-offsets.c |5 + arch/i386/kernel/head.S| 25 - 2 files changed, 25 insertions(+), 5 deletions(-) === --- a/arch/i386/kernel/asm-offsets.c +++ b/arch/i386/kernel/asm-offsets.c @@ -96,6 +96,11 @@ void foo(void) sizeof(struct tss_struct)); DEFINE(PAGE_SIZE_asm, PAGE_SIZE); + DEFINE(PAGE_SHIFT_asm, PAGE_SHIFT); + DEFINE(PTRS_PER_PTE, PTRS_PER_PTE); + DEFINE(PTRS_PER_PMD, PTRS_PER_PMD); + DEFINE(PTRS_PER_PGD, PTRS_PER_PGD); + DEFINE(VDSO_PRELINK_asm, VDSO_PRELINK); OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx); === --- a/arch/i386/kernel/head.S +++ b/arch/i386/kernel/head.S @@ -34,17 +34,32 @@ /* * This is how much memory *in addition to the memory covered up to - * and including _end* we need mapped initially. We need one bit for - * each possible page, but only in low memory, which means - * 2^32/4096/8 = 128K worst case (4G/4G split.) + * and including _end* we need mapped initially. + * We need: + * - one bit for each possible page, but only in low memory, which means + * 2^32/4096/8 = 128K worst case (4G/4G split.) + * - enough space to map all low memory, which means + * (2^32/4096) / 1024 pages (worst case, non PAE) + * (2^32/4096) / 512 + 4 pages (worst case for PAE) + * - a few pages for allocator use before the kernel pagetable has + * been set up * * Modulo rounding, each megabyte assigned here requires a kilobyte of * memory, which is currently unreclaimed. * * This should be a multiple of a page. */ -#define INIT_MAP_BEYOND_END(128*1024) - +LOW_PAGES = 1<<(32-PAGE_SHIFT_asm) + +#if PTRS_PER_PMD > 1 +PAGE_TABLE_SIZE = (LOW_PAGES / PTRS_PER_PMD) + PTRS_PER_PGD +#else +PAGE_TABLE_SIZE = (LOW_PAGES / PTRS_PER_PGD) +#endif +BOOTBITMAP_SIZE = LOW_PAGES / 8 +ALLOCATOR_SLOP = 4 + +INIT_MAP_BEYOND_END = BOOTBITMAP_SIZE + (PAGE_TABLE_SIZE + ALLOCATOR_SLOP)*PAGE_SIZE_asm /* * 32-bit kernel entrypoint; only used by the boot CPU. On entry, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
Jeremy Fitzhardinge wrote: H. Peter Anvin wrote: I suggest, for clarity and to minimize bloat: I think it would save a page, at most. But OK. (Also, if you're running !PAE, these pages will actually become part of the init_mm pagetable, so there's no memory wastage at all.) Even with PSE? However, the main reason I wanted it done that way is to avoid cargo cult programming; this makes it much clearer where the numbers actually come from. #ifdef CONFIG_X86_PAE # define PAGE_TABLE_SIZE((2048+4)*4096) #else # define PAGE_TABLE_SIZE((1024+1)*4096) 1024 should be enough; the pgd is still swapper_pg_dir, and there are no pmds. Check. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
H. Peter Anvin wrote: Jeremy Fitzhardinge wrote: H. Peter Anvin wrote: Really (pae ? 2M : 1M), in other words, plus the 128K for bootmem. Note that this is creating page tables for, not erasing. To map 2M, we will only use 2K of additional memory (meaning there is 50% chance we end up using an additional 4K page.) So the solution is simply to change INIT_MAP_BEYOND_END in head.S appropriately. Like this? Should we bother adding some slop pages for allocations which happen before paging_init()? Yes, although it really should be sensitive to CONFIG_X86_PAE. /* * This is how much memory *in addition to the memory covered up to - * and including _end* we need mapped initially. We need one bit for - * each possible page, but only in low memory, which means - * 2^32/4096/8 = 128K worst case (4G/4G split.) + * and including _end* we need mapped initially. + * We need: + * - one bit for each possible page, but only in low memory, which means + * 2^32/4096/8 = 128K worst case (4G/4G split.) + * - enough space to map all low memory, which means + * (2^32/4096) / 512 + 4 pages (worst case for PAE) * * Modulo rounding, each megabyte assigned here requires a kilobyte of * memory, which is currently unreclaimed. * * This should be a multiple of a page. */ -#define INIT_MAP_BEYOND_END(128*1024) +#define INIT_MAP_BEYOND_END(128*1024 + (2048 + 4)*4096) I suggest, for clarity and to minimize bloat: #ifdef CONFIG_X86_PAE # define PAGE_TABLE_SIZE((2048+4)*4096) #else # define PAGE_TABLE_SIZE((1024+1)*4096) #endif #define BOOTMEM_SIZE(128*1024) #define INIT_MAP_BEYOND_END(BOOTMEM_SIZE+PAGE_TABLE_SIZE) Actually, better yet; there is no reason for these to be macros: #ifdef CONFIG_X86_PAE PAGE_TABLE_SIZE = (2048+4)*4096 #else PAGE_TABLE_SIZE = (1024+1)*4096 #endif BOOTMEM_SIZE = 128*1024 INIT_MAP_BEYOND_END = BOOTMEM_SIZE + PAGE_TABLE_SIZE -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
H. Peter Anvin wrote: > I suggest, for clarity and to minimize bloat: I think it would save a page, at most. But OK. (Also, if you're running !PAE, these pages will actually become part of the init_mm pagetable, so there's no memory wastage at all.) > #ifdef CONFIG_X86_PAE > # define PAGE_TABLE_SIZE((2048+4)*4096) > #else > # define PAGE_TABLE_SIZE((1024+1)*4096) 1024 should be enough; the pgd is still swapper_pg_dir, and there are no pmds. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
Jeremy Fitzhardinge wrote: H. Peter Anvin wrote: Really (pae ? 2M : 1M), in other words, plus the 128K for bootmem. Note that this is creating page tables for, not erasing. To map 2M, we will only use 2K of additional memory (meaning there is 50% chance we end up using an additional 4K page.) So the solution is simply to change INIT_MAP_BEYOND_END in head.S appropriately. Like this? Should we bother adding some slop pages for allocations which happen before paging_init()? Yes, although it really should be sensitive to CONFIG_X86_PAE. /* * This is how much memory *in addition to the memory covered up to - * and including _end* we need mapped initially. We need one bit for - * each possible page, but only in low memory, which means - * 2^32/4096/8 = 128K worst case (4G/4G split.) + * and including _end* we need mapped initially. + * We need: + * - one bit for each possible page, but only in low memory, which means + * 2^32/4096/8 = 128K worst case (4G/4G split.) + * - enough space to map all low memory, which means + * (2^32/4096) / 512 + 4 pages (worst case for PAE) * * Modulo rounding, each megabyte assigned here requires a kilobyte of * memory, which is currently unreclaimed. * * This should be a multiple of a page. */ -#define INIT_MAP_BEYOND_END(128*1024) +#define INIT_MAP_BEYOND_END(128*1024 + (2048 + 4)*4096) I suggest, for clarity and to minimize bloat: #ifdef CONFIG_X86_PAE # define PAGE_TABLE_SIZE((2048+4)*4096) #else # define PAGE_TABLE_SIZE((1024+1)*4096) #endif #define BOOTMEM_SIZE(128*1024) #define INIT_MAP_BEYOND_END (BOOTMEM_SIZE+PAGE_TABLE_SIZE) -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
H. Peter Anvin wrote: > Really (pae ? 2M : 1M), in other words, plus the 128K for bootmem. > Note that this is creating page tables for, not erasing. To map 2M, > we will only use 2K of additional memory (meaning there is 50% chance > we end up using an additional 4K page.) > > So the solution is simply to change INIT_MAP_BEYOND_END in head.S > appropriately. Like this? Should we bother adding some slop pages for allocations which happen before paging_init()? Subject: i386: map enough initial memory to create lowmem mappings head.S creates the very initial pagetable for the kernel. This just maps enough space for the kernel itself, and an allocation bitmap. The amount of mapped memory is rounded up to 4Mbytes, and so this typically ends up mapping 8Mbytes of memory. When booting, pagetable_init() needs to create mappings for all lowmem, and the pagetables for these mappings are allocated from the free pages around the kernel in low memory. If the number of pagetable pages + kernel size exceeds head.S's initial mapping, it will end up faulting on an unmapped page. This will only happen with specific combinations of kernel size and memory size. This patch makes sure that head.S also maps enough space to fit the kernel pagetables as well as the kernel itself. It ends up using an additional two pages of unreclaimable memory. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Zachary Amsden <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: "Eric W. Biederman" <[EMAIL PROTECTED]> Cc: "H. Peter Anvin" <[EMAIL PROTECTED]> Cc: Linus Torvalds <[EMAIL PROTECTED]>, --- arch/i386/kernel/head.S | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) === --- a/arch/i386/kernel/head.S +++ b/arch/i386/kernel/head.S @@ -34,16 +34,19 @@ /* * This is how much memory *in addition to the memory covered up to - * and including _end* we need mapped initially. We need one bit for - * each possible page, but only in low memory, which means - * 2^32/4096/8 = 128K worst case (4G/4G split.) + * and including _end* we need mapped initially. + * We need: + * - one bit for each possible page, but only in low memory, which means + * 2^32/4096/8 = 128K worst case (4G/4G split.) + * - enough space to map all low memory, which means + * (2^32/4096) / 512 + 4 pages (worst case for PAE) * * Modulo rounding, each megabyte assigned here requires a kilobyte of * memory, which is currently unreclaimed. * * This should be a multiple of a page. */ -#define INIT_MAP_BEYOND_END(128*1024) +#define INIT_MAP_BEYOND_END(128*1024 + (2048 + 4)*4096) /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
Zachary Amsden wrote: Jeremy Fitzhardinge wrote: It seems to me that the problem is simply that it runs out of space. head.S maps 8Mbytes of memory. 8 MB was a long time ago. head.S maps the kernel size plus INIT_MAP_BEYOND_END, which is currently set to 128K. The kernel takes ~6.8M of that, and there simply isn't enough remaining space to fit the pagetables to map all memory into the kernel address space. Here's my dump of all pte allocations. Notice the jump at c070b000 where skips over the kernel, and then it just runs into the 8M limit. This is with CONFIG_PARAVIRT, but no CONFIG_XEN. I don't see why this doesn't happen all the time; I can't see anything about this which is PARAVIRT-specific. But I think only specific combinations of memory size and kernel size can trigger the problem, because the code in head.S will often end up mapping enough memory to fit everything in. It tries to map kernelsize+initial_pagetables+128k of space; in this case it happens to map 8M, but if the kernel were much larger it would map 12M. But surely this must have been seen before? Or is there something subtle I'm missing? Wow, that is a huge kernel. No wonder I've never seen this. Seems when you go over 6meg there will be a problem. For PAE, this requires page tables to map up to about 896 of lowmem - each page table can map 2 meg of memory, so you need up to 448 page tables, or 1792k of page table memory - adding 16k for pmd tables, this comes to 1808k. With 6.24M kernel, you simply will run out of space to map all of lowmem in 8M. > With 6.8M PAE kernel, 608M of lowmem mappings will cause you to go beyond 8M of initially mapped space. Non-PAE kernels will be ok until you get to a kernel size of about 7.04M. This means INIT_MAP_BEYOND_END is set incorrectly. Note you can always run out of space; to ensure safety, the init code needs to not use a fixed mapping size, it needs to map end_kernel_address + pae ? 1808k : 896k, assuming 128M vmalloc hole. Really (pae ? 2M : 1M), in other words, plus the 128K for bootmem. Note that this is creating page tables for, not erasing. To map 2M, we will only use 2K of additional memory (meaning there is 50% chance we end up using an additional 4K page.) So the solution is simply to change INIT_MAP_BEYOND_END in head.S appropriately. This could cause problems like running into initrd, however, so might require loader changes, or perhaps relocating the initrd. Is the solution to just cap the kernel size at some fixed maximum? 6.2M appears to be the safe limit for all configurations. The initrd is supposed to be loaded as far away from the kernel as possible. Mapping 2M hardly seems like a problem. When we set up the memory manager we actively have to watch out for the pages that belong to the initrd anyway. I have on my list to be able to pull initrd out of highmem; that way the bootloader can always load the initrd from end of memory. I'm pulling this back onto lkml; seems this is a serious bug which needs attention. I've also cc'd some parties that might have relevant knowledge. Why do I seem to recall head.S mapping 128M of mappings at one point in time? You're probably confusing it with the 128K number, which was set to fit the maximum possible memory for the bootmem pagetables. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
Zachary Amsden wrote: > Note you can always run out of space; to ensure safety, the init code > needs to not use a fixed mapping size, it needs to map > end_kernel_address + pae ? 1808k : 896k, assuming 128M vmalloc hole. > This could cause problems like running into initrd, however, so might > require loader changes, or perhaps relocating the initrd. Is the > solution to just cap the kernel size at some fixed maximum? 6.2M > appears to be the safe limit for all configurations. > > I'm pulling this back onto lkml; seems this is a serious bug which > needs attention. I've also cc'd some parties that might have relevant > knowledge. Why do I seem to recall head.S mapping 128M of mappings at > one point in time? I just posted a patch. Oops, forgot to Cc: Eric. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
Jeremy Fitzhardinge wrote: It seems to me that the problem is simply that it runs out of space. head.S maps 8Mbytes of memory. The kernel takes ~6.8M of that, and there simply isn't enough remaining space to fit the pagetables to map all memory into the kernel address space. Here's my dump of all pte allocations. Notice the jump at c070b000 where skips over the kernel, and then it just runs into the 8M limit. This is with CONFIG_PARAVIRT, but no CONFIG_XEN. I don't see why this doesn't happen all the time; I can't see anything about this which is PARAVIRT-specific. But I think only specific combinations of memory size and kernel size can trigger the problem, because the code in head.S will often end up mapping enough memory to fit everything in. It tries to map kernelsize+initial_pagetables+128k of space; in this case it happens to map 8M, but if the kernel were much larger it would map 12M. But surely this must have been seen before? Or is there something subtle I'm missing? Wow, that is a huge kernel. No wonder I've never seen this. Seems when you go over 6meg there will be a problem. For PAE, this requires page tables to map up to about 896 of lowmem - each page table can map 2 meg of memory, so you need up to 448 page tables, or 1792k of page table memory - adding 16k for pmd tables, this comes to 1808k. With 6.24M kernel, you simply will run out of space to map all of lowmem in 8M. With 6.8M PAE kernel, 608M of lowmem mappings will cause you to go beyond 8M of initially mapped space. Non-PAE kernels will be ok until you get to a kernel size of about 7.04M. Note you can always run out of space; to ensure safety, the init code needs to not use a fixed mapping size, it needs to map end_kernel_address + pae ? 1808k : 896k, assuming 128M vmalloc hole. This could cause problems like running into initrd, however, so might require loader changes, or perhaps relocating the initrd. Is the solution to just cap the kernel size at some fixed maximum? 6.2M appears to be the safe limit for all configurations. I'm pulling this back onto lkml; seems this is a serious bug which needs attention. I've also cc'd some parties that might have relevant knowledge. Why do I seem to recall head.S mapping 128M of mappings at one point in time? Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/