On Fri, Dec 30, 2022 at 06:07:24PM +0100, Jason A. Donenfeld wrote: > Look closer at the boot process. The compressed image is initially at > 0x100000, but it gets relocated to a safer area at the end of > startup_64:
That is the address we're executing here from, rip here looks like 0x100xxx. > /* > * Copy the compressed kernel to the end of our buffer > * where decompression in place becomes safe. > */ > pushq %rsi > leaq (_bss-8)(%rip), %rsi > leaq rva(_bss-8)(%rbx), %rdi when you get to here, it looks something like this: leaq (_bss-8)(%rip), %rsi # 0x9e7ff8 leaq rva(_bss-8)(%rbx), %rdi # 0xc6eeff8 so the source address is that _bss thing and we copy... > movl $(_bss - startup_32), %ecx > shrl $3, %ecx > std ... backwards since DF=1. Up to: # rsi = 0xffff8 # rdi = 0xbe06ff8 Ok, so the source address is 0x100000. Good. > HOWEVER, qemu currently appends setup_data to the end of the > compressed kernel image, Yeah, you mean the kernel which starts executing at 0x100000, i.e., that part which is compressed/head_64.S and which does the above and the relocation etc. > and this part isn't moved, and setup_data links aren't walked/relocated. So > that means the original address remains, of 0x100000. See above: when it starts copying the kernel image backwards to a higher address, that last byte is at 0x9e7ff8 so I'm guessing qemu has put setup_data *after* that address. And that doesn't get copied ofc. So far, so good. Now later, we extract the compressed kernel created with the mkpiggy magic: input_data: .incbin "arch/x86/boot/compressed/vmlinux.bin.gz" input_data_end: by doing /* * Do the extraction, and jump to the new kernel.. */ pushq %rsi /* Save the real mode argument */ 0x13d00 movq %rsi, %rdi /* real mode address */ 0x13d00 leaq boot_heap(%rip), %rsi /* malloc area for uncompression */ 0xc6ef000 leaq input_data(%rip), %rdx /* input_data */ 0xbe073a8 movl input_len(%rip), %ecx /* input_len */ 0x8cfe13 movq %rbp, %r8 /* output target address */ 0x1000000 movl output_len(%rip), %r9d /* decompressed length, end of relocs */ call extract_kernel /* returns kernel location in %rax */ popq %rsi (actual addresses at the end.) Now, when you say you triplefault somewhere in initialize_identity_maps() when trying to access setup_data, then if you look a couple of lines before that call we do call load_stage2_idt which sets up a boottime #PF handler do_boot_page_fault() and it actually does call kernel_add_identity_map() so *actually* it should map any unmapped setup_data addresses. So why doesn't it do that and why do you triplefault? Hmmm. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette