On December 30, 2022 11:54:11 AM PST, Borislav Petkov <b...@alien8.de> wrote: >On Fri, Dec 30, 2022 at 06:07:24PM +0100, Jason A. Donenfeld wrote: >> Look closer at the boot process. The compressed image is initially at >> 0x100000, but it gets relocated to a safer area at the end of >> startup_64: > >That is the address we're executing here from, rip here looks like 0x100xxx. > >> /* >> * Copy the compressed kernel to the end of our buffer >> * where decompression in place becomes safe. >> */ >> pushq %rsi >> leaq (_bss-8)(%rip), %rsi >> leaq rva(_bss-8)(%rbx), %rdi > >when you get to here, it looks something like this: > > leaq (_bss-8)(%rip), %rsi # 0x9e7ff8 > leaq rva(_bss-8)(%rbx), %rdi # 0xc6eeff8 > >so the source address is that _bss thing and we copy... > >> movl $(_bss - startup_32), %ecx >> shrl $3, %ecx >> std > >... backwards since DF=1. > >Up to: > ># rsi = 0xffff8 ># rdi = 0xbe06ff8 > >Ok, so the source address is 0x100000. Good. > >> HOWEVER, qemu currently appends setup_data to the end of the >> compressed kernel image, > >Yeah, you mean the kernel which starts executing at 0x100000, i.e., that part >which is compressed/head_64.S and which does the above and the relocation etc. > >> and this part isn't moved, and setup_data links aren't walked/relocated. So >> that means the original address remains, of 0x100000. > >See above: when it starts copying the kernel image backwards to a higher >address, that last byte is at 0x9e7ff8 so I'm guessing qemu has put setup_data >*after* that address. And that doesn't get copied ofc. > >So far, so good. > >Now later, we extract the compressed kernel created with the mkpiggy magic: > >input_data: >.incbin "arch/x86/boot/compressed/vmlinux.bin.gz" >input_data_end: > >by doing > >/* > * Do the extraction, and jump to the new kernel.. > */ > > pushq %rsi /* Save the real mode argument */ > 0x13d00 > movq %rsi, %rdi /* real mode address */ > 0x13d00 > leaq boot_heap(%rip), %rsi /* malloc area for uncompression */ > 0xc6ef000 > leaq input_data(%rip), %rdx /* input_data */ > 0xbe073a8 > movl input_len(%rip), %ecx /* input_len */ > 0x8cfe13 > movq %rbp, %r8 /* output target address */ > 0x1000000 > movl output_len(%rip), %r9d /* decompressed length, end of relocs > */ > call extract_kernel /* returns kernel location in %rax */ > popq %rsi > >(actual addresses at the end.) > >Now, when you say you triplefault somewhere in initialize_identity_maps() when >trying to access setup_data, then if you look a couple of lines before that >call >we do > > call load_stage2_idt > >which sets up a boottime #PF handler do_boot_page_fault() and it actually does >call kernel_add_identity_map() so *actually* it should map any unmapped >setup_data addresses. > >So why doesn't it do that and why do you triplefault? > >Hmmm. >
See the other thread fork. They have identified the problem already.