Re: Fix for bug that causes "KVM_GET_SUPPORTED_CPUID failed" errors.
Yes, I saw the problem with 3.0.13. Thanks! Gabe On Mon, Jan 23, 2012 at 11:40 PM, Sasha Levin wrote: > The GET_SUPPORTED_CPUID bug has been fixed and shouldn't be happening > from v3.2 onwards. > > Do you still see the issue in older versions? > > On Mon, 2012-01-23 at 21:20 -0800, Gabe Black wrote: >> Sorry, forgot to add a subject. >> >> Gabe >> >> On Mon, Jan 23, 2012 at 9:18 PM, Gabe Black wrote: >> > Hi, I think I've tracked down the bug that causes >> > "KVM_GET_SUPPORTED_CPUID failed: Argument list too long" errors when >> > using the kvm tool. Basically, this (possibly squished) code seems to >> > be to blame: >> > >> > case 0xd: { >> > int i; >> > >> > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; >> > for (i = 1; *nent < maxnent && i < 64; ++i) { >> > if (entry[i].eax == 0) >> > continue; >> > do_cpuid_1_ent(&entry[i], function, i); >> > entry[i].flags |= >> > KVM_CPUID_FLAG_SIGNIFCANT_INDEX; >> > ++*nent; >> > } >> > break; >> > } >> > >> > You can see there's a check whether entry[i].eax is 0, but it isn't >> > until the next line that entry[i] is actually filled in. That means >> > that whether or not an entry is filled in for the 0xd function is >> > essentially random, and that can lead to the loss of valid entries. It >> > also means that nent may be incremented too often, and since all 64 >> > entries are iterated over, that can fill up the available storage and >> > cause that error. >> > >> > I tested my theory by commenting out the if (100% failure rate) and >> > moving it after do_cpuid_1_ent (100% success rate). Since this is a >> > non-deterministic failure that isn't really conclusive, but I'm fairly >> > confident my fix is correct. I don't know exactly what your procedure >> > is for submitting patches, but one is attached. >> > >> > Gabe >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > Sasha. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fix for bug that causes "KVM_GET_SUPPORTED_CPUID failed" errors.
Sorry, forgot to add a subject. Gabe On Mon, Jan 23, 2012 at 9:18 PM, Gabe Black wrote: > Hi, I think I've tracked down the bug that causes > "KVM_GET_SUPPORTED_CPUID failed: Argument list too long" errors when > using the kvm tool. Basically, this (possibly squished) code seems to > be to blame: > > case 0xd: { > int i; > > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; > for (i = 1; *nent < maxnent && i < 64; ++i) { > if (entry[i].eax == 0) > continue; > do_cpuid_1_ent(&entry[i], function, i); > entry[i].flags |= > KVM_CPUID_FLAG_SIGNIFCANT_INDEX; > ++*nent; > } > break; > } > > You can see there's a check whether entry[i].eax is 0, but it isn't > until the next line that entry[i] is actually filled in. That means > that whether or not an entry is filled in for the 0xd function is > essentially random, and that can lead to the loss of valid entries. It > also means that nent may be incremented too often, and since all 64 > entries are iterated over, that can fill up the available storage and > cause that error. > > I tested my theory by commenting out the if (100% failure rate) and > moving it after do_cpuid_1_ent (100% success rate). Since this is a > non-deterministic failure that isn't really conclusive, but I'm fairly > confident my fix is correct. I don't know exactly what your procedure > is for submitting patches, but one is attached. > > Gabe -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hi, I think I've tracked down the bug that causes "KVM_GET_SUPPORTED_CPUID failed: Argument list too long" errors when using the kvm tool. Basically, this (possibly squished) code seems to be to blame: case 0xd: { int i; entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent < maxnent && i < 64; ++i) { if (entry[i].eax == 0) continue; do_cpuid_1_ent(&entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent; } break; } You can see there's a check whether entry[i].eax is 0, but it isn't until the next line that entry[i] is actually filled in. That means that whether or not an entry is filled in for the 0xd function is essentially random, and that can lead to the loss of valid entries. It also means that nent may be incremented too often, and since all 64 entries are iterated over, that can fill up the available storage and cause that error. I tested my theory by commenting out the if (100% failure rate) and moving it after do_cpuid_1_ent (100% success rate). Since this is a non-deterministic failure that isn't really conclusive, but I'm fairly confident my fix is correct. I don't know exactly what your procedure is for submitting patches, but one is attached. Gabe diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 77c9d86..35d7ae0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2414,9 +2414,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent < maxnent && i < 64; ++i) { + do_cpuid_1_ent(&entry[i], function, i); if (entry[i].eax == 0) continue; - do_cpuid_1_ent(&entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent;
unhandled vm exit: 0x80000021 vcpu_id 0
Hello again. I'm making more progress getting KVM going in M5, and right now I'm trying to figure out why I'm getting an unhandled vm exit with exit code 0x8021. According to Intel's manual, something about the guest state isn't being set up correctly. I dumped the initial register state for the 0th virtual CPU and noticed that some things Intel claims are illegal show up there, for instance having paging and protected mode disabled. I'm assuming there's some cooking done to the state as presented to KVM to, for instance, substitute V8086 mode for real mode, etc. I've fixed a number of bugs in M5 that cleaned up some issues, but I'm hoping somebody with more knowledge can tell me what illegal state is still there that would make it through the kvms twiddling and cause VMX to abort. One thing that I know looks funny is that the limit on the IDT is zero, but I haven't been able to find any evidence in the manuals that that's considered wrong rather than just a bad idea. Any help here would be very appreciated! Gabe rax rbx rcx rdx rsi 00090200 rdi rsp rbp r8 r9 r10 r11 r12 r13 r14 r15 rip 0020 rflags 0002 cs 0008 (/ p 1 dpl 0 db 0 s 1 type a l 1 g 1 avl 0) ds 0010 (/ p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) es 0010 (/ p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) ss 0010 (/ p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) fs 0010 (/ p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) gs 0010 (/ p 1 dpl 0 db 1 s 1 type 2 l 0 g 1 avl 0) tr 0018 (/ p 1 dpl 0 db 1 s 0 type b l 0 g 1 avl 0) ldt (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gdt 76000/17 idt 0/0 cr0 8011 cr2 0 cr3 7 cr4 20 cr8 0 efer 500 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
potential tss trampling, assumptions about physical memory layout
While continuing to try to get KVM and our M5 simulator to work together, I ran into another issue. During VCPU bring up in x86 under VMX, the function init_rmode_tss is called which seems to be writing an initial version of a TSS into guest memory. It's not immediately clear to me why that would be a necessary part of CPU initialization since, according to my understanding of the ISA, the TSS should only be written by CPU itself (as apposed to a regular load or store) when doing a task switch. This doesn't present an immediate problem for us, but if CPUs are not all started at the same time, it would be possible for them to trample a shared TSS or whatever would end up beneath it. There has also been a strange problem that crops up when the tss is put at an unusual address (more on that next) where copy_to_user within kvm_clear_guest_page will work or not work depending on seemingly arbitrary circumstances. Are there any restrictions on how to mmap the backing memory or on how it's used at that point in CPU initialization? I'd need to look into this more to get a good idea of what's going on, but it seems to defy logic and any guidance would be useful. init_rmode_tss uses a function called rmode_tss_base to figure out where to put this initial tss, and there seem to be a few assumptions about what the physical memory layout looks like that are not necessarily true. The version of the code in the kernel I'm working with (2.6.28.9) looks like the following: static gva_t rmode_tss_base(struct kvm *kvm) { if (!kvm->arch.tss_addr) { gfn_t base_gfn = kvm->memslots[0].base_gfn + kvm->memslots[0].npages - 3; return base_gfn << PAGE_SHIFT; } return kvm->arch.tss_addr; } The first assumption is that there is a valid element in the memslots array. If not the guest may be in trouble anyway, but the kernel may behave unpredictably in that case. Second, this code seems to assume what I believe is the standard QEMU style memory layout with a chunk of physical memory from 0x0 - 0xA, the VGA framebuffer, and then 0xC and upwards. In that case, memslots[0] would end at 0xA, and going three pages back from the end would still be a valid realmode address. In our case, I was not leaving a gap for VGA and instead allocating all ~500 MB of memory in one chunk, so three pages from the end of memslot[0] was WAY outside the bounds of real mode addressing. If this TSS is indeed for use by the guest in real mode, that doesn't seem like it would work. Finally, it assumes that memslots[0].npages is greater than 3 in the first place. If it isn't, you'll end up with wrapping and try to install a real mode TSS at the end of the address space, even farther from what real mode can handle and potentially entirely invalid. Gabe -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference in kernel code, ignored parameters in libkvm
Avi Kivity wrote: > Gabe Black wrote: >> Hi. I'm a developer on the M5 simulator (m5sim.org) working on a CPU >> model which uses kvm as its execution engine. > > Neat stuff. You're using kvm to run non-x86 code on x86? Right now we're trying to run x86 on x86 but using our device models instead of QEMUs. It would be interesting to use it for other ISAs, but as far as I know we hadn't considered that. > >> I ran into a kernel "BUG" >> where a NULL pointer is being dereferenced in gfn_to_rmap. >> >> What's happening on the kernel side is that gfn_to_rmap is calling >> gfn_to_memslot. That function looks for the gfn in the memory slots, >> fails to find it, and returns a NULL pointer. gfn_to_rmap then tries to >> dereference it, and the kernel kills itself. I believe the original >> source of the call to gfn_to_memslot was mmu_alloc_roots (in 2.6.28.9, >> it may have moved) which tries to get the page pointed to by CR3 using >> kvm_mmu_get_page. That part may not be correct, so here's the log output >> from the kernel. >> > > This was fixed by 89da4ff17 ("KVM: x86: check for cr3 validity in > mmu_alloc_roots"). Did the code base you were testing contain that?7 It was the 2.6.28.9 kernel, and looking at the patch and lxr it appears not. > >> The second problem was the fact that CR3 didn't point to any memory even >> though it had a valid value (0x7000). This was because our code relied >> on kvm_create to set up physical memory, and while it takes parameters >> for it and passes them around, it never actually seems to do anything >> with them. This also seems to be the case in your most recent code. >> >> > > You should set up the memory independently using the memory slot APIs, > then load CR3. kvm_create() has bitrotted a bit. Will do. > >> I am a full time employee of VMware, and while I work on M5 on my own >> time, that places certain limits on what I can do to help fix these >> bugs. While I probably can't implement anything, I should be able to >> provide more information about what we're doing with M5 or about the >> crash if that would help. >> > > I appreciate the reports. Please test latest kvm.git and let us know > if the problems persist. Will do. > > It would also be interesting to hear how you use kvm. > Ultimately, we'd like to use KVM for at least two things. The first is as a way to fast forward simulations to the portion of interest before switching into something slower that can collect interesting statistics and accurately simulate performance. Our immediate goal on our way to that is to get a CPU based around KVM to boot Linux while hooked into our device models. We're very early in the process, but one challenge I'm anticipating is being able to pull the local APIC out of the virtual CPU and into M5 so that we can coordinate IPIs, etc., ourselves. The other thing we'd like to do is to use KVM as a golden model to verify our correctness against. To do that, we'll probably need to make significant progress on the above, and then also find a mechanism to make each CPU advance in very incremental and deterministic ways. Our thought on that so far as been to set the TF bit in the guest and use the #DB to exit back to the host. We expect there will be some gotchas with this like hold off on mov to %ss, but if there would be any show stopper problems please let us know. Gabe -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NULL pointer dereference in kernel code, ignored parameters in libkvm
mits on what I can do to help fix these bugs. While I probably can't implement anything, I should be able to provide more information about what we're doing with M5 or about the crash if that would help. Gabe Black -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html