[PATCH] book3s_hv_rmhandlers:Pass the correct trap argument to kvmhv_commence_exit
In guest_exit_cont we call kvmhv_commence_exit which expects the trap number as the argument. However r3 doesn't contain the trap number at this point and as a result we would be calling the function with a spurious trap number. Fix this by copying r12 into r3 before calling kvmhv_commence_exit as r12 contains the trap number Signed-off-by: Gautham R. Shenoy e...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4d70df2..f0d7c54 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1170,6 +1170,7 @@ mc_cont: bl kvmhv_accumulate_time #endif + mr r3, r12 /* Increment exit count, poke other threads to exit */ bl kvmhv_commence_exit nop -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:Return -ENOMEM directly for the function, kvm_create_lapic
* Nicholas Krause xerofo...@gmail.com wrote: In order to make code paths easier to read in the function, kvm_create_lapic we return -ENOMEM when unable to allocate memory for a kvm_lapic structure pointer directly. This makes the code easier to read and cleaner then jumping to a goto label at the end of the function's body for returning just the error code, -ENOMEM. Signed-off-by: Nicholas Krause xerofo...@gmail.com --- arch/x86/kvm/lapic.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 629af0f..88d0cce 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1687,7 +1687,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) apic = kzalloc(sizeof(*apic), GFP_KERNEL); if (!apic) - goto nomem; + return -ENOMEM; vcpu-arch.apic = apic; @@ -1718,7 +1718,6 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) return 0; nomem_free_apic: kfree(apic); -nomem: return -ENOMEM; } NAK! You just half destroyed the nice error handling cascade of labels. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 98741] New: Cannot boot into kvm guest with kernel = 3.18.x on el6.6 qemu-kvm host with virtio-blk-pci.x-data-plane=on and virtio-blk-pci.ioeventfd=on
https://bugzilla.kernel.org/show_bug.cgi?id=98741 Bug ID: 98741 Summary: Cannot boot into kvm guest with kernel = 3.18.x on el6.6 qemu-kvm host with virtio-blk-pci.x-data-plane=on and virtio-blk-pci.ioeventfd=on Product: Virtualization Version: unspecified Kernel Version: 3.18.x, 3.19.x, 4.0.x Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm Assignee: virtualization_...@kernel-bugs.osdl.org Reporter: jaroslav.pulch...@gooddata.com Regression: No Hello, I'm experiencing problem with latest kernel during testing of new features in virtual guests running at EL6.6 host in KVM with Virtio PV drivers and enabled datapalane (virtio-blk-pci.x-data-plane=on) feature to provide best IO performance. The issue cause that qemu-kvm process is stopped during quest boot. Host log: -- qemu-kvm: /builddir/build/BUILD/qemu-kvm-0.12.1.2/hw/msix.c:645: msix_set_mask_notifier: Assertion `!dev-msix_mask_notifier' failed. 2015-05-18 08:45:48.102+: shutting down --- Guest log: --- ... initcall virtio_pci_driver_init+0x0/0x1000 [virtio_pci] returned 0 after 82556 usecs calling ata_init+0x0/0x5d [libata] @ 204 libata version 3.00 loaded. initcall ata_init+0x0/0x5d [libata] returned 0 after 11641 usecs [drm:drm_framebuffer_reference] 8800da0c5ea0: FB ID: 21 (2) [drm:drm_framebuffer_unreference] 8800da0c5ea0: FB ID: 21 (3) calling ata_generic_pci_driver_init+0x0/0x1000 [ata_generic] @ 204 initcall ata_generic_pci_driver_init+0x0/0x1000 [ata_generic] returned 0 after 146 usecs calling init+0x0/0x1000 [virtio_blk] @ 250 virtio-pci :00:04.0: irq 24 for MSI/MSI-X virtio-pci :00:04.0: irq 25 for MSI/MSI-X ... END ... --- * Reproducible with this setup: Quest kernel versions: kernel 3.18, 3.19, 4.0 qemu-kvm: virtio-blk-pci.x-data-plane=on, virtio-blk-pci.ioeventfd=on * Not reproducible: Quest kernel versions: kernel 3.17 or older * Not reproducible: Quest kernel versions: kernel 3.18, 3.19, 4.0 qemu-kvm: virtio-blk-pci.x-data-plane=on + virtio-blk-pci.ioeventfd=off virtio-blk-pci.x-data-plane=off + virtio-blk-pci.ioeventfd=on I found this: 1/ shutting down is triggered by reentrancy into of qemu's virtio_blk_data_plane_start() function (see RedHat's bug #1222574), 2/ commits which introduced this issue are virtio_blk: enable VQs early on restore (6d62c37f1991aafc872f8d8be8ac60e57ede8605) virtio_net: enable VQs early on restore (e53fbd11e983e896adaabef2d2f1695d6e0af829) virtio_blk: enable VQs early (7a11370e5e6c26566904bb7f08281093a3002ff2) virtio_net: enable VQs early (4baf1e33d0842c9673fef4af207d4b74da8d0126) found by deep drive into git history of virtio drivers and several rebuilds with commits reverts (from 3.17-3.18) kernel 3.18.x build with reverted mentioned commits can successfully boot and run without issues. Best regards, Jaroslav -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU
On 05/21/2015 07:21 PM, Paolo Bonzini wrote: On 21/05/2015 17:48, Avi Kivity wrote: Lovely! Note you have memcpy.o instead of memcpy.c. Doh, and it's not used anyway. Check the repository, and let me know if OSv boots with it (it probably needs ACPI; Linux doesn't boot virtio without ACPI). Yes, it requires ACPI. We don't implement the pre-ACPI bootstrap methods. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:Return -ENOMEM directly for the function, kvm_create_lapic
On 21/05/2015 08:09, Ingo Molnar wrote: * Nicholas Krause xerofo...@gmail.com wrote: In order to make code paths easier to read in the function, kvm_create_lapic we return -ENOMEM when unable to allocate memory for a kvm_lapic structure pointer directly. This makes the code easier to read and cleaner then jumping to a goto label at the end of the function's body for returning just the error code, -ENOMEM. Signed-off-by: Nicholas Krause xerofo...@gmail.com --- arch/x86/kvm/lapic.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 629af0f..88d0cce 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1687,7 +1687,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) apic = kzalloc(sizeof(*apic), GFP_KERNEL); if (!apic) -goto nomem; +return -ENOMEM; vcpu-arch.apic = apic; @@ -1718,7 +1718,6 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) return 0; nomem_free_apic: kfree(apic); -nomem: return -ENOMEM; } NAK! You just half destroyed the nice error handling cascade of labels. Right. What could be done, is always going through kfree(apic), because it is okay to free NULL. So the nomem label moves up, and the nomem_free_apic label is not necessary anymore. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for 2015-05-26
Hi Please, send any topic that you are interested in covering. Call details: By popular demand, a google calendar public entry with it https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ (Let me know if you have any problems with the calendar entry. I just gave up about getting right at the same time CEST, CET, EDT and DST). If you need phone number details, contact me privately Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] KVM: x86: API changes for SMM support
On 21/05/2015 18:26, Radim Krčmář wrote: 2015-05-21 16:59+0200, Paolo Bonzini: On 21/05/2015 16:49, Radim Krčmář wrote: 2015-05-08 13:20+0200, Paolo Bonzini: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h @@ -202,7 +202,7 @@ struct kvm_run { __u32 exit_reason; __u8 ready_for_interrupt_injection; __u8 if_flag; - __u8 padding2[2]; + __u16 flags; (It got lost last review and I'd really like to know ... what is the advantage of giving both bytes to flags?) No advantage. You just should leave padding2[1] in the middle so that the offset of run-padding2[0] doesn't change. I don't get that. The position of padding should be decided by comparing probabilities of extending 'if_flag' and 'flags'. Since it's not obvious I gave two bytes to flags, but I can do it either way. if_flag seems to be set in stone as one bit, so I'd vote for __u8 flags; __u8 padding2; (Or 'padding3', to prevent the same class of errors that removing it altogether does; which we didn't do for other tailed padding). You're right that we didn't do it. I'll change it to flags + padding2. Paolo For there isn't much space left in struct kvm ... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
On 21/05/2015 19:00, Radim Krčmář wrote: Potentially, an NMI could be latched (while in SMM or upon exit) and serviced upon exit [...] This Potentially could be in the sense that the whole 3rd paragraph is only applicable to some ancient SMM design :) It could also be in the sense that you cannot exclude an NMI coming at exactly the wrong time. If you want to go full language lawyer, it does mention it whenever behavior is specific to a processor family. The 1st paragraph has quite clear sentence: If NMIs were blocked before the SMI occurred, they are blocked after execution of RSM. so I'd just ignore the 3rd paragraph ... And the APM 2:10.3.3 Exceptions and Interrupts NMI—If an NMI occurs while the processor is in SMM, it is latched by the processor, but the NMI handler is not invoked until the processor leaves SMM with the execution of an RSM instruction. A pending NMI causes the handler to be invoked immediately after the RSM completes and before the first instruction in the interrupted program is executed. An SMM handler can unmask NMI interrupts by simply executing an IRET. Upon completion of the IRET instruction, the processor recognizes the pending NMI, and transfers control to the NMI handler. Once an NMI is recognized within SMM using this technique, subsequent NMIs are recognized until SMM is exited. Later SMIs cause NMIs to be masked, until the SMM handler unmasks them. makes me think that we should unmask them unconditionally or that SMM doesn't do anything with NMI masking. Actually I hadn't noticed this paragraph. But I read it the same as the Intel manual (i.e. what I implemented): it doesn't say anywhere that RSM may cause the processor to *set* the NMIs masked flag. It makes no sense; as you said it's 1 bit of state! But it seems that it's the architectural behavior. :( If we can choose, less NMI nesting seems like a good idea. It would---I'm just preempting future patches from Nadav. :) That said, even if OVMF does do IRETs in SMM (in 64-bit mode it fills in page tables lazily for memory above 4GB), we do not care about asynchronous SMIs such as those for power management. So we should never enter SMM with NMIs masked, to begin with. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
On 21/05/2015 18:33, Radim Krčmář wrote: Check the AMD architecture manual. I must be blind, is there more than Table 10-2? There's Table 10-1! :DDD Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] KVM: x86: API changes for SMM support
2015-05-08 13:20+0200, Paolo Bonzini: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h @@ -202,7 +202,7 @@ struct kvm_run { __u32 exit_reason; __u8 ready_for_interrupt_injection; __u8 if_flag; - __u8 padding2[2]; + __u16 flags; (It got lost last review and I'd really like to know ... what is the advantage of giving both bytes to flags?) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/12] KVM: x86: stubs for SMM support
2015-05-08 13:20+0200, Paolo Bonzini: This patch adds the interface between x86.c and the emulator: the SMBASE register, a new emulator flag, the RSM instruction. It also adds a new request bit that will be used by the KVM_SMI ioctl. Signed-off-by: Paolo Bonzini pbonz...@redhat.com -- RFC-v1: make SMBASE host-readable only add support for latching an SMI do not reset SMBASE on INIT --- diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h @@ -367,6 +367,7 @@ struct kvm_vcpu_arch { int32_t apic_arb_prio; int mp_state; u64 ia32_misc_enable_msr; + u64 smbase; smbase is u32 in hardware. diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c @@ -2504,7 +2504,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx) vmx-nested.nested_vmx_misc_low = VMX_MISC_SAVE_EFER_LMA; vmx-nested.nested_vmx_misc_low |= VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE | - VMX_MISC_ACTIVITY_HLT; + VMX_MISC_ACTIVITY_HLT | VMX_MISC_IA32_SMBASE_MSR; No need to expose this feature when the MSR isn't readable. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c @@ -2220,6 +2221,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) + case MSR_IA32_SMBASE: + if (!msr_info-host_initiated) + return 1; + vcpu-arch.smbase = data; + break; @@ -2615,6 +2621,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) + case MSR_IA32_SMBASE: + if (!msr_info-host_initiated) + return 1; + msr_info-data = vcpu-arch.smbase; + break; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] KVM: x86: API changes for SMM support
On 21/05/2015 16:49, Radim Krčmář wrote: 2015-05-08 13:20+0200, Paolo Bonzini: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h @@ -202,7 +202,7 @@ struct kvm_run { __u32 exit_reason; __u8 ready_for_interrupt_injection; __u8 if_flag; -__u8 padding2[2]; +__u16 flags; (It got lost last review and I'd really like to know ... what is the advantage of giving both bytes to flags?) No advantage. You just should leave padding2[1] in the middle so that the offset of run-padding2[0] doesn't change. Since it's not obvious I gave two bytes to flags, but I can do it either way. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] userfaultfd v4
Hello up there, On Thu, May 14, 2015 at 07:30:57PM +0200, Andrea Arcangeli wrote: Hello everyone, This is the latest userfaultfd patchset against mm-v4.1-rc3 2015-05-14-10:04. The postcopy live migration feature on the qemu side is mostly ready to be merged and it entirely depends on the userfaultfd syscall to be merged as well. So it'd be great if this patchset could be reviewed for merging in -mm. Userfaults allow to implement on demand paging from userland and more generally they allow userland to more efficiently take control of the behavior of page faults than what was available before (PROT_NONE + SIGSEGV trap). The use cases are: [...] Even though there wasn't a real use case requesting it yet, it also allows to implement distributed shared memory in a way that readonly shared mappings can exist simultaneously in different hosts and they can be become exclusive at the first wrprotect fault. Sorry for maybe speaking up too late, but here is additional real potential use-case which in my view is overlapping with the above: Recently we needed to implement persistency for NumPy arrays - that is to track made changes to array memory and transactionally either abandon the changes on transaction abort, or store them back to storage on transaction commit. Since arrays can be large, it would be slow and thus not practical to have original data copy and compare memory to original to find what array parts have been changed. So I've implemented a scheme where array data is initially PROT_READ protected, then we catch SIGSEGV, if it is write and area belongs to array data - we mark that page as PROT_WRITE and continue. On commit time we know which parts were modified. Also, since arrays could be large - bigger than RAM, and only sparse parts of it could be needed to get needed information, for reading it also makes sense to lazily load data in SIGSEGV handler with initial PROT_NONE protection. This is very similar to how memory mapped files work, but adds transactionality which, as far as I know, is not provided by any currently in-kernel filesystem on Linux. The system is done as files, and arrays are then build on top of this-way memory-mapped files. So from now on we can forget about NumPy arrays and only talk about files, their mapping, lazy loading and transactionally storing in-memory changes back to file storage. To get this working, a custom user-space virtual memory manager is unrolled, which manages RAM memory pages, file mappings into virtual address-space, tracks pages protection and does SIGSEGV handling appropriately. The gist of virtual memory-manager is this: https://lab.nexedi.cn/kirr/wendelin.core/blob/master/include/wendelin/bigfile/virtmem.h https://lab.nexedi.cn/kirr/wendelin.core/blob/master/bigfile/virtmem.c (vma_on_pagefault) For operations it currently needs - establishing virtual memory areas and connecting to tracking it - changing pages protection PROT_NONE or absent - initially PROT_NONE - PROT_READ- after read PROT_READ - PROT_READWRITE - after write PROT_READWRITE - PROT_READ- after commit PROT_READWRITE - PROT_NONE or absent (again) - after abort PROT_READ - PROT_NONE or absent (again) - on reclaim - working with aliasable memory (thus taken from tmpfs) there could be two overlapping-in-file mapping for file (array) requested at different time, and changes from one mapping should propagate to another one - for common parts only 1 page should be memory-mapped into 2 places in address-space. so what is currently lacking on userfaultfd side is: - ability to remove / make PROT_NONE already mapped pages (UFFDIO_REMAP was recently dropped) - ability to arbitrarily change pages protection (e.g. RW - R) - inject aliasable memory from tmpfs (or better hugetlbfs) and into several places (UFFDIO_REMAP + some mapping copy semantic). The code is ugly because it is only a prototype. You can clone/read it all from here: https://lab.nexedi.cn/kirr/wendelin.core Virtual memory-manager even has tests, and from them it could be seen how the system is supposed to work (after each access - what pages and where are mapped and how): https://lab.nexedi.cn/kirr/wendelin.core/blob/master/bigfile/tests/test_virtmem.c The performance currently is not great, partly because of page clearing when getting ram from tmpfs, and partly because of mprotect/SIGSEGV/vmas overhead and other dumb things on my side. I still wanted to show the case, as userfaultd here has potential to remove overhead related to kernel. Thanks beforehand for feedback, Kirill P.S. some context http://www.wendelin.io/NXD-Wendelin.Core.Non.Secret/asEntireHTML -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message
Announcing qboot, a minimal x86 firmware for QEMU
Some of you may have heard about the Clear Containers initiative from Intel, which couple KVM with various kernel tricks to create extremely lightweight virtual machines. The experimental Clear Containers setup requires only 18-20 MB to launch a virtual machine, and needs about 60 ms to boot. Now, as all of you probably know, QEMU is great for running Windows or legacy Linux guests, but that flexibility comes at a hefty price. Not only does all of the emulation consume memory, it also requires some form of low-level firmware in the guest as well. All of this adds quite a bit to virtual-machine startup times (500 to 700 milliseconds is not unusual). Right? In fact, it's for this reason that Clear Containers uses kvmtool instead of QEMU. No, wrong! In fact, reporting bad performance is pretty much the same as throwing down the gauntlet. Enter qboot, a minimal x86 firmware that runs on QEMU and, together with a slimmed-down QEMU configuration, boots a virtual machine in 40 milliseconds[2] on an Ivy Bridge Core i7 processor. qboot is available at git://github.com/bonzini/qboot.git. In all the glory of its 8KB of code, it brings together various existing open source components: * a minimal (really minimal) 16-bit BIOS runtime based on kvmtool's own BIOS * a couple hardware initialization routines written mostly from scratch but with good help from SeaBIOS source code * a minimal 32-bit libc based on kvm-unit-tests * the Linux loader from QEMU itself The repository has more information on how to achieve fast boot times, and examples of using qboot. Right now there is a limit of 8 MB for vmlinuz+initrd+cmdline, which however should be enough for initrd-less containers. The first commit to qboot is more or less 24 hours old, so there is definitely more work to do, in particular to extract ACPI tables from QEMU and present them to the guest. This is probably another day of work or so, and it will enable multiprocessor guests with little or no impact on the boot times. SMBIOS information is also available from QEMU. On the QEMU side, there is no support yet for persistent memory and the NFIT tables from ACPI 6.0. Once that (and ACPI support) is added, qboot will automatically start using it. Happy hacking! Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL
If passed a larger page size lookup_linux_ptep() may fail, so add a check for that and bail out if that's the case. This was found with the help of a static code analysis tool. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com Cc: Scott Wood scottw...@freescale.com --- based on https://github.com/agraf/linux-2.6.git kvm-ppc-next arch/powerpc/kvm/e500_mmu_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index cc536d4..249c816 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -469,7 +469,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pgdir = vcpu_e500-vcpu.arch.pgdir; ptep = lookup_linux_ptep(pgdir, hva, tsize_pages); - if (pte_present(*ptep)) + if (ptep pte_present(*ptep)) wimg = (*ptep PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; else { if (printk_ratelimit()) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL
If passed a larger page size lookup_linux_ptep() may fail, so add a check for that and bail out if that's the case. This was found with the help of a static code analysis tool. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com Cc: Scott Wood scottw...@freescale.com --- based on https://github.com/agraf/linux-2.6.git kvm-ppc-next arch/powerpc/kvm/e500_mmu_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index cc536d4..249c816 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -469,7 +469,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pgdir = vcpu_e500-vcpu.arch.pgdir; ptep = lookup_linux_ptep(pgdir, hva, tsize_pages); - if (pte_present(*ptep)) + if (ptep pte_present(*ptep)) wimg = (*ptep PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; else { if (printk_ratelimit()) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR
Initialize kvmclock base, on kvmclock system MSR write time, so that the guest sees kvmclock counting from zero. This matches baremetal behaviour when kvmclock in guest sets sched clock stable. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cc2c759..ea40d24 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2188,6 +2188,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu-requests); ka-boot_vcpu_runs_old_kvmclock = tmp; + + ka-kvmclock_offset = get_kernel_ns(); } vcpu-arch.time = data; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: odd time values since kvmclock: set scheduler clock stable
On Mon, May 18, 2015 at 10:13:03PM -0400, Sasha Levin wrote: On 05/18/2015 10:02 PM, Sasha Levin wrote: On 05/18/2015 08:13 PM, Marcelo Tosatti wrote: GOn Mon, May 18, 2015 at 07:45:41PM -0400, Sasha Levin wrote: On 05/18/2015 06:39 PM, Marcelo Tosatti wrote: On Tue, May 12, 2015 at 07:17:24PM -0400, Sasha Levin wrote: Hi all, I'm seeing odd jump in time values during boot of a KVM guest: [...] [0.00] tsc: Detected 2260.998 MHz processor [3376355.247558] Calibrating delay loop (skipped) preset value.. [...] I've bisected it to: Paolo, Sasha, Although this might seem undesirable, there is no requirement for sched_clock to initialize at 0: * * There is no strict promise about the base, although it tends to start * at 0 on boot (but people really shouldn't rely on that). * Sasha, are you seeing any problem other than the apparent time jump? Nope, but I've looked at it again and it seems that it jumps to the host's clock (that is, in the example above the 3376355 value was the host's clock value). Thanks, Sasha Sasha, thats right. Its the host monotonic clock. It's worth figuring out if (what) userspace breaks on that. I know it says that you shouldn't rely on that, but I'd happily place a bet on at least one userspace treating it as seconds since boot or something similar. Didn't need to go far... In the guest: # date Tue May 19 02:11:46 UTC 2015 # echo hi /dev/kmsg [3907533.080112] hi # dmesg -T [Fri Jul 3 07:33:41 2015] hi Sasha, Can you give the suggested patch (hypervisor patch...) a try please? (with a patched guest, obviously). KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing qboot, a minimal x86 firmware for QEMU
On Thu, May 21, 2015 at 03:51:43PM +0200, Paolo Bonzini wrote: On the QEMU side, there is no support yet for persistent memory and the NFIT tables from ACPI 6.0. Once that (and ACPI support) is added, qboot will automatically start using it. We are working on adding NFIT support into virtual bios. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL
On Thu, 2015-05-21 at 16:26 +0300, Laurentiu Tudor wrote: If passed a larger page size lookup_linux_ptep() may fail, so add a check for that and bail out if that's the case. This was found with the help of a static code analysis tool. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com Cc: Scott Wood scottw...@freescale.com --- based on https://github.com/agraf/linux-2.6.git kvm-ppc-next arch/powerpc/kvm/e500_mmu_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: check for lookup_linux_ptep() returning NULL
On Thu, 2015-05-21 at 16:26 +0300, Laurentiu Tudor wrote: If passed a larger page size lookup_linux_ptep() may fail, so add a check for that and bail out if that's the case. This was found with the help of a static code analysis tool. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com Cc: Scott Wood scottw...@freescale.com --- based on https://github.com/agraf/linux-2.6.git kvm-ppc-next arch/powerpc/kvm/e500_mmu_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU
On 05/21/2015 04:51 PM, Paolo Bonzini wrote: Some of you may have heard about the Clear Containers initiative from Intel, which couple KVM with various kernel tricks to create extremely lightweight virtual machines. The experimental Clear Containers setup requires only 18-20 MB to launch a virtual machine, and needs about 60 ms to boot. Now, as all of you probably know, QEMU is great for running Windows or legacy Linux guests, but that flexibility comes at a hefty price. Not only does all of the emulation consume memory, it also requires some form of low-level firmware in the guest as well. All of this adds quite a bit to virtual-machine startup times (500 to 700 milliseconds is not unusual). Right? In fact, it's for this reason that Clear Containers uses kvmtool instead of QEMU. No, wrong! In fact, reporting bad performance is pretty much the same as throwing down the gauntlet. Enter qboot, a minimal x86 firmware that runs on QEMU and, together with a slimmed-down QEMU configuration, boots a virtual machine in 40 milliseconds[2] on an Ivy Bridge Core i7 processor. qboot is available at git://github.com/bonzini/qboot.git. In all the glory of its 8KB of code, it brings together various existing open source components: * a minimal (really minimal) 16-bit BIOS runtime based on kvmtool's own BIOS * a couple hardware initialization routines written mostly from scratch but with good help from SeaBIOS source code * a minimal 32-bit libc based on kvm-unit-tests * the Linux loader from QEMU itself The repository has more information on how to achieve fast boot times, and examples of using qboot. Right now there is a limit of 8 MB for vmlinuz+initrd+cmdline, which however should be enough for initrd-less containers. The first commit to qboot is more or less 24 hours old, so there is definitely more work to do, in particular to extract ACPI tables from QEMU and present them to the guest. This is probably another day of work or so, and it will enable multiprocessor guests with little or no impact on the boot times. SMBIOS information is also available from QEMU. On the QEMU side, there is no support yet for persistent memory and the NFIT tables from ACPI 6.0. Once that (and ACPI support) is added, qboot will automatically start using it. Happy hacking! Lovely! Note you have memcpy.o instead of memcpy.c. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] userfaultfd v4
Hi Kirill, On Thu, May 21, 2015 at 04:11:11PM +0300, Kirill Smelkov wrote: Sorry for maybe speaking up too late, but here is additional real Not too late, in fact I don't think there's any change required for this at this stage, but it'd be great if you could help me to review. Since arrays can be large, it would be slow and thus not practical to [..] So I've implemented a scheme where array data is initially PROT_READ protected, then we catch SIGSEGV, if it is write and area belongs to array In the case of postcopy live migration (for qemu and/or containers) and postcopy live snapshotting, splitting the vmas is not an option because we may run out of them. If your PROT_READ areas are limited perhaps this isn't an issue but with hundreds GB guests (currently plenty in production) that needs to live migrate fully reliably and fast, the vmas could exceed the limit if we were to use mprotect. If your arrays are very large and the PROT_READ aren't limited, using userfaultfd this isn't only an optimization for you too, it's actually a must to avoid a potential -ENOMEM. Also, since arrays could be large - bigger than RAM, and only sparse parts of it could be needed to get needed information, for reading it also makes sense to lazily load data in SIGSEGV handler with initial PROT_NONE protection. Similarly I heard somebody wrote a fastresume to load the suspended (on disk) guest ram using userfaultfd. That is a slightly less fundamental case than postcopy because you could do it also with MAP_SHARED, but it's still interesting in allowing to compress or decompress the suspended ram on the fly with lz4 for example, something MAP_PRIVATE/MAP_SHARED wouldn't do (plus there's the additional benefit of not having an orphaned inode left open even if the file is deleted, that prevents to unmount the filesystem for the whole lifetime of the guest). This is very similar to how memory mapped files work, but adds transactionality which, as far as I know, is not provided by any currently in-kernel filesystem on Linux. That's another benefit yes. The gist of virtual memory-manager is this: https://lab.nexedi.cn/kirr/wendelin.core/blob/master/include/wendelin/bigfile/virtmem.h https://lab.nexedi.cn/kirr/wendelin.core/blob/master/bigfile/virtmem.c (vma_on_pagefault) I'll check it more in detail ASAP, thanks for the pointers! For operations it currently needs - establishing virtual memory areas and connecting to tracking it That's the UFFDIO_REGISTER/UNREGISTER. - changing pages protection PROT_NONE or absent - initially absent is what works with -mm already. The lazy loading already works. PROT_NONE - PROT_READ- after read Current UFFDIO_COPY will map it using vma-vm_page_prot. We'll need a new flag for UFFDIO_COPY to map it readonly. This is already contemplated: /* * There will be a wrprotection flag later that allows to map * pages wrprotected on the fly. And such a flag will be * available if the wrprotection ioctl are implemented for the * range according to the uffdio_register.ioctls. */ #define UFFDIO_COPY_MODE_DONTWAKE ((__u64)10) __u64 mode; If the memory protection framework exists (either through the uffdio_register.ioctl out value, or through uffdio_api.features out-only value) you can pass a new flag (MODE_WP) above to transition from absent to PROT_READ. PROT_READ - PROT_READWRITE - after write This will need to add UFFDIO_MPROTECT. PROT_READWRITE - PROT_READ- after commit UFFDIO_MPROTECT again (but harder if going from rw to ro, because of a slight mess to solve with regard to FAULT_FLAG_TRIED, in case you want to run this UFFDIO_MPROTECT without stopping the threads that are accessing the memory concurrently). And this should only work if the uffdio_register.mode had MODE_WP set, so we don't run into the races created by COWs (gup vs fork race). PROT_READWRITE - PROT_NONE or absent (again) - after abort UFFDIO_MPROTECT again, but you won't be able to read the page contents inside the memory manager thread (the one working with userfaultfd). The manager at all times if forbidden to touch the memory it is tracking with userfaultfd (if it does it'll deadlock, but kill -9 will get rid of it). gdb ironically because it is using an underoptimized access_process_vm wouldn't hang, because FAULT_FLAG_RETRY won't be set in handle_userfault in the gdb context, and it'll just receive a sigbus if by mistake the user tries to touch the memory. Even if it will hung later as get_user_pages_locked|unlocked gets used there too, kill -9 would solve gdb too. Back to the problem of accessing the UFFDIO_MPROTECT(PROT_NONE) memory: to do that a new ioctl should be required. I'd rather not go back to the route of UFFDIO_REMAP, but it could copy the
Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
2015-05-21 18:23+0200, Paolo Bonzini: On 21/05/2015 18:20, Radim Krčmář wrote: 2. NMI - SMI - IRET - RSM - NMI NMI is injected; I think it shouldn't be ... have you based this behavior on the 3rd paragraph of SDM 34.8 NMI HANDLING WHILE IN SMM (A special case [...])? Yes. Well, if I were to go lawyer [...] saves the SMRAM state save map but does not save the attribute to keep NMI interrupts disabled. NMI masking is a bit, so it'd be really wasteful not to have an attribute to keep NMI enabled in the same place ... Potentially, an NMI could be latched (while in SMM or upon exit) and serviced upon exit [...] This Potentially could be in the sense that the whole 3rd paragraph is only applicable to some ancient SMM design :) The 1st paragraph has quite clear sentence: If NMIs were blocked before the SMI occurred, they are blocked after execution of RSM. so I'd just ignore the 3rd paragraph ... And the APM 2:10.3.3 Exceptions and Interrupts NMI—If an NMI occurs while the processor is in SMM, it is latched by the processor, but the NMI handler is not invoked until the processor leaves SMM with the execution of an RSM instruction. A pending NMI causes the handler to be invoked immediately after the RSM completes and before the first instruction in the interrupted program is executed. An SMM handler can unmask NMI interrupts by simply executing an IRET. Upon completion of the IRET instruction, the processor recognizes the pending NMI, and transfers control to the NMI handler. Once an NMI is recognized within SMM using this technique, subsequent NMIs are recognized until SMM is exited. Later SMIs cause NMIs to be masked, until the SMM handler unmasks them. makes me think that we should unmask them unconditionally or that SMM doesn't do anything with NMI masking. If we can choose, less NMI nesting seems like a good idea. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing qboot, a minimal x86 firmware for QEMU
On 2015-05-21 15:51, Paolo Bonzini wrote: Some of you may have heard about the Clear Containers initiative from Intel, which couple KVM with various kernel tricks to create extremely lightweight virtual machines. The experimental Clear Containers setup requires only 18-20 MB to launch a virtual machine, and needs about 60 ms to boot. Now, as all of you probably know, QEMU is great for running Windows or legacy Linux guests, but that flexibility comes at a hefty price. Not only does all of the emulation consume memory, it also requires some form of low-level firmware in the guest as well. All of this adds quite a bit to virtual-machine startup times (500 to 700 milliseconds is not unusual). Right? In fact, it's for this reason that Clear Containers uses kvmtool instead of QEMU. No, wrong! In fact, reporting bad performance is pretty much the same as throwing down the gauntlet. Enter qboot, a minimal x86 firmware that runs on QEMU and, together with a slimmed-down QEMU configuration, boots a virtual machine in 40 milliseconds[2] on an Ivy Bridge Core i7 processor. qboot is available at git://github.com/bonzini/qboot.git. In all the glory of its 8KB of code, it brings together various existing open source components: * a minimal (really minimal) 16-bit BIOS runtime based on kvmtool's own BIOS * a couple hardware initialization routines written mostly from scratch but with good help from SeaBIOS source code * a minimal 32-bit libc based on kvm-unit-tests * the Linux loader from QEMU itself The repository has more information on how to achieve fast boot times, and examples of using qboot. Right now there is a limit of 8 MB for vmlinuz+initrd+cmdline, which however should be enough for initrd-less containers. The first commit to qboot is more or less 24 hours old, so there is definitely more work to do, in particular to extract ACPI tables from QEMU and present them to the guest. This is probably another day of work or so, and it will enable multiprocessor guests with little or no impact on the boot times. SMBIOS information is also available from QEMU. On the QEMU side, there is no support yet for persistent memory and the NFIT tables from ACPI 6.0. Once that (and ACPI support) is added, qboot will automatically start using it. Happy hacking! Incidentally, I did something similar these days to get Linux booting in Jailhouse non-root cells, i.e without BIOS and almost no hardware except memory, cpus and pci devices. Yes, requires a bit pv for Linux, but really little. Not aiming for speed (yet), just for less hypervisor work. Maybe there are some milliseconds to save when cutting off more hardware in an analogous way... PV pat^Whacks are here: http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse. The boot loader is a combination of a python script [1] (result can be saved and reused - replaces ACPI) and really few lines of code [2][3]. Jan [1] https://github.com/siemens/jailhouse/blob/wip/linux-x86-inmate/tools/jailhouse-cell-linux [2] https://github.com/siemens/jailhouse/blob/wip/linux-x86-inmate/inmates/lib/x86/header.S [3] https://github.com/siemens/jailhouse/blob/wip/linux-x86-inmate/inmates/tools/x86/linux-loader.c -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
2015-05-08 13:20+0200, Paolo Bonzini: The big ugly one. This patch adds support for switching in and out of system management mode, respectively upon receiving KVM_REQ_SMI and upon executing a RSM instruction. Both 32- and 64-bit formats are supported for the SMM state save area. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- RFC-v1: shift access rights left by 8 for 32-bit format move tracepoint to kvm_set_hflags fix NMI handling --- diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c @@ -2262,12 +2262,258 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt) +static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, u64 smbase, int n) +{ + struct desc_struct desc; + int offset; + u16 selector; + + selector = get_smstate(u32, smbase, 0x7fa8 + n * 4); (u16, SDM says that most significant 2 bytes are reserved anyway.) + if (n 3) + offset = 0x7f84 + n * 12; + else + offset = 0x7f2c + (n - 3) * 12; These numbers made me look where the hell is that defined and the easiest reference seemed to be http://www.sandpile.org/x86/smm.htm, which has several layouts ... I hopefully checked the intersection of various Intels and AMDs. + set_desc_base(desc, get_smstate(u32, smbase, offset + 8)); + set_desc_limit(desc, get_smstate(u32, smbase, offset + 4)); + rsm_set_desc_flags(desc, get_smstate(u32, smbase, offset)); (There wan't a layout where this would be right, so we could save the shifting of those flags in 64 bit mode. Intel P6 was close, and they had only 2 bytes for access right, which means they weren't shifted.) +static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase) +{ + cr0 = get_smstate(u32, smbase, 0x7ffc); (I wonder why they made 'smbase + 0x8000' the default offset in SDM, when 'smbase + 0xfe00' or 'smbase' would work as well.) +static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase) +{ + struct desc_struct desc; + u16 selector; + selector = get_smstate(u32, smbase, 0x7e90); + rsm_set_desc_flags(desc, get_smstate(u32, smbase, 0x7e92) 8); (Both reads should be u16. Luckily, extra data gets ignored.) static int em_rsm(struct x86_emulate_ctxt *ctxt) { + if ((ctxt-emul_flags X86EMUL_SMM_INSIDE_NMI_MASK) == 0) + ctxt-ops-set_nmi_mask(ctxt, false); NMI is always fun ... let's see two cases: 1. NMI - SMI - RSM - NMI NMI is not injected; ok. 2. NMI - SMI - IRET - RSM - NMI NMI is injected; I think it shouldn't be ... have you based this behavior on the 3rd paragraph of SDM 34.8 NMI HANDLING WHILE IN SMM (A special case [...])? Why I think we should restore NMI mask on RSM: - It's consistent with SMI - IRET - NMI - RSM - NMI (where we, I think correctly, unmask NMIs) and the idea that SMM tries to be to transparent (but maybe they didn't care about retarded SMI handlers). - APM 2:15.30.3 SMM_CTL MSR (C001_0116h) • ENTER—Bit 1. Enter SMM: map the SMRAM memory areas, record whether NMI was currently blocked and block further NMI and SMI interrupts. • EXIT—Bit 3. Exit SMM: unmap the SMRAM memory areas, restore the previous masking status of NMI and unconditionally reenable SMI. The MSR should mimic real SMM signals and does restore the NMI mask. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU
On 21/05/2015 17:48, Avi Kivity wrote: Lovely! Note you have memcpy.o instead of memcpy.c. Doh, and it's not used anyway. Check the repository, and let me know if OSv boots with it (it probably needs ACPI; Linux doesn't boot virtio without ACPI). Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
On 21/05/2015 18:20, Radim Krčmář wrote: 2. NMI - SMI - IRET - RSM - NMI NMI is injected; I think it shouldn't be ... have you based this behavior on the 3rd paragraph of SDM 34.8 NMI HANDLING WHILE IN SMM (A special case [...])? Yes. Why I think we should restore NMI mask on RSM: - It's consistent with SMI - IRET - NMI - RSM - NMI (where we, I think correctly, unmask NMIs) Yes, we do. and the idea that SMM tries to be to transparent (but maybe they didn't care about retarded SMI handlers). That's my reading of that paragraph of the manual. :) - APM 2:15.30.3 SMM_CTL MSR (C001_0116h) • ENTER—Bit 1. Enter SMM: map the SMRAM memory areas, record whether NMI was currently blocked and block further NMI and SMI interrupts. • EXIT—Bit 3. Exit SMM: unmap the SMRAM memory areas, restore the previous masking status of NMI and unconditionally reenable SMI. The MSR should mimic real SMM signals and does restore the NMI mask. No idea... My implementation does restore the previous masking status, but only if it was unmasked. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] KVM: x86: API changes for SMM support
2015-05-21 16:59+0200, Paolo Bonzini: On 21/05/2015 16:49, Radim Krčmář wrote: 2015-05-08 13:20+0200, Paolo Bonzini: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h @@ -202,7 +202,7 @@ struct kvm_run { __u32 exit_reason; __u8 ready_for_interrupt_injection; __u8 if_flag; - __u8 padding2[2]; + __u16 flags; (It got lost last review and I'd really like to know ... what is the advantage of giving both bytes to flags?) No advantage. You just should leave padding2[1] in the middle so that the offset of run-padding2[0] doesn't change. I don't get that. The position of padding should be decided by comparing probabilities of extending 'if_flag' and 'flags'. Since it's not obvious I gave two bytes to flags, but I can do it either way. if_flag seems to be set in stone as one bit, so I'd vote for __u8 flags; __u8 padding2; (Or 'padding3', to prevent the same class of errors that removing it altogether does; which we didn't do for other tailed padding). For there isn't much space left in struct kvm ... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
On 21/05/2015 18:20, Radim Krčmář wrote: + set_desc_base(desc, get_smstate(u32, smbase, offset + 8)); + set_desc_limit(desc, get_smstate(u32, smbase, offset + 4)); + rsm_set_desc_flags(desc, get_smstate(u32, smbase, offset)); (There wan't a layout where this would be right, so we could save the shifting of those flags in 64 bit mode. Intel P6 was close, and they had only 2 bytes for access right, which means they weren't shifted.) Check the AMD architecture manual. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
2015-05-21 18:21+0200, Paolo Bonzini: On 21/05/2015 18:20, Radim Krčmář wrote: +set_desc_base(desc, get_smstate(u32, smbase, offset + 8)); +set_desc_limit(desc, get_smstate(u32, smbase, offset + 4)); +rsm_set_desc_flags(desc, get_smstate(u32, smbase, offset)); (There wan't a layout where this would be right, so we could save the shifting of those flags in 64 bit mode. Intel P6 was close, and they had only 2 bytes for access right, which means they weren't shifted.) Check the AMD architecture manual. I must be blind, is there more than Table 10-2? (And according to ADM manual, we are overwriting GDT and IDT base at offset 0xff88 and 0xff94 with ES and CS data, so it's not the best reference for this case ...) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html