Re: [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion
On Sun, Feb 24, 2013, Jan Kiszka wrote about [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion: From: Jan Kiszka jan.kis...@siemens.com On Intel, raising INIT causing an unconditional vmexit. On AMD, this is controlled by the interception mask. Hi, I never tried to closely follow the KVM code paths related this code, but I do have one question: The VMX spec says: The INIT signal is blocked whenever a logical processor is in VMX root operation. It is not blocked in VMX non-root operation. Instead, INITs cause VM exits (see Section 22.3, Other Causes of VM Exits). So when running a non-nested L1 guest, or an L2 guest, the new behavior appears correct. However, it looks like if L1 is running in root mode (i.e., did VMXON once but not running L2 now), the INIT signal needs to be blocked, not do what it does now. It appears (but I'm not sure...) that right now, it causes the L1 guest to lock up, and is not ignored? Nadav. -- Nadav Har'El|Monday, Feb 25 2013, 15 Adar 5773 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |If con is the opposite of pro, is http://nadav.harel.org.il |congress the opposite of progress? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 54061] guest panic after live migration
On 02/20/2013 04:06 PM, bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=54061 --- Comment #2 from Jay Ren yongjie@intel.com 2013-02-20 08:06:16 --- Marcleo, yes, after reverting that commit caf6900f2d8, live-migration can work fine. Sorry for the wrong patch and the bug is: the fast page fault path can make large-spte writeable without probably set dirty bitmap for all small pages. Two ways to fix these: 1): covert the large-spte to small sptes and wirte-protect them, then no readonly large-spte exists. 2): only allow fast page fault to fix #PF on small spte. Seems the first way is better because the second way can cause useless page table waking. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54061] guest panic after live migration
https://bugzilla.kernel.org/show_bug.cgi?id=54061 --- Comment #3 from Anonymous Emailer anonym...@kernel-bugs.osdl.org 2013-02-25 08:26:26 --- Reply-To: xiaoguangr...@linux.vnet.ibm.com On 02/20/2013 04:06 PM, bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=54061 --- Comment #2 from Jay Ren yongjie@intel.com 2013-02-20 08:06:16 --- Marcleo, yes, after reverting that commit caf6900f2d8, live-migration can work fine. Sorry for the wrong patch and the bug is: the fast page fault path can make large-spte writeable without probably set dirty bitmap for all small pages. Two ways to fix these: 1): covert the large-spte to small sptes and wirte-protect them, then no readonly large-spte exists. 2): only allow fast page fault to fix #PF on small spte. Seems the first way is better because the second way can cause useless page table waking. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 0/4] kvm: Make ioeventfd usable on s390.
On Sun, 24 Feb 2013 11:56:39 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Fri, Feb 22, 2013 at 01:09:45PM +0100, Cornelia Huck wrote: Here's the second attempt at implementing ioeventfd for s390. The patchset looks fine overall. Minor comments and questions below. Cool, thanks for reviewing. Rather than the architecture-specific functions used in v1, we now try to integrate with the kvm_io_device infrastructure. Calls to diagnose 500 subcode 3 are now mapped to _write. These devices are created on a new KVM_CSS_BUS when using a new flag KVM_IOEVENTFD_FLAG_CSS. addr and datamatch are (ab)used to contain the subchannel id and the virtqueue. A drawback is that this interface is not easily extendable should we want to attach other hypercalls or carry more payload. Under s390 kvm_hypercallX already uses diagnose 500 so that seems fine. If you want to make it more generic and support more subcodes, I think you'll have to pass an extra u64 field: to bus to both avoid overflowing int value and avoid ugly bus-specific hacks in generic code. Will we ever need that? Code using subcode 3 does not yet seem to be upstream in 3.8 so maybe yes, but you decide. Subcode 3 will be in 3.9. An alternative is to add new bus types when kvm needs to handle new subcodes. So e.g. KVM_BUS_S390_VIRTIO_CCW_NOTIFY and KVM_BUS_S390_VIRTIO_NOTIFY ? I think I'll fall back to a new bus type should we ever need a new notification type - the less strange hacks, the better. You decide, I'm fine with either approach. More minor comments and questions in response to individual patches. Another limitation is the limit of 1000 io devices per bus, which we would hit easily with a few hundred devices, but that should be fixable. v1 - v2: - Move irqfd initialization from a module init function to kvm_init, eliminating the need for a second module for kvm/s390. - Use kvm_io_device for s390 css devices. Cornelia Huck (4): KVM: Initialize irqfd from kvm_init(). KVM: Introduce KVM_CSS_BUS. KVM: ioeventfd for s390 css devices. KVM: s390: Wire up ioeventfd. Documentation/virtual/kvm/api.txt | 7 +++ arch/s390/kvm/Kconfig | 1 + arch/s390/kvm/Makefile| 2 +- arch/s390/kvm/diag.c | 25 + arch/s390/kvm/kvm-s390.c | 1 + include/linux/kvm_host.h | 14 ++ include/uapi/linux/kvm.h | 2 ++ virt/kvm/eventfd.c| 15 --- virt/kvm/kvm_main.c | 6 ++ 9 files changed, 65 insertions(+), 8 deletions(-) -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. It's not pleasant to cause a performance regression, so it should be a last resort (or maybe do both if it helps). On Sun, Feb 24, 2013 at 8:08 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Sun, Feb 24, 2013 at 04:19:17PM +0200, Gleb Natapov wrote: On Sun, Feb 24, 2013 at 01:55:07PM +, Zhang, Yang Z wrote: I do not think it fixes it. There is no guaranty that IPI will be processed by remote cpu while sending cpu is still in locked section, so the same race may happen regardless. As you say above there are 3 contexts, but only two use locks. See following logic, I think the problem you mentioned will not happened with current logic. get lock if test_pir (this will ensure there is no in flight IPI for same interrupt. Since we are taking the lock now, no IPI will be sent before we release the lock and no pir-irr is performed by hardware for same interrupt.) Does the PIR IPI interrupt returns synchronously _after_ PIR-IRR transfer is made? Don't think so. PIR IPI interrupt returns after remote CPU has acked interrupt receival, not after remote CPU has acked _and_ performed PIR-IRR transfer. Right? (yes, right, see step 4. below). Should try to make it easier to drop the lock later, so depend on it as little as possible (also please document what it protects in detail). Also note: 3. The processor clears the outstanding-notification bit in the posted-interrupt descriptor. This is done atomically so as to leave the remainder of the descriptor unmodified (e.g., with a locked AND operation). 4. The processor writes zero to the EOI register in the local APIC; this dismisses the interrupt with the postedinterrupt notification vector from the local APIC. 5. The logical processor performs a logical-OR of PIR into VIRR and clears PIR. No other agent can read or write a PIR bit (or group of bits) between the time it is read (to determine what to OR into VIRR) and when it is cleared. So checking the ON bit does not mean the HW has finished performing PIR-VIRR transfer (which means ON bit can only be used as an indication of whether to send interrupt, not whether PIR-VIRR transfer is finished). So its fine to do - atomic set pir - if (atomic_test_and_set(PIR ON bit)) - send IPI But HW can transfer two distinct bits, set by different serialized instances of kvm_set_irq (protected with a lock), in a single PIR-IRR pass. I do not see where those assumptions are coming from. Testing pir does not guaranty that the IPI is not processed by VCPU right now. iothread: vcpu: send_irq() lock(pir) check pir and irr set pir send IPI (*) unlock(pir) send_irq() lock(pir) receive IPI (*) atomic { pir_tmp = pir pir = 0 } check pir and irr irr = pir_tmp set pir send IPI unlock(pir) At this point both pir and irr are set and interrupt may be coalesced, but it is reported as delivered. s/set pir/injected = !t_a_s(pir)/ So what prevents the scenario above from happening? -- Gleb. Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 4/4] KVM: s390: Wire up ioeventfd.
On Sun, 24 Feb 2013 11:45:22 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Fri, Feb 22, 2013 at 01:09:49PM +0100, Cornelia Huck wrote: Enable ioeventfd support on s390 and hook up diagnose 500 virtio-ccw notifications. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/kvm/Kconfig| 1 + arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/diag.c | 25 + arch/s390/kvm/kvm-s390.c | 1 + 4 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index b58dd86..3c43e30 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -22,6 +22,7 @@ config KVM select PREEMPT_NOTIFIERS select ANON_INODES select HAVE_KVM_CPU_RELAX_INTERCEPT + select HAVE_KVM_EVENTFD ---help--- Support hosting paravirtualized guest machines using the SIE virtualization capability on the mainframe. This should work diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 3975722..8fe9d65 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -6,7 +6,7 @@ # it under the terms of the GNU General Public License (version 2 only) # as published by the Free Software Foundation. -common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o) +common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o eventfd.o) ccflags-y := -Ivirt/kvm -Iarch/s390/kvm diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c index a390687..13cf74a 100644 --- a/arch/s390/kvm/diag.c +++ b/arch/s390/kvm/diag.c @@ -104,6 +104,29 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu) return -EREMOTE; } +static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu) +{ + int ret, idx; + + /* No virtio-ccw notification? Get out quickly. */ + if (!vcpu-kvm-arch.css_support || + (vcpu-run-s.regs.gprs[1] != 3)) + return -EOPNOTSUPP; Looks like we have: arch/s390/include/uapi/asm/kvm_virtio.h:#define KVM_S390_VIRTIO_NOTIFY 0 arch/s390/include/uapi/asm/kvm_virtio.h:#define KVM_S390_VIRTIO_RESET 1 arch/s390/include/uapi/asm/kvm_virtio.h:#define KVM_S390_VIRTIO_SET_STATUS 2 KVM_S390_VIRTIO_CSS_NOTIFY is missing? Strange. Maybe it should be added in that header and included here? kvm_virtio.h is for the old-style s390-virtio mechanism. Subcode 3 will be included in 3.9. Do some guests use it? Also, do you want to support KVM_S390_VIRTIO_NOTIFY as well? We don't want to invest any more effort in the old style s390-virtio transport (other than keeping it working). Any future work will be for virtio-ccw. + + idx = srcu_read_lock(vcpu-kvm-srcu); + /* +* The layout is as follows: +* - gpr 2 contains the subchannel id (passed as addr) +* - gpr 3 contains the virtqueue index (passed as datamatch) +*/ + ret = kvm_io_bus_write(vcpu-kvm, KVM_CSS_BUS, + vcpu-run-s.regs.gprs[2], + 8, vcpu-run-s.regs.gprs[3]); + srcu_read_unlock(vcpu-kvm-srcu, idx); + /* kvm_io_bus_write returns -EOPNOTSUPP if it found no match. */ + return ret; +} + int kvm_s390_handle_diag(struct kvm_vcpu *vcpu) { int code = (vcpu-arch.sie_block-ipb 0xfff) 16; @@ -118,6 +141,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu) return __diag_time_slice_end_directed(vcpu); case 0x308: return __diag_ipl_functions(vcpu); + case 0x500: + return __diag_virtio_hypercall(vcpu); default: return -EOPNOTSUPP; } diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index f822d36..04d2454 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -142,6 +142,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ONE_REG: case KVM_CAP_ENABLE_CAP: case KVM_CAP_S390_CSS_SUPPORT: + case KVM_CAP_IOEVENTFD: r = 1; break; case KVM_CAP_NR_VCPUS: -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 3/4] KVM: ioeventfd for s390 css devices.
On Sun, 24 Feb 2013 11:47:50 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Fri, Feb 22, 2013 at 01:09:48PM +0100, Cornelia Huck wrote: Enhance KVM_IOEVENTFD with a new flag that allows to attach to s390 css devices. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- Documentation/virtual/kvm/api.txt | 7 +++ include/uapi/linux/kvm.h | 2 ++ virt/kvm/eventfd.c| 8 ++-- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index c2534c3..40e799c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1468,15 +1468,22 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +For the special case of s390 css devices, the ioevent is matched to a +subchannel/virtqueue tuple instead. + The following flags are defined: #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 kvm_ioeventfd_flag_nr_datamatch) #define KVM_IOEVENTFD_FLAG_PIO (1 kvm_ioeventfd_flag_nr_pio) #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 kvm_ioeventfd_flag_nr_deassign) +#define KVM_IOEVENTFD_FLAG_CSS (1 kvm_ioeventfd_flag_nr_css) If datamatch flag is set, the event will be signaled only if the written value to the registered address is equal to datamatch in struct kvm_ioeventfd. +For css devices, addr contains the subchannel id and datamatch the virtqueue +index. + 4.60 KVM_DIRTY_TLB diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 9a2db57..1df0766 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -448,12 +448,14 @@ enum { kvm_ioeventfd_flag_nr_datamatch, kvm_ioeventfd_flag_nr_pio, kvm_ioeventfd_flag_nr_deassign, + kvm_ioeventfd_flag_nr_css, kvm_ioeventfd_flag_nr_max, }; #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 kvm_ioeventfd_flag_nr_datamatch) #define KVM_IOEVENTFD_FLAG_PIO (1 kvm_ioeventfd_flag_nr_pio) #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 kvm_ioeventfd_flag_nr_deassign) +#define KVM_IOEVENTFD_FLAG_CSS (1 kvm_ioeventfd_flag_nr_css) So let's explicitly name is KVM_IOEVENTFD_S390_VIRTIO_CCW or even KVM_IOEVENTFD_S390_VIRTIO_CCW_NOTIFY? How about KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY? Also, maybe we want to add KVM_IOEVENTFD_S390_VIRTIO_NOTIFY as well? No, since s390-virtio is a dead end. #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 kvm_ioeventfd_flag_nr_max) - 1) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f0ced1a..7347652 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -679,11 +679,13 @@ static int kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { int pio = args-flags KVM_IOEVENTFD_FLAG_PIO; - enum kvm_bus bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS; + int css = args-flags KVM_IOEVENTFD_FLAG_CSS; + enum kvm_bus bus_idx; struct _ioeventfd*p; struct eventfd_ctx *eventfd; int ret; + bus_idx = pio ? KVM_PIO_BUS : css ? KVM_CSS_BUS: KVM_MMIO_BUS; /* must be natural-word sized */ switch (args-len) { case 1: @@ -759,11 +761,13 @@ static int kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { int pio = args-flags KVM_IOEVENTFD_FLAG_PIO; - enum kvm_bus bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS; + int css = args-flags KVM_IOEVENTFD_FLAG_CSS; + enum kvm_bus bus_idx; struct _ioeventfd*p, *tmp; struct eventfd_ctx *eventfd; int ret = -ENOENT; + bus_idx = pio ? KVM_PIO_BUS : css ? KVM_CSS_BUS: KVM_MMIO_BUS; eventfd = eventfd_ctx_fdget(args-fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 2/4] KVM: Introduce KVM_CSS_BUS.
On Sun, 24 Feb 2013 11:57:25 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Fri, Feb 22, 2013 at 01:09:47PM +0100, Cornelia Huck wrote: Add a new bus type for s390 css kvm io devices. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3b768ef..59be516 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -148,6 +148,7 @@ struct kvm_io_bus { enum kvm_bus { KVM_MMIO_BUS, KVM_PIO_BUS, + KVM_CSS_BUS, so maybe KVM_S390_VIRTIO_CCW_NOTIFY_BUS to make the fact it's s390 and virtio specific, explicit? Hm, I don't really see a need to point out that this is s390-specific, so I think I'd prefer KVM_VIRTIO_CCW_NOTIFY_BUS. KVM_NR_BUSES }; -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion
On 2013-02-25 09:00, Nadav Har'El wrote: On Sun, Feb 24, 2013, Jan Kiszka wrote about [PATCH] KVM: nSVM/nVMX: Implement vmexit on INIT assertion: From: Jan Kiszka jan.kis...@siemens.com On Intel, raising INIT causing an unconditional vmexit. On AMD, this is controlled by the interception mask. Hi, I never tried to closely follow the KVM code paths related this code, but I do have one question: The VMX spec says: The INIT signal is blocked whenever a logical processor is in VMX root operation. It is not blocked in VMX non-root operation. Instead, INITs cause VM exits (see Section 22.3, Other Causes of VM Exits). So when running a non-nested L1 guest, or an L2 guest, the new behavior appears correct. However, it looks like if L1 is running in root mode (i.e., did VMXON once but not running L2 now), the INIT signal needs to be blocked, not do what it does now. It appears (but I'm not sure...) that right now, it causes the L1 guest to lock up, and is not ignored? Yeah, forgot about this detail. Unfortunately, this means we need to track the signal state, processing it on vmentry and, on AMD, when GIF becomes 1. Is the nested-related state already saved on AMD, Jörg? If not, adding this one would not make things worse at least. Still, missing user space save/restore already breaks reset, not only migration (dunno if this is better on AMD). Jan signature.asc Description: OpenPGP digital signature
Re: Tracing kvm: kvm_entry and kvm_exit
On Fri, Feb 22, 2013 at 11:34:27AM -0500, Mohamad Gebai wrote: I am tracing kvm using perf and I am analyzing the sequences of kvm_entry and kvm_exit tracepoints. I noticed that during the boot process of a VM, there are a lot more (2 to 3 as many times) kvm_entry event than there are kvm_exit. I tried looking around but didn't find anything that explains this. Is this missing instrumentation? Or what other path does kvm take that doesn't generate a kvm_exit event? Gleb Natapov noticed something similar when playing with the perf script I posted here: http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/104181 Perhaps there is a code path that is missing trace_kvm_exit(). We didn't investigate why it happens but the unexplained kvm_entry events only appeared at the beginning of the trace, so the theory was that events are not activated atomically by perf(1). CCing perf mailing list. It would be interesting if someone knows the answer. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well? Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well? Windows guests may experience timedrift under CPU overcommit scenario. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well? Windows guests may experience timedrift under CPU overcommit scenario. Ok, I see. Seems we are stuck. :( Do you have any suggestion to solve or workaround current problem? Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM updates for the 3.9 merge window
On Sun, 24 Feb 2013 16:05:31 -0800 Linus Torvalds torva...@linux-foundation.org wrote: On Wed, Feb 20, 2013 at 5:17 PM, Marcelo Tosatti mtosa...@redhat.com wrote: Please pull from git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.9-1 to receive the KVM updates for the 3.9 merge window [..] Ok, particularly the s390 people should check me resolution of the conflicts, since they include the renaming of IOINT_VIR to IRQIO_VIR. But the uapi header file move should be couble-checked by people who use this too. The IRQIO_VIR is now at the end of the IRQIO_xxx entries while we had it between the IRQIO_CSC and the IRQIO_PCI entry. No worry, we just change our code and use the upstream version. Just completed the double-check and pushed my branches to the linux-s390 tree. The code is in sync. -- blue skies, Martin. Reality continues to ruin my life. - Calvin. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for 2013-02-26
Hi Please send in any agenda topics you are interested in. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fixing KVM/ARM breakage in mainline
Russell, Marcelo, Gleb, Commit 89f8833 (Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm) broke KVM/ARM. Nothing fundamental, just annoying enough. I've posted patches against next-20130215 some time ago: http://comments.gmane.org/gmane.comp.emulators.kvm.devel/105019 I've now created a branch based off mainline as of this morning, and containing these fixes: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm/core-3.9-rc1-fixes Now the question is: who wants to merge this? The diffstat looks like this: Marc Zyngier (3): ARM: KVM: fix kvm_arch_{prepare,commit}_memory_region ARM: KVM: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS ARM: KVM: fix compilation after removal of user_alloc from struct kvm_memory_slot arch/arm/include/asm/kvm_host.h | 2 +- arch/arm/kvm/arm.c | 4 ++-- arch/arm/kvm/mmu.c | 5 - 3 files changed, 3 insertions(+), 8 deletions(-) Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing KVM/ARM breakage in mainline
On Mon, Feb 25, 2013 at 12:18:27PM +, Marc Zyngier wrote: Russell, Marcelo, Gleb, Commit 89f8833 (Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm) broke KVM/ARM. Nothing fundamental, just annoying enough. I've posted patches against next-20130215 some time ago: http://comments.gmane.org/gmane.comp.emulators.kvm.devel/105019 I've now created a branch based off mainline as of this morning, and containing these fixes: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm/core-3.9-rc1-fixes Now the question is: who wants to merge this? The diffstat looks like this: I was going to merge it today. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well? Windows guests may experience timedrift under CPU overcommit scenario. Ok, I see. Seems we are stuck. :( Do you have any suggestion to solve or workaround current problem? Depend on knowledge about atomicity (item 5 IIRC) of the sequence in the manual. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
Marcelo Tosatti wrote on 2013-02-25: On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well? Windows guests may experience timedrift under CPU overcommit scenario. Ok, I see. Seems we are stuck. :( Do you have any suggestion to solve or workaround current problem? Depend on knowledge about atomicity (item 5 IIRC) of the sequence in the manual. I am sure it is not atomic: tmp_pir=xhcg(pir, 0) irr |= tmp_pir Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 12/15] target-ppc: Refactor debug output macros
On 21.02.2013, at 05:25, Andreas Färber wrote: Make debug output compile-testable even if disabled. Rename dprintf() in kvm.c to kvm_dprintf() to avoid conflict with glibc. Inline DEBUG_OP check in excp_helper.c. Inline LOG_MMU_STATE() in mmu_helper.c. Inline PPC_{DEBUG_SPR,DUMP_SPR_ACCESSES} checks in translate_init.c. Signed-off-by: Andreas Färber afaer...@suse.de I assume you verified that all the bits do get optimized out, right? :) Reviewed-by: Alexander Graf ag...@suse.de Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 13/15] target-s390x: Refactor debug output macros
On 21.02.2013, at 05:25, Andreas Färber wrote: Make debug output compile-testable even if disabled. Rename dprintf() in kvm.c to kvm_dprintf() due to a conflict with glibc. Drop unused DEBUG_HELPER and LOG_HELPER() in fpu_helper.c. Drop unused LOG_DISAS() in translate.c and inline S390X_DEBUG_DISAS. Signed-off-by: Andreas Färber afaer...@suse.de Reviewed-by: Alexander Graf ag...@suse.de Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote: On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote: Copying Christoffer since ARM has in kernel irq chip too. On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. You are not only provide different way to create in kernel irq chip you also use an alternate way to trigger interrupt lines. Before going into interface specifics lets think about whether it is really worth it? x86 obviously support old way and will have to for some, very long, time. ARM vGIC code, that is ready to go upstream, uses old way too. So it will be 2 archs against one. Christoffer do you think the proposed way it better for your needs. Are you willing to make vGIC use it? In fact there have been two distinct interrupt controller emulations for PPC posted recently, Scott's and mine, with quite different interfaces. In contrast to Scott's device control API, where the operations you would do for different interrupt controllers are quite different, mine attempted to define a much more abstract interface for interrupt controller emulations, with the idea that an interrupt controller subsystem connects a set of interrupt sources, each with some state, to a set of interrupt delivery mechanisms, one per vcpu, also with some state. I agree with Scott that it is futile to try and come up with generic irqchip configuration interface and I doubt it is needed from QEMU or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES interface and while it is possible to pass some information about pic/ioapic using it there will be a lot of information that will not fit there. For one there is global irqchips related state and proposed interface only talk about interrupt sources. Another is that using generic interface will require us to have a code that translate irqchip representation into this generic one and back for no apparent gain. Currently pic/ioapic state is very similar to what HW specification defines and it is not going to change. Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how well it work for migration though. The interface suppose to transfer irqchips state as is, but I see things like that in your code: mutex_lock(ics-lock); irqp-server = val KVM_IRQ_SERVER_MASK; irqp-priority = val KVM_IRQ_PRIORITY_SHIFT; irqp-resend = 0; irqp-masked_pending = 0; irqp-asserted = 0; Why it is safe to initialize those values to default values during migration? Wouldn't it be simpler and more correct to just transfer the whole content of irqp from src to dst? Thus my interface had: - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that indicates the overall architecture of the interrupt subsystem, - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that return or modify the state of some set of interrupt sources - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register identifier space, that is used to save and restore the per-vcpu interrupt presentation state. Alternatively, I could modify my code to use the existing KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls. If I were to do that I would want to rename the 'pad' field in struct kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of interrupt sources to be affected. I'd also want to keep the ONE_REG identifier for the per-vcpu state. It is preferable to the interface you propose since I do not think your interface fits other interrupt controllers well. You can put nr field into dummy[] payload, or replace pad with union {pad, nr}. Or I could change over to using Scott's device control API, though I have some objections to doing that. What is your advice about which interface to use? The ideal situation would be if you and Scott agree on the details about the interface. If you don't like something about Scott's interface we can discuss it and shape it to something you agree with or even like. I already asked Scott to introduce command interface. Scott does not
Re: [PATCH v2 12/15] target-ppc: Refactor debug output macros
Am 25.02.2013 13:54, schrieb Alexander Graf: On 21.02.2013, at 05:25, Andreas Färber wrote: Make debug output compile-testable even if disabled. Rename dprintf() in kvm.c to kvm_dprintf() to avoid conflict with glibc. Inline DEBUG_OP check in excp_helper.c. Inline LOG_MMU_STATE() in mmu_helper.c. Inline PPC_{DEBUG_SPR,DUMP_SPR_ACCESSES} checks in translate_init.c. Signed-off-by: Andreas Färber afaer...@suse.de I assume you verified that all the bits do get optimized out, right? :) No, I didn't for each, my focus was to make debug code compile and keep it compiling after my CPU changes. :) Please read up on the new discussion of rth not liking static const and proposing to go back to v1, modulo do { ... } while (0). Like I said there, if finding a solution that pleases everyone fails, then I will leave it to the maintainers (i.e., you) to choose and apply a solution or to live with the resulting breakages. Andreas Reviewed-by: Alexander Graf ag...@suse.de Alex -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ppc-next v2] target-ppc: Make host CPU a subclass of the host's CPU model
On 23.02.2013, at 22:22, Andreas Färber wrote: This avoids assigning individual class fields and contributors forgetting to add field assignments in KVM-only code. ppc_cpu_class_find_by_pvr() requires the CPU model classes to be registered, so defer host CPU type registration to kvm_arch_init(). Only register the host CPU type if there is a class with matching PVR. This lets us drop error handling from instance_init. Signed-off-by: Andreas Färber afaer...@suse.de Thanks, applied to ppc-next. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well? Windows guests may experience timedrift under CPU overcommit scenario. Ok, I see. Seems we are stuck. :( Do you have any suggestion to solve or workaround current problem? I see a couple of possible solutions: 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons: current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it will be slow on newer kernels 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not running during injection. This assumes that if vcpu is running and does not process interrupt it is guest fault and the same can happen on real HW too. Coalescing when vcpu is not running though is the result of CPU overcommit and should be reported. Cons interface definition is kind of murky. 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Other ideas? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54461] New: nVMX: Downgrading nested VMX features from user-space
https://bugzilla.kernel.org/show_bug.cgi?id=54461 Summary: nVMX: Downgrading nested VMX features from user-space Product: Virtualization Version: unspecified Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: enhancement Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: n...@math.technion.ac.il Regression: No In nested SVM, the sub-features of SVM are all part of CPUID, so we already have a user-space interface (KVM_GET_SUPPORTED_CPUID/KVM_SET_CPUID2) to set a guest's features - which is, for example, useful for dictating a fixed set of features across different hosts. However, in nested VMX, the VMX sub-features offered or not to the guest are not in CPUID, but rather in MSRs, and currently the nested_vmx_setup_ctls_msrs() function determines their value - without any option for user space to control this. We need some sort of new KVM_SET_READONLY_MSRS ioctl for overriding (downgrading) nested_vmx_setup_ctls_msrs()'s decisions for a particular guest. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 53601] nVMX meta-bug
https://bugzilla.kernel.org/show_bug.cgi?id=53601 Nadav Har'El n...@math.technion.ac.il changed: What|Removed |Added Depends on||54461 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54461] nVMX: Downgrading nested VMX features from user-space
https://bugzilla.kernel.org/show_bug.cgi?id=54461 Nadav Har'El n...@math.technion.ac.il changed: What|Removed |Added Blocks||53601 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 12/15] target-ppc: Refactor debug output macros
On 25.02.2013, at 14:14, Andreas Färber wrote: Am 25.02.2013 13:54, schrieb Alexander Graf: On 21.02.2013, at 05:25, Andreas Färber wrote: Make debug output compile-testable even if disabled. Rename dprintf() in kvm.c to kvm_dprintf() to avoid conflict with glibc. Inline DEBUG_OP check in excp_helper.c. Inline LOG_MMU_STATE() in mmu_helper.c. Inline PPC_{DEBUG_SPR,DUMP_SPR_ACCESSES} checks in translate_init.c. Signed-off-by: Andreas Färber afaer...@suse.de I assume you verified that all the bits do get optimized out, right? :) No, I didn't for each, my focus was to make debug code compile and keep it compiling after my CPU changes. :) Please read up on the new discussion of rth not liking static const and proposing to go back to v1, modulo do { ... } while (0). Like I said there, if finding a solution that pleases everyone fails, then I will leave it to the maintainers (i.e., you) to choose and apply a solution or to live with the resulting breakages. I care about 2 things: * #ifdef works * if not defined then the code should get optimized out :). Everything else is icing on top. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 53761] nVMX: MSR bitmap merging
https://bugzilla.kernel.org/show_bug.cgi?id=53761 --- Comment #1 from Nadav Har'El n...@math.technion.ac.il 2013-02-25 13:59:58 --- The same technique, and code, can be reused also for merging of IO bitmaps. This will offer less of a performance impact than that on MSR bitmaps, because the KVM L0 anyway wants exit on all IO ports (except the esoteric port 0x80 used for delays). So I am guessing no performance improvement will be seen because of merging IO bitmaps. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 06:55:58AM +, Zhang, Yang Z wrote: p1) cpu0 cpu1vcpu0 test_and_set_bit(PIR-A) set KVM_REQ_EVENT process REQ_EVENT PIR-A-IRR vcpu-mode=IN_GUEST if (vcpu0-guest_mode) if (!t_a_s_bit(PIR notif)) send IPI linux_pir_handler t_a_s_b(PIR-B)=1 no PIR IPI sent p2) No, this exists with your implementation not mine. As I said, in this patch, it will make request after vcpu ==guest mode, then send ipi. Even the ipi is lost, but the request still will be tracked. Because we have the follow check: vcpu-mode = GUEST_MODE (ipi may arrived here and lost) local irq disable check request (this will ensure the pir modification will be picked up by vcpu before vmentry) Please read the sequence above again, between p1) and p2). Yes, if the IPI is lost, the request _for the CPU which sent PIR IPI_ is guaranteed to be processed, but not the request for another cpu (cpu1). If no PIR IPI sent in cpu1, the delivered is false, and it will roll back to old logic: __apic_accept_irq(): if (!delivered) { kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); } This can solve the problem you mentioned above. Right? Should not be doing this kick in the first place. One result of it is that you'll always take a VM-exit if a second injection happens while a VCPU has not handled the first one. What is the drawback of the suggestion to unconditionally clear PIR notification bit on VM-entry? We can avoid it, but lets get it correct first (BTW can stick a comment saying FIXME: optimize) on that PIR clearing. cpu0 check pir(pass) check irr(pass) injected = !test_and_set_bit(pir) versus cpu1 xchg(pir) No information can be lost because all accesses to shared data are atomic. Sorry, I cannot get your point. Why 'xchg(pir)' can solve the problem I mentioned above? Can you elaborate it? The spinlock I used is trying to protect the whole procedure of sync pir to irr, not just access pir. You're right, its the same problem as with the hardware update. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 03:34:19PM +0200, Gleb Natapov wrote: On Mon, Feb 25, 2013 at 11:13:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 11:04:25AM +, Zhang, Yang Z wrote: Gleb Natapov wrote on 2013-02-25: On Mon, Feb 25, 2013 at 08:42:52AM +, Zhang, Yang Z wrote: Avi Kivity wrote on 2013-02-25: I didn't really follow, but is the root cause the need to keep track of interrupt coalescing? If so we can recommend that users use KVM_IRQ_LINE when coalescing is unneeded, and move interrupt injection with irq coalescing support to vcpu context. So we can hide the capability KVM_CAP_IRQ_INJECT_STATUS when posted interrupt is enabled to force users doesn't to use KVM_IRQ_LINE_STATUS. Does this acceptable? The only case in KVM that need to know the interrupt injection status is vlapic timer. But since vlapic timer and vcpu are always in same pcpu, so there is no problem. Not really. The primary user of this interface is RTC interrupt re-injection for Windows guests. So without KVM_IRQ_LINE_STATUS capability, RTC cannot work well? Windows guests may experience timedrift under CPU overcommit scenario. Ok, I see. Seems we are stuck. :( Do you have any suggestion to solve or workaround current problem? I see a couple of possible solutions: 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons: current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it will be slow on newer kernels Can add a capability to QEMU and enable APICv selectively only in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu only when necessary (and KVM_IRQ_LINE otherwise). Even a lock serializing injection is not safe because ON bit is cleared before XCHG(PIR, 0). Must do something heavier (such as running on target vcpu context). 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not running during injection. This assumes that if vcpu is running and does not process interrupt it is guest fault and the same can happen on real HW too. Coalescing when vcpu is not running though is the result of CPU overcommit and should be reported. Cons interface definition is kind of murky. 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Breaks older userspace? Other ideas? Can HW write a 'finished' bit after 6 in the reserved area? Suppose its not a KVM-specific problem? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54471] New: nVMX: TRUE* MSRs unnecessary
https://bugzilla.kernel.org/show_bug.cgi?id=54471 Summary: nVMX: TRUE* MSRs unnecessary Product: Virtualization Version: unspecified Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: enhancement Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: n...@math.technion.ac.il Regression: No According to the spec, the TRUE* MSRs are only necessary if bit 55 of VMX_BASIC is on. Since we don't set this bit, we don't need to provide these MSRs... So probably we shouldn't. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 53601] nVMX meta-bug
https://bugzilla.kernel.org/show_bug.cgi?id=53601 Nadav Har'El n...@math.technion.ac.il changed: What|Removed |Added Depends on||54471 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54471] nVMX: TRUE* MSRs unnecessary
https://bugzilla.kernel.org/show_bug.cgi?id=54471 Nadav Har'El n...@math.technion.ac.il changed: What|Removed |Added Blocks||53601 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 11:00:21AM -0300, Marcelo Tosatti wrote: I see a couple of possible solutions: 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons: current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it will be slow on newer kernels Can add a capability to QEMU and enable APICv selectively only in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu only when necessary (and KVM_IRQ_LINE otherwise). Bad idea. What happens with mixed scenarios. Even a lock serializing injection is not safe because ON bit is cleared before XCHG(PIR, 0). Must do something heavier (such as running on target vcpu context). Note always running on target vcpu is likely to be slower than no-APICv. So need to do something heavier on the kernel under serialization, if firmware cannot be changed (injection from simultaneous CPUs should be rare so if data to serialize __accept_apic_irq is cache-line aligned it should reduce performance impact). 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not running during injection. This assumes that if vcpu is running and does not process interrupt it is guest fault and the same can happen on real HW too. Coalescing when vcpu is not running though is the result of CPU overcommit and should be reported. Cons interface definition is kind of murky. 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Is there no int-ack notification at RTC HW level? Breaks older userspace? Other ideas? Can HW write a 'finished' bit after 6 in the reserved area? Suppose its not a KVM-specific problem? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 54471] New: nVMX: TRUE* MSRs unnecessary
On 2013-02-25 15:04, bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=54471 Summary: nVMX: TRUE* MSRs unnecessary Product: Virtualization Version: unspecified Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: enhancement Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: n...@math.technion.ac.il Regression: No According to the spec, the TRUE* MSRs are only necessary if bit 55 of VMX_BASIC is on. Since we don't set this bit, we don't need to provide these MSRs... So probably we shouldn't. I tend to say we should (i.e. set bit 55) as there is the risk that buggy guests could needlessly get confused when we expose very recent features (like EPT) and do not provide such old-fashioned stuff. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54471] nVMX: TRUE* MSRs unnecessary
https://bugzilla.kernel.org/show_bug.cgi?id=54471 --- Comment #1 from Anonymous Emailer anonym...@kernel-bugs.osdl.org 2013-02-25 14:29:08 --- Reply-To: jan.kis...@siemens.com On 2013-02-25 15:04, bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=54471 Summary: nVMX: TRUE* MSRs unnecessary Product: Virtualization Version: unspecified Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: enhancement Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: n...@math.technion.ac.il Regression: No According to the spec, the TRUE* MSRs are only necessary if bit 55 of VMX_BASIC is on. Since we don't set this bit, we don't need to provide these MSRs... So probably we shouldn't. I tend to say we should (i.e. set bit 55) as there is the risk that buggy guests could needlessly get confused when we expose very recent features (like EPT) and do not provide such old-fashioned stuff. Jan -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
Marcelo Tosatti wrote on 2013-02-25: On Mon, Feb 25, 2013 at 06:55:58AM +, Zhang, Yang Z wrote: p1) cpu0 cpu1vcpu0 test_and_set_bit(PIR-A) set KVM_REQ_EVENT process REQ_EVENT PIR-A-IRR vcpu-mode=IN_GUEST if (vcpu0-guest_mode) if (!t_a_s_bit(PIR notif)) send IPI linux_pir_handler t_a_s_b(PIR-B)=1 no PIR IPI sent p2) No, this exists with your implementation not mine. As I said, in this patch, it will make request after vcpu ==guest mode, then send ipi. Even the ipi is lost, but the request still will be tracked. Because we have the follow check: vcpu-mode = GUEST_MODE (ipi may arrived here and lost) local irq disable check request (this will ensure the pir modification will be picked up by vcpu before vmentry) Please read the sequence above again, between p1) and p2). Yes, if the IPI is lost, the request _for the CPU which sent PIR IPI_ is guaranteed to be processed, but not the request for another cpu (cpu1). If no PIR IPI sent in cpu1, the delivered is false, and it will roll back to old logic: __apic_accept_irq(): if (!delivered) { kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); } This can solve the problem you mentioned above. Right? Should not be doing this kick in the first place. One result of it is that you'll always take a VM-exit if a second injection happens while a VCPU has not handled the first one. You are right. What is the drawback of the suggestion to unconditionally clear PIR notification bit on VM-entry? The only thing is we need to traverse the whole pir when calling sync_pir_to_irr even the pir is empty. We can avoid it, but lets get it correct first (BTW can stick a comment saying FIXME: optimize) on that PIR clearing. Ok, I will adopt your suggestion. BTW, Where should the comment be add? on sync_pir_to_irr()? cpu0 check pir(pass) check irr(pass) injected = !test_and_set_bit(pir) versus cpu1 xchg(pir) No information can be lost because all accesses to shared data are atomic. Sorry, I cannot get your point. Why 'xchg(pir)' can solve the problem I mentioned above? Can you elaborate it? The spinlock I used is trying to protect the whole procedure of sync pir to irr, not just access pir. You're right, its the same problem as with the hardware update. Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On 24.02.2013, at 16:46, Gleb Natapov wrote: On Thu, Feb 21, 2013 at 08:17:54PM -0600, Scott Wood wrote: On 02/21/2013 02:22:09 AM, Gleb Natapov wrote: On Wed, Feb 20, 2013 at 08:05:12PM -0600, Scott Wood wrote: On 02/20/2013 07:09:49 AM, Gleb Natapov wrote: On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: The ability to set/get attributes is needed. Sorry, but get or set one blob of data, up to 512 bytes, for the entire irqchip is just not good enough -- assuming you don't want us to start sticking pointers and commands in *that* data. :-) Proposed interface sticks pointers into ioctl data, so why doing the same for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. There's a difference between putting a pointer in an ioctl control structure that is specifically documented as being that way (as in ONE_REG), versus taking an ioctl that claims to be setting/getting a blob of state and embedding pointers in it. It would be like sticking a pointer in the attribute payload of this API, which I think is something to be discouraged. If documentation is what differentiate for you between silly and smart then write documentation instead of new interfaces. You mean like what we did with SREGS, that got deprecated and replaced with ONE_REG? SREGS is implemented by ppc. I see ONE_REG as addition to REGS interface. You can access all register at once or you can access them one by one. If there is a need we can add MULTIPLE_REGS that will get list of requested REGS. http://www.spinics.net/lists/kvm-ppc/msg04876.html http://www.spinics.net/lists/kvm-ppc/msg05842.html If Alex prefers ONE_REG interface this is his call :) There's a reason I prefer it :). We ran out of space in the SREGS call, we had a bit map with multiple chunks of data you could set in there and worst of all, when we ran out of space we didn't realize it until the code was already in Linus' tree! The normal here's a chunk of data ioctl style works great when you know all or most of the details of what you want to transfer. In the case of kvm, we need to expose an interface we a) Don't control itself. Hardware guys make hardware. We just follow. b) Is pretty complex. There can be a lot of state in a CPU / device. So because if that we needed something more flexible and dynamic. Something where we could add registers we forgot without jumping through lots of hoops. That's why I prefer ONE_REG for vcpus. Since I see the same danger on device emulation, I tend to prefer it here as well. The interface is not over generic i.e it does not try to replace KVM_RUN by writing special register. Sigh. And we don't necessarily want to set *everything*. What are those cases? You do need to on reset/migration. Why do we want to set all the registers on reset, rather than tell the in-kernel device to reset? The default state came from the kernel in the first place on irqchip creation... I have nothing against telling in-kernel device to reset provided there is a way to do so, which current interface lacks. Reset in userspase has its advantage too: bugs are easier to fix, there may be different kind of resets (hard/soft). The same argument holds for state access. If you suddenly realize oh, I need to also set this one hidden state bit in my hardware you only want to add that specific write on reset. If you suddenly find a race condition in migration that happens very rarely, you want to add the one register that you forgot to synchronize in addition in your new code. You don't want to change the whole ioctl passed struct. That's why we have the padding in our ioctl structs. And feature bitmaps. And once we exceed them, we're screwed. I want to make sure we don't hit that point. Actually, there's a really easy way to understand what this interface does. It basically implements a state get/set ioctl which allows for very fine grained control over your ioctl struct. It does that by only passing one register at a time. If we run into concurrency issues (multiple registers need to be set in a locked manner at the same time), we can always add a new ioctl that sets multiple attrs/regs at once. For CPUs, we haven't hit that case yet. It'd also be using KVM_SET_IRQCHIP to read data, which is the sort of thing you object to later on regarding KVM_IRQ_LINE_STATUS. Do not see why. It's either that, or have the data direction of the chip field in KVM_GET_IRQCHIP not be entirely in the get direction. Still do not follow. Example? struct kvm_irqchip has chip_id, pad, and chip. pad is insufficient to communicate attribute type plus a pointer. So if we want to provide a pointer for the kernel to write the attribute into, it has to read from memory that
[PATCH v3 0/5] kvm: Make ioeventfd usable on s390.
Here's the latest version of my patch series enabling ioeventfds on s390, again against kvm-next. Patches 1 and 2 (cleaning up initialization and exporting the virtio-ccw api) would make sense even independent of the ioeventfd enhancements. Patches 3-5 are concerned with adding a new type of ioeventfds for virtio-ccw notifications on s390. The naming is now hopefully clearer. We won't add ioeventfd support for the legacy s390-virtio transport. Please consider applying. v2 - v3: - Added a patch exporting the virtio-ccw api and use it for the diagnose implementation. - Better naming: We're dealing with virtio-ccw notifications only. v1 - v2: - Move irqfd initialization from a module init function to kvm_init, eliminating the need for a second module for kvm/s390. - Use kvm_io_device for s390 css devices. Cornelia Huck (5): KVM: Initialize irqfd from kvm_init(). KVM: s390: Export virtio-ccw api. KVM: Introduce KVM_VIRTIO_CCW_NOTIFY_BUS. KVM: ioeventfd for virtio-ccw devices. KVM: s390: Wire up ioeventfd. Documentation/virtual/kvm/api.txt | 8 arch/s390/include/uapi/asm/Kbuild | 1 + arch/s390/include/uapi/asm/virtio-ccw.h | 21 + arch/s390/kvm/Kconfig | 1 + arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/diag.c| 26 ++ arch/s390/kvm/kvm-s390.c| 1 + drivers/s390/kvm/virtio_ccw.c | 5 + include/linux/kvm_host.h| 14 ++ include/uapi/linux/kvm.h| 3 +++ virt/kvm/eventfd.c | 21 ++--- virt/kvm/kvm_main.c | 6 ++ 12 files changed, 97 insertions(+), 12 deletions(-) create mode 100644 arch/s390/include/uapi/asm/virtio-ccw.h -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/5] KVM: Initialize irqfd from kvm_init().
Currently, eventfd introduces module_init/module_exit functions to initialize/cleanup the irqfd workqueue. This only works, however, if no other module_init/module_exit functions are built into the same module. Let's just move the initialization and cleanup to kvm_init and kvm_exit. This way, it is also clearer where kvm startup may fail. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- include/linux/kvm_host.h | 13 + virt/kvm/eventfd.c | 7 ++- virt/kvm/kvm_main.c | 6 ++ 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 722cae7..3b768ef 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -423,6 +423,19 @@ void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); int __must_check vcpu_load(struct kvm_vcpu *vcpu); void vcpu_put(struct kvm_vcpu *vcpu); +#ifdef __KVM_HAVE_IOAPIC +int kvm_irqfd_init(void); +void kvm_irqfd_exit(void); +#else +static inline int kvm_irqfd_init(void) +{ + return 0; +} + +static inline void kvm_irqfd_exit(void) +{ +} +#endif int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, struct module *module); void kvm_exit(void); diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index b6eea5c..f0ced1a 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -544,7 +544,7 @@ void kvm_irq_routing_update(struct kvm *kvm, * aggregated from all vm* instances. We need our own isolated single-thread * queue to prevent deadlock against flushing the normal work-queue. */ -static int __init irqfd_module_init(void) +int kvm_irqfd_init(void) { irqfd_cleanup_wq = create_singlethread_workqueue(kvm-irqfd-cleanup); if (!irqfd_cleanup_wq) @@ -553,13 +553,10 @@ static int __init irqfd_module_init(void) return 0; } -static void __exit irqfd_module_exit(void) +void kvm_irqfd_exit(void) { destroy_workqueue(irqfd_cleanup_wq); } - -module_init(irqfd_module_init); -module_exit(irqfd_module_exit); #endif /* diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index adc68fe..7c188a3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2920,6 +2920,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, int r; int cpu; + r = kvm_irqfd_init(); + if (r) + goto out_irqfd; r = kvm_arch_init(opaque); if (r) goto out_fail; @@ -3000,6 +3003,8 @@ out_free_0a: out_free_0: kvm_arch_exit(); out_fail: + kvm_irqfd_exit(); +out_irqfd: return r; } EXPORT_SYMBOL_GPL(kvm_init); @@ -3016,6 +3021,7 @@ void kvm_exit(void) on_each_cpu(hardware_disable_nolock, NULL, 1); kvm_arch_hardware_unsetup(); kvm_arch_exit(); + kvm_irqfd_exit(); free_cpumask_var(cpus_hardware_enabled); } EXPORT_SYMBOL_GPL(kvm_exit); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/5] KVM: Introduce KVM_VIRTIO_CCW_NOTIFY_BUS.
Add a new bus type for virtio-ccw devices on s390. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3b768ef..206247f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -148,6 +148,7 @@ struct kvm_io_bus { enum kvm_bus { KVM_MMIO_BUS, KVM_PIO_BUS, + KVM_VIRTIO_CCW_NOTIFY_BUS, KVM_NR_BUSES }; -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/5] KVM: ioeventfd for virtio-ccw devices.
Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- Documentation/virtual/kvm/api.txt | 8 include/uapi/linux/kvm.h | 3 +++ virt/kvm/eventfd.c| 14 -- 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index c2534c3..86232d6 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1468,15 +1468,23 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +For the special case of virtio-ccw devices on s390, the ioevent is matched +to a subchannel/virtqueue tuple instead. + The following flags are defined: #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 kvm_ioeventfd_flag_nr_datamatch) #define KVM_IOEVENTFD_FLAG_PIO (1 kvm_ioeventfd_flag_nr_pio) #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 kvm_ioeventfd_flag_nr_deassign) +#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ + (1 kvm_ioeventfd_flag_nr_virtio_ccw_notify) If datamatch flag is set, the event will be signaled only if the written value to the registered address is equal to datamatch in struct kvm_ioeventfd. +For virtio-ccw devices, addr contains the subchannel id and datamatch the +virtqueue index. + 4.60 KVM_DIRTY_TLB diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 9a2db57..8f3e5ae 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -448,12 +448,15 @@ enum { kvm_ioeventfd_flag_nr_datamatch, kvm_ioeventfd_flag_nr_pio, kvm_ioeventfd_flag_nr_deassign, + kvm_ioeventfd_flag_nr_virtio_ccw_notify, kvm_ioeventfd_flag_nr_max, }; #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 kvm_ioeventfd_flag_nr_datamatch) #define KVM_IOEVENTFD_FLAG_PIO (1 kvm_ioeventfd_flag_nr_pio) #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 kvm_ioeventfd_flag_nr_deassign) +#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ + (1 kvm_ioeventfd_flag_nr_virtio_ccw_notify) #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 kvm_ioeventfd_flag_nr_max) - 1) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f0ced1a..8de3cd7 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -679,11 +679,16 @@ static int kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { int pio = args-flags KVM_IOEVENTFD_FLAG_PIO; - enum kvm_bus bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS; + int ccw; + enum kvm_bus bus_idx; struct _ioeventfd*p; struct eventfd_ctx *eventfd; int ret; + ccw = args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY; + bus_idx = pio ? KVM_PIO_BUS : + ccw ? KVM_VIRTIO_CCW_NOTIFY_BUS : + KVM_MMIO_BUS; /* must be natural-word sized */ switch (args-len) { case 1: @@ -759,11 +764,16 @@ static int kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { int pio = args-flags KVM_IOEVENTFD_FLAG_PIO; - enum kvm_bus bus_idx = pio ? KVM_PIO_BUS : KVM_MMIO_BUS; + int ccw; + enum kvm_bus bus_idx; struct _ioeventfd*p, *tmp; struct eventfd_ctx *eventfd; int ret = -ENOENT; + ccw = args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY; + bus_idx = pio ? KVM_PIO_BUS : + ccw ? KVM_VIRTIO_CCW_NOTIFY_BUS : + KVM_MMIO_BUS; eventfd = eventfd_ctx_fdget(args-fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/5] KVM: s390: Wire up ioeventfd.
Enable ioeventfd support on s390 and hook up diagnose 500 virtio-ccw notifications. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/kvm/Kconfig| 1 + arch/s390/kvm/Makefile | 2 +- arch/s390/kvm/diag.c | 26 ++ arch/s390/kvm/kvm-s390.c | 1 + 4 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index b58dd86..3c43e30 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -22,6 +22,7 @@ config KVM select PREEMPT_NOTIFIERS select ANON_INODES select HAVE_KVM_CPU_RELAX_INTERCEPT + select HAVE_KVM_EVENTFD ---help--- Support hosting paravirtualized guest machines using the SIE virtualization capability on the mainframe. This should work diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile index 3975722..8fe9d65 100644 --- a/arch/s390/kvm/Makefile +++ b/arch/s390/kvm/Makefile @@ -6,7 +6,7 @@ # it under the terms of the GNU General Public License (version 2 only) # as published by the Free Software Foundation. -common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o) +common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o eventfd.o) ccflags-y := -Ivirt/kvm -Iarch/s390/kvm diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c index a390687..96907f2 100644 --- a/arch/s390/kvm/diag.c +++ b/arch/s390/kvm/diag.c @@ -13,6 +13,7 @@ #include linux/kvm.h #include linux/kvm_host.h +#include asm/virtio-ccw.h #include kvm-s390.h #include trace.h #include trace-s390.h @@ -104,6 +105,29 @@ static int __diag_ipl_functions(struct kvm_vcpu *vcpu) return -EREMOTE; } +static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu) +{ + int ret, idx; + + /* No virtio-ccw notification? Get out quickly. */ + if (!vcpu-kvm-arch.css_support || + (vcpu-run-s.regs.gprs[1] != KVM_S390_VIRTIO_CCW_NOTIFY)) + return -EOPNOTSUPP; + + idx = srcu_read_lock(vcpu-kvm-srcu); + /* +* The layout is as follows: +* - gpr 2 contains the subchannel id (passed as addr) +* - gpr 3 contains the virtqueue index (passed as datamatch) +*/ + ret = kvm_io_bus_write(vcpu-kvm, KVM_VIRTIO_CCW_NOTIFY_BUS, + vcpu-run-s.regs.gprs[2], + 8, vcpu-run-s.regs.gprs[3]); + srcu_read_unlock(vcpu-kvm-srcu, idx); + /* kvm_io_bus_write returns -EOPNOTSUPP if it found no match. */ + return ret; +} + int kvm_s390_handle_diag(struct kvm_vcpu *vcpu) { int code = (vcpu-arch.sie_block-ipb 0xfff) 16; @@ -118,6 +142,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu) return __diag_time_slice_end_directed(vcpu); case 0x308: return __diag_ipl_functions(vcpu); + case 0x500: + return __diag_virtio_hypercall(vcpu); default: return -EOPNOTSUPP; } diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index f822d36..04d2454 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -142,6 +142,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ONE_REG: case KVM_CAP_ENABLE_CAP: case KVM_CAP_S390_CSS_SUPPORT: + case KVM_CAP_IOEVENTFD: r = 1; break; case KVM_CAP_NR_VCPUS: -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] qemu: ioeventfd for virtio-ccw.
Here's the current version of my patch enabling ioeventfds for virtio-ccw. The linux-headers patch only adds the new flag for ioeventfds. Wiring up ioeventfds has improved performance drastically (up to twice as fast for a simple dd). Notifier support is something we plan to do next. Please consider applying if this makes sense. v2 - v3: - Adapt to changed ioeventfd interface again. v1 - v2: - Adapt to changed ioeventfd interface. Cornelia Huck (2): linux-headers: Update with ioeventfd changes. virtio-ccw: Wire up ioeventfd. hw/s390x/css.c| 2 +- hw/s390x/css.h| 1 + hw/s390x/virtio-ccw.c | 114 ++ hw/s390x/virtio-ccw.h | 7 +++ linux-headers/linux/kvm.h | 3 ++ target-s390x/cpu.h| 16 +++ target-s390x/kvm.c| 19 7 files changed, 161 insertions(+), 1 deletion(-) -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] linux-headers: Update with ioeventfd changes.
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- linux-headers/linux/kvm.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 5af9357..fc3d7ed 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -448,12 +448,15 @@ enum { kvm_ioeventfd_flag_nr_datamatch, kvm_ioeventfd_flag_nr_pio, kvm_ioeventfd_flag_nr_deassign, + kvm_ioeventfd_flag_nr_virtio_ccw_notify, kvm_ioeventfd_flag_nr_max, }; #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 kvm_ioeventfd_flag_nr_datamatch) #define KVM_IOEVENTFD_FLAG_PIO (1 kvm_ioeventfd_flag_nr_pio) #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 kvm_ioeventfd_flag_nr_deassign) +#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ + (1 kvm_ioeventfd_flag_nr_virtio_ccw_notify) #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 kvm_ioeventfd_flag_nr_max) - 1) -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] virtio-ccw: Wire up ioeventfd.
On hosts that support ioeventfd, make use of it for host-to-guest notifications via diagnose 500. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/css.c| 2 +- hw/s390x/css.h| 1 + hw/s390x/virtio-ccw.c | 114 ++ hw/s390x/virtio-ccw.h | 7 target-s390x/cpu.h| 16 +++ target-s390x/kvm.c| 19 + 6 files changed, 158 insertions(+), 1 deletion(-) diff --git a/hw/s390x/css.c b/hw/s390x/css.c index 85f6f22..82e6746 100644 --- a/hw/s390x/css.c +++ b/hw/s390x/css.c @@ -68,7 +68,7 @@ int css_create_css_image(uint8_t cssid, bool default_image) return 0; } -static uint16_t css_build_subchannel_id(SubchDev *sch) +uint16_t css_build_subchannel_id(SubchDev *sch) { if (channel_subsys-max_cssid 0) { return (sch-cssid 8) | (1 3) | (sch-ssid 1) | 1; diff --git a/hw/s390x/css.h b/hw/s390x/css.h index 85ed05d..b536ab5 100644 --- a/hw/s390x/css.h +++ b/hw/s390x/css.h @@ -90,6 +90,7 @@ bool css_devno_used(uint8_t cssid, uint8_t ssid, uint16_t devno); void css_subch_assign(uint8_t cssid, uint8_t ssid, uint16_t schid, uint16_t devno, SubchDev *sch); void css_sch_build_virtual_schib(SubchDev *sch, uint8_t chpid, uint8_t type); +uint16_t css_build_subchannel_id(SubchDev *sch); void css_reset(void); void css_reset_sch(SubchDev *sch); void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, uint16_t rsid); diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index d92e427..23b8092 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -63,6 +63,84 @@ VirtIODevice *virtio_ccw_get_vdev(SubchDev *sch) return vdev; } +static int virtio_ccw_set_guest2host_notifier(VirtioCcwDevice *dev, int n, + bool assign, bool set_handler) +{ +VirtQueue *vq = virtio_get_queue(dev-vdev, n); +EventNotifier *notifier = virtio_queue_get_host_notifier(vq); +int r = 0; +SubchDev *sch = dev-sch; +uint32_t sch_id = (css_build_subchannel_id(sch) 16) | sch-schid; + +if (assign) { +r = event_notifier_init(notifier, 1); +if (r 0) { +error_report(%s: unable to init event notifier: %d, __func__, r); +return r; +} +virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); +s390_assign_subch_ioeventfd(event_notifier_get_fd(notifier), sch_id, +n, assign); +} else { +virtio_queue_set_host_notifier_fd_handler(vq, false, false); +s390_assign_subch_ioeventfd(event_notifier_get_fd(notifier), sch_id, +n, assign); +event_notifier_cleanup(notifier); +} +return r; +} + +static void virtio_ccw_start_ioeventfd(VirtioCcwDevice *dev) +{ +int n, r; + +if (!(dev-flags VIRTIO_CCW_FLAG_USE_IOEVENTFD) || +dev-ioeventfd_started) { +return; +} +for (n = 0; n VIRTIO_PCI_QUEUE_MAX; n++) { +if (!virtio_queue_get_num(dev-vdev, n)) { +continue; +} +r = virtio_ccw_set_guest2host_notifier(dev, n, true, true); +if (r 0) { +goto assign_error; +} +} +dev-ioeventfd_started = true; +return; + + assign_error: +while (--n = 0) { +if (!virtio_queue_get_num(dev-vdev, n)) { +continue; +} +r = virtio_ccw_set_guest2host_notifier(dev, n, false, false); +assert(r = 0); +} +dev-ioeventfd_started = false; +/* Disable ioeventfd for this device. */ +dev-flags = ~VIRTIO_CCW_FLAG_USE_IOEVENTFD; +error_report(%s: failed. Fallback to userspace (slower)., __func__); +} + +static void virtio_ccw_stop_ioeventfd(VirtioCcwDevice *dev) +{ +int n, r; + +if (!dev-ioeventfd_started) { +return; +} +for (n = 0; n VIRTIO_PCI_QUEUE_MAX; n++) { +if (!virtio_queue_get_num(dev-vdev, n)) { +continue; +} +r = virtio_ccw_set_guest2host_notifier(dev, n, false, false); +assert(r = 0); +} +dev-ioeventfd_started = false; +} + VirtualCssBus *virtual_css_bus_init(void) { VirtualCssBus *cbus; @@ -187,6 +265,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) } break; case CCW_CMD_VDEV_RESET: +virtio_ccw_stop_ioeventfd(dev); virtio_reset(dev-vdev); ret = 0; break; @@ -313,10 +392,16 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) ret = -EFAULT; } else { status = ldub_phys(ccw.cda); +if (!(status VIRTIO_CONFIG_S_DRIVER_OK)) { +virtio_ccw_stop_ioeventfd(dev); +} virtio_set_status(dev-vdev, status); if (dev-vdev-status == 0) { virtio_reset(dev-vdev); } +if (status VIRTIO_CONFIG_S_DRIVER_OK) { +virtio_ccw_start_ioeventfd(dev);
[PATCH v3 2/5] KVM: s390: Export virtio-ccw api.
Export the virtio-ccw api in a header for usage by other code. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/include/uapi/asm/Kbuild | 1 + arch/s390/include/uapi/asm/virtio-ccw.h | 21 + drivers/s390/kvm/virtio_ccw.c | 5 + 3 files changed, 23 insertions(+), 4 deletions(-) create mode 100644 arch/s390/include/uapi/asm/virtio-ccw.h diff --git a/arch/s390/include/uapi/asm/Kbuild b/arch/s390/include/uapi/asm/Kbuild index 7bf68ff..9ccd190 100644 --- a/arch/s390/include/uapi/asm/Kbuild +++ b/arch/s390/include/uapi/asm/Kbuild @@ -44,5 +44,6 @@ header-y += termios.h header-y += types.h header-y += ucontext.h header-y += unistd.h +header-y += virtio-ccw.h header-y += vtoc.h header-y += zcrypt.h diff --git a/arch/s390/include/uapi/asm/virtio-ccw.h b/arch/s390/include/uapi/asm/virtio-ccw.h new file mode 100644 index 000..a9a4ebf --- /dev/null +++ b/arch/s390/include/uapi/asm/virtio-ccw.h @@ -0,0 +1,21 @@ +/* + * Definitions for virtio-ccw devices. + * + * Copyright IBM Corp. 2013 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + * Author(s): Cornelia Huck cornelia.h...@de.ibm.com + */ +#ifndef __KVM_VIRTIO_CCW_H +#define __KVM_VIRTIO_CCW_H + +/* Alignment of vring buffers. */ +#define KVM_VIRTIO_CCW_RING_ALIGN 4096 + +/* Subcode for diagnose 500 (virtio hypercall). */ +#define KVM_S390_VIRTIO_CCW_NOTIFY 3 + +#endif diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c index 3217dfe..c5532b3 100644 --- a/drivers/s390/kvm/virtio_ccw.c +++ b/drivers/s390/kvm/virtio_ccw.c @@ -31,6 +31,7 @@ #include asm/irq.h #include asm/cio.h #include asm/ccwdev.h +#include asm/virtio-ccw.h /* * virtio related functions @@ -79,10 +80,6 @@ struct virtio_ccw_vq_info { struct list_head node; }; -#define KVM_VIRTIO_CCW_RING_ALIGN 4096 - -#define KVM_S390_VIRTIO_CCW_NOTIFY 3 - #define CCW_CMD_SET_VQ 0x13 #define CCW_CMD_VDEV_RESET 0x33 #define CCW_CMD_SET_IND 0x43 -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On 25.02.2013, at 14:09, Gleb Natapov wrote: On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote: On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote: Copying Christoffer since ARM has in kernel irq chip too. On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. You are not only provide different way to create in kernel irq chip you also use an alternate way to trigger interrupt lines. Before going into interface specifics lets think about whether it is really worth it? x86 obviously support old way and will have to for some, very long, time. ARM vGIC code, that is ready to go upstream, uses old way too. So it will be 2 archs against one. Christoffer do you think the proposed way it better for your needs. Are you willing to make vGIC use it? In fact there have been two distinct interrupt controller emulations for PPC posted recently, Scott's and mine, with quite different interfaces. In contrast to Scott's device control API, where the operations you would do for different interrupt controllers are quite different, mine attempted to define a much more abstract interface for interrupt controller emulations, with the idea that an interrupt controller subsystem connects a set of interrupt sources, each with some state, to a set of interrupt delivery mechanisms, one per vcpu, also with some state. I agree with Scott that it is futile to try and come up with generic irqchip configuration interface and I doubt it is needed from QEMU or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES interface and while it is possible to pass some information about pic/ioapic using it there will be a lot of information that will not fit there. For one there is global irqchips related state and proposed interface only talk about interrupt sources. Another is that using generic interface will require us to have a code that translate irqchip representation into this generic one and back for no apparent gain. Currently pic/ioapic state is very similar to what HW specification defines and it is not going to change. Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how well it work for migration though. The interface suppose to transfer irqchips state as is, but I see things like that in your code: mutex_lock(ics-lock); irqp-server = val KVM_IRQ_SERVER_MASK; irqp-priority = val KVM_IRQ_PRIORITY_SHIFT; irqp-resend = 0; irqp-masked_pending = 0; irqp-asserted = 0; Why it is safe to initialize those values to default values during migration? Wouldn't it be simpler and more correct to just transfer the whole content of irqp from src to dst? Thus my interface had: - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that indicates the overall architecture of the interrupt subsystem, - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that return or modify the state of some set of interrupt sources - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register identifier space, that is used to save and restore the per-vcpu interrupt presentation state. Alternatively, I could modify my code to use the existing KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls. If I were to do that I would want to rename the 'pad' field in struct kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of interrupt sources to be affected. I'd also want to keep the ONE_REG identifier for the per-vcpu state. It is preferable to the interface you propose since I do not think your interface fits other interrupt controllers well. You can put nr field into dummy[] payload, or replace pad with union {pad, nr}. Or I could change over to using Scott's device control API, though I have some objections to doing that. What is your advice about which interface to use? The ideal situation would be if you and Scott agree on the details about the interface. If you don't like something about Scott's interface we can discuss it and shape it to something you agree with or even like. I already asked Scott to
Re: [PATCH v5 1/3] VFIO: Wrapper for getting reference to vfio_device from device
On Sat, 2013-02-23 at 23:25 -0600, Vijay Mohan Pandarathil wrote: - Added vfio_device_get_from_dev() as wrapper to get reference to vfio_device from struct device. - Added vfio_device_data() as a wrapper to get device_data from vfio_device. Signed-off-by: Vijay Mohan Pandarathil vijaymohan.pandarat...@hp.com --- drivers/vfio/vfio.c | 47 ++- include/linux/vfio.h | 3 +++ 2 files changed, 37 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 12c264d..863d1d3 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -407,12 +407,13 @@ static void vfio_device_release(struct kref *kref) } /* Device reference always implies a group reference */ -static void vfio_device_put(struct vfio_device *device) +void vfio_device_put(struct vfio_device *device) { struct vfio_group *group = device-group; kref_put_mutex(device-kref, vfio_device_release, group-device_lock); vfio_group_put(group); } +EXPORT_SYMBOL_GPL(vfio_device_put); static void vfio_device_get(struct vfio_device *device) { @@ -642,8 +643,12 @@ int vfio_add_group_dev(struct device *dev, } EXPORT_SYMBOL_GPL(vfio_add_group_dev); -/* Test whether a struct device is present in our tracking */ -static bool vfio_dev_present(struct device *dev) +/** + * This does a get on the vfio_device from device. + * Callers of this function will have to call vfio_put_device() to + * remove the reference. + */ +struct vfio_device *vfio_device_get_from_dev(struct device *dev) { I cc'd you on a patch that changes vfio_dev_present to fix a bug where the device to iommu group lookup has been shutdown. Using it for this purpose probably has the same bug. That code is in my next branch and in the most reset pull request to Linus. I think instead the code becomes much more simple. We're registering a callback for a struct device for which vfio-pci is the driver. During driver release we disable that callback for the device. Thus during the callback, we know the device is owned by vfio-pci, which means that the drvdata has what we need. So I think vfio_device_get_from_dev() simply becomes: struct vfio_device *vfio_get_device_from_dev(struct device *dev) { struct vfio_device *device = dev_get_drvdata(dev); vfio_device_get(device); return device; } EXPORT_SYMBOL_GPL(vfio_get_device_from_dev); Thanks, Alex struct iommu_group *iommu_group; struct vfio_group *group; @@ -651,25 +656,41 @@ static bool vfio_dev_present(struct device *dev) iommu_group = iommu_group_get(dev); if (!iommu_group) - return false; + return NULL; group = vfio_group_get_from_iommu(iommu_group); if (!group) { iommu_group_put(iommu_group); - return false; + return NULL; } device = vfio_group_get_device(group, dev); - if (!device) { - vfio_group_put(group); - iommu_group_put(iommu_group); - return false; - } - - vfio_device_put(device); vfio_group_put(group); iommu_group_put(iommu_group); - return true; + return device; +} +EXPORT_SYMBOL_GPL(vfio_device_get_from_dev); + +/* + * Caller must hold a reference to the vfio_device + */ +void *vfio_device_data(struct vfio_device *device) +{ + return device-device_data; +} +EXPORT_SYMBOL_GPL(vfio_device_data); + +/* Test whether a struct device is present in our tracking */ +static bool vfio_dev_present(struct device *dev) +{ + struct vfio_device *device; + + device = vfio_device_get_from_dev(dev); + if (device) { + vfio_device_put(device); + return true; + } else + return false; } /* diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ab9e862..ac8d488 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -45,6 +45,9 @@ extern int vfio_add_group_dev(struct device *dev, void *device_data); extern void *vfio_del_group_dev(struct device *dev); +extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); +extern void vfio_device_put(struct vfio_device *device); +extern void *vfio_device_data(struct vfio_device *device); /** * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
I see a couple of possible solutions: 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons: current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it will be slow on newer kernels You could backport the qemu change, verify that it builds, and push it to stable branches. It's hardly ideal but if nothing else comes up then it's a solution. 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not running during injection. This assumes that if vcpu is running and does not process interrupt it is guest fault and the same can happen on real HW too. Coalescing when vcpu is not running though is the result of CPU overcommit and should be reported. Cons interface definition is kind of murky. You still have a window where the vcpu is scheduled out with guest interrupts disabled, then scheduled back in and before it manages to handle the interrupt, the second one hits. It's worth testing though. 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Other ideas? 1. inject RTC interrupt 2. not see EOI 3. inject RTC interrupt 4. due to 2, report coalesced AMD can still use IRR test-and-set. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 11:17:39AM -0300, Marcelo Tosatti wrote: On Mon, Feb 25, 2013 at 11:00:21AM -0300, Marcelo Tosatti wrote: I see a couple of possible solutions: 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons: current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it will be slow on newer kernels Can add a capability to QEMU and enable APICv selectively only in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu only when necessary (and KVM_IRQ_LINE otherwise). Bad idea. What happens with mixed scenarios. Exactly. Even a lock serializing injection is not safe because ON bit is cleared before XCHG(PIR, 0). Must do something heavier (such as running on target vcpu context). Note always running on target vcpu is likely to be slower than no-APICv. Yes, but we will do it only for RTC interrupt. Still performance hit should be very visible when RTC is in 1000Hz mode. So need to do something heavier on the kernel under serialization, if firmware cannot be changed (injection from simultaneous CPUs should be rare so if data to serialize __accept_apic_irq is cache-line aligned it should reduce performance impact). 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not running during injection. This assumes that if vcpu is running and does not process interrupt it is guest fault and the same can happen on real HW too. Coalescing when vcpu is not running though is the result of CPU overcommit and should be reported. Cons interface definition is kind of murky. 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Is there no int-ack notification at RTC HW level? There is, but Windows calls it twice for each injected interrupt. I tried to use it in the past to detect interrupt coalescing, but this double ack prevented me from doing so. May be revisit this approach with willingness to be more hacky about it. Also it is possible to disable RTC ack for HyperV guests. We do not do it and if we will use the ack we will not be able to. Breaks older userspace? Older userspace will have timedrif with Windows, yes. Note that we some version of Windows it has it now too. Other ideas? Can HW write a 'finished' bit after 6 in the reserved area? Suppose its not a KVM-specific problem? I still think adding locking just to obtain correct injection status is bad trade-off even if HW would allow us to get away with it. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 06:50:40PM +0200, Avi Kivity wrote: I see a couple of possible solutions: 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons: current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it will be slow on newer kernels You could backport the qemu change, verify that it builds, and push it to stable branches. It's hardly ideal but if nothing else comes up then it's a solution. 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not running during injection. This assumes that if vcpu is running and does not process interrupt it is guest fault and the same can happen on real HW too. Coalescing when vcpu is not running though is the result of CPU overcommit and should be reported. Cons interface definition is kind of murky. You still have a window where the vcpu is scheduled out with guest interrupts disabled, then scheduled back in and before it manages to handle the interrupt, the second one hits. Yes, definitely not ideal solution. It's worth testing though. Yes again. Windows almost never disables interrupts and RTC interrupt is of highest priority. 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Other ideas? 1. inject RTC interrupt 2. not see EOI 3. inject RTC interrupt 4. due to 2, report coalesced That's the 3 in my list, no? AMD can still use IRR test-and-set. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM ARM compilation fixes for the 3.9 merge window
Linus, Please pull from git://git.kernel.org/pub/scm/virt/kvm/kvm.git master To receive the following KVM bug fixes. They fix ARM KVM compilation breakage due to changes from kvm.git. Marc Zyngier (3): ARM: KVM: fix kvm_arch_{prepare,commit}_memory_region ARM: KVM: Rename KVM_MEMORY_SLOTS - KVM_USER_MEM_SLOTS ARM: KVM: fix compilation after removal of user_alloc from struct kvm_memory_slot arch/arm/include/asm/kvm_host.h |2 +- arch/arm/kvm/arm.c |4 ++-- arch/arm/kvm/mmu.c |5 - 3 files changed, 3 insertions(+), 8 deletions(-) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu v2 4/7] cpu: Pass CPUState to cpu_interrupt()
Move it to qom/cpu.h to avoid issues with include order. Change pc_acpi_smi_interrupt() opaque to X86CPU. Signed-off-by: Andreas Färber afaer...@suse.de --- cpus.c |2 +- exec.c |2 +- hw/alpha_typhoon.c | 10 -- hw/apic.c| 21 +++-- hw/arm_pic.c |5 ++--- hw/cris_pic_cpu.c|3 +-- hw/leon3.c |3 ++- hw/lm32_boards.c |3 +-- hw/lpc_ich9.c|2 +- hw/microblaze_pic_cpu.c |3 +-- hw/milkymist.c |3 +-- hw/mips_int.c|2 +- hw/omap1.c |4 ++-- hw/openrisc_pic.c|2 +- hw/pc.c |6 +++--- hw/pc_piix.c |3 ++- hw/ppc.c |6 +++--- hw/ppc405_uc.c |4 ++-- hw/puv3.c|3 +-- hw/pxa2xx.c |7 --- hw/pxa2xx_gpio.c |2 +- hw/pxa2xx_pic.c |6 +++--- hw/sh_intc.c |5 +++-- hw/sun4m.c |9 ++--- hw/sun4u.c |4 ++-- hw/xtensa_pic.c |2 +- include/exec/cpu-all.h | 13 - include/qom/cpu.h| 24 kvm-all.c|4 +--- target-arm/helper.c |2 +- target-i386/helper.c |6 +++--- target-m68k/helper.c |2 +- target-mips/op_helper.c |8 target-ppc/excp_helper.c |2 +- target-s390x/cpu.h |6 +++--- target-s390x/helper.c|4 ++-- translate-all.c | 12 +--- 37 Dateien geändert, 106 Zeilen hinzugefügt(+), 99 Zeilen entfernt(-) diff --git a/cpus.c b/cpus.c index 8d47bfd..e919dd7 100644 --- a/cpus.c +++ b/cpus.c @@ -1309,7 +1309,7 @@ void qmp_inject_nmi(Error **errp) for (env = first_cpu; env != NULL; env = env-next_cpu) { if (!env-apic_state) { -cpu_interrupt(env, CPU_INTERRUPT_NMI); +cpu_interrupt(CPU(x86_env_get_cpu(env)), CPU_INTERRUPT_NMI); } else { apic_deliver_nmi(env-apic_state); } diff --git a/exec.c b/exec.c index 9d1605f..f686739 100644 --- a/exec.c +++ b/exec.c @@ -1467,7 +1467,7 @@ static void check_watchpoint(int offset, int len_mask, int flags) /* We re-entered the check after replacing the TB. Now raise * the debug interrupt so that is will trigger after the * current instruction. */ -cpu_interrupt(env, CPU_INTERRUPT_DEBUG); +cpu_interrupt(ENV_GET_CPU(env), CPU_INTERRUPT_DEBUG); return; } vaddr = (env-mem_io_vaddr TARGET_PAGE_MASK) + offset; diff --git a/hw/alpha_typhoon.c b/hw/alpha_typhoon.c index dfb10c7..3b15b30 100644 --- a/hw/alpha_typhoon.c +++ b/hw/alpha_typhoon.c @@ -62,10 +62,9 @@ static void cpu_irq_change(AlphaCPU *cpu, uint64_t req) { /* If there are any non-masked interrupts, tell the cpu. */ if (cpu != NULL) { -CPUAlphaState *env = cpu-env; CPUState *cs = CPU(cpu); if (req) { -cpu_interrupt(env, CPU_INTERRUPT_HARD); +cpu_interrupt(cs, CPU_INTERRUPT_HARD); } else { cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD); } @@ -359,11 +358,10 @@ static void cchip_write(void *opaque, hwaddr addr, for (i = 0; i 4; ++i) { AlphaCPU *cpu = s-cchip.cpu[i]; if (cpu != NULL) { -CPUAlphaState *env = cpu-env; CPUState *cs = CPU(cpu); /* IPI can be either cleared or set by the write. */ if (newval (1 (i + 8))) { -cpu_interrupt(env, CPU_INTERRUPT_SMP); +cpu_interrupt(cs, CPU_INTERRUPT_SMP); } else { cpu_reset_interrupt(cs, CPU_INTERRUPT_SMP); } @@ -687,7 +685,7 @@ static void typhoon_set_timer_irq(void *opaque, int irq, int level) /* Set the ITI bit for this cpu. */ s-cchip.misc |= 1 (i + 4); /* And signal the interrupt. */ -cpu_interrupt(cpu-env, CPU_INTERRUPT_TIMER); +cpu_interrupt(CPU(cpu), CPU_INTERRUPT_TIMER); } } } @@ -700,7 +698,7 @@ static void typhoon_alarm_timer(void *opaque) /* Set the ITI bit for this cpu. */ s-cchip.misc |= 1 (cpu + 4); -cpu_interrupt(s-cchip.cpu[cpu]-env, CPU_INTERRUPT_TIMER); +cpu_interrupt(CPU(s-cchip.cpu[cpu]), CPU_INTERRUPT_TIMER); } PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus, diff --git a/hw/apic.c b/hw/apic.c index e113c11..f7cb8b1 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -151,15 +151,15 @@ static void apic_local_deliver(APICCommonState *s, int vector) switch ((lvt 8) 7) { case APIC_DM_SMI: -cpu_interrupt(s-cpu-env, CPU_INTERRUPT_SMI); +cpu_interrupt(CPU(s-cpu),
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 7:43 PM, Gleb Natapov g...@redhat.com wrote: 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Other ideas? 1. inject RTC interrupt 2. not see EOI 3. inject RTC interrupt 4. due to 2, report coalesced That's the 3 in my list, no? No, this idea doesn't involve user interface changes. We still report through KVM_IRQ_LINE_STATUS or whatever it's called. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 08:56:07PM +0200, Avi Kivity wrote: On Mon, Feb 25, 2013 at 7:43 PM, Gleb Natapov g...@redhat.com wrote: 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Other ideas? 1. inject RTC interrupt 2. not see EOI 3. inject RTC interrupt 4. due to 2, report coalesced That's the 3 in my list, no? No, this idea doesn't involve user interface changes. We still report through KVM_IRQ_LINE_STATUS or whatever it's called. Interesting idea! -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting
On Mon, Feb 25, 2013 at 07:40:07PM +0200, Gleb Natapov wrote: On Mon, Feb 25, 2013 at 11:17:39AM -0300, Marcelo Tosatti wrote: On Mon, Feb 25, 2013 at 11:00:21AM -0300, Marcelo Tosatti wrote: I see a couple of possible solutions: 1. Do what Avi said. Make KVM_IRQ_LINE_STATUS be synchronous. Cons: current QEMU uses KVM_IRQ_LINE_STATUS always and it means that it will be slow on newer kernels Can add a capability to QEMU and enable APICv selectively only in newer QEMU, which can issue KVM_IRQ_LINE_STATUS on target vcpu only when necessary (and KVM_IRQ_LINE otherwise). Bad idea. What happens with mixed scenarios. Exactly. Even a lock serializing injection is not safe because ON bit is cleared before XCHG(PIR, 0). Must do something heavier (such as running on target vcpu context). Note always running on target vcpu is likely to be slower than no-APICv. Yes, but we will do it only for RTC interrupt. Still performance hit should be very visible when RTC is in 1000Hz mode. So older qemu-kvm on APICv enabled hardware becomes slow? If that is acceptable, fine. So need to do something heavier on the kernel under serialization, if firmware cannot be changed (injection from simultaneous CPUs should be rare so if data to serialize __accept_apic_irq is cache-line aligned it should reduce performance impact). 2. Make KVM_IRQ_LINE_STATUS report coalescing only when vcpu is not running during injection. This assumes that if vcpu is running and does not process interrupt it is guest fault and the same can happen on real HW too. Coalescing when vcpu is not running though is the result of CPU overcommit and should be reported. Cons interface definition is kind of murky. 3. Do not report KVM_IRQ_LINE_STATUS capability and move RTC to use EOI notifiers for interrupt reinjection. This requires us to add interface for reporting EOI to userspace. This is not in the scope of this patchset. Cons: need to introduce new interface (and the one that will not work on AMD BTW) Is there no int-ack notification at RTC HW level? There is, but Windows calls it twice for each injected interrupt. I tried to use it in the past to detect interrupt coalescing, but this double ack prevented me from doing so. May be revisit this approach with willingness to be more hacky about it. Also it is possible to disable RTC ack for HyperV guests. We do not do it and if we will use the ack we will not be able to. Is it OK to kill the ability to have coalesced information? Breaks older userspace? Older userspace will have timedrif with Windows, yes. Note that we some version of Windows it has it now too. Other ideas? Can HW write a 'finished' bit after 6 in the reserved area? Suppose its not a KVM-specific problem? I still think adding locking just to obtain correct injection status is bad trade-off even if HW would allow us to get away with it. Perhaps with notification at end of copy it could be lockless. With the current scheme, it is not possible to get it right because the notification bit disappears temporarily from sight. And it is not possible to distinguish between 'interrupt injected' and 'interrupt merged', as discussed here. So would have to serialize completly along the lines of: Lock and only inject if can identify that interrupt handler is not running. If its possible to drop KVM_IRQ_LINE_STATUS, then demand APICv HW to use recent software, fine (did not grasp Avi's second idea). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/23/2013 09:04:33 AM, Marcelo Tosatti wrote: On Thu, Feb 21, 2013 at 08:00:25PM -0600, Scott Wood wrote: On 02/21/2013 05:03:32 PM, Marcelo Tosatti wrote: You are not writing to the registers from the CPU point of view. That's exactly how KVM_DEV_MPIC_GRP_REGISTER is defined and implemented on MPIC (with the exception of registers whose behavior changes based on which specific vcpu you use to access them). If/when we have a need to set/get state in a different manner, that's a separate attribute group. Can you describe usage of this register again? It's used by QEMU to reflect memory space accesses to the kernel. In practice, these are either debug accesses or MSIs. Also, it is necessary to provide proper locking of device attribute write versus vcpu device access. So far we have been focusing on having a lockless vcpu path. How is device access related to vcpus? Existing irqchip code is not lockless. VCPUS access in-kernel devices. Right, VCPUs (plural) not VCPU (singular). Thus locking is needed. Yes, it is lockless (see RCU usage in virt/kvm/). virt/kvm/kvm_main.c: /* kvm_io_bus_write - called under kvm-slots_lock */ virt/kvm/ioapic.c: spin_lock(ioapic-lock); etc. Basically abstract 'device attributes' are too abstract. It's up to the device-specific documentation to make them not abstract (I admit there are a few details missing in mpic.txt that I've pointed out in this thread -- it is RFC v1 after all). This wouldn't be any different if we used separate ioctls for everything. It's like saying abstract 'ioctl' is too abstract. Perhaps a better way to put it is that its too permissive. It's no more permissive than saying go define an ioctl. It is somewhat more constrained than that, in that it is an operation on a particular device, as opposed to any possible KVM action. However, your proposed interface deals with sucky capability, versioning and namespace conflicts we have now. Note these items can probably be improved separately. Any particular proposals? Namespace conflicts: Reserve ranges for each arch. That only helps when the conflict is between different arches (as opposed to different devices), and it discourages sharing between arches. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/24/2013 09:46:17 AM, Gleb Natapov wrote: On Thu, Feb 21, 2013 at 08:17:54PM -0600, Scott Wood wrote: On 02/21/2013 02:22:09 AM, Gleb Natapov wrote: On Wed, Feb 20, 2013 at 08:05:12PM -0600, Scott Wood wrote: You mean like what we did with SREGS, that got deprecated and replaced with ONE_REG? SREGS is implemented by ppc. I see ONE_REG as addition to REGS interface. You can access all register at once or you can access them one by one. If there is a need we can add MULTIPLE_REGS that will get list of requested REGS. http://www.spinics.net/lists/kvm-ppc/msg04876.html http://www.spinics.net/lists/kvm-ppc/msg05842.html If Alex prefers ONE_REG interface this is his call :) My point is the reasons we prefer it, that SREGS really is deprecated for new state on PPC (REGS is not extensible so it's been in that state for even longer -- hence SREGS), and the desire to not create that same mess again. Running out of space is just one of the ways in which SREGS is a mess, and I think trying to represent MPIC state that way would be even worse. The interface is not over generic i.e it does not try to replace KVM_RUN by writing special register. Sigh. And we don't necessarily want to set *everything*. What are those cases? You do need to on reset/migration. Why do we want to set all the registers on reset, rather than tell the in-kernel device to reset? The default state came from the kernel in the first place on irqchip creation... I have nothing against telling in-kernel device to reset provided there is a way to do so, which current interface lacks. If by current you mean what is proposed here, you can use KVM_MPIC_DEV_MPIC_REGISTER to set GCR[RESET]. I can see the appeal of an explicit reset device command, though, especially if the hardware reset bit is defined such that you're supposed to poll until it's cleared to know that the reest is complete (the KVM implementation is not likely to return before the reset is complete, but it would be better to not rely on such knowledge). Reset in userspase has its advantage too: bugs are easier to fix, They're harder to fix if fixing it requires adding something to the API. Bugs in reset are just a small portion of overall bugs, so I'm not sure how important this is. It would require that we implement additional state accessors now, rather than if and when we do migration. It could cause the total bug count to go up versus code that already works and is self-contained. :-P there may be different kind of resets (hard/soft). This can be communicated through a device-specific command if it is applicable (it isn't for MPIC). Again, we do not currently support migration on MPIC. It is a very low priority for embedded. We do not wish to rule it out entirely, but it most likely would require adding more state accesors. The interface suppose to be generic, we are not talking about MPIC specifically here. There are two separate things here. There is the device control api, which is generic, but is just a way to create devices and issue commands to them. Then there is the specific device implementation, which is not generic. The details of how migration is handled for a particular device belong to the specific device implementation. Regarding migration never say never :) I did say we don't want to rule it out -- I'm just emphasising this so that we don't get bogged down in questions of how the currently defined MPIC attributes/commands would get used for migration. If we tried to blindly add support for that now, without being able to test, or being able to refer to what QEMU normally saves/restores for MPIC, we'd probably get it wrong. Just as the untested savevm code in QEMU's hw/openpic.c is wrong/incomplete. As for other types of devices, x86 has i8254.c emulated in-kernel -- I know that's not going to switch to the new interface, but it could have if it existed back then. Since it is not going to switch it is not a good example. The point of the example is to show that irqchips aren't the only thing that could ever make sense to emulate in the kernel. I can't see into the future and tell you what the actual first non-irqchip user would be, or if there will definitely be one. If we wait until that user comes along, we'd end up saying *that* is the only user and since MPIC is not going to switch it is not a good example. If no non-irqchip user ends up coming along, what is the actual harm from not putting the words irqchip in the interface description (the generic part, not the device-specific part)? I can also see creating an in-kernel emulation device for doing MMIO filtering on some piece of embedded hardware that guests need to access with reasonable performance, but the hardware desginers screwed up the protection slightly (e.g. put other things in the same 4K page). We've done such
Re: [PATCH] x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock
On Sat, Feb 23, 2013 at 05:05:29PM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com This fixes boot lockups with no-kvmclock, when the host is not exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when the kvmclock initialization failed for whatever reason. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Should go to 3.8 as well, I presume. arch/x86/kernel/kvmclock.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 5bedbdd..b730efa 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -160,8 +160,12 @@ int kvm_register_clock(char *txt) { int cpu = smp_processor_id(); int low, high, ret; - struct pvclock_vcpu_time_info *src = hv_clock[cpu].pvti; + struct pvclock_vcpu_time_info *src; + + if (!hv_clock) + return 0; + src = hv_clock[cpu].pvti; low = (int)__pa(src) | 1; high = ((u64)__pa(src) 32); ret = native_write_msr_safe(msr_kvm_system_time, low, high); @@ -276,6 +280,9 @@ int __init kvm_setup_vsyscall_timeinfo(void) struct pvclock_vcpu_time_info *vcpu_time; unsigned int size; + if (!hv_clock) + return 0; + size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS); preempt_disable(); -- 1.7.3.4 Reviewed-by: Marcelo Tosatti mtosa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in kvm on s390
The Buildbot has detected a new failure on builder s390 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/s390/builds/822 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_master' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
Re: [RFC PATCH 1/6] kvm: add device control API
On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote: On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote: Copying Christoffer since ARM has in kernel irq chip too. On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. You are not only provide different way to create in kernel irq chip you also use an alternate way to trigger interrupt lines. Before going into interface specifics lets think about whether it is really worth it? x86 obviously support old way and will have to for some, very long, time. ARM vGIC code, that is ready to go upstream, uses old way too. So it will be 2 archs against one. Christoffer do you think the proposed way it better for your needs. Are you willing to make vGIC use it? In fact there have been two distinct interrupt controller emulations for PPC posted recently, Scott's and mine, with quite different interfaces. In contrast to Scott's device control API, where the operations you would do for different interrupt controllers are quite different, mine attempted to define a much more abstract interface for interrupt controller emulations, with the idea that an interrupt controller subsystem connects a set of interrupt sources, each with some state, to a set of interrupt delivery mechanisms, one per vcpu, also with some state. I agree with Scott that it is futile to try and come up with generic irqchip configuration interface and I doubt it is needed from QEMU or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES interface and while it is possible to pass some information about pic/ioapic using it there will be a lot of information that will not fit there. For one there is global irqchips related state and proposed interface only talk about interrupt sources. Another is that using generic interface will require us to have a code that translate irqchip representation into this generic one and back for no apparent gain. Currently pic/ioapic state is very similar to what HW specification defines and it is not going to change. Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how well it work for migration though. The interface suppose to transfer irqchips state as is, but I see things like that in your code: mutex_lock(ics-lock); irqp-server = val KVM_IRQ_SERVER_MASK; irqp-priority = val KVM_IRQ_PRIORITY_SHIFT; irqp-resend = 0; irqp-masked_pending = 0; irqp-asserted = 0; Why it is safe to initialize those values to default values during migration? Wouldn't it be simpler and more correct to just transfer the whole content of irqp from src to dst? Thus my interface had: - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that indicates the overall architecture of the interrupt subsystem, - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that return or modify the state of some set of interrupt sources - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register identifier space, that is used to save and restore the per-vcpu interrupt presentation state. Alternatively, I could modify my code to use the existing KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls. If I were to do that I would want to rename the 'pad' field in struct kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of interrupt sources to be affected. I'd also want to keep the ONE_REG identifier for the per-vcpu state. It is preferable to the interface you propose since I do not think your interface fits other interrupt controllers well. You can put nr field into dummy[] payload, or replace pad with union {pad, nr}. Or I could change over to using Scott's device control API, though I have some objections to doing that. What is your advice about which interface to use? The ideal situation would be if you and Scott agree on the details about the interface. If you don't like something about Scott's interface we can discuss it and shape it to something you agree with or even like. I already asked Scott to introduce command interface. Scott does not
Re: [RFC PATCH 1/6] kvm: add device control API
On 24.02.2013, at 16:46, Gleb Natapov wrote: On Thu, Feb 21, 2013 at 08:17:54PM -0600, Scott Wood wrote: On 02/21/2013 02:22:09 AM, Gleb Natapov wrote: On Wed, Feb 20, 2013 at 08:05:12PM -0600, Scott Wood wrote: On 02/20/2013 07:09:49 AM, Gleb Natapov wrote: On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: The ability to set/get attributes is needed. Sorry, but get or set one blob of data, up to 512 bytes, for the entire irqchip is just not good enough -- assuming you don't want us to start sticking pointers and commands in *that* data. :-) Proposed interface sticks pointers into ioctl data, so why doing the same for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. There's a difference between putting a pointer in an ioctl control structure that is specifically documented as being that way (as in ONE_REG), versus taking an ioctl that claims to be setting/getting a blob of state and embedding pointers in it. It would be like sticking a pointer in the attribute payload of this API, which I think is something to be discouraged. If documentation is what differentiate for you between silly and smart then write documentation instead of new interfaces. You mean like what we did with SREGS, that got deprecated and replaced with ONE_REG? SREGS is implemented by ppc. I see ONE_REG as addition to REGS interface. You can access all register at once or you can access them one by one. If there is a need we can add MULTIPLE_REGS that will get list of requested REGS. http://www.spinics.net/lists/kvm-ppc/msg04876.html http://www.spinics.net/lists/kvm-ppc/msg05842.html If Alex prefers ONE_REG interface this is his call :) There's a reason I prefer it :). We ran out of space in the SREGS call, we had a bit map with multiple chunks of data you could set in there and worst of all, when we ran out of space we didn't realize it until the code was already in Linus' tree! The normal here's a chunk of data ioctl style works great when you know all or most of the details of what you want to transfer. In the case of kvm, we need to expose an interface we a) Don't control itself. Hardware guys make hardware. We just follow. b) Is pretty complex. There can be a lot of state in a CPU / device. So because if that we needed something more flexible and dynamic. Something where we could add registers we forgot without jumping through lots of hoops. That's why I prefer ONE_REG for vcpus. Since I see the same danger on device emulation, I tend to prefer it here as well. The interface is not over generic i.e it does not try to replace KVM_RUN by writing special register. Sigh. And we don't necessarily want to set *everything*. What are those cases? You do need to on reset/migration. Why do we want to set all the registers on reset, rather than tell the in-kernel device to reset? The default state came from the kernel in the first place on irqchip creation... I have nothing against telling in-kernel device to reset provided there is a way to do so, which current interface lacks. Reset in userspase has its advantage too: bugs are easier to fix, there may be different kind of resets (hard/soft). The same argument holds for state access. If you suddenly realize oh, I need to also set this one hidden state bit in my hardware you only want to add that specific write on reset. If you suddenly find a race condition in migration that happens very rarely, you want to add the one register that you forgot to synchronize in addition in your new code. You don't want to change the whole ioctl passed struct. That's why we have the padding in our ioctl structs. And feature bitmaps. And once we exceed them, we're screwed. I want to make sure we don't hit that point. Actually, there's a really easy way to understand what this interface does. It basically implements a state get/set ioctl which allows for very fine grained control over your ioctl struct. It does that by only passing one register at a time. If we run into concurrency issues (multiple registers need to be set in a locked manner at the same time), we can always add a new ioctl that sets multiple attrs/regs at once. For CPUs, we haven't hit that case yet. It'd also be using KVM_SET_IRQCHIP to read data, which is the sort of thing you object to later on regarding KVM_IRQ_LINE_STATUS. Do not see why. It's either that, or have the data direction of the chip field in KVM_GET_IRQCHIP not be entirely in the get direction. Still do not follow. Example? struct kvm_irqchip has chip_id, pad, and chip. pad is insufficient to communicate attribute type plus a pointer. So if we want to provide a pointer for the kernel to write the attribute into, it has to read from memory that
Re: [RFC PATCH 1/6] kvm: add device control API
On 25.02.2013, at 14:09, Gleb Natapov wrote: On Mon, Feb 25, 2013 at 12:11:19PM +1100, Paul Mackerras wrote: On Mon, Feb 18, 2013 at 02:21:59PM +0200, Gleb Natapov wrote: Copying Christoffer since ARM has in kernel irq chip too. On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. You are not only provide different way to create in kernel irq chip you also use an alternate way to trigger interrupt lines. Before going into interface specifics lets think about whether it is really worth it? x86 obviously support old way and will have to for some, very long, time. ARM vGIC code, that is ready to go upstream, uses old way too. So it will be 2 archs against one. Christoffer do you think the proposed way it better for your needs. Are you willing to make vGIC use it? In fact there have been two distinct interrupt controller emulations for PPC posted recently, Scott's and mine, with quite different interfaces. In contrast to Scott's device control API, where the operations you would do for different interrupt controllers are quite different, mine attempted to define a much more abstract interface for interrupt controller emulations, with the idea that an interrupt controller subsystem connects a set of interrupt sources, each with some state, to a set of interrupt delivery mechanisms, one per vcpu, also with some state. I agree with Scott that it is futile to try and come up with generic irqchip configuration interface and I doubt it is needed from QEMU or other userspace pov. I looked at proposed KVM_IRQCHIP_SET_SOURCES interface and while it is possible to pass some information about pic/ioapic using it there will be a lot of information that will not fit there. For one there is global irqchips related state and proposed interface only talk about interrupt sources. Another is that using generic interface will require us to have a code that translate irqchip representation into this generic one and back for no apparent gain. Currently pic/ioapic state is very similar to what HW specification defines and it is not going to change. Looking at your implementation of KVM_IRQCHIP_SET_SOURCES I wounder how well it work for migration though. The interface suppose to transfer irqchips state as is, but I see things like that in your code: mutex_lock(ics-lock); irqp-server = val KVM_IRQ_SERVER_MASK; irqp-priority = val KVM_IRQ_PRIORITY_SHIFT; irqp-resend = 0; irqp-masked_pending = 0; irqp-asserted = 0; Why it is safe to initialize those values to default values during migration? Wouldn't it be simpler and more correct to just transfer the whole content of irqp from src to dst? Thus my interface had: - a KVM_CREATE_IRQCHIP_ARGS ioctl, with an argument structure that indicates the overall architecture of the interrupt subsystem, - KVM_IRQCHIP_SET_SOURCES and KVM_IRQCHIP_GET_SOURCES ioctls, that return or modify the state of some set of interrupt sources - a KVM_REG_PPC_ICP_STATE identifier in the ONE_REG register identifier space, that is used to save and restore the per-vcpu interrupt presentation state. Alternatively, I could modify my code to use the existing KVM_CREATE_IRQCHIP, KVM_GET_IRQCHIP, and KVM_SET_IRQCHIP ioctls. If I were to do that I would want to rename the 'pad' field in struct kvm_irqchip to 'nr' and use it with 'chip_id' to identify a range of interrupt sources to be affected. I'd also want to keep the ONE_REG identifier for the per-vcpu state. It is preferable to the interface you propose since I do not think your interface fits other interrupt controllers well. You can put nr field into dummy[] payload, or replace pad with union {pad, nr}. Or I could change over to using Scott's device control API, though I have some objections to doing that. What is your advice about which interface to use? The ideal situation would be if you and Scott agree on the details about the interface. If you don't like something about Scott's interface we can discuss it and shape it to something you agree with or even like. I already asked Scott to
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/23/2013 09:04:33 AM, Marcelo Tosatti wrote: On Thu, Feb 21, 2013 at 08:00:25PM -0600, Scott Wood wrote: On 02/21/2013 05:03:32 PM, Marcelo Tosatti wrote: You are not writing to the registers from the CPU point of view. That's exactly how KVM_DEV_MPIC_GRP_REGISTER is defined and implemented on MPIC (with the exception of registers whose behavior changes based on which specific vcpu you use to access them). If/when we have a need to set/get state in a different manner, that's a separate attribute group. Can you describe usage of this register again? It's used by QEMU to reflect memory space accesses to the kernel. In practice, these are either debug accesses or MSIs. Also, it is necessary to provide proper locking of device attribute write versus vcpu device access. So far we have been focusing on having a lockless vcpu path. How is device access related to vcpus? Existing irqchip code is not lockless. VCPUS access in-kernel devices. Right, VCPUs (plural) not VCPU (singular). Thus locking is needed. Yes, it is lockless (see RCU usage in virt/kvm/). virt/kvm/kvm_main.c: /* kvm_io_bus_write - called under kvm-slots_lock */ virt/kvm/ioapic.c: spin_lock(ioapic-lock); etc. Basically abstract 'device attributes' are too abstract. It's up to the device-specific documentation to make them not abstract (I admit there are a few details missing in mpic.txt that I've pointed out in this thread -- it is RFC v1 after all). This wouldn't be any different if we used separate ioctls for everything. It's like saying abstract 'ioctl' is too abstract. Perhaps a better way to put it is that its too permissive. It's no more permissive than saying go define an ioctl. It is somewhat more constrained than that, in that it is an operation on a particular device, as opposed to any possible KVM action. However, your proposed interface deals with sucky capability, versioning and namespace conflicts we have now. Note these items can probably be improved separately. Any particular proposals? Namespace conflicts: Reserve ranges for each arch. That only helps when the conflict is between different arches (as opposed to different devices), and it discourages sharing between arches. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html