Re: [PATCH v2 14/18] nvdimm: support NFIT_CMD_IMPLEMENTED function
On Fri, Aug 14, 2015 at 10:52:07PM +0800, Xiao Guangrong wrote: @@ -306,6 +354,18 @@ struct dsm_buffer { static ram_addr_t dsm_addr; static size_t dsm_size; +struct cmd_out_implemented { QEMU coding style uses typedef struct {} CamelCase. Please follow this convention in all user-defined structs (see ./CODING_STYLE). static void dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) { +struct MemoryRegion *dsm_ram_mr = opaque; +struct dsm_buffer *dsm; +struct dsm_out *out; +void *buf; + assert(val == NOTIFY_VALUE); The guest should not be able to cause an abort(3). If val != NOTIFY_VALUE we can do nvdebug() and then return. + +buf = memory_region_get_ram_ptr(dsm_ram_mr); +dsm = buf; +out = buf; + +le32_to_cpus(dsm-handle); +le32_to_cpus(dsm-arg1); +le32_to_cpus(dsm-arg2); Can SMP guests modify DSM RAM while this thread is running? We must avoid race conditions. It's probably better to copy in data before byte-swapping or checking input values. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM slow LAMP guest
On 24-8-2015 1:26, Wanpeng Li wrote: On 8/24/15 3:18 AM, Hansa wrote: On 16-7-2015 13:27, Paolo Bonzini wrote: On 15/07/2015 22:02, C. Bröcker wrote: What OS is this? Is it RHEL/CentOS? If so, halt_poll_ns will be in 6.7 which will be out in a few days/weeks. Paolo OK. As said CentOS 6.6. But where do I put this parameter? You can add kvm.halt_poll_ns=50 to the kernel command line. If you have the parameter, you have the /sys/module/kvm/parameters/halt_poll_ns file. Hi, I upgraded to the CentOS 6.7 release which came out last month and as promised the halt_poll_ns parameter was available. Last week I tested the availability status every 5 minutes on my Wordpress VM's with the halt_poll_ns kernel param set on DOM0. I'm pleased to announce that it solves the problem! How much seconds to load your Wordpress site this time? Regards, Wanpeng Li The average is around 0.4 seconds to load my heaviest site on my slowest machine. On the VM server I issued the command below every eleven minutes: date curltest-file; _ top -b -n 1 | sed -n '7,12p' curltest-file; _ curl -o /dev/null -s -wtime_total: %{time_total}\\n https://my.domain.com | perl -pe 'BEGIN {use POSIX;} print strftime(%Y-%m-%d %H:%M:%S , localtime)' curltest-file This gives me the total time for displaying my site on a local machine. It also includes a 'top' command to display which processes are running at each sample. All is saved in a file called curltest-file. I found 7 occurrences in my curltest-file of a time_total larger than 20 seconds. Top however didn't show any significant CPU or IO activity at those sampled times. Further investigations shows me that they are related to a known (gravatar) issue in the Wordpress Jetpack plugin. I didn't include these samples in the average total. Cheers and good luck tweaking your sites! Best, Hansa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM
On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote: Changlog: - Use litten endian for DSM method, thanks for Stefan's suggestion - introduce a new parameter, @configdata, if it's false, Qemu will build a static and readonly namespace in memory and use it serveing for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no reserved region is needed at the end of the @file, it is good for the user who want to pass whole nvdimm device and make its data completely be visible to guest - divide the source code into separated files and add maintain info I have skipped ACPI patches because I'm not very familiar with that area. Have you thought about live migration? Are the contents of the NVDIMM migrated since they are registered as a RAM region? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1
Hi Thomas, On 25/08/15 16:46, Thomas Gleixner wrote: On Tue, 25 Aug 2015, Marc Zyngier wrote: +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE; + #ifndef MAX_GIC_NR #define MAX_GIC_NR 1 #endif @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d) return d-hwirq; } +static inline bool primary_gic_irq(struct irq_data *d) +{ +if (MAX_GIC_NR 1) +return irq_data_get_irq_chip_data(d) == gic_data[0]; + +return true; +} + /* * Routines to acknowledge, disable and enable interrupts */ @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d) static void gic_eoi_irq(struct irq_data *d) { -writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI); +u32 deact_offset = GIC_CPU_EOI; + +if (static_key_true(supports_deactivate)) { +if (primary_gic_irq(d)) +deact_offset = GIC_CPU_DEACTIVATE; I really wonder for the whole series whether you really want all that static key dance and extra conditionals in the callbacks instead of just using seperate irq chips for the different interrupts. Hmmm. We definitely could have different irqchips between primary and secondary controllers indeed. We'd still need a static key for the gic_handle_irq path though, but that's not too bad. Let me hack something, and I'll come back to you ;-). M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment
Thanks for writing v2, Wanpeng. On Mon, Aug 24, 2015 at 11:35 PM, Wanpeng Li wanpeng...@hotmail.com wrote: There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patch adds the ability to adjust halt_poll_ns dynamically. What testing have you done with these patches? Do you know if this removes the overhead of polling in idle VCPUs? Do we lose any of the performance from always polling? There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow matrix is suggested by David: if (poll successfully for interrupt): stay the same else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow The way you implemented this wasn't what I expected. I thought you would time the whole function (kvm_vcpu_block). But I like your approach better. It's simpler and [by inspection] does what we want. halt_poll_ns_shrink/ | halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns -+--+--- 1 | = halt_poll_ns | = 0 halt_poll_ns | *= halt_poll_ns_grow | /= halt_poll_ns_shrink otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink I was curious why you went with this approach rather than just the middle row, or just the last row. Do you think we'll want the extra flexibility? Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- virt/kvm/kvm_main.c | 65 - 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 93db833..2a4962b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -66,9 +66,26 @@ MODULE_AUTHOR(Qumranet); MODULE_LICENSE(GPL); -static unsigned int halt_poll_ns; +#define KVM_HALT_POLL_NS 50 +#define KVM_HALT_POLL_NS_GROW 2 +#define KVM_HALT_POLL_NS_SHRINK 0 +#define KVM_HALT_POLL_NS_MAX 200 The macros are not necessary. Also, hard coding the numbers in the param definitions will make reading the comments above them easier. + +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS; module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); +/* Default doubles per-vcpu halt_poll_ns. */ +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW; +module_param(halt_poll_ns_grow, int, S_IRUGO); + +/* Default resets per-vcpu halt_poll_ns . */ +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK; +module_param(halt_poll_ns_shrink, int, S_IRUGO); + +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */ Ah, I misspoke before. I was thinking about round-trip latency. The latency of a single halt is reduced by about 5-7 us. +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX; +module_param(halt_poll_ns_max, int, S_IRUGO); We can remove halt_poll_ns_max. vcpu-halt_poll_ns can always start at zero and grow from there. Then we just need one module param to keep vcpu-halt_poll_ns from growing too large. [ It would make more sense to remove halt_poll_ns and keep halt_poll_ns_max, but since halt_poll_ns already exists in upstream kernels, we probably can't remove it. ] + /* * Ordering of locks: * @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); +static unsigned int __grow_halt_poll_ns(unsigned int val) +{ + if (halt_poll_ns_grow 1) + return halt_poll_ns; + + val = min(val, halt_poll_ns_max); + + if (val == 0) + return halt_poll_ns; + + if (halt_poll_ns_grow halt_poll_ns) + val *= halt_poll_ns_grow; + else + val += halt_poll_ns_grow; + + return val; +} + +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum) minimum never gets used. +{ + if (modifier 1) + return 0; + + if (modifier halt_poll_ns) + val /= modifier; + else + val -= modifier; + + return val; +} + +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) These wrappers aren't necessary. +{ + vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns); +} + +static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu) +{ + vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns, + halt_poll_ns_shrink, halt_poll_ns); +} + static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) { if (kvm_arch_vcpu_runnable(vcpu)) { @@ -1954,6 +2013,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
Re: [PATCH v2 10/18] nvdimm: init the address region used by DSM method
On Fri, Aug 14, 2015 at 10:52:03PM +0800, Xiao Guangrong wrote: @@ -257,14 +258,91 @@ static void build_nfit_table(GSList *device_list, char *buf) } } +struct dsm_buffer { +/* RAM page. */ +uint32_t handle; +uint8_t arg0[16]; +uint32_t arg1; +uint32_t arg2; +union { +char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)]; +}; + +/* MMIO page. */ +union { +uint32_t notify; +char pedding[PAGE_SIZE]; s/pedding/padding/ +}; +}; + +static ram_addr_t dsm_addr; +static size_t dsm_size; + +static uint64_t dsm_read(void *opaque, hwaddr addr, + unsigned size) +{ +return 0; +} + +static void dsm_write(void *opaque, hwaddr addr, + uint64_t val, unsigned size) +{ +} + +static const MemoryRegionOps dsm_ops = { +.read = dsm_read, +.write = dsm_write, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static int build_dsm_buffer(void) +{ +MemoryRegion *dsm_ram_mr, *dsm_mmio_mr; +ram_addr_t addr;; s/;;/;/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 13/18] nvdimm: build namespace config data
On Fri, Aug 14, 2015 at 10:52:06PM +0800, Xiao Guangrong wrote: +#ifdef NVDIMM_DEBUG +#define nvdebug(fmt, ...) fprintf(stderr, nvdimm: fmt, ## __VA_ARGS__) +#else +#define nvdebug(...) +#endif The following allows the compiler to check format strings and syntax check the argument expressions: #define NVDIMM_DEBUG 0 /* set to 1 for debug output */ #define nvdebug(fmt, ...) \ if (NVDIMM_DEBUG) { \ fprintf(stderr, nvdimm: fmt, ## __VA_ARGS__); \ } This approach avoids bitrot (e.g. debug format string arguments have become outdated). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 15/18] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
On Fri, Aug 14, 2015 at 10:52:08PM +0800, Xiao Guangrong wrote: Function 4 is used to get Namespace lable size s/lable/label/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables
Hi Eric, On 14/08/15 12:58, Eric Auger wrote: On 07/10/2015 04:21 PM, Andre Przywara wrote: The LPI configuration and pending tables of the GICv3 LPIs are held in tables in (guest) memory. To achieve reasonable performance, we cache this data in our own data structures, so we need to sync those two views from time to time. This behaviour is well described in the GICv3 spec and is also exercised by hardware, so the sync points are well known. Provide functions that read the guest memory and store the information from the configuration and pending tables in the kernel. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- would help to have change log between v1 - v2 (valid for the whole series) include/kvm/arm_vgic.h | 2 + virt/kvm/arm/its-emul.c | 124 virt/kvm/arm/its-emul.h | 3 ++ 3 files changed, 129 insertions(+) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 2a67a10..323c33a 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -167,6 +167,8 @@ struct vgic_its { int cwriter; struct list_headdevice_list; struct list_headcollection_list; +/* memory used for buffering guest's memory */ +void*buffer_page; }; struct vgic_dist { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index b9c40d7..05245cb 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -50,6 +50,7 @@ struct its_itte { struct its_collection *collection; u32 lpi; u32 event_id; +u8 priority; bool enabled; unsigned long *pending; }; @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi) return NULL; } +#define LPI_PROP_ENABLE_BIT(p) ((p) LPI_PROP_ENABLED) +#define LPI_PROP_PRIORITY(p)((p) 0xfc) + +/* stores the priority and enable bit for a given LPI */ +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 prop) +{ +itte-priority = LPI_PROP_PRIORITY(prop); +itte-enabled = LPI_PROP_ENABLE_BIT(prop); +} + +#define GIC_LPI_OFFSET 8192 + +/* We scan the table in chunks the size of the smallest page size */ 4kB chunks? Marc was complaining about this wording, I think. The rationale was that 4K is already in the code and thus does not need to be repeated in the comment, whereas the comment should explain the meaning of the value. +#define CHUNK_SIZE 4096U + #define BASER_BASE_ADDRESS(x) ((x) 0xf000ULL) +static int nr_idbits_propbase(u64 propbaser) +{ +int nr_idbits = (1U (propbaser 0x1f)) + 1; + +return max(nr_idbits, INTERRUPT_ID_BITS_ITS); +} + +/* + * Scan the whole LPI configuration table and put the LPI configuration + * data in our own data structures. This relies on the LPI being + * mapped before. + */ +static bool its_update_lpis_configuration(struct kvm *kvm) +{ +struct vgic_dist *dist = kvm-arch.vgic; +u8 *prop = dist-its.buffer_page; +u32 tsize; +gpa_t propbase; +int lpi = GIC_LPI_OFFSET; +struct its_itte *itte; +struct its_device *device; +int ret; + +propbase = BASER_BASE_ADDRESS(dist-propbaser); +tsize = nr_idbits_propbase(dist-propbaser); + +while (tsize 0) { +int chunksize = min(tsize, CHUNK_SIZE); + +ret = kvm_read_guest(kvm, propbase, prop, chunksize); I think you still have the spin_lock issue since if my understanding is correct this is called from vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis where vgic_handle_mmio_access. Or does it take another path? Well, it's (also) called on handling the INVALL command, but you are right that on that enable path the dist lock is held. I reckon that this init part isn't racy so that shouldn't be a problem (famous last words ;-). Let me see whether I can find a way to just drop the lock around the while loop. Cheers, Andre. Shouldn't we create a new kvm_io_device to avoid holding the dist lock? Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 06/18] pc: implement NVDIMM device abstract
On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote: +static void set_file(Object *obj, const char *str, Error **errp) +{ +PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj); + +if (nvdimm-file) { +g_free(nvdimm-file); +} g_free(NULL) is a nop so it's safe to replace the if with just g_free(nvdimm-file). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/15] KVM: arm64: handle pending bit for LPIs in ITS emulation
Hi Eric, On 14/08/15 12:58, Eric Auger wrote: On 07/10/2015 04:21 PM, Andre Przywara wrote: As the actual LPI number in a guest can be quite high, but is mostly assigned using a very sparse allocation scheme, bitmaps and arrays for storing the virtual interrupt status are a waste of memory. We use our equivalent of the Interrupt Translation Table Entry (ITTE) to hold this extra status information for a virtual LPI. As the normal VGIC code cannot use it's fancy bitmaps to manage pending interrupts, we provide a hook in the VGIC code to let the ITS emulation handle the list register queueing itself. LPIs are located in a separate number range (=8192), so distinguishing them is easy. With LPIs being only edge-triggered, we get away with a less complex IRQ handling. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- include/kvm/arm_vgic.h | 2 ++ virt/kvm/arm/its-emul.c | 71 virt/kvm/arm/its-emul.h | 3 ++ virt/kvm/arm/vgic-v3-emul.c | 2 ++ virt/kvm/arm/vgic.c | 72 ++--- 5 files changed, 133 insertions(+), 17 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 1648668..2a67a10 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -147,6 +147,8 @@ struct vgic_vm_ops { int (*init_model)(struct kvm *); void(*destroy_model)(struct kvm *); int (*map_resources)(struct kvm *, const struct vgic_params *); + bool(*queue_lpis)(struct kvm_vcpu *); + void(*unqueue_lpi)(struct kvm_vcpu *, int irq); }; struct vgic_io_device { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index 7f217fa..b9c40d7 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -50,8 +50,26 @@ struct its_itte { struct its_collection *collection; u32 lpi; u32 event_id; + bool enabled; + unsigned long *pending; }; +#define for_each_lpi(dev, itte, kvm) \ + list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) \ + list_for_each_entry(itte, (dev)-itt, itte_list) + You have a checkpatch error here: ERROR: Macros with complex values should be enclosed in parentheses #52: FILE: virt/kvm/arm/its-emul.c:57: +#define for_each_lpi(dev, itte, kvm) \ + list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) \ + list_for_each_entry(itte, (dev)-itt, itte_list) I know about that one. The problem is that if I add the parentheses it breaks the usage below due to the curly brackets. But the definition above is just so convenient and I couldn't find another neat solution so far. If you are concerned about that I can give it another try, otherwise I tend to just ignore checkpatch here. +static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi) +{ can't we have the same LPI present in different interrupt translation tables? I don't know it is a sensible setting but I did not succeed in finding it was not possible. Thanks to Marc I am happy (and relieved!) to point you to 6.1.1 LPI INTIDs: The behavior of the GIC is UNPREDICTABLE if software: - Maps multiple EventID/DeviceID combinations to the same physical LPI INTID. So I exercise the freedom of UNPREDICTABLE here ;-) + struct its_device *device; + struct its_itte *itte; + + for_each_lpi(device, itte, kvm) { + if (itte-lpi == lpi) + return itte; + } + return NULL; +} + #define BASER_BASE_ADDRESS(x) ((x) 0xf000ULL) /* The distributor lock is held by the VGIC MMIO handler. */ @@ -145,6 +163,59 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu, return false; } +/* + * Find all enabled and pending LPIs and queue them into the list + * registers. + * The dist lock is held by the caller. + */ +bool vits_queue_lpis(struct kvm_vcpu *vcpu) +{ + struct vgic_its *its = vcpu-kvm-arch.vgic.its; + struct its_device *device; + struct its_itte *itte; + bool ret = true; + + if (!vgic_has_its(vcpu-kvm)) + return true; + if (!its-enabled || !vcpu-kvm-arch.vgic.lpis_enabled) + return true; + + spin_lock(its-lock); + for_each_lpi(device, itte, vcpu-kvm) { + if (!itte-enabled || !test_bit(vcpu-vcpu_id, itte-pending)) + continue; + + if (!itte-collection) + continue; + + if (itte-collection-target_addr != vcpu-vcpu_id) + continue; + + __clear_bit(vcpu-vcpu_id, itte-pending); + + ret = vgic_queue_irq(vcpu, 0, itte-lpi); what if the vgic_queue_irq fails since no LR can be found, the itte-pending was cleared so we forget that LPI? shouldn't we restore the pending state in ITT? in vgic_queue_hwirq the state change only is performed if the
Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister
On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote: All fields of kvm_io_range were initialized or copied explicitly afterwards. So switch to use kmalloc(). Is there any compiler added alignment padding in either structure? If so, those padding areas would now be uninitialized and may leak kernel data if copied to user-space. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c [] @@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, if (bus-dev_count - bus-ioeventfd_count NR_IOBUS_DEVS - 1) return -ENOSPC; - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) * + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) * sizeof(struct kvm_io_range)), GFP_KERNEL); if (!new_bus) return -ENOMEM; @@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, if (r) return r; - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) * + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) * sizeof(struct kvm_io_range)), GFP_KERNEL); if (!new_bus) return -ENOMEM; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote: diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c index a53d235..7a270a8 100644 --- a/hw/mem/nvdimm/pc-nvdimm.c +++ b/hw/mem/nvdimm/pc-nvdimm.c @@ -24,6 +24,19 @@ #include hw/mem/pc-nvdimm.h +#define PAGE_SIZE (1UL 12) This macro name is likely to collide with system headers or other code. Could you use the existing TARGET_PAGE_SIZE constant instead? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 08/18] nvdimm: init backend memory mapping and config data area
On Fri, Aug 14, 2015 at 10:52:01PM +0800, Xiao Guangrong wrote: The parameter @file is used as backed memory for NVDIMM which is divided into two parts if @dataconfig is true: s/dataconfig/configdata/ @@ -76,13 +109,87 @@ static void pc_nvdimm_init(Object *obj) set_configdata, NULL); } +static uint64_t get_file_size(int fd) +{ +struct stat stat_buf; +uint64_t size; + +if (fstat(fd, stat_buf) 0) { +return 0; +} + +if (S_ISREG(stat_buf.st_mode)) { +return stat_buf.st_size; +} + +if (S_ISBLK(stat_buf.st_mode) !ioctl(fd, BLKGETSIZE64, size)) { +return size; +} #ifdef __linux__ for ioctl(fd, BLKGETSIZE64, size)? There is nothing Linux-specific about emulating NVDIMMs so this code should compile on all platforms. + +return 0; +} + static void pc_nvdimm_realize(DeviceState *dev, Error **errp) { PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev); +char name[512]; +void *buf; +ram_addr_t addr; +uint64_t size, nvdimm_size, config_size = MIN_CONFIG_DATA_SIZE; +int fd; if (!nvdimm-file) { error_setg(errp, file property is not set); } Missing return here. + +fd = open(nvdimm-file, O_RDWR); Does it make sense to support read-only NVDIMMs? It could be handy for sharing a read-only file between unprivileged guests. The permissions on the file would only allow read, not write. +if (fd 0) { +error_setg(errp, can not open %s, nvdimm-file); s/can not/cannot/ +return; +} + +size = get_file_size(fd); +buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); I guess the user will want to choose between MAP_SHARED and MAP_PRIVATE. This can be added in the future. +if (buf == MAP_FAILED) { +error_setg(errp, can not do mmap on %s, nvdimm-file); +goto do_close; +} + +nvdimm-config_data_size = config_size; +if (nvdimm-configdata) { +/* reserve MIN_CONFIGDATA_AREA_SIZE for configue data. */ +nvdimm_size = size - config_size; +nvdimm-config_data_addr = buf + nvdimm_size; +} else { +nvdimm_size = size; +nvdimm-config_data_addr = NULL; +} + +if ((int64_t)nvdimm_size = 0) { The error cases can be detected before mmap(2). That avoids the int64_t cast and also avoids nvdimm_size underflow and the bogus nvdimm-config_data_addr calculation above. size = get_file_size(fd); if (size == 0) { error_setg(errp, empty file or unable to get file size); goto do_close; } else if (nvdimm-configdata size config_size) {{ error_setg(errp, file size is too small to store NVDIMM configure data); goto do_close; } +error_setg(errp, file size is too small to store NVDIMM + configure data); +goto do_unmap; +} + +addr = reserved_range_push(nvdimm_size); +if (!addr) { +error_setg(errp, do not have enough space for size %#lx.\n, size); error_setg() messages must not have a newline at the end. Please use %# PRIx64 instead of %#lx so compilation works on 32-bit hosts where sizeof(long) == 4. +goto do_unmap; +} + +nvdimm-device_index = new_device_index(); +sprintf(name, NVDIMM-%d, nvdimm-device_index); +memory_region_init_ram_ptr(nvdimm-mr, OBJECT(dev), name, nvdimm_size, + buf); How is the autogenerated name used? Why not just use pc-nvdimm.memory? +vmstate_register_ram(nvdimm-mr, DEVICE(dev)); +memory_region_add_subregion(get_system_memory(), addr, nvdimm-mr); + +return; fd is leaked. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/18] nvdimm: reserve address range for NVDIMM
On Fri, Aug 14, 2015 at 10:52:00PM +0800, Xiao Guangrong wrote: NVDIMM reserves all the free range above 4G to do: - Persistent Memory (PMEM) mapping - implement NVDIMM ACPI device _DSM method Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com --- hw/i386/pc.c | 12 ++-- hw/mem/nvdimm/pc-nvdimm.c | 13 + include/hw/mem/pc-nvdimm.h | 1 + 3 files changed, 24 insertions(+), 2 deletions(-) CCing Igor for memory hotplug-related changes. diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 7661ea9..41af6ea 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -64,6 +64,7 @@ #include hw/pci/pci_host.h #include acpi-build.h #include hw/mem/pc-dimm.h +#include hw/mem/pc-nvdimm.h #include qapi/visitor.h #include qapi-visit.h @@ -1302,6 +1303,7 @@ FWCfgState *pc_memory_init(MachineState *machine, MemoryRegion *ram_below_4g, *ram_above_4g; FWCfgState *fw_cfg; PCMachineState *pcms = PC_MACHINE(machine); +ram_addr_t offset; assert(machine-ram_size == below_4g_mem_size + above_4g_mem_size); @@ -1339,6 +1341,8 @@ FWCfgState *pc_memory_init(MachineState *machine, exit(EXIT_FAILURE); } +offset = 0x1ULL + above_4g_mem_size; + /* initialize hotplug memory address space */ if (guest_info-has_reserved_memory (machine-ram_size machine-maxram_size)) { @@ -1358,8 +1362,7 @@ FWCfgState *pc_memory_init(MachineState *machine, exit(EXIT_FAILURE); } -pcms-hotplug_memory.base = -ROUND_UP(0x1ULL + above_4g_mem_size, 1ULL 30); +pcms-hotplug_memory.base = ROUND_UP(offset, 1ULL 30); if (pcms-enforce_aligned_dimm) { /* size hotplug region assuming 1G page max alignment per slot */ @@ -1377,8 +1380,13 @@ FWCfgState *pc_memory_init(MachineState *machine, hotplug-memory, hotplug_mem_size); memory_region_add_subregion(system_memory, pcms-hotplug_memory.base, pcms-hotplug_memory.mr); + +offset = pcms-hotplug_memory.base + hotplug_mem_size; } + /* all the space left above 4G is reserved for NVDIMM. */ +pc_nvdimm_reserve_range(offset); + /* Initialize PC system firmware */ pc_system_firmware_init(rom_memory, guest_info-isapc_ram_fw); diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c index a53d235..7a270a8 100644 --- a/hw/mem/nvdimm/pc-nvdimm.c +++ b/hw/mem/nvdimm/pc-nvdimm.c @@ -24,6 +24,19 @@ #include hw/mem/pc-nvdimm.h +#define PAGE_SIZE (1UL 12) + +static struct nvdimms_info { +ram_addr_t current_addr; +} nvdimms_info; + +/* the address range [offset, ~0ULL) is reserved for NVDIMM. */ +void pc_nvdimm_reserve_range(ram_addr_t offset) +{ +offset = ROUND_UP(offset, PAGE_SIZE); +nvdimms_info.current_addr = offset; +} + static char *get_file(Object *obj, Error **errp) { PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj); diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h index 51152b8..8601e9b 100644 --- a/include/hw/mem/pc-nvdimm.h +++ b/include/hw/mem/pc-nvdimm.h @@ -28,4 +28,5 @@ typedef struct PCNVDIMMDevice { #define PC_NVDIMM(obj) \ OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM) +void pc_nvdimm_reserve_range(ram_addr_t offset); #endif -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1
On Tue, 25 Aug 2015, Marc Zyngier wrote: +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE; + #ifndef MAX_GIC_NR #define MAX_GIC_NR 1 #endif @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d) return d-hwirq; } +static inline bool primary_gic_irq(struct irq_data *d) +{ + if (MAX_GIC_NR 1) + return irq_data_get_irq_chip_data(d) == gic_data[0]; + + return true; +} + /* * Routines to acknowledge, disable and enable interrupts */ @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d) static void gic_eoi_irq(struct irq_data *d) { - writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI); + u32 deact_offset = GIC_CPU_EOI; + + if (static_key_true(supports_deactivate)) { + if (primary_gic_irq(d)) + deact_offset = GIC_CPU_DEACTIVATE; I really wonder for the whole series whether you really want all that static key dance and extra conditionals in the callbacks instead of just using seperate irq chips for the different interrupts. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables
Hi Eric, On 14/08/15 13:35, Eric Auger wrote: On 08/14/2015 01:58 PM, Eric Auger wrote: On 07/10/2015 04:21 PM, Andre Przywara wrote: The LPI configuration and pending tables of the GICv3 LPIs are held in tables in (guest) memory. To achieve reasonable performance, we cache this data in our own data structures, so we need to sync those two views from time to time. This behaviour is well described in the GICv3 spec and is also exercised by hardware, so the sync points are well known. Provide functions that read the guest memory and store the information from the configuration and pending tables in the kernel. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- would help to have change log between v1 - v2 (valid for the whole series) include/kvm/arm_vgic.h | 2 + virt/kvm/arm/its-emul.c | 124 virt/kvm/arm/its-emul.h | 3 ++ 3 files changed, 129 insertions(+) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 2a67a10..323c33a 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -167,6 +167,8 @@ struct vgic_its { int cwriter; struct list_headdevice_list; struct list_headcollection_list; + /* memory used for buffering guest's memory */ + void*buffer_page; }; struct vgic_dist { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index b9c40d7..05245cb 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -50,6 +50,7 @@ struct its_itte { struct its_collection *collection; u32 lpi; u32 event_id; + u8 priority; bool enabled; unsigned long *pending; }; @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi) return NULL; } +#define LPI_PROP_ENABLE_BIT(p) ((p) LPI_PROP_ENABLED) +#define LPI_PROP_PRIORITY(p) ((p) 0xfc) + +/* stores the priority and enable bit for a given LPI */ +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 prop) +{ + itte-priority = LPI_PROP_PRIORITY(prop); + itte-enabled = LPI_PROP_ENABLE_BIT(prop); +} + +#define GIC_LPI_OFFSET 8192 + +/* We scan the table in chunks the size of the smallest page size */ 4kB chunks? +#define CHUNK_SIZE 4096U + #define BASER_BASE_ADDRESS(x) ((x) 0xf000ULL) +static int nr_idbits_propbase(u64 propbaser) +{ + int nr_idbits = (1U (propbaser 0x1f)) + 1; + + return max(nr_idbits, INTERRUPT_ID_BITS_ITS); +} + +/* + * Scan the whole LPI configuration table and put the LPI configuration + * data in our own data structures. This relies on the LPI being + * mapped before. + */ +static bool its_update_lpis_configuration(struct kvm *kvm) +{ + struct vgic_dist *dist = kvm-arch.vgic; + u8 *prop = dist-its.buffer_page; + u32 tsize; + gpa_t propbase; + int lpi = GIC_LPI_OFFSET; + struct its_itte *itte; + struct its_device *device; + int ret; + + propbase = BASER_BASE_ADDRESS(dist-propbaser); + tsize = nr_idbits_propbase(dist-propbaser); + + while (tsize 0) { + int chunksize = min(tsize, CHUNK_SIZE); + + ret = kvm_read_guest(kvm, propbase, prop, chunksize); I think you still have the spin_lock issue since if my understanding is correct this is called from vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis where vgic_handle_mmio_access. Or does it take another path? Shouldn't we create a new kvm_io_device to avoid holding the dist lock? Sorry I forgot it was the case already. But currently we always register the same io ops (registration entry point being vgic_register_kvm_io_dev) and maybe we should have separate dispatcher function for dist, redit and its? What would be the idea behind it? To have separate locks for each? I don't think that will work, as some ITS functions are called from GICv3 register handler functions which manipulate members of the distributor structure. So I am more in favour of dropping the dist lock in these cases before handing off execution to ITS specific functions. Cheers, Andre. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 3/3] kvm: add tracepoint for fast mmio
On 08/25/2015 07:34 PM, Michael S. Tsirkin wrote: On Tue, Aug 25, 2015 at 03:47:15PM +0800, Jason Wang wrote: Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- arch/x86/kvm/trace.h | 17 + arch/x86/kvm/vmx.c | 1 + arch/x86/kvm/x86.c | 1 + 3 files changed, 19 insertions(+) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 4eae7c3..2d4e81a 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio, __entry-count 1 ? (...) : ) ); +TRACE_EVENT(kvm_fast_mmio, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64,gpa) + ), + + TP_fast_assign( + __entry-gpa= gpa; + ), + + TP_printk(fast mmio at gpa 0x%llx, __entry-gpa) +); + + + don't add multiple empty lines please. Ok -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On 08/25/2015 07:33 PM, Michael S. Tsirkin wrote: On Tue, Aug 25, 2015 at 03:47:14PM +0800, Jason Wang wrote: We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com I'm worried that this slows down the regular MMIO. I doubt whether or not it was measurable. Could you share performance #s please? You need a mix of len=0 and len=2 matches. Ok. One solution for the first issue is to create two ioeventfd objects instead. Sounds good. For the second issue, we could change bsearch compare function instead. What do you mean by second issue ? Again, affects all devices to performance #s would be needed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister
On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote: On 08/25/2015 11:29 PM, Joe Perches wrote: On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote: All fields of kvm_io_range were initialized or copied explicitly afterwards. So switch to use kmalloc(). Is there any compiler added alignment padding in either structure? If so, those padding areas would now be uninitialized and may leak kernel data if copied to user-space. I get your concern, but I don't a way to copy them to userspace, did you? I didn't look. I just wanted you to be aware there's a difference and a reason why kzalloc might be used even though all structure members are initialized. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister
On 08/26/2015 01:45 PM, Joe Perches wrote: On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote: On 08/25/2015 11:29 PM, Joe Perches wrote: On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote: All fields of kvm_io_range were initialized or copied explicitly afterwards. So switch to use kmalloc(). Is there any compiler added alignment padding in either structure? If so, those padding areas would now be uninitialized and may leak kernel data if copied to user-space. I get your concern, but I don't a way to copy them to userspace, did you? I didn't look. I just wanted you to be aware there's a difference and a reason why kzalloc might be used even though all structure members are initialized. I see, thanks for the reminding. Looks like we are safe and I will add something like kvm_io_range was never accessed by userspace in the commit log if there's a new version. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On 08/25/2015 07:51 PM, Michael S. Tsirkin wrote: On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote: We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V2: - Tweak styles and comment suggested by Cornelia. Changes from v1: - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when needed to save lots of unnecessary changes. --- virt/kvm/eventfd.c | 31 +-- virt/kvm/kvm_main.c | 16 ++-- 2 files changed, 23 insertions(+), 24 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..c3ffdc3 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args) { - if (flags KVM_IOEVENTFD_FLAG_PIO) + if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; - if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) + if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; - return KVM_MMIO_BUS; + /* When length is ignored, MMIO is put on a separate bus, for + * faster lookups. + */ + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS; } static int @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) struct eventfd_ctx *eventfd; int ret; - bus_idx = ioeventfd_bus_from_flags(args-flags); + bus_idx = ioeventfd_bus_from_args(args); /* must be
Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister
On 08/25/2015 11:29 PM, Joe Perches wrote: On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote: All fields of kvm_io_range were initialized or copied explicitly afterwards. So switch to use kmalloc(). Is there any compiler added alignment padding in either structure? If so, those padding areas would now be uninitialized and may leak kernel data if copied to user-space. I get your concern, but I don't a way to copy them to userspace, did you? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On 08/25/2015 11:04 AM, Jason Wang wrote: [...] @@ -900,10 +899,11 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) if (!p-wildcard p-datamatch != args-datamatch) continue; -kvm_io_bus_unregister_dev(kvm, bus_idx, p-dev); if (!p-length) { kvm_io_bus_unregister_dev(kvm, KVM_FAST_MMIO_BUS, p-dev); +} else { +kvm_io_bus_unregister_dev(kvm, bus_idx, p-dev); } Similar comments here... do you want to check for bus_idx == KVM_MMIO_BUS as well? Good catch. I think keep the original code as is will be also ok to solve this. (with changing the bus_idx to KVM_FAST_MMIO_BUS during registering if it was an wildcard mmio). Do you need to handle the ioeventfd_count changes on the fast mmio bus as well? Yes. So actually, it needs some changes: checking the return value of kvm_io_bus_unregister_dev() and decide which bus does the device belongs to. Looks like it will be more cleaner by just changing ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS accordingly. Will post V2 soon. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On Tue, 25 Aug 2015 15:47:14 +0800 Jason Wang jasow...@redhat.com wrote: diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..95f2901 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args) ioeventfd_bus_from_args()? But _from_flags() is not wrong either :) { - if (flags KVM_IOEVENTFD_FLAG_PIO) + if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; - if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) + if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; - return KVM_MMIO_BUS; + if (args-len) + return KVM_MMIO_BUS; + return KVM_FAST_MMIO_BUS; Hm... /* When length is ignored, MMIO is put on a separate bus, for * faster lookups. */ return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS; } static int This version of the patch looks nice and compact. Regardless whether you want to follow my (minor) style suggestions, consider this patch Acked-by: Cornelia Huck cornelia.h...@de.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister
All fields of kvm_io_range were initialized or copied explicitly afterwards. So switch to use kmalloc(). Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- virt/kvm/kvm_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8b8a444..0d79fe8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, if (bus-dev_count - bus-ioeventfd_count NR_IOBUS_DEVS - 1) return -ENOSPC; - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) * + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) * sizeof(struct kvm_io_range)), GFP_KERNEL); if (!new_bus) return -ENOMEM; @@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, if (r) return r; - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) * + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) * sizeof(struct kvm_io_range)), GFP_KERNEL); if (!new_bus) return -ENOMEM; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 3/3] kvm: add tracepoint for fast mmio
Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- arch/x86/kvm/trace.h | 17 + arch/x86/kvm/vmx.c | 1 + arch/x86/kvm/x86.c | 1 + 3 files changed, 19 insertions(+) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 4eae7c3..2d4e81a 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio, __entry-count 1 ? (...) : ) ); +TRACE_EVENT(kvm_fast_mmio, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64,gpa) + ), + + TP_fast_assign( + __entry-gpa= gpa; + ), + + TP_printk(fast mmio at gpa 0x%llx, __entry-gpa) +); + + + /* * Tracepoint for cpuid. */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 83b7b5c..a55d279 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { skip_emulated_instruction(vcpu); + trace_kvm_fast_mmio(gpa); return 1; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f0f6ec..36cf78e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from v1: - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when needed to save lots of unnecessary changes. --- virt/kvm/eventfd.c | 30 -- virt/kvm/kvm_main.c | 16 ++-- 2 files changed, 22 insertions(+), 24 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..95f2901 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args) { - if (flags KVM_IOEVENTFD_FLAG_PIO) + if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; - if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) + if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; - return KVM_MMIO_BUS; + if (args-len) + return KVM_MMIO_BUS; + return KVM_FAST_MMIO_BUS; } static int @@ -779,7 +781,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) struct eventfd_ctx *eventfd; int ret; - bus_idx = ioeventfd_bus_from_flags(args-flags); + bus_idx = ioeventfd_bus_from_flags(args); /* must be natural-word sized, or 0 to ignore length */ switch (args-len) { case 0: @@ -843,16 +845,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) if (ret 0) goto unlock_fail; - /* When length is ignored, MMIO is also put on a separate bus, for -* faster lookups. -*/ - if (!args-len !(args-flags
[PATCH v7 16/17] KVM: Warn if 'SN' is set during posting interrupts by software
Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot post interrupts when 'SN' is set. If the vcpu is in guest mode, it cannot have been scheduled out, and that's the only case when SN is set currently, warning if SN is set. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/vmx.c | 16 1 file changed, 16 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 64e35ea..eb640a1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4494,6 +4494,22 @@ static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu) { #ifdef CONFIG_SMP if (vcpu-mode == IN_GUEST_MODE) { + struct vcpu_vmx *vmx = to_vmx(vcpu); + + /* +* Currently, we don't support urgent interrupt, +* all interrupts are recognized as non-urgent +* interrupt, so we cannot post interrupts when +* 'SN' is set. +* +* If the vcpu is in guest mode, it means it is +* running instead of being scheduled out and +* waiting in the run queue, and that's the only +* case when 'SN' is set currently, warning if +* 'SN' is set. +*/ + WARN_ON_ONCE(pi_test_sn(vmx-pi_desc)); + apic-send_IPI_mask(get_cpu_mask(vcpu-cpu), POSTED_INTR_VECTOR); return true; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] KVM: make halt_poll_ns per-VCPU
Change halt_poll_ns into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 81089cf..1bef9e2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -242,6 +242,7 @@ struct kvm_vcpu { int sigset_active; sigset_t sigset; struct kvm_vcpu_stat stat; + unsigned int halt_poll_ns; #ifdef CONFIG_HAS_IOMEM int mmio_needed; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d8db2f8f..93db833 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -217,6 +217,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) vcpu-kvm = kvm; vcpu-vcpu_id = id; vcpu-pid = NULL; + vcpu-halt_poll_ns = halt_poll_ns; init_waitqueue_head(vcpu-wq); kvm_async_pf_vcpu_init(vcpu); @@ -1930,8 +1931,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) bool waited = false; start = cur = ktime_get(); - if (halt_poll_ns) { - ktime_t stop = ktime_add_ns(ktime_get(), halt_poll_ns); + if (vcpu-halt_poll_ns) { + ktime_t stop = ktime_add_ns(ktime_get(), vcpu-halt_poll_ns); do { /* -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 02/17] KVM: Add some helper functions for Posted-Interrupts
This patch adds some helper functions to manipulate the Posted-Interrupts Descriptor. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/vmx.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 271dd70..316f9bf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -443,6 +443,8 @@ struct nested_vmx { }; #define POSTED_INTR_ON 0 +#define POSTED_INTR_SN 1 + /* Posted-Interrupt Descriptor */ struct pi_desc { u32 pir[8]; /* Posted interrupt requested */ @@ -483,6 +485,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc) return test_and_set_bit(vector, (unsigned long *)pi_desc-pir); } +static void pi_clear_sn(struct pi_desc *pi_desc) +{ + return clear_bit(POSTED_INTR_SN, + (unsigned long *)pi_desc-control); +} + +static void pi_set_sn(struct pi_desc *pi_desc) +{ + return set_bit(POSTED_INTR_SN, + (unsigned long *)pi_desc-control); +} + +static int pi_test_on(struct pi_desc *pi_desc) +{ + return test_bit(POSTED_INTR_ON, + (unsigned long *)pi_desc-control); +} + +static int pi_test_sn(struct pi_desc *pi_desc) +{ + return test_bit(POSTED_INTR_SN, + (unsigned long *)pi_desc-control); +} + struct vcpu_vmx { struct kvm_vcpu vcpu; unsigned long host_rsp; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 05/17] KVM: Add interfaces to control PI outside vmx
This patch adds pi_clear_sn and pi_set_sn to struct kvm_x86_ops, so we can set/clear SN outside vmx. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/vmx.c | 13 + 2 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d50c1d3..c4f99f1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -860,6 +860,9 @@ struct kvm_x86_ops { gfn_t offset, unsigned long mask); u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu); + + void (*pi_clear_sn)(struct kvm_vcpu *vcpu); + void (*pi_set_sn)(struct kvm_vcpu *vcpu); /* pmu operations of sub-arch */ const struct kvm_pmu_ops *pmu_ops; }; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 81a995c..234f720 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -615,6 +615,16 @@ struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) return (to_vmx(vcpu)-pi_desc); } +static void vmx_pi_clear_sn(struct kvm_vcpu *vcpu) +{ + pi_clear_sn(vcpu_to_pi_desc(vcpu)); +} + +static void vmx_pi_set_sn(struct kvm_vcpu *vcpu) +{ + pi_set_sn(vcpu_to_pi_desc(vcpu)); +} + static unsigned long shadow_read_only_fields[] = { /* * We do NOT shadow fields that are modified when L0 @@ -10471,6 +10481,9 @@ static struct kvm_x86_ops vmx_x86_ops = { .get_pi_desc_addr = vmx_get_pi_desc_addr, + .pi_clear_sn = vmx_pi_clear_sn, + .pi_set_sn = vmx_pi_set_sn, + .pmu_ops = intel_pmu_ops, }; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 00/17] Add VT-d Posted-Interrupts support
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html v7: * Define two weak irq bypass callbacks: - kvm_arch_irq_bypass_start() - kvm_arch_irq_bypass_stop() * Remove the x86 dummy implementation of the above two functions. * Print some useful information instead of WARN_ON() when the irq bypass consumer unregistration fails. * Fix an issue when calling pi_pre_block and pi_post_block. v6: * Rebase on 4.2.0-rc6 * Rebase on https://lkml.org/lkml/2015/8/6/526 and http://www.gossamer-threads.com/lists/linux/kernel/2235623 * Make the add_consumer and del_consumer callbacks static * Remove pointless INIT_LIST_HEAD to 'vdev-ctx[vector].producer.node)' * Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails * Remove optional dummy callbacks for irq producer v4: * For lowest-priority interrupt, only support single-CPU destination interrupts at the current stage, more common lowest priority support will be added later. * Accoring to Marcelo's suggestion, when vCPU is blocked, we handle the posted-interrupts in the HLT emulation path. * Some small changes (coding style, typo, add some code comments) v3: * Adjust the Posted-interrupts Descriptor updating logic when vCPU is preempted or blocked. * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -- KVM_DEV_VFIO_DEVICE_POST_IRQ * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -- __KVM_HAVE_ARCH_KVM_VFIO_POST * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which can be used to change back to remapping mode. * Fix typo v2: * Use VFIO framework to enable this feature, the VFIO part of this series is base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control * Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, then revise some irq logic based on the new hierarchy irqdomain patches provided by Jiang Liu jiang@linux.intel.com Feng Wu (17): KVM: Extend struct pi_desc for VT-d Posted-Interrupts KVM: Add some helper functions for Posted-Interrupts KVM: Define a new interface kvm_intr_is_single_vcpu() KVM: Get Posted-Interrupts descriptor address from 'struct kvm_vcpu' KVM: Add interfaces to control PI outside vmx KVM: Make struct kvm_irq_routing_table accessible KVM: make kvm_set_msi_irq() public vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices vfio: Register/unregister irq_bypass_producer KVM: x86: Update IRTE for posted-interrupts KVM: Define two weak arch callbacks for irq bypass manager KVM: Implement IRQ bypass consumer callbacks for x86 KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' KVM: Update Posted-Interrupts Descriptor when vCPU is preempted KVM: Update Posted-Interrupts Descriptor when vCPU is blocked KVM: Warn if 'SN' is set during posting interrupts by software iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Documentation/kernel-parameters.txt | 1 + arch/x86/include/asm/kvm_host.h | 20 +++ arch/x86/kvm/Kconfig| 1 + arch/x86/kvm/irq_comm.c | 28 +++- arch/x86/kvm/vmx.c | 288 +++- arch/x86/kvm/x86.c | 167 +++-- drivers/iommu/irq_remapping.c | 12 +- drivers/vfio/pci/Kconfig| 1 + drivers/vfio/pci/vfio_pci_intrs.c | 9 ++ drivers/vfio/pci/vfio_pci_private.h | 2 + include/linux/kvm_host.h| 28 include/linux/kvm_irqfd.h | 2 + virt/kvm/eventfd.c | 22 ++- virt/kvm/irqchip.c | 10 -- virt/kvm/kvm_main.c | 3 + 15 files changed, 565 insertions(+), 29 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 06/17] KVM: Make struct kvm_irq_routing_table accessible
Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h, so we can use it outside of irqchip.c. Signed-off-by: Feng Wu feng...@intel.com --- include/linux/kvm_host.h | 14 ++ virt/kvm/irqchip.c | 10 -- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5ac8d21..5f183fb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -328,6 +328,20 @@ struct kvm_kernel_irq_routing_entry { struct hlist_node link; }; +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING + +struct kvm_irq_routing_table { + int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS]; + u32 nr_rt_entries; + /* +* Array indexed by gsi. Each entry contains list of irq chips +* the gsi is connected to. +*/ + struct hlist_head map[0]; +}; + +#endif + #ifndef KVM_PRIVATE_MEM_SLOTS #define KVM_PRIVATE_MEM_SLOTS 0 #endif diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c index 21c1424..2cf45d3 100644 --- a/virt/kvm/irqchip.c +++ b/virt/kvm/irqchip.c @@ -31,16 +31,6 @@ #include trace/events/kvm.h #include irq.h -struct kvm_irq_routing_table { - int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS]; - u32 nr_rt_entries; - /* -* Array indexed by gsi. Each entry contains list of irq chips -* the gsi is connected to. -*/ - struct hlist_head map[0]; -}; - int kvm_irq_map_gsi(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *entries, int gsi) { -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 03/17] KVM: Define a new interface kvm_intr_is_single_vcpu()
This patch defines a new interface kvm_intr_is_single_vcpu(), which can returns whether the interrupt is for single-CPU or not. It is used by VT-d PI, since now we only support single-CPU interrupts, For lowest-priority interrupts, if user configures it via /proc/irq or uses irqbalance to make it single-CPU, we can use PI to deliver the interrupts to it. Full functionality of lowest-priority support will be added later. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/irq_comm.c | 24 2 files changed, 27 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 49ec903..af11bca 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1204,4 +1204,7 @@ int __x86_set_memory_region(struct kvm *kvm, int x86_set_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem); +bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, +struct kvm_vcpu **dest_vcpu); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c index 9efff9e..a9572a13 100644 --- a/arch/x86/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -297,6 +297,30 @@ out: return r; } +bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, +struct kvm_vcpu **dest_vcpu) +{ + int i, r = 0; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!kvm_apic_present(vcpu)) + continue; + + if (!kvm_apic_match_dest(vcpu, NULL, irq-shorthand, + irq-dest_id, irq-dest_mode)) + continue; + + r++; + *dest_vcpu = vcpu; + } + + if (r == 1) + return true; + else + return false; +} + #define IOAPIC_ROUTING_ENTRY(irq) \ { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \ .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On 08/25/2015 04:20 PM, Cornelia Huck wrote: On Tue, 25 Aug 2015 15:47:14 +0800 Jason Wang jasow...@redhat.com wrote: diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..95f2901 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args) ioeventfd_bus_from_args()? But _from_flags() is not wrong either :) { -if (flags KVM_IOEVENTFD_FLAG_PIO) +if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; -if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) +if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; -return KVM_MMIO_BUS; +if (args-len) +return KVM_MMIO_BUS; +return KVM_FAST_MMIO_BUS; Hm... /* When length is ignored, MMIO is put on a separate bus, for * faster lookups. */ return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS; } static int This version of the patch looks nice and compact. Regardless whether you want to follow my (minor) style suggestions, consider this patch Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Thanks for the review. V3 posted :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V2: - Tweak styles and comment suggested by Cornelia. Changes from v1: - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when needed to save lots of unnecessary changes. --- virt/kvm/eventfd.c | 31 +-- virt/kvm/kvm_main.c | 16 ++-- 2 files changed, 23 insertions(+), 24 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..c3ffdc3 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args) { - if (flags KVM_IOEVENTFD_FLAG_PIO) + if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; - if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) + if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; - return KVM_MMIO_BUS; + /* When length is ignored, MMIO is put on a separate bus, for +* faster lookups. +*/ + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS; } static int @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) struct eventfd_ctx *eventfd; int ret; - bus_idx = ioeventfd_bus_from_flags(args-flags); + bus_idx = ioeventfd_bus_from_args(args); /* must be natural-word sized, or 0 to ignore length */ switch (args-len) { case 0: @@ -843,16 +846,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) if (ret 0) goto unlock_fail; - /* When length
[PATCH V3 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister
All fields of kvm_io_range were initialized or copied explicitly afterwards. So switch to use kmalloc(). Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- virt/kvm/kvm_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8b8a444..0d79fe8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, if (bus-dev_count - bus-ioeventfd_count NR_IOBUS_DEVS - 1) return -ENOSPC; - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count + 1) * + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count + 1) * sizeof(struct kvm_io_range)), GFP_KERNEL); if (!new_bus) return -ENOMEM; @@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, if (r) return r; - new_bus = kzalloc(sizeof(*bus) + ((bus-dev_count - 1) * + new_bus = kmalloc(sizeof(*bus) + ((bus-dev_count - 1) * sizeof(struct kvm_io_range)), GFP_KERNEL); if (!new_bus) return -ENOMEM; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 09/17] vfio: Register/unregister irq_bypass_producer
This patch adds the registration/unregistration of an irq_bypass_producer for MSI/MSIx on vfio pci devices. v6: - Make the add_consumer and del_consumer callbacks static - Remove pointless INIT_LIST_HEAD to 'vdev-ctx[vector].producer.node)' - Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails - Remove optional dummy callbacks for irq producer Signed-off-by: Feng Wu feng...@intel.com --- drivers/vfio/pci/vfio_pci_intrs.c | 9 + drivers/vfio/pci/vfio_pci_private.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 1f577b4..c65299d 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -319,6 +319,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, if (vdev-ctx[vector].trigger) { free_irq(irq, vdev-ctx[vector].trigger); + irq_bypass_unregister_producer(vdev-ctx[vector].producer); kfree(vdev-ctx[vector].name); eventfd_ctx_put(vdev-ctx[vector].trigger); vdev-ctx[vector].trigger = NULL; @@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, return ret; } + vdev-ctx[vector].producer.token = trigger; + vdev-ctx[vector].producer.irq = irq; + ret = irq_bypass_register_producer(vdev-ctx[vector].producer); + if (unlikely(ret)) + dev_info(pdev-dev, + irq bypass producer (token %p) registeration fails: %d\n, + vdev-ctx[vector].producer.token, ret); + vdev-ctx[vector].trigger = trigger; return 0; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index ae0e1b4..0e7394f 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -13,6 +13,7 @@ #include linux/mutex.h #include linux/pci.h +#include linux/irqbypass.h #ifndef VFIO_PCI_PRIVATE_H #define VFIO_PCI_PRIVATE_H @@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx { struct virqfd *mask; char*name; boolmasked; + struct irq_bypass_producer producer; }; struct vfio_pci_device { -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 08/15] KVM: arm64: introduce ITS emulation file with stub functions
Salut Eric, diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index 5269ad1..f5865e7 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -48,6 +48,7 @@ #include asm/kvm_mmu.h #include vgic.h +#include its-emul.h static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) @@ -530,9 +531,20 @@ static bool handle_mmio_ctlr_redist(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { -/* since we don't support LPIs, this register is zero for now */ -vgic_reg_access(mmio, NULL, offset, -ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); +struct vgic_dist *dist = vcpu-kvm-arch.vgic; +u32 reg; + +if (!vgic_has_its(vcpu-kvm)) { +vgic_reg_access(mmio, NULL, offset, +ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); +return false; +} can't we remove above block and ... +reg = dist-lpis_enabled ? GICR_CTLR_ENABLE_LPIS : 0; +vgic_reg_access(mmio, reg, offset, +ACCESS_READ_VALUE | ACCESS_WRITE_VALUE); +if (!dist-lpis_enabled (reg GICR_CTLR_ENABLE_LPIS add vgic_has_its(vcpu-kvm) above? Yeah, makes some sense. Changed that. Besides Reviewed-by: Eric Auger eric.au...@linaro.org Merci! André Eric )) { +/* Eventually do something */ +} return false; } @@ -861,6 +873,12 @@ static int vgic_v3_map_resources(struct kvm *kvm, rdbase += GIC_V3_REDIST_SIZE; } +if (vgic_has_its(kvm)) { +ret = vits_init(kvm); +if (ret) +goto out_unregister; +} + dist-redist_iodevs = iodevs; dist-ready = true; goto out; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 3/3] kvm: add tracepoint for fast mmio
Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- arch/x86/kvm/trace.h | 17 + arch/x86/kvm/vmx.c | 1 + arch/x86/kvm/x86.c | 1 + 3 files changed, 19 insertions(+) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 4eae7c3..2d4e81a 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio, __entry-count 1 ? (...) : ) ); +TRACE_EVENT(kvm_fast_mmio, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64,gpa) + ), + + TP_fast_assign( + __entry-gpa= gpa; + ), + + TP_printk(fast mmio at gpa 0x%llx, __entry-gpa) +); + + + /* * Tracepoint for cpuid. */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 83b7b5c..a55d279 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { skip_emulated_instruction(vcpu); + trace_kvm_fast_mmio(gpa); return 1; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f0f6ec..36cf78e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 13/17] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
This patch adds an arch specific hooks 'arch_update' in 'struct kvm_kernel_irqfd'. On Intel side, it is used to update the IRTE when VT-d posted-interrupts is used. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/x86.c | 5 + include/linux/kvm_host.h| 11 +++ include/linux/kvm_irqfd.h | 2 ++ virt/kvm/eventfd.c | 12 +++- 5 files changed, 31 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3038c1b..22269b4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -176,6 +176,8 @@ enum { */ #define KVM_APIC_PV_EOI_PENDING1 +#define __KVM_HAVE_ARCH_IRQFD_INIT 1 + struct kvm_kernel_irq_routing_entry; /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be4b561..ef93fdc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8355,6 +8355,11 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, fails: %d\n, irqfd-consumer.token, ret); } +void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd) +{ + irqfd-arch_update = kvm_arch_update_pi_irte; +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5f183fb..f4005dc 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -34,6 +34,8 @@ #include asm/kvm_host.h +struct kvm_kernel_irqfd; + /* * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used * in kvm, other bits are visible for userspace which are defined in @@ -1145,6 +1147,15 @@ extern struct kvm_device_ops kvm_xics_ops; extern struct kvm_device_ops kvm_arm_vgic_v2_ops; extern struct kvm_device_ops kvm_arm_vgic_v3_ops; +#ifdef __KVM_HAVE_ARCH_IRQFD_INIT +void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd); +#else +static inline void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd) +{ + irqfd-arch_update = NULL; +} +#endif + #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h index 0c1de05..b7aab52 100644 --- a/include/linux/kvm_irqfd.h +++ b/include/linux/kvm_irqfd.h @@ -66,6 +66,8 @@ struct kvm_kernel_irqfd { struct work_struct shutdown; struct irq_bypass_consumer consumer; struct irq_bypass_producer *producer; + int (*arch_update)(struct kvm *kvm, unsigned int host_irq, + uint32_t guest_irq, bool set); }; #endif /* __LINUX_KVM_IRQFD_H */ diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f3050b9..b2d9066 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -288,6 +288,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) INIT_LIST_HEAD(irqfd-list); INIT_WORK(irqfd-inject, irqfd_inject); INIT_WORK(irqfd-shutdown, irqfd_shutdown); + kvm_arch_irqfd_init(irqfd); seqcount_init(irqfd-irq_entry_sc); f = fdget(args-fd); @@ -580,13 +581,22 @@ kvm_irqfd_release(struct kvm *kvm) */ void kvm_irq_routing_update(struct kvm *kvm) { + int ret; struct kvm_kernel_irqfd *irqfd; spin_lock_irq(kvm-irqfds.lock); - list_for_each_entry(irqfd, kvm-irqfds.items, list) + list_for_each_entry(irqfd, kvm-irqfds.items, list) { irqfd_update(kvm, irqfd); + if (irqfd-arch_update irqfd-producer) { + ret = irqfd-arch_update( + irqfd-kvm, irqfd-producer-irq, + irqfd-gsi, 1); + WARN_ON(ret); + } + } + spin_unlock_irq(kvm-irqfds.lock); } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 11/17] KVM: Define two weak arch callbacks for irq bypass manager
Define two weak arch callbacks so that archs that don't need them don't need define them. Signed-off-by: Feng Wu feng...@intel.com --- virt/kvm/eventfd.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index d7a230f..f3050b9 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -256,6 +256,16 @@ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd) write_seqcount_end(irqfd-irq_entry_sc); } +void __attribute__((weak)) kvm_arch_irq_bypass_stop( + struct irq_bypass_consumer *cons) +{ +} + +void __attribute__((weak)) kvm_arch_irq_bypass_start( + struct irq_bypass_consumer *cons) +{ +} + static int kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) { -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts
This patch adds the routine to update IRTE for posted-interrupts when guest changes the interrupt configuration. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/x86.c | 73 ++ 1 file changed, 73 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5ef2560..8f09a76 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -63,6 +63,7 @@ #include asm/fpu/internal.h /* Ugh! */ #include asm/pvclock.h #include asm/div64.h +#include asm/irq_remapping.h #define MAX_IO_MSRS 256 #define KVM_MAX_MCE_BANKS 32 @@ -8248,6 +8249,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) } EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); +/* + * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts + * + * @kvm: kvm + * @host_irq: host irq of the interrupt + * @guest_irq: gsi of the interrupt + * @set: set or unset PI + * returns 0 on success, 0 on failure + */ +int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq, + uint32_t guest_irq, bool set) +{ + struct kvm_kernel_irq_routing_entry *e; + struct kvm_irq_routing_table *irq_rt; + struct kvm_lapic_irq irq; + struct kvm_vcpu *vcpu; + struct vcpu_data vcpu_info; + int idx, ret = -EINVAL; + + if (!irq_remapping_cap(IRQ_POSTING_CAP)) + return 0; + + idx = srcu_read_lock(kvm-irq_srcu); + irq_rt = srcu_dereference(kvm-irq_routing, kvm-irq_srcu); + BUG_ON(guest_irq = irq_rt-nr_rt_entries); + + hlist_for_each_entry(e, irq_rt-map[guest_irq], link) { + if (e-type != KVM_IRQ_ROUTING_MSI) + continue; + /* +* VT-d PI cannot support posting multicast/broadcast +* interrupts to a VCPU, we still use interrupt remapping +* for these kind of interrupts. +* +* For lowest-priority interrupts, we only support +* those with single CPU as the destination, e.g. user +* configures the interrupts via /proc/irq or uses +* irqbalance to make the interrupts single-CPU. +* +* We will support full lowest-priority interrupt later. +* +*/ + + kvm_set_msi_irq(e, irq); + if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu)) + continue; + + vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu); + vcpu_info.vector = irq.vector; + + if (set) + ret = irq_set_vcpu_affinity(host_irq, vcpu_info); + else { + /* suppress notification event before unposting */ + kvm_x86_ops-pi_set_sn(vcpu); + ret = irq_set_vcpu_affinity(host_irq, NULL); + kvm_x86_ops-pi_clear_sn(vcpu); + } + + if (ret 0) { + printk(KERN_INFO %s: failed to update PI IRTE\n, + __func__); + goto out; + } + } + + ret = 0; +out: + srcu_read_unlock(kvm-irq_srcu, idx); + return ret; +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 08/17] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices
Enable irq bypass manager for vfio PCI devices. Signed-off-by: Feng Wu feng...@intel.com --- drivers/vfio/pci/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 579d83b..02912f1 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -2,6 +2,7 @@ config VFIO_PCI tristate VFIO support for PCI devices depends on VFIO PCI EVENTFD select VFIO_VIRQFD + select IRQ_BYPASS_MANAGER help Support for the PCI VFIO bus driver. This is required to make use of PCI drivers using the VFIO framework. -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 12/17] KVM: Implement IRQ bypass consumer callbacks for x86
Implement the following callbacks for x86: - kvm_arch_irq_bypass_add_producer - kvm_arch_irq_bypass_del_producer - kvm_arch_irq_bypass_stop: dummy callback - kvm_arch_irq_bypass_resume: dummy callback and set CONFIG_HAVE_KVM_IRQ_BYPASS for x86. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig| 1 + arch/x86/kvm/x86.c | 34 ++ 3 files changed, 36 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 82d0709..3038c1b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -24,6 +24,7 @@ #include linux/perf_event.h #include linux/pvclock_gtod.h #include linux/clocksource.h +#include linux/irqbypass.h #include asm/pvclock-abi.h #include asm/desc.h diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index c951d44..b90776f 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -30,6 +30,7 @@ config KVM select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQFD select IRQ_BYPASS_MANAGER + select HAVE_KVM_IRQ_BYPASS select HAVE_KVM_IRQ_ROUTING select HAVE_KVM_EVENTFD select KVM_APIC_ARCHITECTURE diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f09a76..be4b561 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -50,6 +50,8 @@ #include linux/pci.h #include linux/timekeeper_internal.h #include linux/pvclock_gtod.h +#include linux/kvm_irqfd.h +#include linux/irqbypass.h #include trace/events/kvm.h #define CREATE_TRACE_POINTS @@ -8321,6 +8323,38 @@ out: return ret; } +int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + irqfd-producer = prod; + + return kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 1); +} + +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + int ret; + struct kvm_kernel_irqfd *irqfd = + container_of(cons, struct kvm_kernel_irqfd, consumer); + + irqfd-producer = NULL; + + /* +* When producer of consumer is unregistered, we change back to +* remapped mode, so we can re-use the current implementation +* when the irq is masked/disabed or the consumer side (KVM +* int this case doesn't want to receive the interrupts. + */ + ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 0); + if (ret) + printk(KERN_INFO irq bypass consumer (token %p) unregistration + fails: %d\n, irqfd-consumer.token, ret); +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 14/17] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
This patch updates the Posted-Interrupts Descriptor when vCPU is preempted. sched out: - Set 'SN' to suppress furture non-urgent interrupts posted for the vCPU. sched in: - Clear 'SN' - Change NDST if vCPU is scheduled to a different CPU - Set 'NV' to POSTED_INTR_VECTOR Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/vmx.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 234f720..9c87064 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -45,6 +45,7 @@ #include asm/debugreg.h #include asm/kexec.h #include asm/apic.h +#include asm/irq_remapping.h #include trace.h #include pmu.h @@ -2001,10 +2002,60 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */ vmx-loaded_vmcs-cpu = cpu; } + + if (irq_remapping_cap(IRQ_POSTING_CAP)) { + struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); + struct pi_desc old, new; + unsigned int dest; + + do { + old.control = new.control = pi_desc-control; + + /* +* If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there +* are two possible cases: +* 1. After running 'pi_pre_block', context switch +*happened. For this case, 'sn' was set in +*vmx_vcpu_put(), so we need to clear it here. +* 2. After running 'pi_pre_block', we were blocked, +*and woken up by some other guy. For this case, +*we don't need to do anything, 'pi_post_block' +*will do everything for us. However, we cannot +*check whether it is case #1 or case #2 here +*(maybe, not needed), so we also clear sn here, +*I think it is not a big deal. +*/ + if (pi_desc-nv != POSTED_INTR_WAKEUP_VECTOR) { + if (vcpu-cpu != cpu) { + dest = cpu_physical_id(cpu); + + if (x2apic_enabled()) + new.ndst = dest; + else + new.ndst = (dest 8) 0xFF00; + } + + /* set 'NV' to 'notification vector' */ + new.nv = POSTED_INTR_VECTOR; + } + + /* Allow posting non-urgent interrupts */ + new.sn = 0; + } while (cmpxchg(pi_desc-control, old.control, + new.control) != old.control); + } } static void vmx_vcpu_put(struct kvm_vcpu *vcpu) { + if (irq_remapping_cap(IRQ_POSTING_CAP)) { + struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); + + /* Set SN when the vCPU is preempted */ + if (vcpu-preempted) + pi_set_sn(pi_desc); + } + __vmx_load_host_state(to_vmx(vcpu)); if (!vmm_exclusive) { __loaded_vmcs_clear(to_vmx(vcpu)-loaded_vmcs); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 01/17] KVM: Extend struct pi_desc for VT-d Posted-Interrupts
Extend struct pi_desc for VT-d Posted-Interrupts. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/vmx.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 83b7b5c..271dd70 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -446,8 +446,24 @@ struct nested_vmx { /* Posted-Interrupt Descriptor */ struct pi_desc { u32 pir[8]; /* Posted interrupt requested */ - u32 control;/* bit 0 of control is outstanding notification bit */ - u32 rsvd[7]; + union { + struct { + /* bit 256 - Outstanding Notification */ + u16 on : 1, + /* bit 257 - Suppress Notification */ + sn : 1, + /* bit 271:258 - Reserved */ + rsvd_1 : 14; + /* bit 279:272 - Notification Vector */ + u8 nv; + /* bit 287:280 - Reserved */ + u8 rsvd_2; + /* bit 319:288 - Notification Destination */ + u32 ndst; + }; + u64 control; + }; + u32 rsvd[6]; } __aligned(64); static bool pi_test_and_set_on(struct pi_desc *pi_desc) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 17/17] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
Enable VT-d Posted-Interrtups and add a command line parameter for it. Signed-off-by: Feng Wu feng...@intel.com --- Documentation/kernel-parameters.txt | 1 + drivers/iommu/irq_remapping.c | 12 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 1d6f045..52aca36 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1547,6 +1547,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nosid disable Source ID checking no_x2apic_optout BIOS x2APIC opt-out request will be ignored + nopost disable Interrupt Posting iomem= Disable strict checking of access to MMIO memory strict regions from userspace. diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index 2d99930..d8c3997 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -22,7 +22,7 @@ int irq_remap_broken; int disable_sourceid_checking; int no_x2apic_optout; -int disable_irq_post = 1; +int disable_irq_post = 0; static int disable_irq_remap; static struct irq_remap_ops *remap_ops; @@ -58,14 +58,18 @@ static __init int setup_irqremap(char *str) return -EINVAL; while (*str) { - if (!strncmp(str, on, 2)) + if (!strncmp(str, on, 2)) { disable_irq_remap = 0; - else if (!strncmp(str, off, 3)) + disable_irq_post = 0; + } else if (!strncmp(str, off, 3)) { disable_irq_remap = 1; - else if (!strncmp(str, nosid, 5)) + disable_irq_post = 1; + } else if (!strncmp(str, nosid, 5)) disable_sourceid_checking = 1; else if (!strncmp(str, no_x2apic_optout, 16)) no_x2apic_optout = 1; + else if (!strncmp(str, nopost, 6)) + disable_irq_post = 1; str += strcspn(str, ,); while (*str == ',') -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 15/17] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
This patch updates the Posted-Interrupts Descriptor when vCPU is blocked. pre-block: - Add the vCPU to the blocked per-CPU list - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR post-block: - Remove the vCPU from the per-CPU list Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 5 ++ arch/x86/kvm/vmx.c | 151 arch/x86/kvm/x86.c | 55 --- include/linux/kvm_host.h| 3 + virt/kvm/kvm_main.c | 3 + 5 files changed, 207 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 22269b4..32af275 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -554,6 +554,8 @@ struct kvm_vcpu_arch { */ bool write_fault_to_shadow_pgtable; + bool halted; + /* set at EPT violation at this point */ unsigned long exit_qualification; @@ -868,6 +870,9 @@ struct kvm_x86_ops { void (*pi_clear_sn)(struct kvm_vcpu *vcpu); void (*pi_set_sn)(struct kvm_vcpu *vcpu); + + int (*pi_pre_block)(struct kvm_vcpu *vcpu); + void (*pi_post_block)(struct kvm_vcpu *vcpu); /* pmu operations of sub-arch */ const struct kvm_pmu_ops *pmu_ops; }; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9c87064..64e35ea 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -888,6 +888,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs); static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu); static DEFINE_PER_CPU(struct desc_ptr, host_gdt); +/* + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we + * can find which vCPU should be waken up. + */ +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu); +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock); + static unsigned long *vmx_io_bitmap_a; static unsigned long *vmx_io_bitmap_b; static unsigned long *vmx_msr_bitmap_legacy; @@ -2981,6 +2988,8 @@ static int hardware_enable(void) return -EBUSY; INIT_LIST_HEAD(per_cpu(loaded_vmcss_on_cpu, cpu)); + INIT_LIST_HEAD(per_cpu(blocked_vcpu_on_cpu, cpu)); + spin_lock_init(per_cpu(blocked_vcpu_on_cpu_lock, cpu)); /* * Now we can enable the vmclear operation in kdump @@ -6106,6 +6115,25 @@ static void update_ple_window_actual_max(void) ple_window_grow, INT_MIN); } +/* + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR. + */ +static void wakeup_handler(void) +{ + struct kvm_vcpu *vcpu; + int cpu = smp_processor_id(); + + spin_lock(per_cpu(blocked_vcpu_on_cpu_lock, cpu)); + list_for_each_entry(vcpu, per_cpu(blocked_vcpu_on_cpu, cpu), + blocked_vcpu_list) { + struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); + + if (pi_test_on(pi_desc) == 1) + kvm_vcpu_kick(vcpu); + } + spin_unlock(per_cpu(blocked_vcpu_on_cpu_lock, cpu)); +} + static __init int hardware_setup(void) { int r = -ENOMEM, i, msr; @@ -6290,6 +6318,8 @@ static __init int hardware_setup(void) kvm_x86_ops-enable_log_dirty_pt_masked = NULL; } + kvm_set_posted_intr_wakeup_handler(wakeup_handler); + return alloc_kvm_area(); out8: @@ -10414,6 +10444,124 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm, kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask); } +/* + * This routine does the following things for vCPU which is going + * to be blocked if VT-d PI is enabled. + * - Store the vCPU to the wakeup list, so when interrupts happen + * we can find the right vCPU to wake up. + * - Change the Posted-interrupt descriptor as below: + * 'NDST' -- vcpu-pre_pcpu + * 'NV' -- POSTED_INTR_WAKEUP_VECTOR + * - If 'ON' is set during this process, which means at least one + * interrupt is posted for this vCPU, we cannot block it, in + * this case, return 1, otherwise, return 0. + * + */ +static int vmx_pi_pre_block(struct kvm_vcpu *vcpu) +{ + unsigned long flags; + unsigned int dest; + struct pi_desc old, new; + struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); + + if (!irq_remapping_cap(IRQ_POSTING_CAP)) + return 0; + + vcpu-pre_pcpu = vcpu-cpu; + spin_lock_irqsave(per_cpu(blocked_vcpu_on_cpu_lock, + vcpu-pre_pcpu), flags); + list_add_tail(vcpu-blocked_vcpu_list, + per_cpu(blocked_vcpu_on_cpu, + vcpu-pre_pcpu)); + spin_unlock_irqrestore(per_cpu(blocked_vcpu_on_cpu_lock, + vcpu-pre_pcpu), flags); + + do { + old.control = new.control = pi_desc-control; + + /* +* We should not block the vCPU if +* an interrupt is posted for it. +
[PATCH v7 04/17] KVM: Get Posted-Interrupts descriptor address from 'struct kvm_vcpu'
Define an interface to get PI descriptor address from the vCPU structure. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/vmx.c | 11 +++ 2 files changed, 13 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index af11bca..d50c1d3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -858,6 +858,8 @@ struct kvm_x86_ops { void (*enable_log_dirty_pt_masked)(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t offset, unsigned long mask); + + u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu); /* pmu operations of sub-arch */ const struct kvm_pmu_ops *pmu_ops; }; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 316f9bf..81a995c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -610,6 +610,10 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ [number##_HIGH] = VMCS12_OFFSET(name)+4 +struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) +{ + return (to_vmx(vcpu)-pi_desc); +} static unsigned long shadow_read_only_fields[] = { /* @@ -4487,6 +4491,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu) return; } +static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu) +{ + return __pa((u64)vcpu_to_pi_desc(vcpu)); +} + /* * Set up the vmcs's constant host-state fields, i.e., host-state fields that * will not change in the lifetime of the guest. @@ -10460,6 +10469,8 @@ static struct kvm_x86_ops vmx_x86_ops = { .flush_log_dirty = vmx_flush_log_dirty, .enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked, + .get_pi_desc_addr = vmx_get_pi_desc_addr, + .pmu_ops = intel_pmu_ops, }; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7 07/17] KVM: make kvm_set_msi_irq() public
Make kvm_set_msi_irq() public, we can use this function outside. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/include/asm/kvm_host.h | 4 arch/x86/kvm/irq_comm.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c4f99f1..82d0709 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -175,6 +175,8 @@ enum { */ #define KVM_APIC_PV_EOI_PENDING1 +struct kvm_kernel_irq_routing_entry; + /* * We don't want allocation failures within the mmu code, so we preallocate * enough memory for a single page fault in a cache. @@ -1212,4 +1214,6 @@ int x86_set_memory_region(struct kvm *kvm, bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq, struct kvm_vcpu **dest_vcpu); +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e, +struct kvm_lapic_irq *irq); #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c index a9572a13..1319c60 100644 --- a/arch/x86/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -91,8 +91,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, return r; } -static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e, - struct kvm_lapic_irq *irq) +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e, +struct kvm_lapic_irq *irq) { trace_kvm_msi_set_irq(e-msi.address_lo, e-msi.data); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 09/15] KVM: arm64: implement basic ITS register handlers
Hi Eric, diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index 659dd39..b498f06 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -32,10 +32,62 @@ #include vgic.h #include its-emul.h +#define BASER_BASE_ADDRESS(x) ((x) 0xf000ULL) + +/* The distributor lock is held by the VGIC MMIO handler. */ static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + struct vgic_its *its = vcpu-kvm-arch.vgic.its; + u32 reg; + bool was_enabled; + + switch (offset ~3) { + case 0x00: /* GITS_CTLR */ + /* We never defer any command execution. */ + reg = GITS_CTLR_QUIESCENT; + if (its-enabled) + reg |= GITS_CTLR_ENABLE; + was_enabled = its-enabled; + vgic_reg_access(mmio, reg, offset 3, + ACCESS_READ_VALUE | ACCESS_WRITE_VALUE); + its-enabled = !!(reg GITS_CTLR_ENABLE); + return !was_enabled its-enabled; + case 0x04: /* GITS_IIDR */ + reg = (PRODUCT_ID_KVM 24) | (IMPLEMENTER_ARM 0); + vgic_reg_access(mmio, reg, offset 3, + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); + break; + case 0x08: /* GITS_TYPER */ + /* + * We use linear CPU numbers for redistributor addressing, + * so GITS_TYPER.PTA is 0. + * To avoid memory waste on the guest side, we keep the + * number of IDBits and DevBits low for the time being. + * This could later be made configurable by userland. + * Since we have all collections in linked list, we claim + * that we can hold all of the collection tables in our + * own memory and that the ITT entry size is 1 byte (the + * smallest possible one). + */ + reg = GITS_TYPER_PLPIS; + reg |= 0xff GITS_TYPER_HWCOLLCNT_SHIFT; + reg |= 0x0f GITS_TYPER_DEVBITS_SHIFT; + reg |= 0x0f GITS_TYPER_IDBITS_SHIFT; + vgic_reg_access(mmio, reg, offset 3, + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); + break; + case 0x0c: + /* The upper 32bits of TYPER are all 0 for the time being. + * Should we need more than 256 collections, we can enable + * some bits in here. + */ + vgic_reg_access(mmio, NULL, offset 3, + ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); + break; + } + return false; } @@ -43,20 +95,142 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + u32 reg = 0; + int idreg = (offset ~3) + GITS_IDREGS_BASE; + + switch (idreg) { + case GITS_PIDR2: + reg = GIC_PIDR2_ARCH_GICv3; + break; + case GITS_PIDR4: + /* This is a 64K software visible page */ + reg = 0x40; + break; + /* Those are the ID registers for (any) GIC. */ + case GITS_CIDR0: + reg = 0x0d; + break; + case GITS_CIDR1: + reg = 0xf0; + break; + case GITS_CIDR2: + reg = 0x05; + break; + case GITS_CIDR3: + reg = 0xb1; + break; + } + vgic_reg_access(mmio, reg, offset 3, + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); return false; } +static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd) +{ + return -ENODEV; +} + static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + struct vgic_its *its = vcpu-kvm-arch.vgic.its; + int mode = ACCESS_READ_VALUE; + + mode |= its-enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE; + + vgic_handle_base_register(vcpu, mmio, offset, its-cbaser, mode); + + /* Writing CBASER resets the read pointer. */ + if (mmio-is_write) + its-creadr = 0; + return false; } +static int its_cmd_buffer_size(struct kvm *kvm) +{ + struct vgic_its *its = kvm-arch.vgic.its; + + return ((its-cbaser 0xff) + 1) 12; +} + +static gpa_t its_cmd_buffer_base(struct kvm *kvm) +{ + struct vgic_its *its = kvm-arch.vgic.its; + + return BASER_BASE_ADDRESS(its-cbaser); +} + +/* + * By writing to CWRITER the guest announces new commands to be processed. + * Since we cannot read from guest memory inside the ITS
Re: KVM slow LAMP guest
On 8/25/15 11:42 PM, Hansa wrote: On 24-8-2015 1:26, Wanpeng Li wrote: On 8/24/15 3:18 AM, Hansa wrote: On 16-7-2015 13:27, Paolo Bonzini wrote: On 15/07/2015 22:02, C. Bröcker wrote: What OS is this? Is it RHEL/CentOS? If so, halt_poll_ns will be in 6.7 which will be out in a few days/weeks. Paolo OK. As said CentOS 6.6. But where do I put this parameter? You can add kvm.halt_poll_ns=50 to the kernel command line. If you have the parameter, you have the /sys/module/kvm/parameters/halt_poll_ns file. Hi, I upgraded to the CentOS 6.7 release which came out last month and as promised the halt_poll_ns parameter was available. Last week I tested the availability status every 5 minutes on my Wordpress VM's with the halt_poll_ns kernel param set on DOM0. I'm pleased to announce that it solves the problem! How much seconds to load your Wordpress site this time? Regards, Wanpeng Li The average is around 0.4 seconds to load my heaviest site on my slowest machine. Nice! On the VM server I issued the command below every eleven minutes: date curltest-file; _ top -b -n 1 | sed -n '7,12p' curltest-file; _ curl -o /dev/null -s -wtime_total: %{time_total}\\n https://my.domain.com | perl -pe 'BEGIN {use POSIX;} print strftime(%Y-%m-%d %H:%M:%S , localtime)' curltest-file This gives me the total time for displaying my site on a local machine. It also includes a 'top' command to display which processes are running at each sample. All is saved in a file called curltest-file. I found 7 occurrences in my curltest-file of a time_total larger than 20 seconds. Top however didn't show any significant CPU or IO activity at those sampled times. Further investigations shows me that they are related to a known (gravatar) issue in the Wordpress Jetpack plugin. I didn't include these samples in the average total. If you just use halt_poll_ns or both halt_poll_ns and idle=poll in guest? Regards, Wanpeng Li Cheers and good luck tweaking your sites! Best, Hansa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM slow LAMP guest
On 26-8-2015 0:33, Wanpeng Li wrote: On the VM server I issued the command below every eleven minutes: date curltest-file; _ top -b -n 1 | sed -n '7,12p' curltest-file; _ curl -o /dev/null -s -wtime_total: %{time_total}\\n https://my.domain.com | perl -pe 'BEGIN {use POSIX;} print strftime(%Y-%m-%d %H:%M:%S , localtime)' curltest-file This gives me the total time for displaying my site on a local machine. It also includes a 'top' command to display which processes are running at each sample. All is saved in a file called curltest-file. I found 7 occurrences in my curltest-file of a time_total larger than 20 seconds. Top however didn't show any significant CPU or IO activity at those sampled times. Further investigations shows me that they are related to a known (gravatar) issue in the Wordpress Jetpack plugin. I didn't include these samples in the average total. If you just use halt_poll_ns or both halt_poll_ns and idle=poll in guest? I just use kvm.halt_poll_ns=50 Should I try some different tests? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts
On Tue, 2015-08-25 at 16:50 +0800, Feng Wu wrote: This patch adds the routine to update IRTE for posted-interrupts when guest changes the interrupt configuration. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/x86.c | 73 ++ 1 file changed, 73 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5ef2560..8f09a76 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -63,6 +63,7 @@ #include asm/fpu/internal.h /* Ugh! */ #include asm/pvclock.h #include asm/div64.h +#include asm/irq_remapping.h #define MAX_IO_MSRS 256 #define KVM_MAX_MCE_BANKS 32 @@ -8248,6 +8249,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) } EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); +/* + * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts + * + * @kvm: kvm + * @host_irq: host irq of the interrupt + * @guest_irq: gsi of the interrupt + * @set: set or unset PI + * returns 0 on success, 0 on failure + */ +int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq, + uint32_t guest_irq, bool set) +{ + struct kvm_kernel_irq_routing_entry *e; + struct kvm_irq_routing_table *irq_rt; + struct kvm_lapic_irq irq; + struct kvm_vcpu *vcpu; + struct vcpu_data vcpu_info; + int idx, ret = -EINVAL; + + if (!irq_remapping_cap(IRQ_POSTING_CAP)) + return 0; + + idx = srcu_read_lock(kvm-irq_srcu); + irq_rt = srcu_dereference(kvm-irq_routing, kvm-irq_srcu); + BUG_ON(guest_irq = irq_rt-nr_rt_entries); + + hlist_for_each_entry(e, irq_rt-map[guest_irq], link) { + if (e-type != KVM_IRQ_ROUTING_MSI) + continue; + /* + * VT-d PI cannot support posting multicast/broadcast + * interrupts to a VCPU, we still use interrupt remapping + * for these kind of interrupts. + * + * For lowest-priority interrupts, we only support + * those with single CPU as the destination, e.g. user + * configures the interrupts via /proc/irq or uses + * irqbalance to make the interrupts single-CPU. + * + * We will support full lowest-priority interrupt later. + * + */ + + kvm_set_msi_irq(e, irq); + if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu)) + continue; + + vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu); + vcpu_info.vector = irq.vector; + + if (set) + ret = irq_set_vcpu_affinity(host_irq, vcpu_info); + else { + /* suppress notification event before unposting */ + kvm_x86_ops-pi_set_sn(vcpu); + ret = irq_set_vcpu_affinity(host_irq, NULL); + kvm_x86_ops-pi_clear_sn(vcpu); + } Can we add trace events so that we have a way to tell when PI is being enabled/disabled other than performance heuristics? Thanks, Alex + + if (ret 0) { + printk(KERN_INFO %s: failed to update PI IRTE\n, + __func__); + goto out; + } + } + + ret = 0; +out: + srcu_read_unlock(kvm-irq_srcu, idx); + return ret; +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On Tue, 25 Aug 2015 17:05:47 +0800 Jason Wang jasow...@redhat.com wrote: We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V2: - Tweak styles and comment suggested by Cornelia. Changes from v1: - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when needed to save lots of unnecessary changes. --- virt/kvm/eventfd.c | 31 +-- virt/kvm/kvm_main.c | 16 ++-- 2 files changed, 23 insertions(+), 24 deletions(-) Acked-by: Cornelia Huck cornelia.h...@de.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 3/3] kvm: add tracepoint for fast mmio
On Tue, Aug 25, 2015 at 05:05:48PM +0800, Jason Wang wrote: Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- arch/x86/kvm/trace.h | 17 + arch/x86/kvm/vmx.c | 1 + arch/x86/kvm/x86.c | 1 + 3 files changed, 19 insertions(+) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 4eae7c3..2d4e81a 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio, __entry-count 1 ? (...) : ) ); +TRACE_EVENT(kvm_fast_mmio, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64,gpa) + ), + + TP_fast_assign( + __entry-gpa= gpa; + ), + + TP_printk(fast mmio at gpa 0x%llx, __entry-gpa) +); + + + don't add multiple empty lines pls. /* * Tracepoint for cpuid. */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 83b7b5c..a55d279 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { skip_emulated_instruction(vcpu); + trace_kvm_fast_mmio(gpa); return 1; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f0f6ec..36cf78e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote: We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V2: - Tweak styles and comment suggested by Cornelia. Changes from v1: - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when needed to save lots of unnecessary changes. --- virt/kvm/eventfd.c | 31 +-- virt/kvm/kvm_main.c | 16 ++-- 2 files changed, 23 insertions(+), 24 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..c3ffdc3 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args) { - if (flags KVM_IOEVENTFD_FLAG_PIO) + if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; - if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) + if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; - return KVM_MMIO_BUS; + /* When length is ignored, MMIO is put on a separate bus, for + * faster lookups. + */ + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS; } static int @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) struct eventfd_ctx *eventfd; int ret; - bus_idx = ioeventfd_bus_from_flags(args-flags); + bus_idx = ioeventfd_bus_from_args(args); /* must be natural-word sized, or 0 to ignore length */ switch (args-len) { case 0: @@ -843,16 +846,6 @@
[PATCH v3 1/4] irqchip: GICv3: Convert to EOImode == 1
So far, GICv3 has been used in with EOImode == 0. The effect of this mode is to perform the priority drop and the deactivation of the interrupt at the same time. While this works perfectly for Linux (we only have a single priority), it causes issues when an interrupt is forwarded to a guest, and when we want the guest to perform the EOI itself. For this case, the GIC architecture provides EOImode == 1, where: - A write to ICC_EOIR1_EL1 drops the priority of the interrupt and leaves it active. Other interrupts at the same priority level can now be taken, but the active interrupt cannot be taken again - A write to ICC_DIR_EL1 marks the interrupt as inactive, meaning it can now be taken again. This patch converts the driver to be able to use this new mode, depending on whether or not the kernel can behave as a hypervisor. No feature change. Reviewed-by: Eric Auger eric.au...@linaro.org Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic-v3.c | 39 ++ include/linux/irqchip/arm-gic-v3.h | 9 + 2 files changed, 44 insertions(+), 4 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index c52f7ba..addd2ee 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -30,6 +30,7 @@ #include asm/cputype.h #include asm/exception.h #include asm/smp_plat.h +#include asm/virt.h #include irq-gic-common.h #include irqchip.h @@ -50,6 +51,7 @@ struct gic_chip_data { }; static struct gic_chip_data gic_data __read_mostly; +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE; #define gic_data_rdist() (this_cpu_ptr(gic_data.rdists.rdist)) #define gic_data_rdist_rd_base() (gic_data_rdist()-rd_base) @@ -293,7 +295,14 @@ static int gic_irq_get_irqchip_state(struct irq_data *d, static void gic_eoi_irq(struct irq_data *d) { - gic_write_eoir(gic_irq(d)); + if (static_key_true(supports_deactivate)) { + /* No need to deactivate an LPI */ + if (gic_irq(d) = 8192) + return; + gic_write_dir(gic_irq(d)); + } else { + gic_write_eoir(gic_irq(d)); + } } static int gic_set_type(struct irq_data *d, unsigned int type) @@ -343,15 +352,26 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs if (likely(irqnr 15 irqnr 1020) || irqnr = 8192) { int err; + + if (static_key_true(supports_deactivate)) + gic_write_eoir(irqnr); + err = handle_domain_irq(gic_data.domain, irqnr, regs); if (err) { WARN_ONCE(true, Unexpected interrupt received!\n); - gic_write_eoir(irqnr); + if (static_key_true(supports_deactivate)) { + if (irqnr 8192) + gic_write_dir(irqnr); + } else { + gic_write_eoir(irqnr); + } } continue; } if (irqnr 16) { gic_write_eoir(irqnr); + if (static_key_true(supports_deactivate)) + gic_write_dir(irqnr); #ifdef CONFIG_SMP handle_IPI(irqnr, regs); #else @@ -451,8 +471,13 @@ static void gic_cpu_sys_reg_init(void) /* Set priority mask register */ gic_write_pmr(DEFAULT_PMR_VALUE); - /* EOI deactivates interrupt too (mode 0) */ - gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir); + if (static_key_true(supports_deactivate)) { + /* EOI drops priority only (mode 1) */ + gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop); + } else { + /* EOI deactivates interrupt too (mode 0) */ + gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir); + } /* ... and let's hit the road... */ gic_write_grpen1(1); @@ -820,6 +845,12 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare if (of_property_read_u64(node, redistributor-stride, redist_stride)) redist_stride = 0; + if (!is_hyp_mode_available()) + static_key_slow_dec(supports_deactivate); + + if (static_key_true(supports_deactivate)) + pr_info(GIC: Using split EOI/Deactivate mode\n); + gic_data.dist_base = dist_base; gic_data.redist_regions = rdist_regs; gic_data.nr_redist_regions = nr_redist_regions; diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index ffbc034..bc98832 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++
[PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1
So far, GICv2 has been used with EOImode == 0. The effect of this mode is to perform the priority drop and the deactivation of the interrupt at the same time. While this works perfectly for Linux (we only have a single priority), it causes issues when an interrupt is forwarded to a guest, and when we want the guest to perform the EOI itself. For this case, the GIC architecture provides EOImode == 1, where: - A write to the EOI register drops the priority of the interrupt and leaves it active. Other interrupts at the same priority level can now be taken, but the active interrupt cannot be taken again - A write to the DIR marks the interrupt as inactive, meaning it can now be taken again. We only enable this feature when booted in HYP mode and that the device-tree reported a suitable CPU interface. Observable behaviour should remain unchanged. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic.c | 51 +++-- include/linux/irqchip/arm-gic.h | 4 2 files changed, 53 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 4dd8826..505aaf3 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -46,6 +46,7 @@ #include asm/irq.h #include asm/exception.h #include asm/smp_plat.h +#include asm/virt.h #include irq-gic-common.h #include irqchip.h @@ -82,6 +83,8 @@ static DEFINE_RAW_SPINLOCK(irq_controller_lock); #define NR_GIC_CPU_IF 8 static u8 gic_cpu_map[NR_GIC_CPU_IF] __read_mostly; +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE; + #ifndef MAX_GIC_NR #define MAX_GIC_NR 1 #endif @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d) return d-hwirq; } +static inline bool primary_gic_irq(struct irq_data *d) +{ + if (MAX_GIC_NR 1) + return irq_data_get_irq_chip_data(d) == gic_data[0]; + + return true; +} + /* * Routines to acknowledge, disable and enable interrupts */ @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d) static void gic_eoi_irq(struct irq_data *d) { - writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI); + u32 deact_offset = GIC_CPU_EOI; + + if (static_key_true(supports_deactivate)) { + if (primary_gic_irq(d)) + deact_offset = GIC_CPU_DEACTIVATE; + } + + writel_relaxed(gic_irq(d), gic_cpu_base(d) + deact_offset); } static int gic_irq_set_irqchip_state(struct irq_data *d, @@ -272,11 +290,15 @@ static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) irqnr = irqstat GICC_IAR_INT_ID_MASK; if (likely(irqnr 15 irqnr 1021)) { + if (static_key_true(supports_deactivate)) + writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI); handle_domain_irq(gic-domain, irqnr, regs); continue; } if (irqnr 16) { writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI); + if (static_key_true(supports_deactivate)) + writel_relaxed(irqstat, cpu_base + GIC_CPU_DEACTIVATE); #ifdef CONFIG_SMP handle_IPI(irqnr, regs); #endif @@ -359,6 +381,10 @@ static void gic_cpu_if_up(void) { void __iomem *cpu_base = gic_data_cpu_base(gic_data[0]); u32 bypass = 0; + u32 mode = 0; + + if (static_key_true(supports_deactivate)) + mode = GIC_CPU_CTRL_EOImodeNS; /* * Preserve bypass disable bits to be written back later @@ -366,7 +392,7 @@ static void gic_cpu_if_up(void) bypass = readl(cpu_base + GIC_CPU_CTRL); bypass = GICC_DIS_BYPASS_MASK; - writel_relaxed(bypass | GICC_ENABLE, cpu_base + GIC_CPU_CTRL); + writel_relaxed(bypass | mode | GICC_ENABLE, cpu_base + GIC_CPU_CTRL); } @@ -986,6 +1012,8 @@ void __init gic_init_bases(unsigned int gic_nr, int irq_start, register_cpu_notifier(gic_cpu_notifier); #endif set_handle_irq(gic_handle_irq); + if (static_key_true(supports_deactivate)) + pr_info(GIC: Using split EOI/Deactivate mode\n); } gic_dist_init(gic); @@ -1001,6 +1029,7 @@ gic_of_init(struct device_node *node, struct device_node *parent) { void __iomem *cpu_base; void __iomem *dist_base; + struct resource cpu_res; u32 percpu_offset; int irq; @@ -1013,6 +1042,16 @@ gic_of_init(struct device_node *node, struct device_node *parent) cpu_base = of_iomap(node, 1); WARN(!cpu_base, unable to map gic cpu registers\n); + of_address_to_resource(node, 1, cpu_res); + + /* +* Disable split EOI/Deactivate if either HYP is not available +* or the CPU interface is too small. +*/
Re: [PATCH v2 10/15] KVM: arm64: add data structures to model ITS interrupt translation
Hi Eric, On 13/08/15 16:46, Eric Auger wrote: On 07/10/2015 04:21 PM, Andre Przywara wrote: The GICv3 Interrupt Translation Service (ITS) uses tables in memory to allow a sophisticated interrupt routing. It features device tables, an interrupt table per device and a table connecting collections to actual CPUs (aka. redistributors in the GICv3 lingo). Since the interrupt numbers for the LPIs are allocated quite sparsely and the range can be quite huge (8192 LPIs being the minimum), using bitmaps or arrays for storing information is a waste of memory. We use linked lists instead, which we iterate linearily. This works very well with the actual number of LPIs/MSIs in the guest being quite low. Should the number of LPIs exceed the number where iterating through lists seems acceptable, we can later revisit this and use more efficient data structures. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- include/kvm/arm_vgic.h | 3 +++ virt/kvm/arm/its-emul.c | 48 2 files changed, 51 insertions(+) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index b432055..1648668 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -25,6 +25,7 @@ #include linux/spinlock.h #include linux/types.h #include kvm/iodev.h +#include linux/list.h #define VGIC_NR_IRQS_LEGACY 256 #define VGIC_NR_SGIS16 @@ -162,6 +163,8 @@ struct vgic_its { u64 cbaser; int creadr; int cwriter; +struct list_headdevice_list; +struct list_headcollection_list; }; struct vgic_dist { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index b498f06..7f217fa 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -21,6 +21,7 @@ #include linux/kvm.h #include linux/kvm_host.h #include linux/interrupt.h +#include linux/list.h #include linux/irqchip/arm-gic-v3.h #include kvm/arm_vgic.h @@ -32,6 +33,25 @@ #include vgic.h #include its-emul.h +struct its_device { +struct list_head dev_list; +struct list_head itt; +u32 device_id; +}; + +struct its_collection { +struct list_head coll_list; +u32 collection_id; +u32 target_addr; +}; + +struct its_itte { +struct list_head itte_list; +struct its_collection *collection; +u32 lpi; +u32 event_id; +}; + #define BASER_BASE_ADDRESS(x) ((x) 0xf000ULL) /* The distributor lock is held by the VGIC MMIO handler. */ @@ -311,6 +331,9 @@ int vits_init(struct kvm *kvm) spin_lock_init(its-lock); +INIT_LIST_HEAD(its-device_list); +INIT_LIST_HEAD(its-collection_list); + its-enabled = false; return -ENXIO; @@ -320,11 +343,36 @@ void vits_destroy(struct kvm *kvm) { struct vgic_dist *dist = kvm-arch.vgic; struct vgic_its *its = dist-its; +struct its_device *dev; +struct its_itte *itte; +struct list_head *dev_cur, *dev_temp; +struct list_head *cur, *temp; if (!vgic_has_its(kvm)) return; +if (!its-device_list.next) Why not using list_empty? But I think I would simply remove this since the empty case if handle below... list_empty() requires the list to be initialized before. This check here is to detect that map_resources was never called (this is only done on the first VCPU run) and thus device_list is basically still all zeroes. If we abort the guest without ever running a VCPU (for instance because some initialization failed), we call vits_destroy() anyway (because this is called when tearing down the VGIC device). So the check is here to detect early that vits_destroy() has been called without the ITS ever been fully initialized. This fixed a real bug when the guest start was aborted before the ITS was ever used. I will add a comment to make this clear. +return; + +spin_lock(its-lock); +list_for_each_safe(dev_cur, dev_temp, its-device_list) { +dev = container_of(dev_cur, struct its_device, dev_list); isn't the usage of list_for_each_entry_safe more synthetic here? If I got this correctly, we need the _safe variant if we want to remove the list item within the loop. Or am I missing something here? Cheers, Andre. +list_for_each_safe(cur, temp, dev-itt) { +itte = (container_of(cur, struct its_itte, itte_list)); same Eric +list_del(cur); +kfree(itte); +} +list_del(dev_cur); +kfree(dev); +} + +list_for_each_safe(cur, temp, its-collection_list) { +list_del(cur); +kfree(container_of(cur, struct its_collection, coll_list)); +} + kfree(dist-pendbaser); its-enabled = false; +spin_unlock(its-lock); } -- To unsubscribe from this list: send the
Re: [PATCH V2 3/3] kvm: add tracepoint for fast mmio
On Tue, Aug 25, 2015 at 03:47:15PM +0800, Jason Wang wrote: Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- arch/x86/kvm/trace.h | 17 + arch/x86/kvm/vmx.c | 1 + arch/x86/kvm/x86.c | 1 + 3 files changed, 19 insertions(+) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 4eae7c3..2d4e81a 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio, __entry-count 1 ? (...) : ) ); +TRACE_EVENT(kvm_fast_mmio, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64,gpa) + ), + + TP_fast_assign( + __entry-gpa= gpa; + ), + + TP_printk(fast mmio at gpa 0x%llx, __entry-gpa) +); + + + don't add multiple empty lines please. /* * Tracepoint for cpuid. */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 83b7b5c..a55d279 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { skip_emulated_instruction(vcpu); + trace_kvm_fast_mmio(gpa); return 1; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f0f6ec..36cf78e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm) EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote: We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com Saw v3 too late. Pls see my comments on v2. --- Changes from V2: - Tweak styles and comment suggested by Cornelia. Changes from v1: - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when needed to save lots of unnecessary changes. --- virt/kvm/eventfd.c | 31 +-- virt/kvm/kvm_main.c | 16 ++-- 2 files changed, 23 insertions(+), 24 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..c3ffdc3 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args) { - if (flags KVM_IOEVENTFD_FLAG_PIO) + if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; - if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) + if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; - return KVM_MMIO_BUS; + /* When length is ignored, MMIO is put on a separate bus, for + * faster lookups. + */ + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS; } static int @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) struct eventfd_ctx *eventfd; int ret; - bus_idx = ioeventfd_bus_from_flags(args-flags); + bus_idx = ioeventfd_bus_from_args(args); /* must be natural-word sized, or 0 to ignore length */ switch (args-len) {
[PATCH v3 0/4] irqchip: GICv2/v3: Add support for irq_vcpu_affinity
The GICv2 and GICv3 architectures allow an active physical interrupt to be forwarded to a guest, and the guest to indirectly perform the deactivation of the interrupt by performing an EOI on the virtual interrupt (see for example the GICv2 spec, 3.2.1). This allows some substantial performance improvement for level triggered interrupts that otherwise have to be masked/unmasked in VFIO, not to mention the required trap back to KVM when the guest performs an EOI. To enable this, the GICs need to be switched to a different EOImode, where a taken interrupt can be left active (which prevents the same interrupt from being taken again), while other interrupts are still being processed normally. We also use the new irq_set_vcpu_affinity hook that was introduced for Intel's Posted Interrupts to determine whether or not to perform the deactivation at EOI-time. As all of this only makes sense when the kernel can behave as a hypervisor, we only enable this mode on detecting that the kernel was actually booted in HYP mode, and that the GIC supports this feature. This series is a complete rework of a RFC I sent over a year ago: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/266328.html Since then, a lot has been either merged (the irqchip_state) or reworked (my active-timer series: http://www.spinics.net/lists/kvm/msg118768.html), and this implements the last few bits for Eric Auger's series to finally make it into the kernel: https://lkml.org/lkml/2015/7/2/268 https://lkml.org/lkml/2015/7/6/291 With all these patches combined, physical interrupt routing from the kernel into a VM becomes possible. This has been tested on Juno (GICv2) and FastModel (GICv3), and Eric tested it on a Calxeda Midway. A branch is available at: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git irq/gic-irq-vcpu-affinity-v2 * From v2: - Another small fix from Eric - Some commit message cleanups * From v1: - Fixes after review from Eric - Got rid of the cascaded GICv2 hack (it was broken anyway) - Folded the LPI deactivation patch (it makes more sense as part of the main one. - Some clarifying comments about the deactivate on mask - I haven't retained Eric's Reviewed/Tested-by, as the code as significantly changed on GICv2 Marc Zyngier (4): irqchip: GICv3: Convert to EOImode == 1 irqchip: GICv3: Don't deactivate interrupts forwarded to a guest irqchip: GIC: Convert to EOImode == 1 irqchip: GIC: Don't deactivate interrupts forwarded to a guest drivers/irqchip/irq-gic-v3.c | 70 +-- drivers/irqchip/irq-gic.c | 111 - include/linux/irqchip/arm-gic-v3.h | 9 +++ include/linux/irqchip/arm-gic.h| 4 ++ 4 files changed, 188 insertions(+), 6 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/4] irqchip: GIC: Don't deactivate interrupts forwarded to a guest
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU) added just what we needed at the lowest level to allow an interrupt to be deactivated by a guest. When such a request reaches the GIC, it knows it doesn't need to perform the deactivation anymore, and can safely leave the guest do its magic. This of course requires additional support in both VFIO and KVM. Reviewed-by: Eric Auger eric.au...@linaro.org Tested-by: Eric Auger eric.au...@linaro.org Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic.c | 60 +++ 1 file changed, 60 insertions(+) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 505aaf3..5e48850 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -148,6 +148,36 @@ static inline bool primary_gic_irq(struct irq_data *d) return true; } +static inline bool cascading_gic_irq(struct irq_data *d) +{ + void *data = irq_data_get_irq_handler_data(d); + + /* +* If handler_data pointing to one of the secondary GICs, then +* this is a cascading interrupt, and it cannot possibly be +* forwarded. +*/ + if (data = (void *)(gic_data + 1) + data (void *)(gic_data + MAX_GIC_NR)) + return true; + + return false; +} + +static inline bool forwarded_irq(struct irq_data *d) +{ + /* +* A forwarded interrupt: +* - is on the primary GIC +* - has its handler_data set to a value +* - that isn't a secondary GIC +*/ + if (primary_gic_irq(d) d-handler_data !cascading_gic_irq(d)) + return true; + + return false; +} + /* * Routines to acknowledge, disable and enable interrupts */ @@ -166,6 +196,18 @@ static int gic_peek_irq(struct irq_data *d, u32 offset) static void gic_mask_irq(struct irq_data *d) { gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR); + /* +* When masking a forwarded interrupt, make sure it is +* deactivated as well. +* +* This ensures that an interrupt that is getting +* disabled/masked will not get stuck, because there is +* noone to deactivate it (guest is being terminated). +*/ + if (static_key_true(supports_deactivate)) { + if (forwarded_irq(d)) + gic_poke_irq(d, GIC_DIST_ACTIVE_CLEAR); + } } static void gic_unmask_irq(struct irq_data *d) @@ -178,6 +220,10 @@ static void gic_eoi_irq(struct irq_data *d) u32 deact_offset = GIC_CPU_EOI; if (static_key_true(supports_deactivate)) { + /* Do not deactivate an IRQ forwarded to a vcpu. */ + if (forwarded_irq(d)) + return; + if (primary_gic_irq(d)) deact_offset = GIC_CPU_DEACTIVATE; } @@ -251,6 +297,19 @@ static int gic_set_type(struct irq_data *d, unsigned int type) return gic_configure_irq(gicirq, type, base, NULL); } +static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) +{ + /* Only interrupts on the primary GIC can be forwarded to a vcpu. */ + if (static_key_true(supports_deactivate)) { + if (primary_gic_irq(d) !cascading_gic_irq(d)) { + d-handler_data = vcpu; + return 0; + } + } + + return -EINVAL; +} + #ifdef CONFIG_SMP static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, bool force) @@ -346,6 +405,7 @@ static struct irq_chip gic_chip = { #endif .irq_get_irqchip_state = gic_irq_get_irqchip_state, .irq_set_irqchip_state = gic_irq_set_irqchip_state, + .irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity, .flags = IRQCHIP_SET_TYPE_MASKED, }; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses
On Tue, Aug 25, 2015 at 03:47:14PM +0800, Jason Wang wrote: We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS and another is KVM_FAST_MMIO_BUS. This leads to issue: - kvm_io_bus_destroy() knows nothing about the devices on two buses points to a single dev. Which will lead double free [1] during exit. - wildcard eventfd ignores data len, so it was registered as a kvm_io_range with zero length. This will fail the binary search in kvm_io_bus_get_first_dev() when we try to emulate through KVM_MMIO_BUS. This will cause userspace io emulation request instead of a eventfd notification (virtqueue kick will be trapped by qemu instead of vhost in this case). Fixing this by don't register wildcard mmio eventfd on two buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the double free issue of kvm_io_bus_destroy(). For the arch/setups that does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try KVM_FAST_MMIO_BUS first to see it it has a match. [1] Panic caused by double free: CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f RIP: 0010:[c07e25d8] [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP: 0018:88020e7f3bc8 EFLAGS: 00010292 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880 FS: 7fc1ee3e6700() GS:88023e24() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0 Stack: 88021e7cc000 88020e7f3be8 c07e2622 88020e7f3c38 c07df69a 880232524160 88020e792d80 880219b78c00 0008 8802321686a8 Call Trace: [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm] [c07df69a] kvm_put_kvm+0xca/0x210 [kvm] [c07df818] kvm_vcpu_release+0x18/0x20 [kvm] [811f69f7] __fput+0xe7/0x250 [811f6bae] fput+0xe/0x10 [81093f04] task_work_run+0xd4/0xf0 [81079358] do_exit+0x368/0xa50 [81082c8f] ? recalc_sigpending+0x1f/0x60 [81079ad5] do_group_exit+0x45/0xb0 [81085c71] get_signal+0x291/0x750 [810144d8] do_signal+0x28/0xab0 [810f3a3b] ? do_futex+0xdb/0x5d0 [810b7028] ? __wake_up_locked_key+0x18/0x20 [810f3fa6] ? SyS_futex+0x76/0x170 [81014fc9] do_notify_resume+0x69/0xb0 [817cb9af] int_signal+0x12/0x17 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 RIP [c07e25d8] ioeventfd_release+0x28/0x60 [kvm] RSP 88020e7f3bc8 Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com I'm worried that this slows down the regular MMIO. Could you share performance #s please? You need a mix of len=0 and len=2 matches. One solution for the first issue is to create two ioeventfd objects instead. For the second issue, we could change bsearch compare function instead. Again, affects all devices to performance #s would be needed. --- Changes from v1: - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when needed to save lots of unnecessary changes. --- virt/kvm/eventfd.c | 30 -- virt/kvm/kvm_main.c | 16 ++-- 2 files changed, 22 insertions(+), 24 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..95f2901 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) return false; } -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags) +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args) { - if (flags KVM_IOEVENTFD_FLAG_PIO) + if (args-flags KVM_IOEVENTFD_FLAG_PIO) return KVM_PIO_BUS; - if (flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) + if (args-flags KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY) return KVM_VIRTIO_CCW_NOTIFY_BUS; - return KVM_MMIO_BUS; + if (args-len) + return KVM_MMIO_BUS; + return KVM_FAST_MMIO_BUS; } static int @@ -779,7 +781,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) struct eventfd_ctx *eventfd; int ret; - bus_idx =
[PATCH v3 2/4] irqchip: GICv3: Don't deactivate interrupts forwarded to a guest
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU) added just what we needed at the lowest level to allow an interrupt to be deactivated by a guest. When such a request reaches the GIC, it knows it doesn't need to perform the deactivation anymore, and can safely leave the guest do its magic. This of course requires additional support in both VFIO and KVM. Reviewed-by: Eric Auger eric.au...@linaro.org Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/irqchip/irq-gic-v3.c | 35 +-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index addd2ee..5aa9bf6 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -70,6 +70,11 @@ static inline int gic_irq_in_rdist(struct irq_data *d) return gic_irq(d) 32; } +static inline bool forwarded_irq(struct irq_data *d) +{ + return d-handler_data != NULL; +} + static inline void __iomem *gic_dist_base(struct irq_data *d) { if (gic_irq_in_rdist(d))/* SGI+PPI - SGI_base for this CPU */ @@ -231,6 +236,18 @@ static void gic_poke_irq(struct irq_data *d, u32 offset) static void gic_mask_irq(struct irq_data *d) { gic_poke_irq(d, GICD_ICENABLER); + /* +* When masking a forwarded interrupt, make sure it is +* deactivated as well. +* +* This ensures that an interrupt that is getting +* disabled/masked will not get stuck, because there is +* noone to deactivate it (guest is being terminated). +*/ + if (static_key_true(supports_deactivate)) { + if (forwarded_irq(d)) + gic_poke_irq(d, GICD_ICACTIVER); + } } static void gic_unmask_irq(struct irq_data *d) @@ -296,8 +313,11 @@ static int gic_irq_get_irqchip_state(struct irq_data *d, static void gic_eoi_irq(struct irq_data *d) { if (static_key_true(supports_deactivate)) { - /* No need to deactivate an LPI */ - if (gic_irq(d) = 8192) + /* +* No need to deactivate an LPI, or an interrupt that +* is is getting forwarded to a vcpu. +*/ + if (gic_irq(d) = 8192 || forwarded_irq(d)) return; gic_write_dir(gic_irq(d)); } else { @@ -331,6 +351,16 @@ static int gic_set_type(struct irq_data *d, unsigned int type) return gic_configure_irq(irq, type, base, rwp_wait); } +static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) +{ + if (static_key_true(supports_deactivate)) { + d-handler_data = vcpu; + return 0; + } + + return -EINVAL; +} + static u64 gic_mpidr_to_affinity(u64 mpidr) { u64 aff; @@ -683,6 +713,7 @@ static struct irq_chip gic_chip = { .irq_set_affinity = gic_set_affinity, .irq_get_irqchip_state = gic_irq_get_irqchip_state, .irq_set_irqchip_state = gic_irq_set_irqchip_state, + .irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity, .flags = IRQCHIP_SET_TYPE_MASKED, }; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, August 26, 2015 3:58 AM To: Wu, Feng Cc: pbonz...@redhat.com; j...@8bytes.org; mtosa...@redhat.com; eric.au...@linaro.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts On Tue, 2015-08-25 at 16:50 +0800, Feng Wu wrote: This patch adds the routine to update IRTE for posted-interrupts when guest changes the interrupt configuration. Signed-off-by: Feng Wu feng...@intel.com --- arch/x86/kvm/x86.c | 73 ++ 1 file changed, 73 insertions(+) + kvm_set_msi_irq(e, irq); + if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu)) + continue; + + vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu); + vcpu_info.vector = irq.vector; + + if (set) + ret = irq_set_vcpu_affinity(host_irq, vcpu_info); + else { + /* suppress notification event before unposting */ + kvm_x86_ops-pi_set_sn(vcpu); + ret = irq_set_vcpu_affinity(host_irq, NULL); + kvm_x86_ops-pi_clear_sn(vcpu); + } Can we add trace events so that we have a way to tell when PI is being enabled/disabled other than performance heuristics? Thanks, Sure, I will add it. Thanks, Feng Alex
答复: I'm now looking into kvm-unit-tests and encounted with some problems.
You should add kvm maillinglist too. -邮件原件- 发件人: Jinjian (Ken) 发送时间: 2015年8月25日 21:42 收件人: drjo...@redhat.com; pbonz...@redhat.com 抄送: Huangpeng (Peter); Gonglei (Arei); Zhanghailiang 主题: I'm now looking into kvm-unit-tests and encounted with some problems. Hi all: I'm now looking into kvm-unit-tests and encounted with some problems. 1. when I run run_test.sh, it reported exec: {config_fd}: not found. how and where to define it? 2. all tests run with -smp 2(or 3) hang. for example, run apic unittest, command as follows: qemu-kvm --enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel x86/apic.flat -smp 2 -vnc none result: enter into an endless loop the related codes are: x86/apic.c test_sti_nmi() on_cpu_async(1, sti_loop, 0); static void sti_loop(void *ignore) { unsigned k = 0; while (sti_loop_active) { sti_nop((char *)(ulong)((k++ * 4096) % (128 * 1024 * 1024))); } } 3. s3 kvm-unit-test hang run s3 unittest, command as follows: qemu-kvm --enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel x86/s3.flat -vnc none s3 hang at resume event logs: RSDP is at f62c0 RSDT is at 7fe16a9 FADT is at 7fe0bda FACS is at 7fe resume vector addr is 7fe000c copy resume code from 400350 4.qemu exit and fail even after the problematic code is commented, when we run emulate unittest. run emulate unittest,command as follows: qemu-kvm --enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel x86/emulator.flat -vnc none result: qemu exit when do test_muldiv logs: PASS: imul rax, mem, imm unhandled cpu excecption 8 If The code which cause qemu exit is commented, the test also fail logs: FAIL: mov null, %ss Question: What's the cause of the problem at your view? looking forward for your reply. Thank you in advance.
[PATCH v2 0/3] KVM: Dynamic halt_poll_ns
v1 - v2: * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of the module parameter * use the shrink/grow matrix which is suggested by David * set halt_poll_ns_max to 2ms There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patchset add the ability to adjust halt_poll_ns dynamically. There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow matrix is suggested by David: if (poll successfully for interrupt): stay the same else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow halt_poll_ns_shrink/ | halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns -+--+--- 1 | = halt_poll_ns | = 0 halt_poll_ns | *= halt_poll_ns_grow | /= halt_poll_ns_shrink otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink Wanpeng Li (3): KVM: make halt_poll_ns per-VCPU KVM: dynamic halt_poll_ns adjustment KVM: trace kvm_halt_poll_ns grow/shrink include/linux/kvm_host.h | 1 + include/trace/events/kvm.h | 30 ++ virt/kvm/kvm_main.c| 78 -- 3 files changed, 106 insertions(+), 3 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] KVM: trace kvm_halt_poll_ns grow/shrink
Tracepoint for dynamic halt_pool_ns, fired on every potential change. Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- include/trace/events/kvm.h | 30 ++ virt/kvm/kvm_main.c| 8 2 files changed, 38 insertions(+) diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index a44062d..75ddf80 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -356,6 +356,36 @@ TRACE_EVENT( __entry-address) ); +TRACE_EVENT(kvm_halt_poll_ns, + TP_PROTO(bool grow, unsigned int vcpu_id, int new, int old), + TP_ARGS(grow, vcpu_id, new, old), + + TP_STRUCT__entry( + __field(bool, grow) + __field(unsigned int, vcpu_id) + __field(int, new) + __field(int, old) + ), + + TP_fast_assign( + __entry-grow = grow; + __entry-vcpu_id= vcpu_id; + __entry-new= new; + __entry-old= old; + ), + + TP_printk(vcpu %u: halt_pool_ns %d (%s %d), + __entry-vcpu_id, + __entry-new, + __entry-grow ? grow : shrink, + __entry-old) +); + +#define trace_kvm_halt_poll_ns_grow(vcpu_id, new, old) \ + trace_kvm_halt_poll_ns(true, vcpu_id, new, old) +#define trace_kvm_halt_poll_ns_shrink(vcpu_id, new, old) \ + trace_kvm_halt_poll_ns(false, vcpu_id, new, old) + #endif #endif /* _TRACE_KVM_MAIN_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2a4962b..04f62e0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1957,13 +1957,21 @@ static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum) static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) { + int old = vcpu-halt_poll_ns; + vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns); + + trace_kvm_halt_poll_ns_grow(vcpu-vcpu_id, vcpu-halt_poll_ns, old); } static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu) { + int old = vcpu-halt_poll_ns; + vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns, halt_poll_ns_shrink, halt_poll_ns); + + trace_kvm_halt_poll_ns_shrink(vcpu-vcpu_id, vcpu-halt_poll_ns, old); } static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] KVM: dynamic halt_poll_ns adjustment
There is a downside of halt_poll_ns since poll is still happen for idle VCPU which can waste cpu usage. This patch adds the ability to adjust halt_poll_ns dynamically. There are two new kernel parameters for changing the halt_poll_ns: halt_poll_ns_grow and halt_poll_ns_shrink. A third new parameter, halt_poll_ns_max, controls the maximal halt_poll_ns; it is internally rounded down to a closest multiple of halt_poll_ns_grow. The shrink/grow matrix is suggested by David: if (poll successfully for interrupt): stay the same else if (length of kvm_vcpu_block is longer than halt_poll_ns_max): shrink else if (length of kvm_vcpu_block is less than halt_poll_ns_max): grow halt_poll_ns_shrink/ | halt_poll_ns_grow| grow halt_poll_ns| shrink halt_poll_ns -+--+--- 1 | = halt_poll_ns | = 0 halt_poll_ns | *= halt_poll_ns_grow | /= halt_poll_ns_shrink otherwise| += halt_poll_ns_grow | -= halt_poll_ns_shrink Signed-off-by: Wanpeng Li wanpeng...@hotmail.com --- virt/kvm/kvm_main.c | 65 - 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 93db833..2a4962b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -66,9 +66,26 @@ MODULE_AUTHOR(Qumranet); MODULE_LICENSE(GPL); -static unsigned int halt_poll_ns; +#define KVM_HALT_POLL_NS 50 +#define KVM_HALT_POLL_NS_GROW 2 +#define KVM_HALT_POLL_NS_SHRINK 0 +#define KVM_HALT_POLL_NS_MAX 200 + +static unsigned int halt_poll_ns = KVM_HALT_POLL_NS; module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); +/* Default doubles per-vcpu halt_poll_ns. */ +static unsigned int halt_poll_ns_grow = KVM_HALT_POLL_NS_GROW; +module_param(halt_poll_ns_grow, int, S_IRUGO); + +/* Default resets per-vcpu halt_poll_ns . */ +static unsigned int halt_poll_ns_shrink = KVM_HALT_POLL_NS_SHRINK; +module_param(halt_poll_ns_shrink, int, S_IRUGO); + +/* halt polling only reduces halt latency by 10-15 us, 2ms is enough */ +static unsigned int halt_poll_ns_max = KVM_HALT_POLL_NS_MAX; +module_param(halt_poll_ns_max, int, S_IRUGO); + /* * Ordering of locks: * @@ -1907,6 +1924,48 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); +static unsigned int __grow_halt_poll_ns(unsigned int val) +{ + if (halt_poll_ns_grow 1) + return halt_poll_ns; + + val = min(val, halt_poll_ns_max); + + if (val == 0) + return halt_poll_ns; + + if (halt_poll_ns_grow halt_poll_ns) + val *= halt_poll_ns_grow; + else + val += halt_poll_ns_grow; + + return val; +} + +static unsigned int __shrink_halt_poll_ns(int val, int modifier, int minimum) +{ + if (modifier 1) + return 0; + + if (modifier halt_poll_ns) + val /= modifier; + else + val -= modifier; + + return val; +} + +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) +{ + vcpu-halt_poll_ns = __grow_halt_poll_ns(vcpu-halt_poll_ns); +} + +static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu) +{ + vcpu-halt_poll_ns = __shrink_halt_poll_ns(vcpu-halt_poll_ns, + halt_poll_ns_shrink, halt_poll_ns); +} + static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) { if (kvm_arch_vcpu_runnable(vcpu)) { @@ -1954,6 +2013,10 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) break; waited = true; + if (vcpu-halt_poll_ns halt_poll_ns_max) + shrink_halt_poll_ns(vcpu); + else + grow_halt_poll_ns(vcpu); schedule(); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM slow LAMP guest
On 8/26/15 6:41 AM, Hansa wrote: On 26-8-2015 0:33, Wanpeng Li wrote: On the VM server I issued the command below every eleven minutes: date curltest-file; _ top -b -n 1 | sed -n '7,12p' curltest-file; _ curl -o /dev/null -s -wtime_total: %{time_total}\\n https://my.domain.com | perl -pe 'BEGIN {use POSIX;} print strftime(%Y-%m-%d %H:%M:%S , localtime)' curltest-file This gives me the total time for displaying my site on a local machine. It also includes a 'top' command to display which processes are running at each sample. All is saved in a file called curltest-file. I found 7 occurrences in my curltest-file of a time_total larger than 20 seconds. Top however didn't show any significant CPU or IO activity at those sampled times. Further investigations shows me that they are related to a known (gravatar) issue in the Wordpress Jetpack plugin. I didn't include these samples in the average total. If you just use halt_poll_ns or both halt_poll_ns and idle=poll in guest? I just use kvm.halt_poll_ns=50 Should I try some different tests? Looks good to me currently. Per vCPU will consume almost half pCPU's capacity in host when add idle=poll in my testing which is not suitable for some cloud computing scenarios since vCPUs have high overcommit ratio on host. Regards, Wanpeng Li -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: [ERROR] INIT: PANIC: segmentation violation! sleeping for 30 seconds.
KVM Team, We have a user that is experiencing an odd issue when trying to create VMs on his system which causes the entire host to crash, printing the following message to the console: [ERROR] INIT: PANIC: segmentation violation! sleeping for 30 seconds. We have never seen this issue before, but the user's hardware is quite old (circa 2008). The odd thing is that he is having no issues utilizing alternative hypervisors such as Microsoft Hyper-V or XenServer on the same hardware. System Information: Linux Kernel 4.0.4 QEMU 2.3.0 CPU Intel(R) Core(TM)2 CPU E8400 @ 3.00GHz (fam: 06, model: 17, stepping: 0a) What other information can I provide to help track down the root cause of this issue? Thank you for your time. Sincerest Regards, Jonathan Panozzo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c
Kevin Hao haokexin at gmail.com writes: We use the wrong number arguments when invoking trace_kvm_stlb_inval, and cause the following build error. arch/powerpc/kvm/e500_tlb.c: In function 'kvmppc_e500_stlbe_invalidate': arch/powerpc/kvm/e500_tlb.c:230: error: too many arguments to function 'trace_kvm_stlb_inval' Signed-off-by: Kevin Hao haokexin at gmail.com --- arch/powerpc/kvm/e500_tlb.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index 21011e1..1261a21 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c at at -226,8 +226,7 at at static void kvmppc_e500_stlbe_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500, kvmppc_e500_shadow_release(vcpu_e500, tlbsel, esel); stlbe-mas1 = 0; - trace_kvm_stlb_inval(index_of(tlbsel, esel), stlbe-mas1, stlbe-mas2, - stlbe-mas3, stlbe-mas7); + trace_kvm_stlb_inval(index_of(tlbsel, esel)); } static void kvmppc_e500_tlb1_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500, it worked was able to build image after this change, I have one query how to check on PowerPC that KVM is enabled, or VT is enabled, any cli available to check this data. please share it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I'm now looking into kvm-unit-tests and encounted with some problems.
To Peter, Thank you very much! I'm sorry this is my first kvm mail, and have no experience. On 2015/8/26 8:40, Huangpeng (Peter) wrote: You should add kvm maillinglist too. -邮件原件- 发件人: Jinjian (Ken) 发送时间: 2015年8月25日 21:42 收件人: drjo...@redhat.com; pbonz...@redhat.com 抄送: Huangpeng (Peter); Gonglei (Arei); Zhanghailiang 主题: I'm now looking into kvm-unit-tests and encounted with some problems. Hi all: I'm now looking into kvm-unit-tests and encounted with some problems. 1. when I run run_test.sh, it reported exec: {config_fd}: not found. how and where to define it? 2. all tests run with -smp 2(or 3) hang. for example, run apic unittest, command as follows: qemu-kvm --enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel x86/apic.flat -smp 2 -vnc none result: enter into an endless loop the related codes are: x86/apic.c test_sti_nmi() on_cpu_async(1, sti_loop, 0); static void sti_loop(void *ignore) { unsigned k = 0; while (sti_loop_active) { sti_nop((char *)(ulong)((k++ * 4096) % (128 * 1024 * 1024))); } } 3. s3 kvm-unit-test hang run s3 unittest, command as follows: qemu-kvm --enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel x86/s3.flat -vnc none s3 hang at resume event logs: RSDP is at f62c0 RSDT is at 7fe16a9 FADT is at 7fe0bda FACS is at 7fe resume vector addr is 7fe000c copy resume code from 400350 4.qemu exit and fail even after the problematic code is commented, when we run emulate unittest. run emulate unittest,command as follows: qemu-kvm --enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -serial stdio -device pci-testdev -kernel x86/emulator.flat -vnc none result: qemu exit when do test_muldiv logs: PASS: imul rax, mem, imm unhandled cpu excecption 8 If The code which cause qemu exit is commented, the test also fail logs: FAIL: mov null, %ss Question: What's the cause of the problem at your view? looking forward for your reply. Thank you in advance. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html