Re: [PATCH 00/16] implement vNVDIMM
On 07/02/2015 02:17 PM, Michael S. Tsirkin wrote: On Wed, Jul 01, 2015 at 10:50:16PM +0800, Xiao Guangrong wrote: hw/acpi/aml-build.c | 32 +- hw/i386/acpi-build.c|9 +- hw/i386/acpi-dsdt.dsl |2 +- hw/i386/pc.c| 11 +- hw/mem/Makefile.objs|1 + hw/mem/pc-nvdimm.c | 1040 +++ include/hw/acpi/aml-build.h |5 +- include/hw/mem/pc-nvdimm.h | 56 +++ 8 files changed, 1149 insertions(+), 7 deletions(-) create mode 100644 hw/mem/pc-nvdimm.c create mode 100644 include/hw/mem/pc-nvdimm.h Given the amount of code, this is definitely not 2.4 material. Maybe others will have the time to review it before this, but in any case please remember to repost after 2.4 is out. I see, thanks for your reminder, Michael! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] virtio/vhost: cross endian support
On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote: On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote: virtio/vhost: cross endian support Ugh. Does this really have to be dynamic? Can't virtio do the sane thing, and just use a _fixed_ endianness? Doing a unconditional byte swap is faster and simpler than the crazy conditionals. That's true regardless of endianness, but gets to be even more so if the fixed endianness is little-endian, since BE is not-so-slowly fading from the world. Linus Yea, well - support for legacy BE guests on the new LE hosts is exactly the motivation for this. I dislike it too, but there are two redeeming properties that made me merge this: 1. It's a trivial amount of code: since we wrap host/guest accesses anyway, almost all of it is well hidden from drivers. 2. Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY - and when it's clear, there's zero overhead (as some point it was tested by compiling with and without the patches, got the same stripped binary). Maybe we could create a Kconfig symbol to enforce point (2): prevent people from enabling it e.g. on x86. I will look into this - but it can be done by a patch on top, so I think this can be merged as is. Or do you know of someone using kernel with all config options enabled undiscriminately? Thanks, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
Hello! -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Eric Auger Sent: Monday, June 29, 2015 6:37 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; marc.zyng...@arm.com; christoffer.d...@linaro.org; andre.przyw...@arm.com; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; p.fe...@samsung.com; pbonz...@redhat.com Subject: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that composes the MSI msg. Let's create a new routing entry type, dubbed KVM_IRQ_ROUTING_EXTENDED_MSI and use the __u32 pad space to convey the device ID. Signed-off-by: Eric Auger eric.au...@linaro.org --- RFC - PATCH - remove kvm_irq_routing_extended_msi and use union instead --- Documentation/virtual/kvm/api.txt | 9 - include/uapi/linux/kvm.h | 6 +- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d20fd94..6426ae9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1414,7 +1414,10 @@ struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; devid is actually a part of MSI bunch. Shouldn't it be a part of struct kvm_irq_routing_msi then? It also has reserved pad. @@ -1427,6 +1430,10 @@ struct kvm_irq_routing_entry { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 + +In case of KVM_IRQ_ROUTING_EXTENDED_MSI routing type, devid is used to convey +the device ID. No flags are specified so far, the corresponding field must be set to zero. What if we use KVM_MSI_VALID_DEVID flag instead of new KVM_IRQ_ROUTING_EXTENDED_MSI definition? I believe this would make an API more consistent and introduce less new definitions. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2a23705..8484681 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -841,12 +841,16 @@ struct kvm_irq_routing_s390_adapter { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] virtio/vhost: cross endian support
On Wed, Jul 01, 2015 at 12:03:59PM -0700, Linus Torvalds wrote: On Wed, Jul 1, 2015 at 12:02 PM, Linus Torvalds torva...@linux-foundation.org wrote: Doing a unconditional byte swap is faster and simpler than the crazy conditionals. Unconditional endianness not only makes for simpler and faster code, it also ends up being easier to debug and add things like type annotations for sparse. Linus At least this last one is well covered by these patches: this uses separate sparse types so all accesses are statically verified by sparse to use the correct accessor. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping
Hello! -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Eric Auger Sent: Monday, June 29, 2015 6:37 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; marc.zyng...@arm.com; christoffer.d...@linaro.org; andre.przyw...@arm.com; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; p.fe...@samsung.com; pbonz...@redhat.com Subject: [PATCH 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping If the ITS modality is not available, let's simply support MSI injection by transforming the MSI.data into an SPI ID. This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too. Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/kvm/Kconfig | 1 + virt/kvm/arm/vgic.c | 5 + 2 files changed, 6 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 151e710..0f58baf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_MSI select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 0b4c48c..b3c10dc 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2314,6 +2314,11 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, return kvm-arch.vgic.vm_ops.inject_msi(kvm, msi); else return -ENODEV; + case KVM_IRQ_ROUTING_MSI: + if (kvm-arch.vgic.vm_ops.inject_msi) + return -EINVAL; + else + return kvm_vgic_inject_irq(kvm, 0, e-msi.data, level); Given API change i suggest (using KVM_MSI_VALID_DEVID flag), we could get rid of all these if()'s here. Just forward all parameters to vGIC implementation code and let it do its checks. default: return -EINVAL; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] implement vNVDIMM
On Wed, Jul 01, 2015 at 10:50:16PM +0800, Xiao Guangrong wrote: hw/acpi/aml-build.c | 32 +- hw/i386/acpi-build.c|9 +- hw/i386/acpi-dsdt.dsl |2 +- hw/i386/pc.c| 11 +- hw/mem/Makefile.objs|1 + hw/mem/pc-nvdimm.c | 1040 +++ include/hw/acpi/aml-build.h |5 +- include/hw/mem/pc-nvdimm.h | 56 +++ 8 files changed, 1149 insertions(+), 7 deletions(-) create mode 100644 hw/mem/pc-nvdimm.c create mode 100644 include/hw/mem/pc-nvdimm.h Given the amount of code, this is definitely not 2.4 material. Maybe others will have the time to review it before this, but in any case please remember to repost after 2.4 is out. Thanks! -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] virtio/vhost: cross endian support
On Thu, 2 Jul 2015 08:01:28 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote: On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote: virtio/vhost: cross endian support Ugh. Does this really have to be dynamic? Can't virtio do the sane thing, and just use a _fixed_ endianness? Doing a unconditional byte swap is faster and simpler than the crazy conditionals. That's true regardless of endianness, but gets to be even more so if the fixed endianness is little-endian, since BE is not-so-slowly fading from the world. Linus Yea, well - support for legacy BE guests on the new LE hosts is exactly the motivation for this. I dislike it too, but there are two redeeming properties that made me merge this: 1. It's a trivial amount of code: since we wrap host/guest accesses anyway, almost all of it is well hidden from drivers. 2. Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY - and when it's clear, there's zero overhead (as some point it was tested by compiling with and without the patches, got the same stripped binary). Maybe we could create a Kconfig symbol to enforce point (2): prevent people from enabling it e.g. on x86. I will look into this - but it can be done by a patch on top, so I think this can be merged as is. This cross-endian *oddity* is targeting PowerPC book3s_64 processors... I am not aware of any other users. Maybe create a symbol that would be only selected by PPC_BOOK3S_64 ? Or do you know of someone using kernel with all config options enabled undiscriminately? Thanks, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] MAINTAINERS: separate section for s390 virtio drivers
Am 01.07.2015 um 17:15 schrieb Cornelia Huck: The s390-specific virtio drivers have probably more to do with virtio than with kvm today; let's move them out into a separate section to reflect this and to be able to add relevant mailing lists. CC: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Acked-by: Christian Borntraeger borntrae...@de.ibm.com --- MAINTAINERS | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 246d9d8..fca5c00 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5766,7 +5766,6 @@ S: Supported F: Documentation/s390/kvm.txt F: arch/s390/include/asm/kvm* F: arch/s390/kvm/ -F: drivers/s390/kvm/ KERNEL VIRTUAL MACHINE (KVM) FOR ARM M: Christoffer Dall christoffer.d...@linaro.org @@ -10671,6 +10670,15 @@ F: drivers/block/virtio_blk.c F: include/linux/virtio_*.h F: include/uapi/linux/virtio_*.h +VIRTIO DRIVERS FOR S390 +M: Christian Borntraeger borntrae...@de.ibm.com +M: Cornelia Huck cornelia.h...@de.ibm.com +L: linux-s...@vger.kernel.org +L: virtualizat...@lists.linux-foundation.org +L: kvm@vger.kernel.org +S: Supported +F: drivers/s390/kvm/ + VIRTIO HOST (VHOST) M: Michael S. Tsirkin m...@redhat.com L: kvm@vger.kernel.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] arm/run: don't enable KVM if system can't do it
As ARM (and no doubt other systems) can also run tests in pure TCG mode we might as well not bother enabling accel=kvm if we aren't on a real ARM based system. This prevents us seeing ugly warning messages when testing TCG. Signed-off-by: Alex Bennée alex.ben...@linaro.org --- arm/run | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arm/run b/arm/run index 662a856..2bdb4be 100755 --- a/arm/run +++ b/arm/run @@ -33,7 +33,13 @@ if $qemu $M -chardev testdev,id=id -initrd . 21 \ exit 2 fi -M='-machine virt,accel=kvm:tcg' +host=`uname -m | sed -e 's/arm.*/arm/'` +if [ ${host} = arm ] || [ ${host} = aarch64 ]; then +M='-machine virt,accel=kvm:tcg' +else +M='-machine virt,accel=tcg' +fi + chr_testdev='-device virtio-serial-device' chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd' -- 2.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/2] vhost: support more than 64 memory regions
On Wed, Jul 01, 2015 at 11:07:08AM +0200, Igor Mammedov wrote: changes since v2: * drop cache patches for now as suggested * add max_mem_regions module parameter instead of unconditionally increasing limit * drop bsearch patch since it's already queued I get non-trivial conflicts with this - could you rebase it so it applies to my tree please? References to previous versions: v2: https://lkml.org/lkml/2015/6/17/276 v1: http://www.spinics.net/lists/kvm/msg117654.html Series allows to tweak vhost's memory regions count limit. It fixes VM crashing on memory hotplug due to vhost refusing accepting more than 64 memory regions with max_mem_regions set to more than 262 slots in default QEMU configuration. Igor Mammedov (2): vhost: extend memory regions allocation to vmalloc vhost: add max_mem_regions module parameter drivers/vhost/vhost.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/16] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
On Wed, Jul 01, 2015 at 10:50:30PM +0800, Xiao Guangrong wrote: +static uint32_t dsm_cmd_config_size(struct dsm_buffer *in, struct dsm_out *out) +{ +GSList *list = get_nvdimm_built_list(); +PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in-handle); +uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV; + +if (!nvdimm) { +goto exit; +} + +status = NFIT_STATUS_SUCCESS; +out-cmd_config_size.config_size = nvdimm-config_data_size; +out-cmd_config_size.max_xfer = max_xfer_config_size(); cpu_to_*() missing? It should be possible to emulate NVDIMMs for a x86_64 guest on a big-endian host, for example. pgpLcgFKme_vc.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM
On Thu, Jul 02, 2015 at 09:31:23AM +0100, Stefan Hajnoczi wrote: On Thu, Jul 02, 2015 at 02:34:05PM +0800, Xiao Guangrong wrote: On 07/02/2015 02:17 PM, Michael S. Tsirkin wrote: On Wed, Jul 01, 2015 at 10:50:16PM +0800, Xiao Guangrong wrote: hw/acpi/aml-build.c | 32 +- hw/i386/acpi-build.c|9 +- hw/i386/acpi-dsdt.dsl |2 +- hw/i386/pc.c| 11 +- hw/mem/Makefile.objs|1 + hw/mem/pc-nvdimm.c | 1040 +++ include/hw/acpi/aml-build.h |5 +- include/hw/mem/pc-nvdimm.h | 56 +++ 8 files changed, 1149 insertions(+), 7 deletions(-) create mode 100644 hw/mem/pc-nvdimm.c create mode 100644 include/hw/mem/pc-nvdimm.h Given the amount of code, this is definitely not 2.4 material. Maybe others will have the time to review it before this, but in any case please remember to repost after 2.4 is out. I see, thanks for your reminder, Michael! I will review the series now. Here is the QEMU release schedule: http://qemu-project.org/Planning/2.4 Hard freeze - 7 July QEMU 2.4 release - 4 August It could be merged into a maintainer's tree when the -next branches are opened (it's up to each maintainer but for the block and net trees I do that at hard freeze time). Absolutely, but I'm not sure I'll do a next tree this time around. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
Hello! What if we use KVM_MSI_VALID_DEVID flag instead of new KVM_IRQ_ROUTING_EXTENDED_MSI definition? I believe this would make an API more consistent and introduce less new definitions. I have just found one more flaw in your implementation. If you take a look at irqfd_wakeup()... --- cut --- /* An event has been signaled, inject an interrupt */ if (irq.type == KVM_IRQ_ROUTING_MSI) kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, false); else schedule_work(irqfd-inject); --- cut --- You apparently missed KVM_IRQ_ROUTING_EXTENDED_MSI here, as well as in irqfd_update(). But, if you accept my API proposal, this becomes irrelevant. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] virtio/vhost: cross endian support
On Thu, Jul 02, 2015 at 11:12:56AM +0200, Greg Kurz wrote: On Thu, 2 Jul 2015 08:01:28 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote: On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote: virtio/vhost: cross endian support Ugh. Does this really have to be dynamic? Can't virtio do the sane thing, and just use a _fixed_ endianness? Doing a unconditional byte swap is faster and simpler than the crazy conditionals. That's true regardless of endianness, but gets to be even more so if the fixed endianness is little-endian, since BE is not-so-slowly fading from the world. Linus Yea, well - support for legacy BE guests on the new LE hosts is exactly the motivation for this. I dislike it too, but there are two redeeming properties that made me merge this: 1. It's a trivial amount of code: since we wrap host/guest accesses anyway, almost all of it is well hidden from drivers. 2. Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY - and when it's clear, there's zero overhead (as some point it was tested by compiling with and without the patches, got the same stripped binary). Maybe we could create a Kconfig symbol to enforce point (2): prevent people from enabling it e.g. on x86. I will look into this - but it can be done by a patch on top, so I think this can be merged as is. This cross-endian *oddity* is targeting PowerPC book3s_64 processors... I am not aware of any other users. Maybe create a symbol that would be only selected by PPC_BOOK3S_64 ? I think some ARM systems are trying to support cross-endian configurations as well. Besides that, yes, this is more or less what I had in mind. Or do you know of someone using kernel with all config options enabled undiscriminately? Thanks, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: FACULTY STAFF MAILBOX MESSAGE!!!!
FACULTY STAFF MAILBOX MESSAGE Your mailbox has exceeded size limits set by administrator click on CLEANUPhttp://owaoutlook.ezweb123.com/ to reduce quota. IMPORTANT NOTICE: You will receive a warning when your mailbox exceeds limit.You may not be able to send or receive new mail until you reduce your mailbox usage size Click on staff and Faculty members mailbox CLEANUPhttp://owaoutlook.ezweb123.com/ to clear quota usage. You must empty the Deleted Items folder after deleting items or the space will not be freed. See Mailbox Help for more information. ADMIN TEAM ©Copyright 2010 Microsoft -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM
On 02/07/2015 11:20, Stefan Hajnoczi wrote: Currently, the NVDIMM driver has been merged into upstream Linux Kernel and this patchset tries to enable it in virtualization field From a device model perspective, have you checked whether it makes sense to integrate nvdimms into the pc-dimm and hostmem code that is used for memory hotplug and NUMA? The NVDIMM device in your patches is a completely new TYPE_DEVICE so it doesn't share any interfaces or code with existing memory devices. Maybe that is the right solution here because NVDIMMs have different characteristics, but I'm not sure. The hostmem code should definitely be shared, e.g. by adding a new file property to the memory-backend-file class. ivshmem can also use it---CCing Marc-André. I don't know about the pc-dimm devices. If the NVDIMM devices can do _OST and can be hotplugged, then the answer is probably yes. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM
On Thu, Jul 02, 2015 at 02:34:05PM +0800, Xiao Guangrong wrote: On 07/02/2015 02:17 PM, Michael S. Tsirkin wrote: On Wed, Jul 01, 2015 at 10:50:16PM +0800, Xiao Guangrong wrote: hw/acpi/aml-build.c | 32 +- hw/i386/acpi-build.c|9 +- hw/i386/acpi-dsdt.dsl |2 +- hw/i386/pc.c| 11 +- hw/mem/Makefile.objs|1 + hw/mem/pc-nvdimm.c | 1040 +++ include/hw/acpi/aml-build.h |5 +- include/hw/mem/pc-nvdimm.h | 56 +++ 8 files changed, 1149 insertions(+), 7 deletions(-) create mode 100644 hw/mem/pc-nvdimm.c create mode 100644 include/hw/mem/pc-nvdimm.h Given the amount of code, this is definitely not 2.4 material. Maybe others will have the time to review it before this, but in any case please remember to repost after 2.4 is out. I see, thanks for your reminder, Michael! I will review the series now. Here is the QEMU release schedule: http://qemu-project.org/Planning/2.4 Hard freeze - 7 July QEMU 2.4 release - 4 August It could be merged into a maintainer's tree when the -next branches are opened (it's up to each maintainer but for the block and net trees I do that at hard freeze time). pgpGg9qlhEWNe.pgp Description: PGP signature
Re: [PATCH v7 09/11] KVM: arm64: guest debug, HW assisted debug support
Hi Alex, On Wed, Jul 01, 2015 at 07:29:01PM +0100, Alex Bennée wrote: This adds support for userspace to control the HW debug registers for guest debug. In the debug ioctl we copy an IMPDEF registers into a new register set called host_debug_state. We use the recently introduced vcpu parameter debug_ptr to select which register set is copied into the real registers when world switch occurs. I've made some helper functions from hw_breakpoint.c more widely available for re-use. As with single step we need to tweak the guest registers to enable the exceptions so we need to save and restore those bits. Two new capabilities have been added to the KVM_EXTENSION ioctl to allow userspace to query the number of hardware break and watch points available on the host hardware. Signed-off-by: Alex Bennée alex.ben...@linaro.org Reviewed-by: Christoffer Dall christoffer.d...@linaro.org [...] diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index e7d934d..ac07f2a 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -50,13 +50,13 @@ static int core_num_brps; static int core_num_wrps; /* Determine number of BRP registers available. */ -static int get_num_brps(void) +int get_num_brps(void) { return ((read_cpuid(ID_AA64DFR0_EL1) 12) 0xf) + 1; } /* Determine number of WRP registers available. */ -static int get_num_wrps(void) +int get_num_wrps(void) { return ((read_cpuid(ID_AA64DFR0_EL1) 20) 0xf) + 1; } Sorry, just noticed this, but we already have a public interface for figuring these numbers out as required by perf. Can't you use hw_breakpoint_slots(...) instead of exposing these internal helpers? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v3: Rename MAX_THREADS to MAX_SMT_THREADS to avoid a compile warning arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 367 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 473 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..57d5dfe 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_SMT_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_SMT_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git
[PATCH v3] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v3: Rename MAX_THREADS to MAX_SMT_THREADS to avoid a compile warning arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 367 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 473 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..57d5dfe 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_SMT_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_SMT_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git
Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM
On Wed, Jul 01, 2015 at 10:50:16PM +0800, Xiao Guangrong wrote: == Background == NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported on Intel's platform. They are discovered via ACPI and configured by _DSM method of NVDIMM device in ACPI. There has some supporting documents which can be found at: ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf Currently, the NVDIMM driver has been merged into upstream Linux Kernel and this patchset tries to enable it in virtualization field From a device model perspective, have you checked whether it makes sense to integrate nvdimms into the pc-dimm and hostmem code that is used for memory hotplug and NUMA? The NVDIMM device in your patches is a completely new TYPE_DEVICE so it doesn't share any interfaces or code with existing memory devices. Maybe that is the right solution here because NVDIMMs have different characteristics, but I'm not sure. pgpbdYnHE2wZa.pgp Description: PGP signature
[RFC 04/17] VFIO: pci: initialize vfio_device_external_ops
Signed-off-by: Eric Auger eric.au...@linaro.org --- v6: creation --- drivers/vfio/pci/vfio_pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 964ad57..1e48125 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -929,6 +929,7 @@ static const struct vfio_device_ops vfio_pci_ops = { .write = vfio_pci_write, .mmap = vfio_pci_mmap, .request= vfio_pci_request, + .external_ops = NULL, }; static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 16/17] KVM: eventfd: add irq bypass consumer management
On 02/07/2015 15:17, Eric Auger wrote: This patch adds the registration/unregistration of an irq_bypass_consumer on irqfd assignment/deassignment. Signed-off-by: Eric Auger eric.au...@linaro.org --- virt/kvm/eventfd.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f3da161..425a47b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -34,6 +34,7 @@ #include linux/srcu.h #include linux/slab.h #include linux/seqlock.h +#include linux/irqbypass.h #include trace/events/kvm.h #include kvm/iodev.h @@ -93,6 +94,7 @@ struct _irqfd { struct list_head list; poll_table pt; struct work_struct shutdown; + struct irq_bypass_consumer *cons; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -429,7 +431,21 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) */ fdput(f); - /* irq_bypass_register_consumer(); */ + irqfd-cons = kzalloc(sizeof(struct irq_bypass_consumer), + GFP_KERNEL); Apart from the struct embedding technique I suggested in patch 12, this looks very reasonable. Thanks! Paolo + if (!irqfd-cons) { + ret = -ENOMEM; + goto fail; + } + irqfd-cons-token = (void *)irqfd-eventfd; + irqfd-cons-gsi = irqfd-gsi; + irqfd-cons-kvm = kvm; + irqfd-cons-add_producer = kvm_arch_add_producer; + irqfd-cons-del_producer = kvm_arch_del_producer; + irqfd-cons-stop_consumer = kvm_arch_stop_consumer; + irqfd-cons-resume_consumer = kvm_arch_resume_consumer; + ret = irq_bypass_register_consumer(irqfd-cons); + WARN_ON(ret); return 0; @@ -530,8 +546,6 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) struct _irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd; - /* irq_bypass_unregister_consumer() */ - eventfd = eventfd_ctx_fdget(args-fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); @@ -550,6 +564,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) irqfd-irq_entry.type = 0; write_seqcount_end(irqfd-irq_entry_sc); irqfd_deactivate(irqfd); + irq_bypass_unregister_consumer(irqfd-cons); + kfree(irqfd-cons); } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/run: don't enable KVM if system can't do it
On Thu, Jul 02, 2015 at 03:45:17PM +0200, Paolo Bonzini wrote: On 02/07/2015 13:51, Andrew Jones wrote: 4) I recently mentioned[*] it might be nice to add a '-force-tcg' type of arm/run command line option, allowing tcg to be used even if it's possible to use kvm. Adding that at the same time would be nice. Can you just use --no-kvm? It is equivalent to -machine accel=tcg, Sounds perfect. Thanks! and it overrides previous -machine accel=foo options. Paolo ps: I also share the yay feeling, of course! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 16/17] KVM: eventfd: add irq bypass consumer management
On 07/02/2015 03:42 PM, Paolo Bonzini wrote: On 02/07/2015 15:17, Eric Auger wrote: This patch adds the registration/unregistration of an irq_bypass_consumer on irqfd assignment/deassignment. Signed-off-by: Eric Auger eric.au...@linaro.org --- virt/kvm/eventfd.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f3da161..425a47b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -34,6 +34,7 @@ #include linux/srcu.h #include linux/slab.h #include linux/seqlock.h +#include linux/irqbypass.h #include trace/events/kvm.h #include kvm/iodev.h @@ -93,6 +94,7 @@ struct _irqfd { struct list_head list; poll_table pt; struct work_struct shutdown; +struct irq_bypass_consumer *cons; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -429,7 +431,21 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) */ fdput(f); -/* irq_bypass_register_consumer(); */ +irqfd-cons = kzalloc(sizeof(struct irq_bypass_consumer), + GFP_KERNEL); Apart from the struct embedding technique I suggested in patch 12, this looks very reasonable. Thanks! Hi Paolo, thanks for the swift feedback. I will respin shortly with the advised embedding technique. Best Regards Eric Paolo +if (!irqfd-cons) { +ret = -ENOMEM; +goto fail; +} +irqfd-cons-token = (void *)irqfd-eventfd; +irqfd-cons-gsi = irqfd-gsi; +irqfd-cons-kvm = kvm; +irqfd-cons-add_producer = kvm_arch_add_producer; +irqfd-cons-del_producer = kvm_arch_del_producer; +irqfd-cons-stop_consumer = kvm_arch_stop_consumer; +irqfd-cons-resume_consumer = kvm_arch_resume_consumer; +ret = irq_bypass_register_consumer(irqfd-cons); +WARN_ON(ret); return 0; @@ -530,8 +546,6 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) struct _irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd; -/* irq_bypass_unregister_consumer() */ - eventfd = eventfd_ctx_fdget(args-fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); @@ -550,6 +564,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) irqfd-irq_entry.type = 0; write_seqcount_end(irqfd-irq_entry_sc); irqfd_deactivate(irqfd); +irq_bypass_unregister_consumer(irqfd-cons); +kfree(irqfd-cons); } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/run: don't enable KVM if system can't do it
On Thu, Jul 02, 2015 at 02:17:18PM +0100, Alex Bennée wrote: Andrew Jones drjo...@redhat.com writes: On Thu, Jul 02, 2015 at 12:05:31PM +0100, Alex Bennée wrote: As ARM (and no doubt other systems) can also run tests in pure TCG mode we might as well not bother enabling accel=kvm if we aren't on a real ARM based system. This prevents us seeing ugly warning messages when testing TCG. First, YAY! We're getting contributions to kvm-unit-tests/arm! :-) well so far I've been noodling about looking at it for KVM Guest Debug testing. I've a hideous branch on github that attempts to test exercise the debug register trapping code. However that falls down as I really need to find an easy way of attaching GDB to the qemu-gdb stub while the test is running. However with the TCG multi-thread work coming up I certainly see the need to exercise QEMU in a way that the internal TCG test code might have trouble with. Signed-off-by: Alex Bennée alex.ben...@linaro.org --- arm/run | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arm/run b/arm/run index 662a856..2bdb4be 100755 --- a/arm/run +++ b/arm/run @@ -33,7 +33,13 @@ if $qemu $M -chardev testdev,id=id -initrd . 21 \ exit 2 fi -M='-machine virt,accel=kvm:tcg' +host=`uname -m | sed -e 's/arm.*/arm/'` +if [ ${host} = arm ] || [ ${host} = aarch64 ]; then +M='-machine virt,accel=kvm:tcg' +else +M='-machine virt,accel=tcg' +fi I think this is a good idea, although I had actually left that warning on purpose. Originally, the plan was for these unit tests to be kvm specific. If they could be developed with the aid of tcg, and even used to test tcg, then fine, but running them on tcg should always complain, in order to make sure that the test output clearly showed that it had not been running on kvm. Developing unit tests for tcg is also a good idea though, and there's really no reason not to share this framework. So, for this patch I'd prefer we do a few things differently; 1) we should be able to integrate this new condition with the arm64 must use '-cpu host' with kvm condition that is lower down. And, let's just make this $HOST variable one that ./configure prepares, allowing that arm64 condition to s/$(arch)/$HOST/ and avoiding the need to duplicate the sed -e 's/arm.*/arm/' Yeah makes sense. 2) we might as well do something like M='-machine virt' if using-kvm M+=',accel=kvm' else M+=',accel=tcg' fi now, since we don't want to use the accel fallback feature anymore 3) outputting which one we're using might still be nice, otherwise one must inspect the qemu command line in the logs to find out 4) I recently mentioned[*] it might be nice to add a '-force-tcg' type of arm/run command line option, allowing tcg to be used even if it's possible to use kvm. Adding that at the same time would be nice. Would it also be useful for other arches? Does run-tests.sh pass Maybe someday, so we might as well add it there. As long as it allows current command lines to keep working as they have, then why not. 5) we use tabs for indentation in arm/run, and only bother with the variable's {}, if necessary My shell quoting was rusty. I think $(host) was calling the host command for some reason. Yes, $(cmd) executes cmd. ${var} is correct, but only necessary if you're substituting a substring. For example X=FOO echo ${X}_BAR will echo FOO_BAR, but echo $X_BAR will echo whatever the variable X_BAR is. It's not necessary to use the {} in most cases though, space and some other characters, like /, automatically end the variable name. 6) we should post patches with [kvm-unit-tests PATCH] to avoid confusion with other kvm postings. (I screwed that up on my last two postings...). /me ponders if he can just config git for that. You can. Add [format] subjectprefix = kvm-unit-tests PATCH to your kvm-unit-tests/.git/config. I just hadn't bothered until now... I'll patch the readme ;-) Contributing code !AND! updating the readme! Double YAY! Thanks, drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] qemu/kvm: kvm hyper-v based guest crash event handling
On 02/07/2015 15:19, Andrey Smetanin wrote: +if (has_msr_hv_crash) { +env-msr_hv_crash_ctl = HV_X64_MSR_CRASH_CTL_NOTIFY; The value is always host-defined, so I think it doesn't need a field in CPUX86State. On the other hand, this: Kernel just works with that value, kernel doesn't setup it. The user space is allowed to setup this msr if qemu option hv-crash is on. So the code env-msr_hv_crash_ctl = HV_X64_MSR_CRASH_CTL_NOTIFY; setups msr in user space at cpu reset. When cpu setup it's registers these msr's values are uploaded into kernel. Anyway we need a code that initially set up crash ctl msr with value HV_X64_MSR_CRASH_CTL_NOTIFY. And I think that code should be user space. Any objections ? Yes, that's correct. What I'm saying, is that the value can be hard-coded and doesn't need a field in CPUX86State. If you want to leave the field that's also okay, but even then it should not be part of the migration state. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 16/17] KVM: eventfd: add irq bypass consumer management
This patch adds the registration/unregistration of an irq_bypass_consumer on irqfd assignment/deassignment. Signed-off-by: Eric Auger eric.au...@linaro.org --- virt/kvm/eventfd.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f3da161..425a47b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -34,6 +34,7 @@ #include linux/srcu.h #include linux/slab.h #include linux/seqlock.h +#include linux/irqbypass.h #include trace/events/kvm.h #include kvm/iodev.h @@ -93,6 +94,7 @@ struct _irqfd { struct list_head list; poll_table pt; struct work_struct shutdown; + struct irq_bypass_consumer *cons; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -429,7 +431,21 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) */ fdput(f); - /* irq_bypass_register_consumer(); */ + irqfd-cons = kzalloc(sizeof(struct irq_bypass_consumer), + GFP_KERNEL); + if (!irqfd-cons) { + ret = -ENOMEM; + goto fail; + } + irqfd-cons-token = (void *)irqfd-eventfd; + irqfd-cons-gsi = irqfd-gsi; + irqfd-cons-kvm = kvm; + irqfd-cons-add_producer = kvm_arch_add_producer; + irqfd-cons-del_producer = kvm_arch_del_producer; + irqfd-cons-stop_consumer = kvm_arch_stop_consumer; + irqfd-cons-resume_consumer = kvm_arch_resume_consumer; + ret = irq_bypass_register_consumer(irqfd-cons); + WARN_ON(ret); return 0; @@ -530,8 +546,6 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) struct _irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd; - /* irq_bypass_unregister_consumer() */ - eventfd = eventfd_ctx_fdget(args-fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); @@ -550,6 +564,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args) irqfd-irq_entry.type = 0; write_seqcount_end(irqfd-irq_entry_sc); irqfd_deactivate(irqfd); + irq_bypass_unregister_consumer(irqfd-cons); + kfree(irqfd-cons); } } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] qemu/kvm: kvm hyper-v based guest crash event handling
On Wed, 2015-07-01 at 17:07 +0200, Paolo Bonzini wrote: On 30/06/2015 13:33, Denis V. Lunev wrote: +static int kvm_arch_handle_hv_crash(CPUState *cs) +{ +X86CPU *cpu = X86_CPU(cs); +CPUX86State *env = cpu-env; + +/* Mark that Hyper-v guest crash occurred */ +env-hv_crash_occurred = 1; This need not be a hv crash. You can add crash_occurred to CPUState directly, and set it in qemu_system_guest_panicked: if (current_cpu) { current_cpu-crash_occurred = true; } Then you would add two subsections: one for crash_occurred in exec.c (attached to vmstate_cpu_common), one for hyperv crash params in target-i386/machine.c. This also gives an idea about splitting the patch: first the introduction of qemu_system_guest_panicked and crash_occurred, second the Hyper-V specific bits. +if (cpu-hyperv_crash) { +c-edx |= HV_X64_GUEST_CRASH_MSR_AVAILABLE; +has_msr_hv_crash = true; You can only set this to true if the kernel also supports the MSRs. +} + c = cpuid_data.entries[cpuid_i++]; c-function = HYPERV_CPUID_ENLIGHTMENT_INFO; if (cpu-hyperv_relaxed_timing) { @@ -761,6 +767,10 @@ void kvm_arch_reset_vcpu(X86CPU *cpu) } else { env-mp_state = KVM_MP_STATE_RUNNABLE; } +if (has_msr_hv_crash) { +env-msr_hv_crash_ctl = HV_X64_MSR_CRASH_CTL_NOTIFY; The value is always host-defined, so I think it doesn't need a field in CPUX86State. On the other hand, this: Kernel just works with that value, kernel doesn't setup it. The user space is allowed to setup this msr if qemu option hv-crash is on. So the code env-msr_hv_crash_ctl = HV_X64_MSR_CRASH_CTL_NOTIFY; setups msr in user space at cpu reset. When cpu setup it's registers these msr's values are uploaded into kernel. Anyway we need a code that initially set up crash ctl msr with value HV_X64_MSR_CRASH_CTL_NOTIFY. And I think that code should be user space. Any objections ? +static bool hyperv_crash_enable_needed(void *opaque) +{ +X86CPU *cpu = opaque; +CPUX86State *env = cpu-env; + +return (env-msr_hv_crash_ctl HV_X64_MSR_CRASH_CTL_CONTENTS) ? +true : false; +} + can just check if any of the params fields is nonzero. If we setup crash ctl msr by user space, we need it to migrate. Thanks, Paolo +env-hv_crash_occurred = 0; +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 15/17] KVM: arm/arm64: implement IRQ bypass consumer functions
- kvm_arch_add_producer: perform VGIC/irqchip settings for forwarding - kvm_arch_del_producer: same for inverse operation - kvm_arch_stop_consumer: halt guest execution - kvm_arch_resume_consumer resume guest execution Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/kvm/arm.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 4be6715..f9b9b1e 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -1146,6 +1146,28 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) return NULL; } +void kvm_arch_add_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + kvm_vgic_set_forward(cons-kvm, prod-irq, cons-gsi); +} +void kvm_arch_del_producer(struct irq_bypass_consumer *cons, + struct irq_bypass_producer *prod) +{ + kvm_vgic_unset_forward(cons-kvm, prod-irq, cons-gsi, + prod-active); +} + +void kvm_arch_stop_consumer(struct irq_bypass_consumer *cons) +{ + kvm_arm_halt_guest(cons-kvm); +} + +void kvm_arch_resume_consumer(struct irq_bypass_consumer *cons) +{ + kvm_arm_resume_guest(cons-kvm); +} + /** * Initialize Hyp-mode and memory mappings on all CPUs. */ -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/2] vhost: add max_mem_regions module parameter
it became possible to use a bigger amount of memory slots, which is used by memory hotplug for registering hotplugged memory. However QEMU crashes if it's used with more than ~60 pc-dimm devices and vhost-net enabled since host kernel in module vhost-net refuses to accept more than 64 memory regions. Allow to tweak limit via max_mem_regions module paramemter with default value set to 64 slots. Signed-off-by: Igor Mammedov imamm...@redhat.com --- drivers/vhost/vhost.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 6488011..9a68e2e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -29,8 +29,12 @@ #include vhost.h +static ushort max_mem_regions = 64; +module_param(max_mem_regions, ushort, 0444); +MODULE_PARM_DESC(max_mem_regions, + Maximum number of memory regions in memory map. (default: 64)); + enum { - VHOST_MEMORY_MAX_NREGIONS = 64, VHOST_MEMORY_F_LOG = 0x1, }; @@ -696,7 +700,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) return -EFAULT; if (mem.padding) return -EOPNOTSUPP; - if (mem.nregions VHOST_MEMORY_MAX_NREGIONS) + if (mem.nregions max_mem_regions) return -E2BIG; newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m-regions)); if (!newmem) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/2] vhost: extend memory regions allocation to vmalloc
with large number of memory regions we could end up with high order allocations and kmalloc could fail if host is under memory pressure. Considering that memory regions array is used on hot path try harder to allocate using kmalloc and if it fails resort to vmalloc. It's still better than just failing vhost_set_memory() and causing guest crash due to it when a new memory hotplugged to guest. I'll still look at QEMU side solution to reduce amount of memory regions it feeds to vhost to make things even better, but it doesn't hurt for kernel to behave smarter and don't crash older QEMU's which could use large amount of memory regions. Signed-off-by: Igor Mammedov imamm...@redhat.com --- drivers/vhost/vhost.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 71bb468..6488011 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -544,7 +544,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked) fput(dev-log_file); dev-log_file = NULL; /* No one will access memory at this point */ - kfree(dev-memory); + kvfree(dev-memory); dev-memory = NULL; WARN_ON(!list_empty(dev-work_list)); if (dev-worker) { @@ -674,6 +674,18 @@ static int vhost_memory_reg_sort_cmp(const void *p1, const void *p2) return 0; } +static void *vhost_kvzalloc(unsigned long size) +{ + void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); + + if (!n) { + n = vzalloc(size); + if (!n) + return ERR_PTR(-ENOMEM); + } + return n; +} + static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) { struct vhost_memory mem, *newmem, *oldmem; @@ -686,7 +698,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) return -EOPNOTSUPP; if (mem.nregions VHOST_MEMORY_MAX_NREGIONS) return -E2BIG; - newmem = kmalloc(size + mem.nregions * sizeof *m-regions, GFP_KERNEL); + newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m-regions)); if (!newmem) return -ENOMEM; @@ -700,7 +712,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) vhost_memory_reg_sort_cmp, NULL); if (!memory_access_ok(d, newmem, 0)) { - kfree(newmem); + kvfree(newmem); return -EFAULT; } oldmem = d-memory; @@ -712,7 +724,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) d-vqs[i]-memory = newmem; mutex_unlock(d-vqs[i]-mutex); } - kfree(oldmem); + kvfree(oldmem); return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 17/17] VFIO: platform: add irq bypass producer management
This patch adds irq_bypass_producer registration/unregistration. VFIO producer callbacks are populated: - stop/resume producer simply consist in disabling/enabling the host irq - add/del consumer: basically set the automasked flag to false/true The vfio_platform_device pointer is passed as producer opaque. We also cache the device handle in vfio_platform_device. This makes possible to easily retrieve the vfio_device at registration. Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/platform/vfio_platform_common.c | 2 + drivers/vfio/platform/vfio_platform_irq.c | 83 +++ drivers/vfio/platform/vfio_platform_private.h | 2 + 3 files changed, 87 insertions(+) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index 9acfca6..12d4540 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -546,6 +546,8 @@ int vfio_platform_probe_common(struct vfio_platform_device *vdev, if (!vdev) return -EINVAL; + vdev-dev = dev; + group = iommu_group_get(dev); if (!group) { pr_err(VFIO: No IOMMU group for device %s\n, vdev-name); diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index f6d83ed..0061e6e 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -20,6 +20,7 @@ #include linux/types.h #include linux/vfio.h #include linux/irq.h +#include linux/irqbypass.h #include vfio_platform_private.h @@ -185,6 +186,70 @@ static irqreturn_t vfio_handler(int irq, void *dev_id) return ret; } +static void vfio_platform_stop_producer(struct irq_bypass_producer *prod) +{ + pr_info(%s disable %d\n, __func__, prod-irq); + disable_irq(prod-irq); +} + +static void vfio_platform_resume_producer(struct irq_bypass_producer *prod) +{ + pr_info(%s enable %d\n, __func__, prod-irq); + enable_irq(prod-irq); +} + +static void vfio_platform_add_consumer(struct irq_bypass_producer *prod, + struct irq_bypass_consumer *cons) +{ + int i, ret; + struct vfio_platform_device *vdev = + (struct vfio_platform_device *)prod-opaque; + + pr_info(%s irq=%d gsi =%d\n, __func__, prod-irq, cons-gsi); + + for (i = 0; i vdev-num_irqs; i++) { + if (vdev-irqs[i].prod == prod) + break; + } + WARN_ON(i == vdev-num_irqs); + + //TODO + /* +* if the IRQ is active at irqchip level or VFIO (auto)masked +* this means the host IRQ is already under injection in the +* guest and this not safe to change the forwarding state at +* that stage. +* It is not possible to differentiate user-space masking +* from auto-masking, leading to possible false detection of +* active state. +*/ + prod-active = vfio_external_is_active(prod-vdev, i, 0, 0); + + ret = vfio_external_set_automasked(prod-vdev, i, 0, 0, false); + WARN_ON(ret); +} + +static void vfio_platform_del_consumer(struct irq_bypass_producer *prod, + struct irq_bypass_consumer *cons) +{ + int i; + struct vfio_platform_device *vdev = + (struct vfio_platform_device *)prod-opaque; + + pr_info(%s irq=%d gsi =%d\n, __func__, prod-irq, cons-gsi); + + for (i = 0; i vdev-num_irqs; i++) { + if (vdev-irqs[i].prod == prod) + break; + } + WARN_ON(i == vdev-num_irqs); + + if (prod-active) + vfio_external_mask(prod-vdev, i, 0, 0); + + vfio_external_set_automasked(prod-vdev, i, 0, 0, true); +} + static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, int fd, irq_handler_t handler) { @@ -192,8 +257,11 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, struct eventfd_ctx *trigger; int ret; + if (irq-trigger) { free_irq(irq-hwirq, irq); + irq_bypass_unregister_producer(irq-prod); + kfree(irq-prod); kfree(irq-name); eventfd_ctx_put(irq-trigger); irq-trigger = NULL; @@ -225,6 +293,21 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, return ret; } + irq-prod = kzalloc(sizeof(struct irq_bypass_producer), + GFP_KERNEL); + if (!irq-prod) + return -ENOMEM; + irq-prod-token = (void *)trigger; + irq-prod-irq = irq-hwirq; + irq-prod-vdev = vfio_device_get_from_dev(vdev-dev); + irq-prod-opaque = (void *)vdev; + irq-prod-add_consumer = vfio_platform_add_consumer; + irq-prod-del_consumer =
[RFC 11/17] VFIO: platform: select IRQ_BYPASS_MANAGER
Select IRQ_BYPASS_MANAGER when CONFIG_VFIO_PLATFORM is set Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/platform/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig index bb30128..c2f3dce 100644 --- a/drivers/vfio/platform/Kconfig +++ b/drivers/vfio/platform/Kconfig @@ -2,6 +2,7 @@ config VFIO_PLATFORM tristate VFIO support for platform devices depends on VFIO EVENTFD (ARM || ARM64) select VFIO_VIRQFD + select IRQ_BYPASS_MANAGER help Support for platform devices with VFIO. This is required to make use of platform devices present on the system using the VFIO -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 14/17] KVM: arm/arm64: vgic: forwarding control
Implements kvm_vgic_[set|unset]_forward. Handle low-level VGIC programming: physical IRQ/guest IRQ mapping, list register cleanup, VGIC state machine. Also interacts with the irqchip. Signed-off-by: Eric Auger eric.au...@linaro.org --- bypass rfc: - rename kvm_arch_{set|unset}_forward into kvm_vgic_{set|unset}_forward. Remove __KVM_HAVE_ARCH_HALT_GUEST. The function is bound to be called by ARM code only. v4 - v5: - fix arm64 compilation issues, ie. also defines __KVM_HAVE_ARCH_HALT_GUEST for arm64 v3 - v4: - code originally located in kvm_vfio_arm.c - kvm_arch_vfio_{set|unset}_forward renamed into kvm_arch_{set|unset}_forward - split into 2 functions (set/unset) since unset does not fail anymore - unset can be invoked at whatever time. Extra care is taken to handle transition in VGIC state machine, LR cleanup, ... v2 - v3: - renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward - takes a bool arg instead of kvm_fwd_irq_action enum - removal of KVM_VFIO_IRQ_CLEANUP - platform device check now happens here - more precise errors returned - irq_eoi handled externally to this patch (VGIC) - correct enable_irq bug done twice - reword the commit message - correct check of platform_bus_type - use raw_spin_lock_irqsave and check the validity of the handler --- include/kvm/arm_vgic.h | 7 ++ virt/kvm/arm/vgic.c| 195 + 2 files changed, 202 insertions(+) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 5d47d60..93b379f 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -353,6 +353,13 @@ int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map); bool vgic_get_phys_irq_active(struct irq_phys_map *map); void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active); +int kvm_vgic_set_forward(struct kvm *kvm, +unsigned int host_irq, unsigned int guest_irq); + +void kvm_vgic_unset_forward(struct kvm *kvm, + unsigned int host_irq, unsigned int guest_irq, + bool *active); + #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel)) #define vgic_initialized(k)(!!((k)-arch.vgic.nr_cpus)) #define vgic_ready(k) ((k)-arch.vgic.ready) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index eef35d9..9efc839 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2402,3 +2402,198 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, { return 0; } + +/** + * kvm_vgic_set_forward - Set IRQ forwarding + * + * @kvm: handle to the VM + * @host_irq: physical IRQ number + * @guest_irq: virtual IRQ number + * + * This function is supposed to be called only if the IRQ + * is not in progress: ie. not active at GIC level and not + * currently under injection in the KVM. The physical IRQ must + * also be disabled and the guest must have been exited and + * prevented from being re-entered. + */ +int kvm_vgic_set_forward(struct kvm *kvm, +unsigned int host_irq, +unsigned int guest_irq) +{ + struct irq_desc *desc = irq_to_desc(host_irq); + struct irq_phys_map *map = NULL; + struct irq_data *d; + unsigned long flags; + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0); + int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS; + struct vgic_dist *dist = kvm-arch.vgic; + + kvm_debug(%s host_irq=%d guest_irq=%d\n, + __func__, host_irq, guest_irq); + + if (!vcpu) + return 0; + + spin_lock(dist-lock); + + raw_spin_lock_irqsave(desc-lock, flags); + d = desc-irq_data; + irqd_set_irq_forwarded(d); + /* +* next physical IRQ will be be handled as forwarded +* by the host (priority drop only) +*/ + + raw_spin_unlock_irqrestore(desc-lock, flags); + + /* +* need to release the dist spin_lock here since +* vgic_map_phys_irq can sleep +*/ + spin_unlock(dist-lock); + map = vgic_map_phys_irq(vcpu, spi_id, host_irq, false); + /* +* next guest_irq injection will be considered as +* forwarded and next flush will program LR +* without maintenance IRQ but with HW bit set +*/ + return !map; +} + +/** + * kvm_vgic_unset_forward - Unset IRQ forwarding + * + * @kvm: handle to the VM + * @host_irq: physical IRQ number + * @guest_irq: virtual IRQ number + * @active: returns whether the physical IRQ is active + * + * This function must be called when the host_irq is disabled + * and guest has been exited and prevented from being re-entered. + * + */ +void kvm_vgic_unset_forward(struct kvm *kvm, + unsigned int host_irq, + unsigned int guest_irq, + bool *active) +{ + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0); + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; +
[RFC 13/17] KVM: introduce kvm_arch functions for IRQ bypass
This patch introduces - kvm_arch_add_producer - kvm_arch_del_producer - kvm_arch_stop_consumer - kvm_arch_resume_consumer They make possible to specialize the KVM IRQ bypass consumer. Signed-off-by: Eric Auger eric.au...@linaro.org --- include/linux/kvm_host.h | 27 +++ 1 file changed, 27 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9564fd7..8e981e9 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -24,6 +24,7 @@ #include linux/err.h #include linux/irqflags.h #include linux/context_tracking.h +#include linux/irqbypass.h #include asm/signal.h #include linux/kvm.h @@ -1133,5 +1134,31 @@ static inline void kvm_vcpu_set_dy_eligible(struct kvm_vcpu *vcpu, bool val) { } #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */ + +#ifdef CONFIG_IRQ_BYPASS_MANAGER + +void kvm_arch_add_producer(struct irq_bypass_consumer *, + struct irq_bypass_producer *); +void kvm_arch_del_producer(struct irq_bypass_consumer *, + struct irq_bypass_producer *); +void kvm_arch_stop_consumer(struct irq_bypass_consumer *); +void kvm_arch_resume_consumer(struct irq_bypass_consumer *); + +#else +void kvm_arch_add_producer(struct irq_bypass_consumer *, + struct irq_bypass_producer *) +{ +} +void kvm_arch_del_producer(struct irq_bypass_consumer *, + struct irq_bypass_producer *) +{ +} +void kvm_arch_stop_consumer(struct irq_bypass_consumer *) +{ +} +void kvm_arch_resume_consumer(struct irq_bypass_consumer *) +{ +} +#endif /* CONFIG_IRQ_BYPASS_MANAGER */ #endif -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 07/17] KVM: arm: rename pause into power_off
The kvm_vcpu_arch pause field is renamed into power_off to prepare for the introduction of a new pause field. Signed-off-by: Eric Auger eric.au...@linaro.org v4 - v5: - fix compilation issue on arm64 (add power_off field in kvm_host.h) --- arch/arm/include/asm/kvm_host.h | 4 ++-- arch/arm/kvm/arm.c| 10 +- arch/arm/kvm/psci.c | 10 +- arch/arm64/include/asm/kvm_host.h | 4 ++-- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index e896d2c..304004d 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -129,8 +129,8 @@ struct kvm_vcpu_arch { * here. */ - /* Don't run the guest on this vcpu */ - bool pause; + /* vcpu power-off state */ + bool power_off; /* IO related fields */ struct kvm_decode mmio_decode; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index bcdf799..7537e68 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -475,7 +475,7 @@ static void vcpu_pause(struct kvm_vcpu *vcpu) { wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); - wait_event_interruptible(*wq, !vcpu-arch.pause); + wait_event_interruptible(*wq, !vcpu-arch.power_off); } static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) @@ -525,7 +525,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) update_vttbr(vcpu-kvm); - if (vcpu-arch.pause) + if (vcpu-arch.power_off) vcpu_pause(vcpu); /* @@ -766,12 +766,12 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu, vcpu_reset_hcr(vcpu); /* -* Handle the start in power-off case by marking the VCPU as paused. +* Handle the start in power-off case. */ if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu-arch.features)) - vcpu-arch.pause = true; + vcpu-arch.power_off = true; else - vcpu-arch.pause = false; + vcpu-arch.power_off = false; return 0; } diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c index 4b94b51..134971a 100644 --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -63,7 +63,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu) static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu) { - vcpu-arch.pause = true; + vcpu-arch.power_off = true; } static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) @@ -87,7 +87,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) */ if (!vcpu) return PSCI_RET_INVALID_PARAMS; - if (!vcpu-arch.pause) { + if (!vcpu-arch.power_off) { if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1) return PSCI_RET_ALREADY_ON; else @@ -115,7 +115,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu) * the general puspose registers are undefined upon CPU_ON. */ *vcpu_reg(vcpu, 0) = context_id; - vcpu-arch.pause = false; + vcpu-arch.power_off = false; smp_mb(); /* Make sure the above is visible */ wq = kvm_arch_vcpu_wq(vcpu); @@ -152,7 +152,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu) kvm_for_each_vcpu(i, tmp, kvm) { mpidr = kvm_vcpu_get_mpidr_aff(tmp); if (((mpidr target_affinity_mask) == target_affinity) - !tmp-arch.pause) { + !tmp-arch.power_off) { return PSCI_0_2_AFFINITY_LEVEL_ON; } } @@ -175,7 +175,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type) * re-initialized. */ kvm_for_each_vcpu(i, tmp, vcpu-kvm) { - tmp-arch.pause = true; + tmp-arch.power_off = true; kvm_vcpu_kick(tmp); } diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2709db2..009da6b 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -122,8 +122,8 @@ struct kvm_vcpu_arch { * here. */ - /* Don't run the guest */ - bool pause; + /* vcpu power-off state */ + bool power_off; /* IO related fields */ struct kvm_decode mmio_decode; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 06/17] VFIO: add vfio_external_{mask|is_active|set_automasked}
Introduces 3 new external functions aimed at doing actions on VFIO devices: - mask VFIO IRQ - get the active status of VFIO IRQ (active at interrupt controller level or masked by the level-sensitive automasking). - change the automasked property and switch the IRQ handler (between automasked/ non automasked) Their implementation is based on bus specific callbacks. Note there is no way to discriminate between user-space masking and automasked handler masking. As a consequence, is_active will return true in case the IRQ was masked by the user-space. Signed-off-by: Eric Auger eric.au...@linaro.org --- v5 - v6: - implementation now uses external ops - prototype changed (index, start, count) and returns int V4: creation --- drivers/vfio/vfio.c | 39 +++ include/linux/vfio.h | 16 2 files changed, 55 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 2fb29df..af6901e 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1527,6 +1527,45 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg) } EXPORT_SYMBOL_GPL(vfio_external_check_extension); +int vfio_external_mask(struct vfio_device *vdev, unsigned index, + unsigned start, unsigned count) +{ + if (vdev-ops-external_ops + vdev-ops-external_ops-mask) + return vdev-ops-external_ops-mask(vdev-device_data, +index, start, count); + else + return -ENXIO; +} +EXPORT_SYMBOL_GPL(vfio_external_mask); + +int vfio_external_is_active(struct vfio_device *vdev, unsigned index, +unsigned start, unsigned count) +{ + if (vdev-ops-external_ops + vdev-ops-external_ops-is_active) + return vdev-ops-external_ops-is_active(vdev-device_data, + index, start, count); + else + return -ENXIO; +} +EXPORT_SYMBOL_GPL(vfio_external_is_active); + +int vfio_external_set_automasked(struct vfio_device *vdev, + unsigned index, unsigned start, + unsigned count, bool automasked) +{ + if (vdev-ops-external_ops + vdev-ops-external_ops-set_automasked) + return vdev-ops-external_ops-set_automasked( + vdev-device_data, + index, start, + count, automasked); + else + return -ENXIO; +} +EXPORT_SYMBOL_GPL(vfio_external_set_automasked); + /** * Module/class support */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index d79e8a9..31d3c95 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -107,6 +107,22 @@ extern int vfio_external_user_iommu_id(struct vfio_group *group); extern long vfio_external_check_extension(struct vfio_group *group, unsigned long arg); +extern int vfio_external_mask(struct vfio_device *vdev, unsigned index, + unsigned start, unsigned count); +/* + * returns whether the VFIO IRQ is active: + * true if not yet deactivated at interrupt controller level or if + * automasked (level sensitive IRQ). Unfortunately there is no way to + * discriminate between handler auto-masking and user-space masking + */ +extern int vfio_external_is_active(struct vfio_device *vdev, + unsigned index, unsigned start, + unsigned count); + +extern int vfio_external_set_automasked(struct vfio_device *vdev, +unsigned index, unsigned start, +unsigned count, bool automasked); + struct pci_dev; #ifdef CONFIG_EEH extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 10/17] KVM: arm: select IRQ_BYPASS_MANAGER
Select IRQ_BYPASS_MANAGER when CONFIG_KVM is set Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/kvm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..7d38d25 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select IRQ_BYPASS_MANAGER depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control
- [add,del]_[consumer,producer] updated to takes both the consumer and producer handles. This is requested to combine info from both, typically to link the source irq owned by the producer with the gsi owned by the consumer (forwarded IRQ setup). - new functions are added: [stop,resume]_[consumer, producer]. Those are needed for forwarding since the state change requires to entermingle actions at consumer, producer. - On handshake, we now call connect, disconnect which features the more complex sequence. - new fields are added on producer side: linux irq, vfio_device handle, active which reflects whether the source is active (at interrupt controller level or at VFIO level - automasked -) and finally an opaque pointer which will be used to point to the vfio_platform_device in this series. - new fields on consumer side: the kvm handle, the gsi Integration of posted interrupt series will help to refine those choices Signed-off-by: Eric Auger eric.au...@linaro.org --- - connect/disconnect could become a cb too. For forwarding it may make sense to have failure at connection: this would happen when the physical IRQ is either active at irqchip level or VFIO masked. This means some of the cb should return an error and this error management could be prod/cons specific. Where to attach the connect/disconnect cb: to the cons or prod, to both? - Hence may be sensible to do the list_add only if connect returns 0 - disconnect would not be allowed to fail. --- include/linux/irqbypass.h | 26 ++--- kernel/irq/bypass.c | 48 +++ 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h index 718508e..591ae3f 100644 --- a/include/linux/irqbypass.h +++ b/include/linux/irqbypass.h @@ -3,17 +3,37 @@ #include linux/list.h +struct vfio_device; +struct irq_bypass_consumer; +struct kvm; + struct irq_bypass_producer { struct list_head node; void *token; - /* TBD */ + unsigned int irq; /* host physical irq */ + struct vfio_device *vdev; /* vfio device that requested irq */ + /* is irq active at irqchip or VFIO masked? */ + bool active; + void *opaque; + void (*stop_producer)(struct irq_bypass_producer *); + void (*resume_producer)(struct irq_bypass_producer *); + void (*add_consumer)(struct irq_bypass_producer *, +struct irq_bypass_consumer *); + void (*del_consumer)(struct irq_bypass_producer *, +struct irq_bypass_consumer *); }; struct irq_bypass_consumer { struct list_head node; void *token; - void (*add_producer)(struct irq_bypass_producer *); - void (*del_producer)(struct irq_bypass_producer *); + unsigned int gsi; /* the guest gsi */ + struct kvm *kvm; + void (*stop_consumer)(struct irq_bypass_consumer *); + void (*resume_consumer)(struct irq_bypass_consumer *); + void (*add_producer)(struct irq_bypass_consumer *, +struct irq_bypass_producer *); + void (*del_producer)(struct irq_bypass_consumer *, +struct irq_bypass_producer *); }; int irq_bypass_register_producer(struct irq_bypass_producer *); diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c index 5d0f92b..fb31fef 100644 --- a/kernel/irq/bypass.c +++ b/kernel/irq/bypass.c @@ -19,6 +19,46 @@ static LIST_HEAD(producers); static LIST_HEAD(consumers); static DEFINE_MUTEX(lock); +/* lock must be hold when calling connect */ +static void connect(struct irq_bypass_producer *prod, + struct irq_bypass_consumer *cons) +{ + pr_info( %s prod(%d) - cons(%d)\n, + __func__, prod-irq, cons-gsi); + if (prod-stop_producer) + prod-stop_producer(prod); + if (cons-stop_consumer) + cons-stop_consumer(cons); + if (prod-add_consumer) + prod-add_consumer(prod, cons); + if (cons-add_producer) + cons-add_producer(cons, prod); + if (cons-resume_consumer) + cons-resume_consumer(cons); + if (prod-resume_producer) + prod-resume_producer(prod); +} + +/* lock must be hold when calling disconnect */ +static void disconnect(struct irq_bypass_producer *prod, + struct irq_bypass_consumer *cons) +{ + pr_info( %s prod(%d) - cons(%d)\n, + __func__, prod-irq, cons-gsi); + if (prod-stop_producer) + prod-stop_producer(prod); + if (cons-stop_consumer) + cons-stop_consumer(cons); + if (cons-del_producer) + cons-del_producer(cons, prod); + if (prod-del_consumer) + prod-del_consumer(prod, cons); + if (cons-resume_consumer) + cons-resume_consumer(cons); + if
[RFC 00/17] ARM IRQ forward control based on IRQ bypass manager
This series allows to set ARM IRQ forwarding between a VFIO platform device physical IRQ and a guest virtual IRQ. The setting is coordinated by the prototype IRQ bypass manager. This kernel integration seems now prefered to previous kvm-vfio device user api: - [RFC v6 00/16] KVM-VFIO IRQ forward control, https://lkml.org/lkml/2015/4/13/353). Some rationale can be found in IRQ bypass manager thread: https://lkml.org/lkml/2015/6/29/268 The principle is the VFIO platform driver registers a producer struct on VFIO_IRQ_SET_ACTION_TRIGGER while KVM irqfd registers a consumer struct on the irqfd assignment. This leads to a handshake based on the eventfd context (used as token) match. When either of the producer/consumer module disappears, this leads to an unregistration and the link is disconnected. Structure of the series: [1-6] Modifications in the VFIO (platform) driver to prepare for dynamic switch between automasked/masked mode [7-8] Introduce halt/resume guest capability [9] irq bypass manager proto as sent by Alex [10-17] Adaptations to support forwarding on top of IRQ bypass manager Dependencies: 1- [PATCH 00/10] arm/arm64: KVM: Active interrupt state switching for shared devices (http://www.spinics.net/lists/kvm/msg117411.html) 2- RFC ARM: Forwarding physical interrupts to a guest VM (http://lwn.net/Articles/603514/) 3- IRQ bypass manager proto: https://lkml.org/lkml/2015/6/29/268 4- [RFC v2 0/4] chip/vgic adaptations for forwarded irq http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/323183.html All those pieces can be found at: https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.2-rc1-bypass-fwd More backgroung on ARM IRQ forwarding in the text below and at http://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf. A forwarded IRQ is deactivated by the guest and not by the host. When the guest deactivates the associated virtual IRQ, the interrupt controller automatically completes the physical IRQ. Obviously this requires some HW support in the interrupt controller. This is the case for ARM GICv2. The direct benefit is that, for a level sensitive IRQ, a VM exit can be avoided on forwarded IRQ completion. When the IRQ is forwarded, the VFIO platform driver does not need to mask the physical IRQ anymore before signaling the eventfd. Indeed genirq lowers the running priority, enabling other physical IRQ to hit except that one. Besides, the injection still is based on irqfd triggering. The only impact on irqfd process is resamplefd is not called anymore on virtual IRQ completion since deactivation is not trapped by KVM. This was tested on Calxeda Midway, assigning the xgmac main IRQ kvm-vfio v6 - rfc based on IRQ bypass manager see previous history in https://lkml.org/lkml/2015/4/13/353). Best Regards Eric Alex Williamson (1): bypass: IRQ bypass manager proto by Alex Eric Auger (16): VFIO: platform: test forwarded state when selecting IRQ handler VFIO: platform: single handler using function pointer VFIO: Introduce vfio_device_external_ops VFIO: pci: initialize vfio_device_external_ops VFIO: platform: implement vfio_device_external_ops callbacks VFIO: add vfio_external_{mask|is_active|set_automasked} KVM: arm: rename pause into power_off kvm: arm/arm64: implement kvm_arm_[halt,resume]_guest KVM: arm: select IRQ_BYPASS_MANAGER VFIO: platform: select IRQ_BYPASS_MANAGER irq: bypass: Extend skeleton for ARM forwarding control KVM: introduce kvm_arch functions for IRQ bypass KVM: arm/arm64: vgic: forwarding control KVM: arm/arm64: implement IRQ bypass consumer functions KVM: eventfd: add irq bypass consumer management VFIO: platform: add irq bypass producer management arch/arm/include/asm/kvm_host.h | 5 +- arch/arm/kvm/Kconfig | 1 + arch/arm/kvm/arm.c| 60 +++- arch/arm/kvm/psci.c | 10 +- arch/arm64/include/asm/kvm_host.h | 3 + arch/x86/kvm/Kconfig | 1 + drivers/vfio/pci/Kconfig | 1 + drivers/vfio/pci/vfio_pci.c | 1 + drivers/vfio/pci/vfio_pci_intrs.c | 6 + drivers/vfio/platform/Kconfig | 1 + drivers/vfio/platform/vfio_platform_common.c | 9 ++ drivers/vfio/platform/vfio_platform_irq.c | 160 - drivers/vfio/platform/vfio_platform_private.h | 14 ++ drivers/vfio/vfio.c | 39 ++ include/kvm/arm_vgic.h| 7 + include/linux/irqbypass.h | 43 ++ include/linux/kvm_host.h | 27 include/linux/vfio.h | 34 + kernel/irq/Kconfig| 3 + kernel/irq/Makefile | 1 + kernel/irq/bypass.c | 156 + virt/kvm/arm/vgic.c
[RFC 09/17] bypass: IRQ bypass manager proto by Alex
From: Alex Williamson alex.william...@redhat.com There are plenty of details to be filled in, but I think the basics looks something like the code below. The IRQ bypass manager just defines a pair of structures, one for interrupt producers and one for interrupt consumers. I'm certain that we'll need more callbacks than I've defined below, but figuring out what those should be for the best abstraction is the hardest part of this idea. The manager provides both registration and de-registration interfaces for both types of objects and keeps lists for each, protected by a lock. The manager doesn't even really need to know what the match token is, but I assume for our purposes it will be an eventfd_ctx. On the vfio side, the producer struct would be embedded in the vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct in _irqfd. As I've coded below, the IRQ bypass manager calls the consumer callbacks, so the producer struct would need fields or callbacks to provide the consumer the info it needs. AIUI the Posted Interrupt model, VFIO only needs to provide data to the consumer. For IRQ Forwarding, I think the producer needs to be informed when bypass is active to model the incoming interrupt as edge vs level. I've prototyped the base IRQ bypass manager here as static, but I don't see any reason it couldn't be a module that's loaded by dependency when either vfio-pci or kvm-intel is loaded (or other producer/consumer objects). Is this a reasonable starting point to craft the additional fields and callbacks and interaction of who calls who that we need to support Posted Interrupts and IRQ Forwarding? Is the AMD version of this still alive? Thanks, Alex --- arch/x86/kvm/Kconfig | 1 + drivers/vfio/pci/Kconfig | 1 + drivers/vfio/pci/vfio_pci_intrs.c | 6 ++ include/linux/irqbypass.h | 23 kernel/irq/Kconfig| 3 + kernel/irq/Makefile | 1 + kernel/irq/bypass.c | 116 ++ virt/kvm/eventfd.c| 4 ++ 8 files changed, 155 insertions(+) create mode 100644 include/linux/irqbypass.h create mode 100644 kernel/irq/bypass.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index d8a1d56..86d0d77 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -61,6 +61,7 @@ config KVM_INTEL depends on KVM # for perf_guest_get_msrs(): depends on CPU_SUP_INTEL + select IRQ_BYPASS_MANAGER ---help--- Provides support for KVM on Intel processors equipped with the VT extensions. diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 579d83b..02912f1 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -2,6 +2,7 @@ config VFIO_PCI tristate VFIO support for PCI devices depends on VFIO PCI EVENTFD select VFIO_VIRQFD + select IRQ_BYPASS_MANAGER help Support for the PCI VFIO bus driver. This is required to make use of PCI drivers using the VFIO framework. diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 1f577b4..4e053be 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) if (vdev-ctx[0].trigger) { free_irq(pdev-irq, vdev); + /* irq_bypass_unregister_producer(); */ kfree(vdev-ctx[0].name); eventfd_ctx_put(vdev-ctx[0].trigger); vdev-ctx[0].trigger = NULL; @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) return ret; } + /* irq_bypass_register_producer(); */ + /* * INTx disable will stick across the new irq setup, * disable_irq won't. @@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, if (vdev-ctx[vector].trigger) { free_irq(irq, vdev-ctx[vector].trigger); + /* irq_bypass_unregister_producer(); */ kfree(vdev-ctx[vector].name); eventfd_ctx_put(vdev-ctx[vector].trigger); vdev-ctx[vector].trigger = NULL; @@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, return ret; } + /* irq_bypass_register_producer(); */ + vdev-ctx[vector].trigger = trigger; return 0; diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h new file mode 100644 index 000..718508e --- /dev/null +++ b/include/linux/irqbypass.h @@ -0,0 +1,23 @@ +#ifndef IRQBYPASS_H +#define IRQBYPASS_H + +#include linux/list.h + +struct irq_bypass_producer { + struct list_head node; + void *token; + /* TBD */ +}; + +struct irq_bypass_consumer { + struct list_head
[RFC 02/17] VFIO: platform: single handler using function pointer
A single handler now is registered whatever the use case: automasked or not. A function pointer is set according to the wished behavior and the handler calls this function. The irq lock is taken/released in the root handler. eventfd_signal can be called in regions not allowed to sleep. Signed-off-by: Eric Auger eric.au...@linaro.org --- v4: creation --- drivers/vfio/platform/vfio_platform_irq.c | 21 +++-- drivers/vfio/platform/vfio_platform_private.h | 1 + 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index 132bb3f..8eb65c1 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -147,11 +147,8 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev, static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id) { struct vfio_platform_irq *irq_ctx = dev_id; - unsigned long flags; int ret = IRQ_NONE; - spin_lock_irqsave(irq_ctx-lock, flags); - if (!irq_ctx-masked) { ret = IRQ_HANDLED; @@ -160,8 +157,6 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id) irq_ctx-masked = true; } - spin_unlock_irqrestore(irq_ctx-lock, flags); - if (ret == IRQ_HANDLED) eventfd_signal(irq_ctx-trigger, 1); @@ -177,6 +172,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id) return IRQ_HANDLED; } +static irqreturn_t vfio_handler(int irq, void *dev_id) +{ + struct vfio_platform_irq *irq_ctx = dev_id; + unsigned long flags; + irqreturn_t ret; + + spin_lock_irqsave(irq_ctx-lock, flags); + ret = irq_ctx-handler(irq, dev_id); + spin_unlock_irqrestore(irq_ctx-lock, flags); + + return ret; +} + static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, int fd, irq_handler_t handler) { @@ -206,9 +214,10 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, } irq-trigger = trigger; + irq-handler = handler; irq_set_status_flags(irq-hwirq, IRQ_NOAUTOEN); - ret = request_irq(irq-hwirq, handler, 0, irq-name, irq); + ret = request_irq(irq-hwirq, vfio_handler, 0, irq-name, irq); if (ret) { kfree(irq-name); eventfd_ctx_put(trigger); diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index 1c9b3d5..413f575 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -37,6 +37,7 @@ struct vfio_platform_irq { spinlock_t lock; struct virqfd *unmask; struct virqfd *mask; + irqreturn_t (*handler)(int irq, void *dev_id); }; struct vfio_platform_region { -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 08/17] kvm: arm/arm64: implement kvm_arm_[halt,resume]_guest
On halt, the guest is forced to exit and prevented from being re-entered. This is synchronous. Those two operations will be needed for IRQ forwarding setting. Signed-off-by: Eric Auger eric.au...@linaro.org --- RFC: - rename the function and this latter becomes static - remove __KVM_HAVE_ARCH_HALT_GUEST v4 - v5: add arm64 support - also defines __KVM_HAVE_ARCH_HALT_GUEST for arm64 - add pause field --- arch/arm/include/asm/kvm_host.h | 3 +++ arch/arm/kvm/arm.c| 32 +--- arch/arm64/include/asm/kvm_host.h | 3 +++ 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 304004d..899ae27 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -132,6 +132,9 @@ struct kvm_vcpu_arch { /* vcpu power-off state */ bool power_off; + /* Don't run the guest */ + bool pause; + /* IO related fields */ struct kvm_decode mmio_decode; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 7537e68..4be6715 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -471,11 +471,36 @@ bool kvm_arch_intc_initialized(struct kvm *kvm) return vgic_initialized(kvm); } +static void kvm_arm_halt_guest(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) + vcpu-arch.pause = true; + force_vm_exit(cpu_all_mask); +} + +static void kvm_arm_resume_guest(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); + + vcpu-arch.pause = false; + wake_up_interruptible(wq); + } +} + + static void vcpu_pause(struct kvm_vcpu *vcpu) { wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu); - wait_event_interruptible(*wq, !vcpu-arch.power_off); + wait_event_interruptible(*wq, ((!vcpu-arch.power_off) + (!vcpu-arch.pause))); } static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) @@ -525,7 +550,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) update_vttbr(vcpu-kvm); - if (vcpu-arch.power_off) + if (vcpu-arch.power_off || vcpu-arch.pause) vcpu_pause(vcpu); /* @@ -551,7 +576,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) run-exit_reason = KVM_EXIT_INTR; } - if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) { + if (ret = 0 || need_new_vmid_gen(vcpu-kvm) || + vcpu-arch.pause) { local_irq_enable(); preempt_enable(); kvm_vgic_sync_hwstate(vcpu); diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 009da6b..69e3785 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -125,6 +125,9 @@ struct kvm_vcpu_arch { /* vcpu power-off state */ bool power_off; + /* Don't run the guest */ + bool pause; + /* IO related fields */ struct kvm_decode mmio_decode; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008
But you are very appositely mistaken: copy_huge_page() used to make the same mistake, and Dave Hansen fixed it back in v3.13, but the fix never went to the stable trees. commit 30b0a105d9f7141e4cbf72ae5511832457d89788 Author: Dave Hansen dave.han...@linux.intel.com Date: Thu Nov 21 14:31:58 2013 -0800 mm: thp: give transparent hugepage code a separate copy_page Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); So, yay for code reuse. But: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); and a non-hugetlbfs page has no page_hstate(). This works 99% of the time because page_hstate() determines the hstate from the page order alone. Since the page order of a THP page matches the default hugetlbfs page order, it works. But, if you change the default huge page size on the boot command-line (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate so page_hstate() returns null and copy_huge_page() oopses pretty fast since copy_huge_page() dereferences the hstate: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); if (unlikely(pages_per_huge_page(h) MAX_ORDER_NR_PAGES)) { ... Mel noticed that the migration code is really the only user of these functions. This moves all the copy code over to migrate.c and makes copy_huge_page() work for THP by checking for it explicitly. I believe the bug was introduced in commit b32967ff101a (mm: numa: Add THP migration for the NUMA working set scanning fault case) [a...@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi] Signed-off-by: Dave Hansen dave.han...@linux.intel.com Acked-by: Mel Gorman mgor...@suse.de Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com Cc: Hillf Danton dhi...@gmail.com Cc: Andrea Arcangeli aarca...@redhat.com Tested-by: Dave Jiang dave.ji...@intel.com Signed-off-by: Andrew Morton a...@linux-foundation.org Signed-off-by: Linus Torvalds torva...@linux-foundation.org Thanks, the issue is fixed on 3.10 with trivial patch modification. Ping? 3.10 still misses that.. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 05/17] VFIO: platform: implement vfio_device_external_ops callbacks
This patch adds the implementation for the 3 external callbacks of vfio_device_external_ops struct, namely active, is_active, set_automasked. Also vfio_device_ops and vfio_device_external_ops are set accordingly. Signed-off-by: Eric Auger eric.au...@linaro.org --- v6: creation --- drivers/vfio/platform/vfio_platform_common.c | 7 drivers/vfio/platform/vfio_platform_irq.c | 49 +++ drivers/vfio/platform/vfio_platform_private.h | 11 ++ 3 files changed, 67 insertions(+) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index e43efb5..9acfca6 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -520,6 +520,12 @@ static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma) return -EINVAL; } +static struct vfio_device_external_ops vfio_platform_external_ops = { + .mask = vfio_platform_external_mask, + .is_active = vfio_platform_external_is_active, + .set_automasked = vfio_platform_external_set_automasked, +}; + static const struct vfio_device_ops vfio_platform_ops = { .name = vfio-platform, .open = vfio_platform_open, @@ -528,6 +534,7 @@ static const struct vfio_device_ops vfio_platform_ops = { .read = vfio_platform_read, .write = vfio_platform_write, .mmap = vfio_platform_mmap, + .external_ops = vfio_platform_external_ops }; int vfio_platform_probe_common(struct vfio_platform_device *vdev, diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index 8eb65c1..f6d83ed 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -231,6 +231,55 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index, return 0; } +int vfio_platform_external_mask(void *device_data, unsigned index, +unsigned start, unsigned count) +{ + struct vfio_platform_device *vdev = device_data; + + vfio_platform_mask(vdev-irqs[index]); + return 0; +} + +int vfio_platform_external_is_active(void *device_data, unsigned index, + unsigned start, unsigned count) +{ + unsigned long flags; + struct vfio_platform_device *vdev = device_data; + struct vfio_platform_irq *irq = vdev-irqs[index]; + bool active, masked, outstanding; + int ret; + + spin_lock_irqsave(irq-lock, flags); + + ret = irq_get_irqchip_state(irq-hwirq, IRQCHIP_STATE_ACTIVE, active); + BUG_ON(ret); + masked = irq-masked; + outstanding = active || masked; + + spin_unlock_irqrestore(irq-lock, flags); + return outstanding; +} + +int vfio_platform_external_set_automasked(void *device_data, unsigned index, + unsigned start, unsigned count, + bool automasked) +{ + unsigned long flags; + struct vfio_platform_device *vdev = device_data; + struct vfio_platform_irq *irq = vdev-irqs[index]; + + spin_lock_irqsave(irq-lock, flags); + if (automasked) { + irq-flags |= VFIO_IRQ_INFO_AUTOMASKED; + irq-handler = vfio_automasked_irq_handler; + } else { + irq-flags = ~VFIO_IRQ_INFO_AUTOMASKED; + irq-handler = vfio_irq_handler; + } + spin_unlock_irqrestore(irq-lock, flags); + return 0; +} + static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev, unsigned index, unsigned start, unsigned count, uint32_t flags, diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index 413f575..5f46c68 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -90,4 +90,15 @@ extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev, unsigned start, unsigned count, void *data); +extern int vfio_platform_external_mask(void *device_data, unsigned index, + unsigned start, unsigned count); +extern int vfio_platform_external_is_active(void *device_data, + unsigned index, unsigned start, + unsigned count); +extern int vfio_platform_external_set_automasked(void *device_data, +unsigned index, +unsigned start, +unsigned count, +bool
[RFC 03/17] VFIO: Introduce vfio_device_external_ops
New bus callbacks are introduced. They correspond to external functions. To avoid messing up the main vfio_device_ops struct, a new vfio_device_external_ops struct is introduced. Signed-off-by: Eric Auger eric.au...@linaro.org --- v6: creation --- include/linux/vfio.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ddb4409..d79e8a9 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -19,6 +19,23 @@ #include uapi/linux/vfio.h /** + * struct vfio_device_external_ops - VFIO bus driver device callbacks + * used as external API + * @mask: mask any IRQ defined by triplet + * @is_active: returns whether any IRQ defined by triplet is active + * @set_automasked: sets the automasked flag of triplet's IRQ + */ +struct vfio_device_external_ops { + int (*mask)(void *device_data, unsigned index, unsigned start, + unsigned count); + int (*is_active)(void *device_data, unsigned index, unsigned start, +unsigned count); + int (*set_automasked)(void *device_data, unsigned index, + unsigned start, unsigned count, + bool automasked); +}; + +/** * struct vfio_device_ops - VFIO bus driver device callbacks * * @open: Called when userspace creates new file descriptor for device @@ -42,6 +59,7 @@ struct vfio_device_ops { unsigned long arg); int (*mmap)(void *device_data, struct vm_area_struct *vma); void(*request)(void *device_data, unsigned int count); + struct vfio_device_external_ops *external_ops; }; extern int vfio_add_group_dev(struct device *dev, -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/run: don't enable KVM if system can't do it
On 02/07/2015 13:51, Andrew Jones wrote: 4) I recently mentioned[*] it might be nice to add a '-force-tcg' type of arm/run command line option, allowing tcg to be used even if it's possible to use kvm. Adding that at the same time would be nice. Can you just use --no-kvm? It is equivalent to -machine accel=tcg, and it overrides previous -machine accel=foo options. Paolo ps: I also share the yay feeling, of course! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 01/17] VFIO: platform: test forwarded state when selecting IRQ handler
In case the IRQ is forwarded, the VFIO platform IRQ handler does not need to disable the IRQ anymore. When setting the IRQ handler we now also test the forwarded state. In case the IRQ is forwarded we select the vfio_irq_handler. Signed-off-by: Eric Auger eric.au...@linaro.org --- v3 - v4: - change title v2 - v3: - forwarded state was tested in the handler. Now the forwarded state is tested before setting the handler. This definitively limits the dynamics of forwarded state changes but I don't think there is a use case where we need to be able to change the state at any time. Conflicts: drivers/vfio/platform/vfio_platform_irq.c --- drivers/vfio/platform/vfio_platform_irq.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c index 88bba57..132bb3f 100644 --- a/drivers/vfio/platform/vfio_platform_irq.c +++ b/drivers/vfio/platform/vfio_platform_irq.c @@ -229,8 +229,13 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev, { struct vfio_platform_irq *irq = vdev-irqs[index]; irq_handler_t handler; + struct irq_data *d; + bool is_forwarded; - if (vdev-irqs[index].flags VFIO_IRQ_INFO_AUTOMASKED) + d = irq_get_irq_data(irq-hwirq); + is_forwarded = irqd_irq_forwarded(d); + + if (vdev-irqs[index].flags VFIO_IRQ_INFO_AUTOMASKED !is_forwarded) handler = vfio_automasked_irq_handler; else handler = vfio_irq_handler; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 13/17] KVM: introduce kvm_arch functions for IRQ bypass
On 02/07/2015 15:17, Eric Auger wrote: +#ifdef CONFIG_IRQ_BYPASS_MANAGER Please use a separate symbol CONFIG_KVM_HAVE_IRQ_BYPASS. +void kvm_arch_add_producer(struct irq_bypass_consumer *, +struct irq_bypass_producer *); add_irq_bypass_producer, and so on below. Paolo +void kvm_arch_del_producer(struct irq_bypass_consumer *, +struct irq_bypass_producer *); +void kvm_arch_stop_consumer(struct irq_bypass_consumer *); +void kvm_arch_resume_consumer(struct irq_bypass_consumer *); + +#else +void kvm_arch_add_producer(struct irq_bypass_consumer *, +struct irq_bypass_producer *) +{ +} +void kvm_arch_del_producer(struct irq_bypass_consumer *, +struct irq_bypass_producer *) +{ +} +void kvm_arch_stop_consumer(struct irq_bypass_consumer *) +{ +} +void kvm_arch_resume_consumer(struct irq_bypass_consumer *) +{ +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/run: don't enable KVM if system can't do it
On Thu, Jul 02, 2015 at 12:05:31PM +0100, Alex Bennée wrote: As ARM (and no doubt other systems) can also run tests in pure TCG mode we might as well not bother enabling accel=kvm if we aren't on a real ARM based system. This prevents us seeing ugly warning messages when testing TCG. First, YAY! We're getting contributions to kvm-unit-tests/arm! Signed-off-by: Alex Bennée alex.ben...@linaro.org --- arm/run | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arm/run b/arm/run index 662a856..2bdb4be 100755 --- a/arm/run +++ b/arm/run @@ -33,7 +33,13 @@ if $qemu $M -chardev testdev,id=id -initrd . 21 \ exit 2 fi -M='-machine virt,accel=kvm:tcg' +host=`uname -m | sed -e 's/arm.*/arm/'` +if [ ${host} = arm ] || [ ${host} = aarch64 ]; then +M='-machine virt,accel=kvm:tcg' +else +M='-machine virt,accel=tcg' +fi I think this is a good idea, although I had actually left that warning on purpose. Originally, the plan was for these unit tests to be kvm specific. If they could be developed with the aid of tcg, and even used to test tcg, then fine, but running them on tcg should always complain, in order to make sure that the test output clearly showed that it had not been running on kvm. Developing unit tests for tcg is also a good idea though, and there's really no reason not to share this framework. So, for this patch I'd prefer we do a few things differently; 1) we should be able to integrate this new condition with the arm64 must use '-cpu host' with kvm condition that is lower down. And, let's just make this $HOST variable one that ./configure prepares, allowing that arm64 condition to s/$(arch)/$HOST/ and avoiding the need to duplicate the sed -e 's/arm.*/arm/' 2) we might as well do something like M='-machine virt' if using-kvm M+=',accel=kvm' else M+=',accel=tcg' fi now, since we don't want to use the accel fallback feature anymore 3) outputting which one we're using might still be nice, otherwise one must inspect the qemu command line in the logs to find out 4) I recently mentioned[*] it might be nice to add a '-force-tcg' type of arm/run command line option, allowing tcg to be used even if it's possible to use kvm. Adding that at the same time would be nice. 5) we use tabs for indentation in arm/run, and only bother with the variable's {}, if necessary 6) we should post patches with [kvm-unit-tests PATCH] to avoid confusion with other kvm postings. (I screwed that up on my last two postings...). Thanks! drew [*] https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07514.html + chr_testdev='-device virtio-serial-device' chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd' -- 2.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/2] vhost: support more than 64 memory regions
changes since v3: * rebased on top of vhost-next branch changes since v2: * drop cache patches for now as suggested * add max_mem_regions module parameter instead of unconditionally increasing limit * drop bsearch patch since it's already queued References to previous versions: v2: https://lkml.org/lkml/2015/6/17/276 v1: http://www.spinics.net/lists/kvm/msg117654.html Series allows to tweak vhost's memory regions count limit. It fixes VM crashing on memory hotplug due to vhost refusing accepting more than 64 memory regions with max_mem_regions set to more than 262 slots in default QEMU configuration. Igor Mammedov (2): vhost: extend memory regions allocation to vmalloc vhost: add max_mem_regions module parameter drivers/vhost/vhost.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/run: don't enable KVM if system can't do it
Andrew Jones drjo...@redhat.com writes: On Thu, Jul 02, 2015 at 12:05:31PM +0100, Alex Bennée wrote: As ARM (and no doubt other systems) can also run tests in pure TCG mode we might as well not bother enabling accel=kvm if we aren't on a real ARM based system. This prevents us seeing ugly warning messages when testing TCG. First, YAY! We're getting contributions to kvm-unit-tests/arm! :-) well so far I've been noodling about looking at it for KVM Guest Debug testing. I've a hideous branch on github that attempts to test exercise the debug register trapping code. However that falls down as I really need to find an easy way of attaching GDB to the qemu-gdb stub while the test is running. However with the TCG multi-thread work coming up I certainly see the need to exercise QEMU in a way that the internal TCG test code might have trouble with. Signed-off-by: Alex Bennée alex.ben...@linaro.org --- arm/run | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arm/run b/arm/run index 662a856..2bdb4be 100755 --- a/arm/run +++ b/arm/run @@ -33,7 +33,13 @@ if $qemu $M -chardev testdev,id=id -initrd . 21 \ exit 2 fi -M='-machine virt,accel=kvm:tcg' +host=`uname -m | sed -e 's/arm.*/arm/'` +if [ ${host} = arm ] || [ ${host} = aarch64 ]; then +M='-machine virt,accel=kvm:tcg' +else +M='-machine virt,accel=tcg' +fi I think this is a good idea, although I had actually left that warning on purpose. Originally, the plan was for these unit tests to be kvm specific. If they could be developed with the aid of tcg, and even used to test tcg, then fine, but running them on tcg should always complain, in order to make sure that the test output clearly showed that it had not been running on kvm. Developing unit tests for tcg is also a good idea though, and there's really no reason not to share this framework. So, for this patch I'd prefer we do a few things differently; 1) we should be able to integrate this new condition with the arm64 must use '-cpu host' with kvm condition that is lower down. And, let's just make this $HOST variable one that ./configure prepares, allowing that arm64 condition to s/$(arch)/$HOST/ and avoiding the need to duplicate the sed -e 's/arm.*/arm/' Yeah makes sense. 2) we might as well do something like M='-machine virt' if using-kvm M+=',accel=kvm' else M+=',accel=tcg' fi now, since we don't want to use the accel fallback feature anymore 3) outputting which one we're using might still be nice, otherwise one must inspect the qemu command line in the logs to find out 4) I recently mentioned[*] it might be nice to add a '-force-tcg' type of arm/run command line option, allowing tcg to be used even if it's possible to use kvm. Adding that at the same time would be nice. Would it also be useful for other arches? Does run-tests.sh pass 5) we use tabs for indentation in arm/run, and only bother with the variable's {}, if necessary My shell quoting was rusty. I think $(host) was calling the host command for some reason. 6) we should post patches with [kvm-unit-tests PATCH] to avoid confusion with other kvm postings. (I screwed that up on my last two postings...). /me ponders if he can just config git for that. I'll patch the readme ;-) Thanks! drew [*] https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07514.html + chr_testdev='-device virtio-serial-device' chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd' -- 2.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control
On 02/07/2015 15:17, Eric Auger wrote: - new fields are added on producer side: linux irq, vfio_device handle, active which reflects whether the source is active (at interrupt controller level or at VFIO level - automasked -) and finally an opaque pointer which will be used to point to the vfio_platform_device in this series. Linux IRQ and active should be okay. As to the vfio_device handle, you should link it from the vfio_platform_device instead. And for the vfio_platform_device, you can link it from the vfio_platform_irq instead. Once you've done this, embed the irq_bypass_producer struct in the vfio_platform_irq struct; in the new kvm_arch_* functions, go back to the vfio_platform_irq struct via container_of. From there you can retrieve pointers to the vfio_platform_device and the vfio_device. - new fields on consumer side: the kvm handle, the gsi You do not need to add these. Instead, add the kvm handle to irqfd only. Like above, embed the irq_bypass_consumer struct in the irqfd struct; in the new kvm_arch_* functions, go back to the vfio_platform_irq struct via container_of. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 09/11] KVM: arm64: guest debug, HW assisted debug support
Will Deacon will.dea...@arm.com writes: Are you happy with this?: Subject: [PATCH v8 09/11] KVM: arm64: guest debug, HW assisted debug support This adds support for userspace to control the HW debug registers for guest debug. In the debug ioctl we copy an IMPDEF registers into a new register set called host_debug_state. We use the recently introduced vcpu parameter debug_ptr to select which register set is copied into the real registers when world switch occurs. I've made some helper functions from hw_breakpoint.c more widely available for re-use. As with single step we need to tweak the guest registers to enable the exceptions so we need to save and restore those bits. Two new capabilities have been added to the KVM_EXTENSION ioctl to allow userspace to query the number of hardware break and watch points available on the host hardware. Signed-off-by: Alex Bennée alex.ben...@linaro.org Reviewed-by: Christoffer Dall christoffer.d...@linaro.org --- v2 - switched to C setup - replace host debug registers directly into context - minor tweak to api docs - setup right register for debug - add FAR_EL2 to debug exit structure - add support for trapping debug register access v3 - remove stray trace statement - fix spacing around operators (various) - clean-up usage of trap_debug - introduce debug_ptr, replace excessive memcpy stuff - don't use memcpy in ioctl, just assign - update cap ioctl documentation - reword a number comments - rename host_debug_state-external_debug_state v4 - use the new u32/u64 split debug_ptr approach - fix some wording/comments v5 - don't set MDSCR_EL1.KDE (not needed) v6 - update wording given change in commentary - KVM_GUESTDBG_USE_HW_BP-KVM_GUESTDBG_USE_HW v7 - fix merge conflicts from ioctl move to guest.c - use kvm_arm_reset_debug_ptr to reset ptr - a BUG_ON() test has been added to trap failure to reset debug_ptr - debugging-debug in kvm_host.h comment - s/defined// s/to// in commit msg - rm ref to introducing debug_ptr in commit msg - add r-b tag v8 - use hw_breakpoint_slots() instead --- Documentation/virtual/kvm/api.txt | 7 ++- arch/arm64/include/asm/kvm_host.h | 6 +- arch/arm64/kvm/debug.c| 40 ++- arch/arm64/kvm/guest.c| 7 +++ arch/arm64/kvm/handle_exit.c | 6 ++ arch/arm64/kvm/reset.c| 13 + arch/arm64/kvm/sys_regs.c | 3 --- include/uapi/linux/kvm.h | 2 ++ 8 files changed, 74 insertions(+), 10 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 33c8143..ada57df 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2668,7 +2668,7 @@ The top 16 bits of the control field are architecture specific control flags which can include the following: - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] - - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] + - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64] - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] @@ -2683,6 +2683,11 @@ updated to the correct (supplied) values. The second part of the structure is architecture specific and typically contains a set of debug registers. +For arm64 the number of debug registers is implementation defined and +can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and +KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number +indicating the number of supported registers. + When debug events exit the main run loop with the reason KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run structure containing architecture specific debug information. diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 461d288..6c745e0 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -116,13 +116,17 @@ struct kvm_vcpu_arch { * debugging the guest from the host and to maintain separate host and * guest state during world switches. vcpu_debug_state are the debug * registers of the vcpu as the guest sees them. host_debug_state are -* the host registers which are saved and restored during world switches. +* the host registers which are saved and restored during +* world switches. external_debug_state contains the debug +* values we want to debug the guest. This is set via the +* KVM_SET_GUEST_DEBUG ioctl. * * debug_ptr points to the set of debug registers that should be loaded * onto the hardware when running the guest. */ struct kvm_guest_debug_arch *debug_ptr; struct
Re: [PATCH 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping
Hi Pavel, On 07/02/2015 09:53 AM, Pavel Fedin wrote: Hello! -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Eric Auger Sent: Monday, June 29, 2015 6:37 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; marc.zyng...@arm.com; christoffer.d...@linaro.org; andre.przyw...@arm.com; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; p.fe...@samsung.com; pbonz...@redhat.com Subject: [PATCH 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping If the ITS modality is not available, let's simply support MSI injection by transforming the MSI.data into an SPI ID. This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too. Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/kvm/Kconfig | 1 + virt/kvm/arm/vgic.c | 5 + 2 files changed, 6 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 151e710..0f58baf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD +select HAVE_KVM_MSI select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 0b4c48c..b3c10dc 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2314,6 +2314,11 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, return kvm-arch.vgic.vm_ops.inject_msi(kvm, msi); else return -ENODEV; +case KVM_IRQ_ROUTING_MSI: +if (kvm-arch.vgic.vm_ops.inject_msi) +return -EINVAL; +else +return kvm_vgic_inject_irq(kvm, 0, e-msi.data, level); Given API change i suggest (using KVM_MSI_VALID_DEVID flag), we could get rid of all these if()'s here. Just forward all parameters to vGIC implementation code and let it do its checks. I don't understand this comment. Here this is the kernel struct that is used (struct kvm_kernel_irq_routing_entry) and not the user one (kvm_irq_routing_entry). The kernel struct does not have the flag field. Another reason I think to keep using the type for homogeneity. To be noted that in the kernel struct, the devid is passed in kvm_extended_msi, as you suggested for the user-space struct. Thanks Eric default: return -EINVAL; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
Hi Eric, On 02/07/15 15:49, Eric Auger wrote: Hi Pavel, On 07/02/2015 09:26 AM, Pavel Fedin wrote: Hello! -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Eric Auger Sent: Monday, June 29, 2015 6:37 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; marc.zyng...@arm.com; christoffer.d...@linaro.org; andre.przyw...@arm.com; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; p.fe...@samsung.com; pbonz...@redhat.com Subject: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that composes the MSI msg. Let's create a new routing entry type, dubbed KVM_IRQ_ROUTING_EXTENDED_MSI and use the __u32 pad space to convey the device ID. Signed-off-by: Eric Auger eric.au...@linaro.org --- RFC - PATCH - remove kvm_irq_routing_extended_msi and use union instead --- Documentation/virtual/kvm/api.txt | 9 - include/uapi/linux/kvm.h | 6 +- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d20fd94..6426ae9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1414,7 +1414,10 @@ struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; devid is actually a part of MSI bunch. Shouldn't it be a part of struct kvm_irq_routing_msi then? It also has reserved pad. Well this makes sense to me to associate the devid to the msi and put devid in the pad field of struct kvm_irq_routing_msi. André, Christoffer, would you agree on this change? - I would like to avoid doing/undoing things ;-) - Yes, that makes sense to me. TBH I haven't had a closer look at the patches yet, but clearly devid belongs into struct kvm_irq_routing_msi. @@ -1427,6 +1430,10 @@ struct kvm_irq_routing_entry { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 + +In case of KVM_IRQ_ROUTING_EXTENDED_MSI routing type, devid is used to convey +the device ID. No flags are specified so far, the corresponding field must be set to zero. What if we use KVM_MSI_VALID_DEVID flag instead of new KVM_IRQ_ROUTING_EXTENDED_MSI definition? I believe this would make an API more consistent and introduce less new definitions. do you mean using type == KVM_IRQ_ROUTING_MSI and flag == KVM_MSI_VALID_DEVID? Not sure this is simpler/clearer. s390 paved the way for new routing entry types. I add a new one here. I tend to agree with Pavel's solution. When hacking IRQ routing support into kvmtool I saw that it's nasty being forced to differentiate between the two MSI routing types. Actually userland should be able to query the kernel about what kind of routing it requires. Also there is the issue that we must _not_ set the flag on x86, since that breaks older kernels (due to that check that Eric removes in 3/7). So from my point of view the cleanest solution would be to always use KVM_IRQ_ROUTING_MSI, and add the device ID if the kernel needs it (true for ITS guests, false for GICv2M, x86, ...) I am looking for a clever solution for this now. Cheers, Andre. Another solution may be to use new KVM_IRQ_ROUTING_EXTENDED_MSI type and add struct kvm_msi ext_msi in kvm_irq_routing_entry union. It is 8 words as well. But most probably this is even uglier. Let's see if this thread is heading to a consensus... Best Regards Eric diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2a23705..8484681 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -841,12 +841,16 @@ struct kvm_irq_routing_s390_adapter { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org
Re: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
Hi Andre, On 07/02/2015 05:14 PM, Andre Przywara wrote: Hi Eric, On 02/07/15 15:49, Eric Auger wrote: Hi Pavel, On 07/02/2015 09:26 AM, Pavel Fedin wrote: Hello! -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Eric Auger Sent: Monday, June 29, 2015 6:37 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; marc.zyng...@arm.com; christoffer.d...@linaro.org; andre.przyw...@arm.com; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; p.fe...@samsung.com; pbonz...@redhat.com Subject: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that composes the MSI msg. Let's create a new routing entry type, dubbed KVM_IRQ_ROUTING_EXTENDED_MSI and use the __u32 pad space to convey the device ID. Signed-off-by: Eric Auger eric.au...@linaro.org --- RFC - PATCH - remove kvm_irq_routing_extended_msi and use union instead --- Documentation/virtual/kvm/api.txt | 9 - include/uapi/linux/kvm.h | 6 +- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d20fd94..6426ae9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1414,7 +1414,10 @@ struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; devid is actually a part of MSI bunch. Shouldn't it be a part of struct kvm_irq_routing_msi then? It also has reserved pad. Well this makes sense to me to associate the devid to the msi and put devid in the pad field of struct kvm_irq_routing_msi. André, Christoffer, would you agree on this change? - I would like to avoid doing/undoing things ;-) - Yes, that makes sense to me. TBH I haven't had a closer look at the patches yet, but clearly devid belongs into struct kvm_irq_routing_msi. thanks for your quick reply. OK so let's go with that change. @@ -1427,6 +1430,10 @@ struct kvm_irq_routing_entry { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 + +In case of KVM_IRQ_ROUTING_EXTENDED_MSI routing type, devid is used to convey +the device ID. No flags are specified so far, the corresponding field must be set to zero. What if we use KVM_MSI_VALID_DEVID flag instead of new KVM_IRQ_ROUTING_EXTENDED_MSI definition? I believe this would make an API more consistent and introduce less new definitions. do you mean using type == KVM_IRQ_ROUTING_MSI and flag == KVM_MSI_VALID_DEVID? Not sure this is simpler/clearer. s390 paved the way for new routing entry types. I add a new one here. I tend to agree with Pavel's solution. When hacking IRQ routing support into kvmtool I saw that it's nasty being forced to differentiate between the two MSI routing types. Actually userland should be able to query the kernel about what kind of routing it requires. Also there is the issue that we must _not_ set the flag on x86, since that breaks older kernels (due to that check that Eric removes in 3/7). So from my point of view the cleanest solution would be to always use KVM_IRQ_ROUTING_MSI, and add the device ID if the kernel needs it (true for ITS guests, false for GICv2M, x86, ...) I am looking for a clever solution for this now. OK thanks for sharing. I need some more time to study qemu code too. - Eric Cheers, Andre. Another solution may be to use new KVM_IRQ_ROUTING_EXTENDED_MSI type and add struct kvm_msi ext_msi in kvm_irq_routing_entry union. It is 8 words as well. But most probably this is even uglier. Let's see if this thread is heading to a consensus... Best Regards Eric diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2a23705..8484681 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -841,12 +841,16 @@ struct kvm_irq_routing_s390_adapter { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Kind regards,
Re: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
On 07/02/2015 05:39 PM, Pavel Fedin wrote: Hello! OK thanks for sharing. I need some more time to study qemu code too. I am currently working on supporting this in qemu. Not ready yet, need some time. But, with API i suggest, things are really much-much simpler. OK so both of you say the same thing. Will respin accordingly Eric Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
Hi Pavel, On 07/02/2015 09:26 AM, Pavel Fedin wrote: Hello! -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Eric Auger Sent: Monday, June 29, 2015 6:37 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; marc.zyng...@arm.com; christoffer.d...@linaro.org; andre.przyw...@arm.com; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; p.fe...@samsung.com; pbonz...@redhat.com Subject: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that composes the MSI msg. Let's create a new routing entry type, dubbed KVM_IRQ_ROUTING_EXTENDED_MSI and use the __u32 pad space to convey the device ID. Signed-off-by: Eric Auger eric.au...@linaro.org --- RFC - PATCH - remove kvm_irq_routing_extended_msi and use union instead --- Documentation/virtual/kvm/api.txt | 9 - include/uapi/linux/kvm.h | 6 +- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d20fd94..6426ae9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1414,7 +1414,10 @@ struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; -__u32 pad; +union { +__u32 pad; +__u32 devid; +}; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; devid is actually a part of MSI bunch. Shouldn't it be a part of struct kvm_irq_routing_msi then? It also has reserved pad. Well this makes sense to me to associate the devid to the msi and put devid in the pad field of struct kvm_irq_routing_msi. André, Christoffer, would you agree on this change? - I would like to avoid doing/undoing things ;-) - @@ -1427,6 +1430,10 @@ struct kvm_irq_routing_entry { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 + +In case of KVM_IRQ_ROUTING_EXTENDED_MSI routing type, devid is used to convey +the device ID. No flags are specified so far, the corresponding field must be set to zero. What if we use KVM_MSI_VALID_DEVID flag instead of new KVM_IRQ_ROUTING_EXTENDED_MSI definition? I believe this would make an API more consistent and introduce less new definitions. do you mean using type == KVM_IRQ_ROUTING_MSI and flag == KVM_MSI_VALID_DEVID? Not sure this is simpler/clearer. s390 paved the way for new routing entry types. I add a new one here. Another solution may be to use new KVM_IRQ_ROUTING_EXTENDED_MSI type and add struct kvm_msi ext_msi in kvm_irq_routing_entry union. It is 8 words as well. But most probably this is even uglier. Let's see if this thread is heading to a consensus... Best Regards Eric diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2a23705..8484681 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -841,12 +841,16 @@ struct kvm_irq_routing_s390_adapter { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4 struct kvm_irq_routing_entry { __u32 gsi; __u32 type; __u32 flags; -__u32 pad; +union { +__u32 pad; +__u32 devid; +}; union { struct kvm_irq_routing_irqchip irqchip; struct kvm_irq_routing_msi msi; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
On 07/02/2015 10:41 AM, Pavel Fedin wrote: Hello! What if we use KVM_MSI_VALID_DEVID flag instead of new KVM_IRQ_ROUTING_EXTENDED_MSI definition? I believe this would make an API more consistent and introduce less new definitions. I have just found one more flaw in your implementation. If you take a look at irqfd_wakeup()... --- cut --- /* An event has been signaled, inject an interrupt */ if (irq.type == KVM_IRQ_ROUTING_MSI) kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, false); else schedule_work(irqfd-inject); --- cut --- You apparently missed KVM_IRQ_ROUTING_EXTENDED_MSI here, as well as in irqfd_update(). But, if you accept my API proposal, this becomes irrelevant. Hi Pavel, thanks for spotting this bug. Whatever the user-api API choice I will respin shortly fixing this plus the one reported by André. Thanks for the review. Best Regards Eric Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping
Hello! Given API change i suggest (using KVM_MSI_VALID_DEVID flag), we could get rid of all these if()'s here. Just forward all parameters to vGIC implementation code and let it do its checks. I don't understand this comment. Here this is the kernel struct that is used (struct kvm_kernel_irq_routing_entry) and not the user one (kvm_irq_routing_entry). The kernel struct does not have the flag field. Easy. ARM code can always use struct kvm_extended_msi, and flags can go to this structure. Another reason I think to keep using the type for homogeneity. Homogeneity is perfect IMHO. If that would be simpler for you, i could post a patch for this which i made on top of your series. Sorry, i don't have time to respin the whole thing, busy with qemu GICv3 fight :) Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/7] KVM: api: add kvm_irq_routing_extended_msi
Hello! OK thanks for sharing. I need some more time to study qemu code too. I am currently working on supporting this in qemu. Not ready yet, need some time. But, with API i suggest, things are really much-much simpler. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits
On Thu, Jul 02, 2015 at 10:49:03AM -0700, Mario Smarduch wrote: On 07/01/2015 02:49 AM, Christoffer Dall wrote: On Wed, Jun 24, 2015 at 05:04:10PM -0700, Mario Smarduch wrote: Currently we save/restore fp/simd on each exit. Fist patch optimizes arm64 save/restore, we only do so on Guest access. hackbench and several lmbench tests show anywhere from 30% to above 50% optimzation achieved. In second patch 32-bit handler is updated to keep exit handling consistent with 64-bit code. 30-50% of what? The overhead or overall performance? Yes, so considering all exits to Host KVM anywhere from 30 to 50% didn't require an fp/simd switch. Anything else you like to see added here? No, I'm good with them. Marc is handling the tree these days so I'll leave it up to him if we want to adjust patch 1 or what to do. Thanks! -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC 09/17] bypass: IRQ bypass manager proto by Alex
-Original Message- From: Eric Auger [mailto:eric.au...@linaro.org] Sent: Thursday, July 02, 2015 9:17 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; christoffer.d...@linaro.org; marc.zyng...@arm.com; alex.william...@redhat.com; pbonz...@redhat.com; avi.kiv...@gmail.com; mtosa...@redhat.com; Wu, Feng; j...@8bytes.org; b.rey...@virtualopensystems.com Cc: linux-ker...@vger.kernel.org; patc...@linaro.org Subject: [RFC 09/17] bypass: IRQ bypass manager proto by Alex From: Alex Williamson alex.william...@redhat.com There are plenty of details to be filled in, but I think the basics looks something like the code below. The IRQ bypass manager just defines a pair of structures, one for interrupt producers and one for interrupt consumers. I'm certain that we'll need more callbacks than I've defined below, but figuring out what those should be for the best abstraction is the hardest part of this idea. The manager provides both registration and de-registration interfaces for both types of objects and keeps lists for each, protected by a lock. The manager doesn't even really need to know what the match token is, but I assume for our purposes it will be an eventfd_ctx. On the vfio side, the producer struct would be embedded in the vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct in _irqfd. As I've coded below, the IRQ bypass manager calls the consumer callbacks, so the producer struct would need fields or callbacks to provide the consumer the info it needs. AIUI the Posted Interrupt model, VFIO only needs to provide data to the consumer. For IRQ Forwarding, I think the producer needs to be informed when bypass is active to model the incoming interrupt as edge vs level. I've prototyped the base IRQ bypass manager here as static, but I don't see any reason it couldn't be a module that's loaded by dependency when either vfio-pci or kvm-intel is loaded (or other producer/consumer objects). Is this a reasonable starting point to craft the additional fields and callbacks and interaction of who calls who that we need to support Posted Interrupts and IRQ Forwarding? Is the AMD version of this still alive? Thanks, Alex In fact, I also implement a RFC patch for this new framework. I am thinking, can we discuss all the requirements for irq forwarding and posted interrupts, and make it a separate patchset as a general layer? Then we can continue to push arch specific stuff, it is more clear and easy. Thanks, Feng --- arch/x86/kvm/Kconfig | 1 + drivers/vfio/pci/Kconfig | 1 + drivers/vfio/pci/vfio_pci_intrs.c | 6 ++ include/linux/irqbypass.h | 23 kernel/irq/Kconfig| 3 + kernel/irq/Makefile | 1 + kernel/irq/bypass.c | 116 ++ virt/kvm/eventfd.c| 4 ++ 8 files changed, 155 insertions(+) create mode 100644 include/linux/irqbypass.h create mode 100644 kernel/irq/bypass.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index d8a1d56..86d0d77 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -61,6 +61,7 @@ config KVM_INTEL depends on KVM # for perf_guest_get_msrs(): depends on CPU_SUP_INTEL + select IRQ_BYPASS_MANAGER ---help--- Provides support for KVM on Intel processors equipped with the VT extensions. diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 579d83b..02912f1 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -2,6 +2,7 @@ config VFIO_PCI tristate VFIO support for PCI devices depends on VFIO PCI EVENTFD select VFIO_VIRQFD + select IRQ_BYPASS_MANAGER help Support for the PCI VFIO bus driver. This is required to make use of PCI drivers using the VFIO framework. diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 1f577b4..4e053be 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) if (vdev-ctx[0].trigger) { free_irq(pdev-irq, vdev); + /* irq_bypass_unregister_producer(); */ kfree(vdev-ctx[0].name); eventfd_ctx_put(vdev-ctx[0].trigger); vdev-ctx[0].trigger = NULL; @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) return ret; } + /* irq_bypass_register_producer(); */ + /* * INTx disable will stick across the new irq setup, * disable_irq won't. @@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, if
RE: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Thursday, July 02, 2015 9:41 PM To: Eric Auger; eric.au...@st.com; linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; christoffer.d...@linaro.org; marc.zyng...@arm.com; alex.william...@redhat.com; avi.kiv...@gmail.com; mtosa...@redhat.com; Wu, Feng; j...@8bytes.org; b.rey...@virtualopensystems.com Cc: linux-ker...@vger.kernel.org; patc...@linaro.org Subject: Re: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control On 02/07/2015 15:17, Eric Auger wrote: - new fields are added on producer side: linux irq, vfio_device handle, active which reflects whether the source is active (at interrupt controller level or at VFIO level - automasked -) and finally an opaque pointer which will be used to point to the vfio_platform_device in this series. Linux IRQ and active should be okay. As to the vfio_device handle, you should link it from the vfio_platform_device instead. And for the vfio_platform_device, you can link it from the vfio_platform_irq instead. Once you've done this, embed the irq_bypass_producer struct in the vfio_platform_irq struct; in the new kvm_arch_* functions, go back to the vfio_platform_irq struct via container_of. From there you can retrieve pointers to the vfio_platform_device and the vfio_device. - new fields on consumer side: the kvm handle, the gsi You do not need to add these. Instead, add the kvm handle to irqfd only. Like above, embed the irq_bypass_consumer struct in the irqfd struct; in the new kvm_arch_* functions, go back to the vfio_platform_irq struct via container_of. I also need the gsi field here, for posted-interrupts, I need 'gsi', 'irq' to update the IRTE. Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control
-Original Message- From: Eric Auger [mailto:eric.au...@linaro.org] Sent: Thursday, July 02, 2015 9:17 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; christoffer.d...@linaro.org; marc.zyng...@arm.com; alex.william...@redhat.com; pbonz...@redhat.com; avi.kiv...@gmail.com; mtosa...@redhat.com; Wu, Feng; j...@8bytes.org; b.rey...@virtualopensystems.com Cc: linux-ker...@vger.kernel.org; patc...@linaro.org Subject: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control - [add,del]_[consumer,producer] updated to takes both the consumer and producer handles. This is requested to combine info from both, typically to link the source irq owned by the producer with the gsi owned by the consumer (forwarded IRQ setup). - new functions are added: [stop,resume]_[consumer, producer]. Those are needed for forwarding since the state change requires to entermingle actions at consumer, producer. - On handshake, we now call connect, disconnect which features the more complex sequence. - new fields are added on producer side: linux irq, vfio_device handle, active which reflects whether the source is active (at interrupt controller level or at VFIO level - automasked -) and finally an opaque pointer which will be used to point to the vfio_platform_device in this series. - new fields on consumer side: the kvm handle, the gsi Integration of posted interrupt series will help to refine those choices On PI side, I need another filed as below, struct irq_bypass_consumer { struct list_head node; void *token; + unsigned irq;/*got from producer when registered*/ void (*add_producer)(struct irq_bypass_producer *, struct irq_bypass_consumer *); void (*del_producer)(struct irq_bypass_producer *, struct irq_bypass_consumer *); + void (*update)(struct irq_bypass_consumer *); }; 'update' is used to update the IRTE, while irq is initialized when registered, which is used to find the right IRTE. Thanks, Feng Signed-off-by: Eric Auger eric.au...@linaro.org --- - connect/disconnect could become a cb too. For forwarding it may make sense to have failure at connection: this would happen when the physical IRQ is either active at irqchip level or VFIO masked. This means some of the cb should return an error and this error management could be prod/cons specific. Where to attach the connect/disconnect cb: to the cons or prod, to both? - Hence may be sensible to do the list_add only if connect returns 0 - disconnect would not be allowed to fail. --- include/linux/irqbypass.h | 26 ++--- kernel/irq/bypass.c | 48 +++ 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h index 718508e..591ae3f 100644 --- a/include/linux/irqbypass.h +++ b/include/linux/irqbypass.h @@ -3,17 +3,37 @@ #include linux/list.h +struct vfio_device; +struct irq_bypass_consumer; +struct kvm; + struct irq_bypass_producer { struct list_head node; void *token; - /* TBD */ + unsigned int irq; /* host physical irq */ + struct vfio_device *vdev; /* vfio device that requested irq */ + /* is irq active at irqchip or VFIO masked? */ + bool active; + void *opaque; + void (*stop_producer)(struct irq_bypass_producer *); + void (*resume_producer)(struct irq_bypass_producer *); + void (*add_consumer)(struct irq_bypass_producer *, + struct irq_bypass_consumer *); + void (*del_consumer)(struct irq_bypass_producer *, + struct irq_bypass_consumer *); }; struct irq_bypass_consumer { struct list_head node; void *token; - void (*add_producer)(struct irq_bypass_producer *); - void (*del_producer)(struct irq_bypass_producer *); + unsigned int gsi; /* the guest gsi */ + struct kvm *kvm; + void (*stop_consumer)(struct irq_bypass_consumer *); + void (*resume_consumer)(struct irq_bypass_consumer *); + void (*add_producer)(struct irq_bypass_consumer *, + struct irq_bypass_producer *); + void (*del_producer)(struct irq_bypass_consumer *, + struct irq_bypass_producer *); }; int irq_bypass_register_producer(struct irq_bypass_producer *); diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c index 5d0f92b..fb31fef 100644 --- a/kernel/irq/bypass.c +++ b/kernel/irq/bypass.c @@ -19,6 +19,46 @@ static LIST_HEAD(producers); static LIST_HEAD(consumers); static DEFINE_MUTEX(lock); +/* lock must be hold when calling connect */ +static void connect(struct irq_bypass_producer *prod, + struct
Re: [PATCH 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping
Hi Andre, On 07/02/2015 07:10 PM, Andre Przywara wrote: Hi Eric, On 29/06/15 16:37, Eric Auger wrote: If the ITS modality is not available, let's simply support MSI injection by transforming the MSI.data into an SPI ID. This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too. Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/kvm/Kconfig | 1 + virt/kvm/arm/vgic.c | 5 + 2 files changed, 6 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 151e710..0f58baf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD +select HAVE_KVM_MSI select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 0b4c48c..b3c10dc 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2314,6 +2314,11 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, return kvm-arch.vgic.vm_ops.inject_msi(kvm, msi); else return -ENODEV; +case KVM_IRQ_ROUTING_MSI: +if (kvm-arch.vgic.vm_ops.inject_msi) +return -EINVAL; +else +return kvm_vgic_inject_irq(kvm, 0, e-msi.data, level); If you add: static int vgic_v2m_inject_msi(struct kvm *kvm, struct kvm_msi *msi) { return kvm_vgic_inject_irq(kvm, 0, msi-data, 1); } to vgic-v2-emul.c and wire it up accordingly, you can simplify the above kvm_set_msi, getting rid of all those extra case handling. This also helps merging KVM_IRQ_ROUTING_MSI and the extended case. I have hacked this up and it seems to work for me. OK thanks I will respin either today or on monday. Best Regards Eric Cheers, Andre. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control
-Original Message- From: Wu, Feng Sent: Friday, July 03, 2015 10:20 AM To: Paolo Bonzini; Eric Auger; eric.au...@st.com; linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; christoffer.d...@linaro.org; marc.zyng...@arm.com; alex.william...@redhat.com; avi.kiv...@gmail.com; mtosa...@redhat.com; j...@8bytes.org; b.rey...@virtualopensystems.com Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; Wu, Feng Subject: RE: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control -Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Thursday, July 02, 2015 9:41 PM To: Eric Auger; eric.au...@st.com; linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; christoffer.d...@linaro.org; marc.zyng...@arm.com; alex.william...@redhat.com; avi.kiv...@gmail.com; mtosa...@redhat.com; Wu, Feng; j...@8bytes.org; b.rey...@virtualopensystems.com Cc: linux-ker...@vger.kernel.org; patc...@linaro.org Subject: Re: [RFC 12/17] irq: bypass: Extend skeleton for ARM forwarding control On 02/07/2015 15:17, Eric Auger wrote: - new fields are added on producer side: linux irq, vfio_device handle, active which reflects whether the source is active (at interrupt controller level or at VFIO level - automasked -) and finally an opaque pointer which will be used to point to the vfio_platform_device in this series. Linux IRQ and active should be okay. As to the vfio_device handle, you should link it from the vfio_platform_device instead. And for the vfio_platform_device, you can link it from the vfio_platform_irq instead. Once you've done this, embed the irq_bypass_producer struct in the vfio_platform_irq struct; in the new kvm_arch_* functions, go back to the vfio_platform_irq struct via container_of. From there you can retrieve pointers to the vfio_platform_device and the vfio_device. - new fields on consumer side: the kvm handle, the gsi You do not need to add these. Instead, add the kvm handle to irqfd only. Like above, embed the irq_bypass_consumer struct in the irqfd struct; in the new kvm_arch_* functions, go back to the vfio_platform_irq struct via container_of. I also need the gsi field here, for posted-interrupts, I need 'gsi', 'irq' to update the IRTE. Oh... we can get gsi from irq_bypass_consumer - _irqfd - gsi, so it is not needed in irq_bypass_consumer. Got it! :) Thanks, Feng Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 09/17] bypass: IRQ bypass manager proto by Alex
Hi Feng, On 07/03/2015 04:16 AM, Wu, Feng wrote: -Original Message- From: Eric Auger [mailto:eric.au...@linaro.org] Sent: Thursday, July 02, 2015 9:17 PM To: eric.au...@st.com; eric.au...@linaro.org; linux-arm-ker...@lists.infradead.org; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; christoffer.d...@linaro.org; marc.zyng...@arm.com; alex.william...@redhat.com; pbonz...@redhat.com; avi.kiv...@gmail.com; mtosa...@redhat.com; Wu, Feng; j...@8bytes.org; b.rey...@virtualopensystems.com Cc: linux-ker...@vger.kernel.org; patc...@linaro.org Subject: [RFC 09/17] bypass: IRQ bypass manager proto by Alex From: Alex Williamson alex.william...@redhat.com There are plenty of details to be filled in, but I think the basics looks something like the code below. The IRQ bypass manager just defines a pair of structures, one for interrupt producers and one for interrupt consumers. I'm certain that we'll need more callbacks than I've defined below, but figuring out what those should be for the best abstraction is the hardest part of this idea. The manager provides both registration and de-registration interfaces for both types of objects and keeps lists for each, protected by a lock. The manager doesn't even really need to know what the match token is, but I assume for our purposes it will be an eventfd_ctx. On the vfio side, the producer struct would be embedded in the vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct in _irqfd. As I've coded below, the IRQ bypass manager calls the consumer callbacks, so the producer struct would need fields or callbacks to provide the consumer the info it needs. AIUI the Posted Interrupt model, VFIO only needs to provide data to the consumer. For IRQ Forwarding, I think the producer needs to be informed when bypass is active to model the incoming interrupt as edge vs level. I've prototyped the base IRQ bypass manager here as static, but I don't see any reason it couldn't be a module that's loaded by dependency when either vfio-pci or kvm-intel is loaded (or other producer/consumer objects). Is this a reasonable starting point to craft the additional fields and callbacks and interaction of who calls who that we need to support Posted Interrupts and IRQ Forwarding? Is the AMD version of this still alive? Thanks, Alex In fact, I also implement a RFC patch for this new framework. I am thinking, can we discuss all the requirements for irq forwarding and posted interrupts, and make it a separate patchset as a general layer? Then we can continue to push arch specific stuff, it is more clear and easy. Sure. I intend to respin today according to Paolo's directives and I will put common patches in a separate series. Let's see next week with Alex how he prefers things to be handled. Best Regards Eric Thanks, Feng --- arch/x86/kvm/Kconfig | 1 + drivers/vfio/pci/Kconfig | 1 + drivers/vfio/pci/vfio_pci_intrs.c | 6 ++ include/linux/irqbypass.h | 23 kernel/irq/Kconfig| 3 + kernel/irq/Makefile | 1 + kernel/irq/bypass.c | 116 ++ virt/kvm/eventfd.c| 4 ++ 8 files changed, 155 insertions(+) create mode 100644 include/linux/irqbypass.h create mode 100644 kernel/irq/bypass.c diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index d8a1d56..86d0d77 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -61,6 +61,7 @@ config KVM_INTEL depends on KVM # for perf_guest_get_msrs(): depends on CPU_SUP_INTEL +select IRQ_BYPASS_MANAGER ---help--- Provides support for KVM on Intel processors equipped with the VT extensions. diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 579d83b..02912f1 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -2,6 +2,7 @@ config VFIO_PCI tristate VFIO support for PCI devices depends on VFIO PCI EVENTFD select VFIO_VIRQFD +select IRQ_BYPASS_MANAGER help Support for the PCI VFIO bus driver. This is required to make use of PCI drivers using the VFIO framework. diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 1f577b4..4e053be 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) if (vdev-ctx[0].trigger) { free_irq(pdev-irq, vdev); +/* irq_bypass_unregister_producer(); */ kfree(vdev-ctx[0].name); eventfd_ctx_put(vdev-ctx[0].trigger); vdev-ctx[0].trigger = NULL; @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) return ret; } +/*
Re: [PATCH 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping
Hi Eric, On 29/06/15 16:37, Eric Auger wrote: If the ITS modality is not available, let's simply support MSI injection by transforming the MSI.data into an SPI ID. This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too. Signed-off-by: Eric Auger eric.au...@linaro.org --- arch/arm/kvm/Kconfig | 1 + virt/kvm/arm/vgic.c | 5 + 2 files changed, 6 insertions(+) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 151e710..0f58baf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_MSI select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT ARM_LPAE ARM_ARCH_TIMER diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 0b4c48c..b3c10dc 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2314,6 +2314,11 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, return kvm-arch.vgic.vm_ops.inject_msi(kvm, msi); else return -ENODEV; + case KVM_IRQ_ROUTING_MSI: + if (kvm-arch.vgic.vm_ops.inject_msi) + return -EINVAL; + else + return kvm_vgic_inject_irq(kvm, 0, e-msi.data, level); If you add: static int vgic_v2m_inject_msi(struct kvm *kvm, struct kvm_msi *msi) { return kvm_vgic_inject_irq(kvm, 0, msi-data, 1); } to vgic-v2-emul.c and wire it up accordingly, you can simplify the above kvm_set_msi, getting rid of all those extra case handling. This also helps merging KVM_IRQ_ROUTING_MSI and the extended case. I have hacked this up and it seems to work for me. Cheers, Andre. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits
On 07/01/2015 02:49 AM, Christoffer Dall wrote: On Wed, Jun 24, 2015 at 05:04:10PM -0700, Mario Smarduch wrote: Currently we save/restore fp/simd on each exit. Fist patch optimizes arm64 save/restore, we only do so on Guest access. hackbench and several lmbench tests show anywhere from 30% to above 50% optimzation achieved. In second patch 32-bit handler is updated to keep exit handling consistent with 64-bit code. 30-50% of what? The overhead or overall performance? Yes, so considering all exits to Host KVM anywhere from 30 to 50% didn't require an fp/simd switch. Anything else you like to see added here? Changes since v1: - Addressed Marcs comments - Verified optimization improvements with lmbench and hackbench, updated commit message Changes since v2: - only for patch 2/2 - Reworked trapping to vfp access handler Changes since v3: - Only for patch 2/2 - Removed load_vcpu in switch_to_guest_vfp per Marcs comment - Got another chance to replace an unreferenced label with a comment Mario Smarduch (2): Optimize arm64 skip 30-50% vfp/simd save/restore on exits keep arm vfp/simd exit handling consistent with arm64 arch/arm/kvm/interrupts.S| 14 +++- arch/arm64/include/asm/kvm_arm.h |5 - arch/arm64/kvm/hyp.S | 46 +++--- 3 files changed, 55 insertions(+), 10 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM
On 02/07/2015 20:01, Xiao Guangrong wrote: Thanks for your review, Stefan and Paolo! On 07/02/2015 05:52 PM, Paolo Bonzini wrote: On 02/07/2015 11:20, Stefan Hajnoczi wrote: Currently, the NVDIMM driver has been merged into upstream Linux Kernel and this patchset tries to enable it in virtualization field From a device model perspective, have you checked whether it makes sense to integrate nvdimms into the pc-dimm and hostmem code that is used for memory hotplug and NUMA? The NVDIMM device in your patches is a completely new TYPE_DEVICE so it doesn't share any interfaces or code with existing memory devices. Maybe that is the right solution here because NVDIMMs have different characteristics, but I'm not sure. The hostmem code should definitely be shared, e.g. by adding a new file property to the memory-backend-file class. ivshmem can also use it---CCing Marc-Andr�. However, file-based memory used by NVDIMM is special, it divides the file to two parts, one part is used as PMEM and another part is used to store NVDIMM's configure data. Maybe we can introduce end-reserved property to reserve specified size at the end of the file. Or create a new class type based on memory-backend-file (named nvdimm-backend-file) class to hide this magic thing? I need to read the code then. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/16] implement vNVDIMM
Thanks for your review, Stefan and Paolo! On 07/02/2015 05:52 PM, Paolo Bonzini wrote: On 02/07/2015 11:20, Stefan Hajnoczi wrote: Currently, the NVDIMM driver has been merged into upstream Linux Kernel and this patchset tries to enable it in virtualization field From a device model perspective, have you checked whether it makes sense to integrate nvdimms into the pc-dimm and hostmem code that is used for memory hotplug and NUMA? The NVDIMM device in your patches is a completely new TYPE_DEVICE so it doesn't share any interfaces or code with existing memory devices. Maybe that is the right solution here because NVDIMMs have different characteristics, but I'm not sure. The hostmem code should definitely be shared, e.g. by adding a new file property to the memory-backend-file class. ivshmem can also use it---CCing Marc-Andr�. However, file-based memory used by NVDIMM is special, it divides the file to two parts, one part is used as PMEM and another part is used to store NVDIMM's configure data. Maybe we can introduce end-reserved property to reserve specified size at the end of the file. Or create a new class type based on memory-backend-file (named nvdimm-backend-file) class to hide this magic thing? I don't know about the pc-dimm devices. If the NVDIMM devices can do _OST and can be hotplugged, then the answer is probably yes. _OST is not needed for NVDIMM. NVDIMM is completely different with dimm memory device in ACPI - it has different HID, method object, memory range detection, device organization, etc. So i prefer to introducing new device class for NVDIMM. For hotplug, NVDIMM and DIMM can share some logic, e.g, free-address-range management, slot management ... ( but new Object initiation in ACPI is complete different), we can abstract these operation as common part. NUMA detection is also different between NVDIMM, DIMM is also different, NVDIMM need to report its NUMA affinity in SPA table. But they can share some common function i think. BTW, i am going to implement vNVDIMM hotplug once linux NVDIMM driver supports it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 14/16] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
On 07/02/2015 05:23 PM, Stefan Hajnoczi wrote: On Wed, Jul 01, 2015 at 10:50:30PM +0800, Xiao Guangrong wrote: +static uint32_t dsm_cmd_config_size(struct dsm_buffer *in, struct dsm_out *out) +{ +GSList *list = get_nvdimm_built_list(); +PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in-handle); +uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV; + +if (!nvdimm) { +goto exit; +} + +status = NFIT_STATUS_SUCCESS; +out-cmd_config_size.config_size = nvdimm-config_data_size; +out-cmd_config_size.max_xfer = max_xfer_config_size(); cpu_to_*() missing? It should be possible to emulate NVDIMMs for a x86_64 guest on a big-endian host, for example. Indeed, will fix it in the next version, thank you for pointing it out. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/12] kvm: add hyper-v crash msrs values
From: Andrey Smetanin asmeta...@virtuozzo.com Added Hyper-V crash msrs values - HV_X64_MSR_CRASH*. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/include/uapi/asm/hyperv.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index ce6068d..8fba544 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -199,6 +199,17 @@ #define HV_X64_MSR_STIMER3_CONFIG 0x40B6 #define HV_X64_MSR_STIMER3_COUNT 0x40B7 +/* Hyper-V guest crash notification MSR's */ +#define HV_X64_MSR_CRASH_P00x4100 +#define HV_X64_MSR_CRASH_P10x4101 +#define HV_X64_MSR_CRASH_P20x4102 +#define HV_X64_MSR_CRASH_P30x4103 +#define HV_X64_MSR_CRASH_P40x4104 +#define HV_X64_MSR_CRASH_CTL 0x4105 +#define HV_X64_MSR_CRASH_CTL_NOTIFY(1ULL 63) +#define HV_X64_MSR_CRASH_PARAMS\ + (1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0)) + #define HV_X64_MSR_HYPERCALL_ENABLE0x0001 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT12 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK \ -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/12] kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file
From: Andrey Smetanin asmeta...@virtuozzo.com This patch introduce Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All Hyper-V kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/include/asm/kvm_host.h | 20 ++- arch/x86/kvm/Makefile | 4 +- arch/x86/kvm/hyperv.c | 307 arch/x86/kvm/hyperv.h | 32 + arch/x86/kvm/lapic.h| 2 +- arch/x86/kvm/x86.c | 265 +- arch/x86/kvm/x86.h | 5 + 7 files changed, 366 insertions(+), 269 deletions(-) create mode 100644 arch/x86/kvm/hyperv.c create mode 100644 arch/x86/kvm/hyperv.h diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c7fa57b..78616aa 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -358,6 +358,11 @@ struct kvm_mtrr { struct list_head head; }; +/* Hyper-V per vcpu emulation context */ +struct kvm_vcpu_hv { + u64 hv_vapic; +}; + struct kvm_vcpu_arch { /* * rip and regs accesses must go through @@ -514,8 +519,7 @@ struct kvm_vcpu_arch { /* used for guest single stepping over the given code position */ unsigned long singlestep_rip; - /* fields used by HYPER-V emulation */ - u64 hv_vapic; + struct kvm_vcpu_hv hyperv; cpumask_var_t wbinvd_dirty_mask; @@ -586,6 +590,13 @@ struct kvm_apic_map { struct kvm_lapic *logical_map[16][16]; }; +/* Hyper-V emulation context */ +struct kvm_hv { + u64 hv_guest_os_id; + u64 hv_hypercall; + u64 hv_tsc_page; +}; + struct kvm_arch { unsigned int n_used_mmu_pages; unsigned int n_requested_mmu_pages; @@ -643,10 +654,7 @@ struct kvm_arch { /* reads protected by irq_srcu, writes by irq_lock */ struct hlist_head mask_notifier_list; - /* fields used by HYPER-V emulation */ - u64 hv_guest_os_id; - u64 hv_hypercall; - u64 hv_tsc_page; + struct kvm_hv hyperv; #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 67d215c..a1ff508 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -12,7 +12,9 @@ kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ - i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o + i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ + hyperv.o + kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= assigned-dev.o iommu.o kvm-intel-y+= vmx.o pmu_intel.o kvm-amd-y += svm.o pmu_amd.o diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c new file mode 100644 index 000..2b49f10 --- /dev/null +++ b/arch/x86/kvm/hyperv.c @@ -0,0 +1,307 @@ +/* + * KVM Microsoft Hyper-V emulation + * + * derived from arch/x86/kvm/x86.c + * + * Copyright (C) 2006 Qumranet, Inc. + * Copyright (C) 2008 Qumranet, Inc. + * Copyright IBM Corporation, 2008 + * Copyright 2010 Red Hat, Inc. and/or its affiliates. + * Copyright (C) 2015 Andrey Smetanin asmeta...@virtuozzo.com + * + * Authors: + * Avi Kivity a...@qumranet.com + * Yaniv Kamay ya...@qumranet.com + * Amit Shahamit.s...@qumranet.com + * Ben-Ami Yassour ben...@il.ibm.com + * Andrey Smetanin asmeta...@virtuozzo.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include x86.h +#include lapic.h +#include hyperv.h + +#include linux/kvm_host.h +#include trace/events/kvm.h + +#include trace.h + +static bool kvm_hv_msr_partition_wide(u32 msr) +{ + bool r = false; + + switch (msr) { + case HV_X64_MSR_GUEST_OS_ID: + case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_REFERENCE_TSC: + case HV_X64_MSR_TIME_REF_COUNT: + r = true; + break; + } + + return r; +} + +static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) +{ + struct kvm *kvm = vcpu-kvm; + struct kvm_hv *hv = kvm-arch.hyperv; + + switch (msr) { + case HV_X64_MSR_GUEST_OS_ID: + hv-hv_guest_os_id = data; + /* setting guest os id to zero disables hypercall page */ + if (!hv-hv_guest_os_id) + hv-hv_hypercall =
[PATCH 5/12] kvm: added KVM_REQ_HV_CRASH value to notify qemu about hyper-v crash
From: Andrey Smetanin asmeta...@virtuozzo.com Added KVM_REQ_HV_CRASH - vcpu request used for notify user space(QEMU) about Hyper-V crash. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2b2edf1..a377e00 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -139,6 +139,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_DISABLE_IBS 24 #define KVM_REQ_APIC_PAGE_RELOAD 25 #define KVM_REQ_SMI 26 +#define KVM_REQ_HV_CRASH 27 #define KVM_USERSPACE_IRQ_SOURCE_ID0 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1 -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/12] kvm: introduce vcpu_debug = kvm_debug + vcpu context
From: Andrey Smetanin asmeta...@virtuozzo.com vcpu_debug is useful macro like kvm_debug but additionally includes vcpu context inside output. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- include/linux/kvm_host.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9564fd7..2b2edf1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -424,6 +424,9 @@ struct kvm { #define vcpu_unimpl(vcpu, fmt, ...)\ kvm_pr_unimpl(vcpu%i fmt, (vcpu)-vcpu_id, ## __VA_ARGS__) +#define vcpu_debug(vcpu, fmt, ...) \ + kvm_debug(vcpu%i fmt, (vcpu)-vcpu_id, ## __VA_ARGS__) + static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) { smp_rmb(); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/12] kvm/x86: mark hyper-v crash msrs as partition wide
From: Andrey Smetanin asmeta...@virtuozzo.com Hyper-V crash msr's are per vm, aren't per vcpu, so mark them as partition wide. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/kvm/hyperv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 2b49f10..af83c96 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -39,6 +39,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) case HV_X64_MSR_HYPERCALL: case HV_X64_MSR_REFERENCE_TSC: case HV_X64_MSR_TIME_REF_COUNT: + case HV_X64_MSR_CRASH_CTL: + case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: r = true; break; } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/12] HyperV equivalent of pvpanic driver
ndows 2012 guests can notify hypervisor about occurred guest crash (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does handling of this MSR's by KVM and sending notification to user space that allows to gather Windows guest crash dump by QEMU/LIBVIRT. The idea is to provide functionality equal to pvpanic device without QEMU guest agent for Windows. The idea is borrowed from Linux HyperV bus driver and validated against Windows 2k12. Changes from v3: * remove unused HV_X64_MSR_CRASH_CTL_NOTIFY * added documentation section about KVM_SYSTEM_EVENT_CRASH * allow only supported values inside crash ctl msr * qemu: split patch into generic crash handling patches and hyperv specific * qemu: skip migration of crash ctl msr value Changes from v2: * forbid modification crash ctl msr by guest * qemu_system_guest_panicked usage in pvpanic and s390x * hyper-v crash handler move from generic kvm to i386 * hyper-v crash handler: skip fetching crash msrs just mark crash occured * sync with linux-next 20150629 * patch 11 squashed to patch 10 * patch 9 squashed to patch 7 Changes from v1: * hyperv code move to hyperv.c * added read handlers of crash data msrs * added per vm and per cpu hyperv context structures * added saving crash msrs inside qemu cpu state * added qemu fetch and update of crash msrs * added qemu crash msrs store in cpu state and it's migration Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Gleb Natapov g...@kernel.org CC: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/12] kvm/x86: added hyper-v crash msrs into kvm hyperv context
From: Andrey Smetanin asmeta...@virtuozzo.com Added kvm Hyper-V context hv crash variables as storage of Hyper-V crash msrs. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/include/asm/kvm_host.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 78616aa..697c1f3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -595,6 +595,10 @@ struct kvm_hv { u64 hv_guest_os_id; u64 hv_hypercall; u64 hv_tsc_page; + + /* Hyper-v based guest crash (NT kernel bugcheck) parameters */ + u64 hv_crash_param[HV_X64_MSR_CRASH_PARAMS]; + u64 hv_crash_ctl; }; struct kvm_arch { -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/12] kvm/x86: added hyper-v crash data and ctl msr's get/set'ers
From: Andrey Smetanin asmeta...@virtuozzo.com Added hyper-v crash msr's(HV_X64_MSR_CRASH*) data and control geters and setters. Userspace should check that such msr's available by check of KVM_CAP_HYPERV_MSR_CRASH capability. User space allowed to setup Hyper-V crash ctl msr. This msr should be setup to HV_X64_MSR_CRASH_CTL_NOTIFY value so Hyper-V guest knows it can send crash data to host. But Hyper-V guest notifies about crash event by writing the same HV_X64_MSR_CRASH_CTL_NOTIFY value into crash ctl msr. So both user space and guest writes inside ctl msr the same value and this patch distingiush the moment of actual guest crash by checking host initiated value from msr info. Also patch prevents modification of crash ctl msr by guest. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/kvm/hyperv.c| 74 ++-- arch/x86/kvm/hyperv.h| 2 +- arch/x86/kvm/x86.c | 8 +- include/uapi/linux/kvm.h | 1 + 4 files changed, 80 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index af83c96..a8160d2 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -48,7 +48,63 @@ static bool kvm_hv_msr_partition_wide(u32 msr) return r; } -static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) +static int kvm_hv_msr_get_crash_data(struct kvm_vcpu *vcpu, +u32 index, u64 *pdata) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + if (WARN_ON_ONCE(index = ARRAY_SIZE(hv-hv_crash_param))) + return -EINVAL; + + *pdata = hv-hv_crash_param[index]; + return 0; +} + +static int kvm_hv_msr_get_crash_ctl(struct kvm_vcpu *vcpu, u64 *pdata) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + *pdata = hv-hv_crash_ctl; + return 0; +} + +static int kvm_hv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u64 data, bool host) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + if (host) + hv-hv_crash_ctl = data HV_X64_MSR_CRASH_CTL_NOTIFY; + + if (!host (data HV_X64_MSR_CRASH_CTL_NOTIFY)) { + + vcpu_debug(vcpu, hv crash (0x%llx 0x%llx 0x%llx 0x%llx 0x%llx)\n, + hv-hv_crash_param[0], + hv-hv_crash_param[1], + hv-hv_crash_param[2], + hv-hv_crash_param[3], + hv-hv_crash_param[4]); + + /* Send notification about crash to user space */ + kvm_make_request(KVM_REQ_HV_CRASH, vcpu); + } + + return 0; +} + +static int kvm_hv_msr_set_crash_data(struct kvm_vcpu *vcpu, +u32 index, u64 data) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + if (WARN_ON_ONCE(index = ARRAY_SIZE(hv-hv_crash_param))) + return -EINVAL; + + hv-hv_crash_param[index] = data; + return 0; +} + +static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data, +bool host) { struct kvm *kvm = vcpu-kvm; struct kvm_hv *hv = kvm-arch.hyperv; @@ -101,6 +157,12 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) mark_page_dirty(kvm, gfn); break; } + case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: + return kvm_hv_msr_set_crash_data(vcpu, +msr - HV_X64_MSR_CRASH_P0, +data); + case HV_X64_MSR_CRASH_CTL: + return kvm_hv_msr_set_crash_ctl(vcpu, data, host); default: vcpu_unimpl(vcpu, Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n, msr, data); @@ -173,6 +235,12 @@ static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case HV_X64_MSR_REFERENCE_TSC: data = hv-hv_tsc_page; break; + case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: + return kvm_hv_msr_get_crash_data(vcpu, +msr - HV_X64_MSR_CRASH_P0, +pdata); + case HV_X64_MSR_CRASH_CTL: + return kvm_hv_msr_get_crash_ctl(vcpu, pdata); default: vcpu_unimpl(vcpu, Hyper-V unhandled rdmsr: 0x%x\n, msr); return 1; @@ -217,13 +285,13 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) return 0; } -int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) +int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host) { if (kvm_hv_msr_partition_wide(msr)) { int r;
[PATCH 01/12] kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file
From: Andrey Smetanin asmeta...@virtuozzo.com This patch introduce Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All Hyper-V kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/include/asm/kvm_host.h | 20 ++- arch/x86/kvm/Makefile | 4 +- arch/x86/kvm/hyperv.c | 307 arch/x86/kvm/hyperv.h | 32 + arch/x86/kvm/lapic.h| 2 +- arch/x86/kvm/x86.c | 265 +- arch/x86/kvm/x86.h | 5 + 7 files changed, 366 insertions(+), 269 deletions(-) create mode 100644 arch/x86/kvm/hyperv.c create mode 100644 arch/x86/kvm/hyperv.h diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c7fa57b..78616aa 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -358,6 +358,11 @@ struct kvm_mtrr { struct list_head head; }; +/* Hyper-V per vcpu emulation context */ +struct kvm_vcpu_hv { + u64 hv_vapic; +}; + struct kvm_vcpu_arch { /* * rip and regs accesses must go through @@ -514,8 +519,7 @@ struct kvm_vcpu_arch { /* used for guest single stepping over the given code position */ unsigned long singlestep_rip; - /* fields used by HYPER-V emulation */ - u64 hv_vapic; + struct kvm_vcpu_hv hyperv; cpumask_var_t wbinvd_dirty_mask; @@ -586,6 +590,13 @@ struct kvm_apic_map { struct kvm_lapic *logical_map[16][16]; }; +/* Hyper-V emulation context */ +struct kvm_hv { + u64 hv_guest_os_id; + u64 hv_hypercall; + u64 hv_tsc_page; +}; + struct kvm_arch { unsigned int n_used_mmu_pages; unsigned int n_requested_mmu_pages; @@ -643,10 +654,7 @@ struct kvm_arch { /* reads protected by irq_srcu, writes by irq_lock */ struct hlist_head mask_notifier_list; - /* fields used by HYPER-V emulation */ - u64 hv_guest_os_id; - u64 hv_hypercall; - u64 hv_tsc_page; + struct kvm_hv hyperv; #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 67d215c..a1ff508 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -12,7 +12,9 @@ kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ - i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o + i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ + hyperv.o + kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= assigned-dev.o iommu.o kvm-intel-y+= vmx.o pmu_intel.o kvm-amd-y += svm.o pmu_amd.o diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c new file mode 100644 index 000..2b49f10 --- /dev/null +++ b/arch/x86/kvm/hyperv.c @@ -0,0 +1,307 @@ +/* + * KVM Microsoft Hyper-V emulation + * + * derived from arch/x86/kvm/x86.c + * + * Copyright (C) 2006 Qumranet, Inc. + * Copyright (C) 2008 Qumranet, Inc. + * Copyright IBM Corporation, 2008 + * Copyright 2010 Red Hat, Inc. and/or its affiliates. + * Copyright (C) 2015 Andrey Smetanin asmeta...@virtuozzo.com + * + * Authors: + * Avi Kivity a...@qumranet.com + * Yaniv Kamay ya...@qumranet.com + * Amit Shahamit.s...@qumranet.com + * Ben-Ami Yassour ben...@il.ibm.com + * Andrey Smetanin asmeta...@virtuozzo.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include x86.h +#include lapic.h +#include hyperv.h + +#include linux/kvm_host.h +#include trace/events/kvm.h + +#include trace.h + +static bool kvm_hv_msr_partition_wide(u32 msr) +{ + bool r = false; + + switch (msr) { + case HV_X64_MSR_GUEST_OS_ID: + case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_REFERENCE_TSC: + case HV_X64_MSR_TIME_REF_COUNT: + r = true; + break; + } + + return r; +} + +static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) +{ + struct kvm *kvm = vcpu-kvm; + struct kvm_hv *hv = kvm-arch.hyperv; + + switch (msr) { + case HV_X64_MSR_GUEST_OS_ID: + hv-hv_guest_os_id = data; + /* setting guest os id to zero disables hypercall page */ + if (!hv-hv_guest_os_id) + hv-hv_hypercall =
[PATCH 08/12] kvm/x86: add sending hyper-v crash notification to user space
From: Andrey Smetanin asmeta...@virtuozzo.com Sending of notification is done by exiting vcpu to user space if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space the kvm_run structure contains system_event with type KVM_SYSTEM_EVENT_CRASH to notify about guest crash occured. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- Documentation/virtual/kvm/api.txt | 5 + arch/x86/kvm/x86.c| 6 ++ include/uapi/linux/kvm.h | 1 + 3 files changed, 12 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index a7926a9..a4ebcb7 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3277,6 +3277,7 @@ should put the acknowledged interrupt vector into the 'epr' field. struct { #define KVM_SYSTEM_EVENT_SHUTDOWN 1 #define KVM_SYSTEM_EVENT_RESET 2 +#define KVM_SYSTEM_EVENT_CRASH 3 __u32 type; __u64 flags; } system_event; @@ -3296,6 +3297,10 @@ Valid values for 'type' are: KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. As with SHUTDOWN, userspace can choose to ignore the request, or to schedule the reset to occur in the future and may call KVM_RUN again. + KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest + has requested a crash condition maintenance. Userspace can choose + to ignore the request, or to gather VM memory core dump and/or + reset/shutdown of the VM. /* Fix the size of the union. */ char padding[256]; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b4c2767..28e79c0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6265,6 +6265,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) vcpu_scan_ioapic(vcpu); if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) kvm_vcpu_reload_apic_access_page(vcpu); + if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) { + vcpu-run-exit_reason = KVM_EXIT_SYSTEM_EVENT; + vcpu-run-system_event.type = KVM_SYSTEM_EVENT_CRASH; + r = 0; + goto out; + } } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5da4ca3..c8c6b8b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -317,6 +317,7 @@ struct kvm_run { struct { #define KVM_SYSTEM_EVENT_SHUTDOWN 1 #define KVM_SYSTEM_EVENT_RESET 2 +#define KVM_SYSTEM_EVENT_CRASH 3 __u32 type; __u64 flags; } system_event; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/12] kvm/x86: added hyper-v crash data and ctl msr's get/set'ers
From: Andrey Smetanin asmeta...@virtuozzo.com Added hyper-v crash msr's(HV_X64_MSR_CRASH*) data and control geters and setters. Userspace should check that such msr's available by check of KVM_CAP_HYPERV_MSR_CRASH capability. User space allowed to setup Hyper-V crash ctl msr. This msr should be setup to HV_X64_MSR_CRASH_CTL_NOTIFY value so Hyper-V guest knows it can send crash data to host. But Hyper-V guest notifies about crash event by writing the same HV_X64_MSR_CRASH_CTL_NOTIFY value into crash ctl msr. So both user space and guest writes inside ctl msr the same value and this patch distingiush the moment of actual guest crash by checking host initiated value from msr info. Also patch prevents modification of crash ctl msr by guest. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/kvm/hyperv.c| 74 ++-- arch/x86/kvm/hyperv.h| 2 +- arch/x86/kvm/x86.c | 8 +- include/uapi/linux/kvm.h | 1 + 4 files changed, 80 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index af83c96..a8160d2 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -48,7 +48,63 @@ static bool kvm_hv_msr_partition_wide(u32 msr) return r; } -static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) +static int kvm_hv_msr_get_crash_data(struct kvm_vcpu *vcpu, +u32 index, u64 *pdata) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + if (WARN_ON_ONCE(index = ARRAY_SIZE(hv-hv_crash_param))) + return -EINVAL; + + *pdata = hv-hv_crash_param[index]; + return 0; +} + +static int kvm_hv_msr_get_crash_ctl(struct kvm_vcpu *vcpu, u64 *pdata) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + *pdata = hv-hv_crash_ctl; + return 0; +} + +static int kvm_hv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u64 data, bool host) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + if (host) + hv-hv_crash_ctl = data HV_X64_MSR_CRASH_CTL_NOTIFY; + + if (!host (data HV_X64_MSR_CRASH_CTL_NOTIFY)) { + + vcpu_debug(vcpu, hv crash (0x%llx 0x%llx 0x%llx 0x%llx 0x%llx)\n, + hv-hv_crash_param[0], + hv-hv_crash_param[1], + hv-hv_crash_param[2], + hv-hv_crash_param[3], + hv-hv_crash_param[4]); + + /* Send notification about crash to user space */ + kvm_make_request(KVM_REQ_HV_CRASH, vcpu); + } + + return 0; +} + +static int kvm_hv_msr_set_crash_data(struct kvm_vcpu *vcpu, +u32 index, u64 data) +{ + struct kvm_hv *hv = vcpu-kvm-arch.hyperv; + + if (WARN_ON_ONCE(index = ARRAY_SIZE(hv-hv_crash_param))) + return -EINVAL; + + hv-hv_crash_param[index] = data; + return 0; +} + +static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data, +bool host) { struct kvm *kvm = vcpu-kvm; struct kvm_hv *hv = kvm-arch.hyperv; @@ -101,6 +157,12 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) mark_page_dirty(kvm, gfn); break; } + case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: + return kvm_hv_msr_set_crash_data(vcpu, +msr - HV_X64_MSR_CRASH_P0, +data); + case HV_X64_MSR_CRASH_CTL: + return kvm_hv_msr_set_crash_ctl(vcpu, data, host); default: vcpu_unimpl(vcpu, Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n, msr, data); @@ -173,6 +235,12 @@ static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case HV_X64_MSR_REFERENCE_TSC: data = hv-hv_tsc_page; break; + case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: + return kvm_hv_msr_get_crash_data(vcpu, +msr - HV_X64_MSR_CRASH_P0, +pdata); + case HV_X64_MSR_CRASH_CTL: + return kvm_hv_msr_get_crash_ctl(vcpu, pdata); default: vcpu_unimpl(vcpu, Hyper-V unhandled rdmsr: 0x%x\n, msr); return 1; @@ -217,13 +285,13 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) return 0; } -int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) +int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host) { if (kvm_hv_msr_partition_wide(msr)) { int r;
[PATCH 12/12] qemu/kvm/x86: hyper-v crash msrs set/get'ers and migration
From: Andrey Smetanin asmeta...@virtuozzo.com KVM Hyper-V based guests can notify hypervisor about occurred guest crash by writing into Hyper-V crash MSR's. This patch does handling and migration of HV_X64_MSR_CRASH_P0-P4, HV_X64_MSR_CRASH_CTL msrs. User can enable these MSR's by 'hv-crash' option. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Paolo Bonzini pbonz...@redhat.com CC: Andreas Färber afaer...@suse.de --- linux-headers/asm-x86/hyperv.h | 13 + linux-headers/linux/kvm.h | 1 + target-i386/cpu-qom.h | 1 + target-i386/cpu.c | 1 + target-i386/cpu.h | 2 ++ target-i386/kvm.c | 27 +++ target-i386/machine.c | 26 ++ 7 files changed, 71 insertions(+) diff --git a/linux-headers/asm-x86/hyperv.h b/linux-headers/asm-x86/hyperv.h index ce6068d..5f88dc7 100644 --- a/linux-headers/asm-x86/hyperv.h +++ b/linux-headers/asm-x86/hyperv.h @@ -108,6 +108,8 @@ #define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE (1 4) /* Support for a virtual guest idle state is available */ #define HV_X64_GUEST_IDLE_STATE_AVAILABLE (1 5) +/* Guest crash data handler available */ +#define HV_X64_GUEST_CRASH_MSR_AVAILABLE (1 10) /* * Implementation recommendations. Indicates which behaviors the hypervisor @@ -199,6 +201,17 @@ #define HV_X64_MSR_STIMER3_CONFIG 0x40B6 #define HV_X64_MSR_STIMER3_COUNT 0x40B7 +/* Hypev-V guest crash notification MSR's */ +#define HV_X64_MSR_CRASH_P00x4100 +#define HV_X64_MSR_CRASH_P10x4101 +#define HV_X64_MSR_CRASH_P20x4102 +#define HV_X64_MSR_CRASH_P30x4103 +#define HV_X64_MSR_CRASH_P40x4104 +#define HV_X64_MSR_CRASH_CTL 0x4105 +#define HV_X64_MSR_CRASH_CTL_NOTIFY(1ULL 63) +#define HV_X64_MSR_CRASH_PARAMS\ + (1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0)) + #define HV_X64_MSR_HYPERCALL_ENABLE0x0001 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT12 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK \ diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 409be37..efe720e 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -818,6 +818,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_HYPERV_MSR_CRASH 119 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h index 7a4fddd..c35b624 100644 --- a/target-i386/cpu-qom.h +++ b/target-i386/cpu-qom.h @@ -89,6 +89,7 @@ typedef struct X86CPU { bool hyperv_relaxed_timing; int hyperv_spinlock_attempts; bool hyperv_time; +bool hyperv_crash; bool check_cpuid; bool enforce_cpuid; bool expose_kvm; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 36b07f9..04a8408 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -3117,6 +3117,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_BOOL(hv-relaxed, X86CPU, hyperv_relaxed_timing, false), DEFINE_PROP_BOOL(hv-vapic, X86CPU, hyperv_vapic, false), DEFINE_PROP_BOOL(hv-time, X86CPU, hyperv_time, false), +DEFINE_PROP_BOOL(hv-crash, X86CPU, hyperv_crash, false), DEFINE_PROP_BOOL(check, X86CPU, check_cpuid, false), DEFINE_PROP_BOOL(enforce, X86CPU, enforce_cpuid, false), DEFINE_PROP_BOOL(kvm, X86CPU, expose_kvm, true), diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 603aaf0..6c2352a 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -21,6 +21,7 @@ #include config.h #include qemu-common.h +#include asm/hyperv.h #ifdef TARGET_X86_64 #define TARGET_LONG_BITS 64 @@ -904,6 +905,7 @@ typedef struct CPUX86State { uint64_t msr_hv_guest_os_id; uint64_t msr_hv_vapic; uint64_t msr_hv_tsc; +uint64_t msr_hv_crash_prm[HV_X64_MSR_CRASH_PARAMS]; /* exception/interrupt handling */ int error_code; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index daced5c..f3456af 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -79,6 +79,7 @@ static int lm_capable_kernel; static bool has_msr_hv_hypercall; static bool has_msr_hv_vapic; static bool has_msr_hv_tsc; +static bool has_msr_hv_crash; static bool has_msr_mtrr; static bool has_msr_xss; @@ -515,6 +516,12 @@ int kvm_arch_init_vcpu(CPUState *cs) c-eax |= 0x200; has_msr_hv_tsc = true; } +if (cpu-hyperv_crash +kvm_check_extension(cs-kvm_state, KVM_CAP_HYPERV_MSR_CRASH) 0) { +c-edx |= HV_X64_GUEST_CRASH_MSR_AVAILABLE; +has_msr_hv_crash = true; +} + c = cpuid_data.entries[cpuid_i++]; c-function =
[PATCH v5 0/12] HyperV equivalent of pvpanic driver
Windows 2012 guests can notify hypervisor about occurred guest crash (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does handling of this MSR's by KVM and sending notification to user space that allows to gather Windows guest crash dump by QEMU/LIBVIRT. The idea is to provide functionality equal to pvpanic device without QEMU guest agent for Windows. The idea is borrowed from Linux HyperV bus driver and validated against Windows 2k12. Changes from v4: * fixed typo in email of Andreas Färber afaer...@suse.de my vim strangely behaves on lines with extended Deutch chars Changes from v3: * remove unused HV_X64_MSR_CRASH_CTL_NOTIFY * added documentation section about KVM_SYSTEM_EVENT_CRASH * allow only supported values inside crash ctl msr * qemu: split patch into generic crash handling patches and hyperv specific * qemu: skip migration of crash ctl msr value Changes from v2: * forbid modification crash ctl msr by guest * qemu_system_guest_panicked usage in pvpanic and s390x * hyper-v crash handler move from generic kvm to i386 * hyper-v crash handler: skip fetching crash msrs just mark crash occured * sync with linux-next 20150629 * patch 11 squashed to patch 10 * patch 9 squashed to patch 7 Changes from v1: * hyperv code move to hyperv.c * added read handlers of crash data msrs * added per vm and per cpu hyperv context structures * added saving crash msrs inside qemu cpu state * added qemu fetch and update of crash msrs * added qemu crash msrs store in cpu state and it's migration Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Gleb Natapov g...@kernel.org CC: Paolo Bonzini pbonz...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/12] kvm: introduce vcpu_debug = kvm_debug + vcpu context
From: Andrey Smetanin asmeta...@virtuozzo.com vcpu_debug is useful macro like kvm_debug but additionally includes vcpu context inside output. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- include/linux/kvm_host.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9564fd7..2b2edf1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -424,6 +424,9 @@ struct kvm { #define vcpu_unimpl(vcpu, fmt, ...)\ kvm_pr_unimpl(vcpu%i fmt, (vcpu)-vcpu_id, ## __VA_ARGS__) +#define vcpu_debug(vcpu, fmt, ...) \ + kvm_debug(vcpu%i fmt, (vcpu)-vcpu_id, ## __VA_ARGS__) + static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) { smp_rmb(); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs
On Wed, Jul 01, 2015 at 07:18:40PM +0100, Marc Zyngier wrote: On 01/07/15 12:58, Christoffer Dall wrote: On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote: On 30/06/15 21:19, Christoffer Dall wrote: On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote: We only set the irq_queued flag for level interrupts, meaning that !vgic_irq_is_queued(vcpu, irq) is a good enough predicate for all interrupts. This will allow us to inject edge HW interrupts, for which the state ACTIVE+PENDING is not allowed. I don't understand this; ACTIVE+PENDING is allowed for edge interrupts. Do you mean that if we set the HW bit in the LR, then we are linking to an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW GIC side? Why is this relevant here? I feel like I'm missing context. I've probably taken a shortcut here - bear with me while I'm trying to explain the issue. For HW interrupts, we shouldn't even try to use the state bits in the LR, because that state is contained in the physical distributor. Setting the HW bit really means there is something going on at the distributor level, just go there. ok, so by HW interrupts you mean virtual interrupts with the HW bit in the LR set, correct? Yes, sorry. If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd basically loose the second interrupt because that state is simply not considered. Huh? Which second interrupt. I looked at the spec and it says don't use the state bits for HW interrupts, so isn't it simply not supported to set these bits at all and that's it? I managed to confuse myself reading the same bit. It says (GICv3 spec): A hypervisor must only use the pending and active state for software originated interrupts, which are typically associated with virtual devices, or SGIs. That's the PENDING+ACTIVE state, and not the pending and active bits like I read it initially. Now consider the following scenario: - We inject a virtual edge interrupt - We mark the corresponding physical interrupt as active. - Queue interrupt in an LR - Resume vcpu Now, we inject another edge interrupt, the vcpu exits for whatever reason, and the previously injected interrupt is still active. The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in the LR, and resume the vcpu. But the above states that this is invalid for HW generated interrupts. Right, ok, so we must resample the pending state even for an edge-triggered interrupt once it's EOIed, because we cannot put it in the LR despite it being pending on the physical distributor? Incidentally, we do not need to set the EOI_INT bit, becuase when the guest EOIs the interrupt, it will also deactivate it on the physical distributor and the hardware will then take the pending physical interrupt, we will handle it in the host, etc. etc. If we had a different *shared* device than the timer which is edge-triggered, don't we then also need to capture the physical distributor's pending state along with the state of the device unless we assume that upon restoring the state for the device count on the device to have another rising/falling edge to trigger the interrupt again? (I assume the line would always go high for a level-triggered interrupt in this case). So the trick we're using is to only inject the active interrupt, and prevent anything else from being injected until we can confirm that the active state has been cleared at the physical level. Does it make any sense? Sort of, but what I don't understand now is how the guest ever sees the interrupt then. If we always inject the virtual interrupt by setting the active state on the physical distributor, and we can't inject this as active+pending, and the guest doesn't see the state in the LR, then how does this ever raise a virtual interrupt and how does the guest see an interrupt which is only PENDING so that it can ack it etc. etc.? Maybe I don't fully understand how the HW bit works after all... The way the spec is written is slightly misleading. But the gist of it is that we still signal the guest using the PENDING bit in the LR, and switch the LR as usual. it is just that we can't use the PENDING+ACTIVE state (apparently, this can lead to a double deactivation). Not sure the above makes sense. Beer time, I suppose. It does make sense, I just had to sleep on it and see the code as a whole instead of trying to understand it by just looking at this patch individually. Thanks, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/12] kvm/x86: added hyper-v crash data and ctl msr's get/set'ers
On 02/07/2015 18:07, Denis V. Lunev wrote: From: Andrey Smetanin asmeta...@virtuozzo.com Added hyper-v crash msr's(HV_X64_MSR_CRASH*) data and control geters and setters. Userspace should check that such msr's available by check of KVM_CAP_HYPERV_MSR_CRASH capability. It should use the existing KVM_GET_SUPPORTED_MSRS infrastructure. See emulated_msrs where other Hyper-V MSRs are listed. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/12] kvm/x86: add sending hyper-v crash notification to user space
From: Andrey Smetanin asmeta...@virtuozzo.com Sending of notification is done by exiting vcpu to user space if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space the kvm_run structure contains system_event with type KVM_SYSTEM_EVENT_CRASH to notify about guest crash occured. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- Documentation/virtual/kvm/api.txt | 5 + arch/x86/kvm/x86.c| 6 ++ include/uapi/linux/kvm.h | 1 + 3 files changed, 12 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index a7926a9..a4ebcb7 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3277,6 +3277,7 @@ should put the acknowledged interrupt vector into the 'epr' field. struct { #define KVM_SYSTEM_EVENT_SHUTDOWN 1 #define KVM_SYSTEM_EVENT_RESET 2 +#define KVM_SYSTEM_EVENT_CRASH 3 __u32 type; __u64 flags; } system_event; @@ -3296,6 +3297,10 @@ Valid values for 'type' are: KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. As with SHUTDOWN, userspace can choose to ignore the request, or to schedule the reset to occur in the future and may call KVM_RUN again. + KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest + has requested a crash condition maintenance. Userspace can choose + to ignore the request, or to gather VM memory core dump and/or + reset/shutdown of the VM. /* Fix the size of the union. */ char padding[256]; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b4c2767..28e79c0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6265,6 +6265,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) vcpu_scan_ioapic(vcpu); if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) kvm_vcpu_reload_apic_access_page(vcpu); + if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) { + vcpu-run-exit_reason = KVM_EXIT_SYSTEM_EVENT; + vcpu-run-system_event.type = KVM_SYSTEM_EVENT_CRASH; + r = 0; + goto out; + } } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5da4ca3..c8c6b8b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -317,6 +317,7 @@ struct kvm_run { struct { #define KVM_SYSTEM_EVENT_SHUTDOWN 1 #define KVM_SYSTEM_EVENT_RESET 2 +#define KVM_SYSTEM_EVENT_CRASH 3 __u32 type; __u64 flags; } system_event; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/12] kvm: add hyper-v crash msrs values
From: Andrey Smetanin asmeta...@virtuozzo.com Added Hyper-V crash msrs values - HV_X64_MSR_CRASH*. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/include/uapi/asm/hyperv.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index ce6068d..8fba544 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -199,6 +199,17 @@ #define HV_X64_MSR_STIMER3_CONFIG 0x40B6 #define HV_X64_MSR_STIMER3_COUNT 0x40B7 +/* Hyper-V guest crash notification MSR's */ +#define HV_X64_MSR_CRASH_P00x4100 +#define HV_X64_MSR_CRASH_P10x4101 +#define HV_X64_MSR_CRASH_P20x4102 +#define HV_X64_MSR_CRASH_P30x4103 +#define HV_X64_MSR_CRASH_P40x4104 +#define HV_X64_MSR_CRASH_CTL 0x4105 +#define HV_X64_MSR_CRASH_CTL_NOTIFY(1ULL 63) +#define HV_X64_MSR_CRASH_PARAMS\ + (1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0)) + #define HV_X64_MSR_HYPERCALL_ENABLE0x0001 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT12 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK \ -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] kvm/x86: mark hyper-v crash msrs as partition wide
From: Andrey Smetanin asmeta...@virtuozzo.com Hyper-V crash msr's are per vm, aren't per vcpu, so mark them as partition wide. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/kvm/hyperv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 2b49f10..af83c96 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -39,6 +39,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) case HV_X64_MSR_HYPERCALL: case HV_X64_MSR_REFERENCE_TSC: case HV_X64_MSR_TIME_REF_COUNT: + case HV_X64_MSR_CRASH_CTL: + case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: r = true; break; } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/12] qemu: added qemu_system_guest_panicked() - generic guest panic handler
From: Andrey Smetanin asmeta...@virtuozzo.com There are pieces of guest panic handling code that can be shared in one generic function. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Paolo Bonzini pbonz...@redhat.com CC: Andreas Färber afaer...@suse.de --- hw/misc/pvpanic.c | 3 +-- include/sysemu/sysemu.h | 1 + target-s390x/kvm.c | 11 ++- vl.c| 6 ++ 4 files changed, 10 insertions(+), 11 deletions(-) diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 994f8af..3709488 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -41,8 +41,7 @@ static void handle_event(int event) } if (event PVPANIC_PANICKED) { -qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort); -vm_stop(RUN_STATE_GUEST_PANICKED); +qemu_system_guest_panicked(); return; } } diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index df80951..70164c9 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -68,6 +68,7 @@ int qemu_reset_requested_get(void); void qemu_system_killed(int signal, pid_t pid); void qemu_devices_reset(void); void qemu_system_reset(bool report); +void qemu_system_guest_panicked(void); void qemu_add_exit_notifier(Notifier *notify); void qemu_remove_exit_notifier(Notifier *notify); diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index 135111a..e5bd3ef 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -1796,13 +1796,6 @@ static bool is_special_wait_psw(CPUState *cs) return cs-kvm_run-psw_addr == 0xfffUL; } -static void guest_panicked(void) -{ -qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, - error_abort); -vm_stop(RUN_STATE_GUEST_PANICKED); -} - static void unmanageable_intercept(S390CPU *cpu, const char *str, int pswoffset) { CPUState *cs = CPU(cpu); @@ -1811,7 +1804,7 @@ static void unmanageable_intercept(S390CPU *cpu, const char *str, int pswoffset) str, cs-cpu_index, ldq_phys(cs-as, cpu-env.psa + pswoffset), ldq_phys(cs-as, cpu-env.psa + pswoffset + 8)); s390_cpu_halt(cpu); -guest_panicked(); +qemu_system_guest_panicked(); } static int handle_intercept(S390CPU *cpu) @@ -1844,7 +1837,7 @@ static int handle_intercept(S390CPU *cpu) if (is_special_wait_psw(cs)) { qemu_system_shutdown_request(); } else { -guest_panicked(); +qemu_system_guest_panicked(); } } r = EXCP_HALTED; diff --git a/vl.c b/vl.c index 69ad90c..38eee1f 100644 --- a/vl.c +++ b/vl.c @@ -1721,6 +1721,12 @@ void qemu_system_reset(bool report) cpu_synchronize_all_post_reset(); } +void qemu_system_guest_panicked(void) +{ +qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort); +vm_stop(RUN_STATE_GUEST_PANICKED); +} + void qemu_system_reset_request(void) { if (no_reboot) { -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/12] qemu: add crash_occurred flag into CPUState
From: Andrey Smetanin asmeta...@virtuozzo.com CPUState-crash_occurred value inside CPUState marks that guest crash occurred. This value added into cpu common migration subsection. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Paolo Bonzini pbonz...@redhat.com CC: Andreas Färber afaer...@suse.de --- exec.c| 19 +++ include/qom/cpu.h | 1 + vl.c | 3 +++ 3 files changed, 23 insertions(+) diff --git a/exec.c b/exec.c index f7883d2..adf49e8 100644 --- a/exec.c +++ b/exec.c @@ -465,6 +465,24 @@ static const VMStateDescription vmstate_cpu_common_exception_index = { } }; +static bool cpu_common_crash_occurred_needed(void *opaque) +{ +CPUState *cpu = opaque; + +return cpu-crash_occurred != 0; +} + +static const VMStateDescription vmstate_cpu_common_crash_occurred = { +.name = cpu_common/crash_occurred, +.version_id = 1, +.minimum_version_id = 1, +.needed = cpu_common_crash_occurred_needed, +.fields = (VMStateField[]) { +VMSTATE_UINT32(crash_occurred, CPUState), +VMSTATE_END_OF_LIST() +} +}; + const VMStateDescription vmstate_cpu_common = { .name = cpu_common, .version_id = 1, @@ -478,6 +496,7 @@ const VMStateDescription vmstate_cpu_common = { }, .subsections = (const VMStateDescription*[]) { vmstate_cpu_common_exception_index, +vmstate_cpu_common_crash_occurred, NULL } }; diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 39f0f19..f559a69 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -263,6 +263,7 @@ struct CPUState { bool created; bool stop; bool stopped; +uint32_t crash_occurred; volatile sig_atomic_t exit_request; uint32_t interrupt_request; int singlestep_enabled; diff --git a/vl.c b/vl.c index 38eee1f..9e0aee5 100644 --- a/vl.c +++ b/vl.c @@ -1723,6 +1723,9 @@ void qemu_system_reset(bool report) void qemu_system_guest_panicked(void) { +if (current_cpu) { +current_cpu-crash_occurred = 1; +} qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, error_abort); vm_stop(RUN_STATE_GUEST_PANICKED); } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/12] qemu/kvm: added kvm system event crash handler
From: Andrey Smetanin asmeta...@virtuozzo.com KVM kernel can receive guest crash events. Patch code calls appropriate handler for kernel guest crash event. Guest crash event recognized by KVM_SYSTEM_EVENT_CRASH type of system event. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Paolo Bonzini pbonz...@redhat.com CC: Andreas Färber afaer...@suse.de --- kvm-all.c | 4 linux-headers/linux/kvm.h | 1 + 2 files changed, 5 insertions(+) diff --git a/kvm-all.c b/kvm-all.c index 53e01d4..7a959b6 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1844,6 +1844,10 @@ int kvm_cpu_exec(CPUState *cpu) qemu_system_reset_request(); ret = EXCP_INTERRUPT; break; +case KVM_SYSTEM_EVENT_CRASH: +qemu_system_guest_panicked(); +ret = 0; +break; default: DPRINTF(kvm_arch_handle_exit\n); ret = kvm_arch_handle_exit(cpu, run); diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index fad9e5c..409be37 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -317,6 +317,7 @@ struct kvm_run { struct { #define KVM_SYSTEM_EVENT_SHUTDOWN 1 #define KVM_SYSTEM_EVENT_RESET 2 +#define KVM_SYSTEM_EVENT_CRASH 3 __u32 type; __u64 flags; } system_event; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/12] kvm/x86: added hyper-v crash msrs into kvm hyperv context
From: Andrey Smetanin asmeta...@virtuozzo.com Added kvm Hyper-V context hv crash variables as storage of Hyper-V crash msrs. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- arch/x86/include/asm/kvm_host.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 78616aa..697c1f3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -595,6 +595,10 @@ struct kvm_hv { u64 hv_guest_os_id; u64 hv_hypercall; u64 hv_tsc_page; + + /* Hyper-v based guest crash (NT kernel bugcheck) parameters */ + u64 hv_crash_param[HV_X64_MSR_CRASH_PARAMS]; + u64 hv_crash_ctl; }; struct kvm_arch { -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/12] kvm: added KVM_REQ_HV_CRASH value to notify qemu about hyper-v crash
From: Andrey Smetanin asmeta...@virtuozzo.com Added KVM_REQ_HV_CRASH - vcpu request used for notify user space(QEMU) about Hyper-V crash. Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com Signed-off-by: Denis V. Lunev d...@openvz.org Reviewed-by: Peter Hornyack peterhorny...@google.com CC: Paolo Bonzini pbonz...@redhat.com CC: Gleb Natapov g...@kernel.org --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2b2edf1..a377e00 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -139,6 +139,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_DISABLE_IBS 24 #define KVM_REQ_APIC_PAGE_RELOAD 25 #define KVM_REQ_SMI 26 +#define KVM_REQ_HV_CRASH 27 #define KVM_USERSPACE_IRQ_SOURCE_ID0 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1 -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/12] qemu/kvm/x86: hyper-v crash msrs set/get'ers and migration
On 02/07/2015 18:07, Denis V. Lunev wrote: +if (cpu-hyperv_crash +kvm_check_extension(cs-kvm_state, KVM_CAP_HYPERV_MSR_CRASH) 0) { +c-edx |= HV_X64_GUEST_CRASH_MSR_AVAILABLE; +has_msr_hv_crash = true; +} + Please patch kvm_get_supported_msrs instead of adding a capability. The QEMU parts are otherwise okay. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html