Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 11:37:46AM +0100, Peter Maydell wrote: On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote: Forgot to (4): switch from setting userspace's mapping to device memory to normal, non-cacheable. Using device memory caused a problem that Alex Graf found, and Peter Maydell suggested using normal, non-cacheable instead. Did you check that non-cacheable is definitely the correct kind of Normal memory attribute we want? (ie not write-through). I was concerned that write-through wouldn't be sufficient. If the guest writes to its non-cached memory, and QEMU needs to see what it wrote, then won't write-through fail to work? Unless we some how invalidate the cache first? drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 01:38:38PM +0200, Paolo Bonzini wrote: On 14/05/2015 13:36, Christoffer Dall wrote: (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? Only until device assignment gets into the picture. Will UEFI have to deal with device assignment in any respect? Why not? For example you could do network boot from an assigned network card. In fact, anything that UEFI has to deal with, the OS has to deal with too. If you need a UEFI hack, chances are you need or will need a Linux hack too. Fair enough. I was thinking that UEFI needs to be built with knowledge of all the hardware present including any passthrough devices, but I guess this is plainly not true with PCI (and might not even be true with the level of DT parsing we do for the virtual platform). So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 05/14/15 15:00, Andrew Jones wrote: On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote: On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote: Well, PCI BARs are generally MMIO resources, and hence should not be cached. As an optimization, OS drivers can mark them as cacheable or write-combining or something like that, but in general it's a safe default to leave them uncached---one would think. Isn't this handled by the OS mapping them in the 'prefetchable' MMIO window rather than the 'non-prefetchable' one? (QEMU's generic-PCIe device doesn't yet support the prefetchable window.) I was thinking (with my limited PCI knowledge) the same thing, and was planning on experimenting with that. This could be supported in UEFI as well, with the following steps: - the DTB that QEMU provides UEFI with should advertise such a prefetchable window. - The driver in UEFI that parses the DTB should understand that DTB node (well, record type), and store the appropriate base size into some new dynamic PCDs (= basically, firmware wide global variables; PCD = platform configuration database) - The entry point of the host bridge driver would call gDS-AddMemorySpace() twice, separately for the two different windows, with their appropriate caching attributes. - The host bridge driver needs to be extended so that TypePMem32 requests are not rejected (like now); they should be handled similarly to TypeMem32. Except, the gDS-AllocateMemorySpace() call should allocate from the prefetchable range (determined by the new PCDs above). - QEMU's emulated devices should then expose their BARs as prefetchable (so that the above branch would be taken in the host bridge driver). (Of course, if QEMU intends to emulate PCI devices somewhat realistically, then QEMU should claim non-prefetchable for BARs that would not be prefetchable on physical hardware either, and then the hypervisor should accommodate the firmware's UC mapping and say hey I know better, we're virtual in fact, and override the attribute (- use WB instead of UC). With which we'd be back to square one...) Thanks Laszlo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 5/5] arm64: KVM: Switch vgic save/restore to alternative_insn
On Fri, Mar 27, 2015 at 01:09:25PM +, Marc Zyngier wrote: So far, we configured the world-switch by having a small array of pointers to the save and restore functions, depending on the GIC used on the platform. Loading these values each time is a bit silly (they never change), and it makes sense to rely on the instruction patching instead. This leads to a nice cleanup of the code. Acked-by: Will Deacon will.dea...@arm.com Signed-off-by: Marc Zyngier marc.zyng...@arm.com I gave this a quick spin on Juno as well and works as expected: Reviewed-by: Christoffer Dall christoffer.d...@linaro.org ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote: Well, PCI BARs are generally MMIO resources, and hence should not be cached. As an optimization, OS drivers can mark them as cacheable or write-combining or something like that, but in general it's a safe default to leave them uncached---one would think. Isn't this handled by the OS mapping them in the 'prefetchable' MMIO window rather than the 'non-prefetchable' one? (QEMU's generic-PCIe device doesn't yet support the prefetchable window.) -- PMM ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:00, Christoffer Dall wrote: So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? Well, we could use the subsystem id. But it's a hack, and may cause incompatibilities with some drivers. Michael, any ideas? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? I think that was rejected before. Do you remember where? I just remember Catalin mentioning the idea to me verbally. Besides the slightly heavy added use of resources etc. it seems like it would address some of our issues in a good way. But I'm still not sure why UEFI/Linux currently sees our PCI bus as being non-coherent when in fact it is and we have no passthrough issues currently. Are all PCI controllers always non-coherent for some reason and therefore we model it as such too? -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 02:28:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:24, Christoffer Dall wrote: On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:00, Christoffer Dall wrote: So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? Well, we could use the subsystem id. But it's a hack, and may cause incompatibilities with some drivers. Michael, any ideas? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? I think that was rejected before. Do you remember where? I just remember Catalin mentioning the idea to me verbally. In the last centithread on the subject. :) At least I and Peter disagreed. It's not about the heavy added use of resources, it's more about it being really easy to misconfigure. But I'm still not sure why UEFI/Linux currently sees our PCI bus as being non-coherent when in fact it is and we have no passthrough issues currently. Are all PCI controllers always non-coherent for some reason and therefore we model it as such too? Well, PCI BARs are generally MMIO resources, and hence should not be cached. As an optimization, OS drivers can mark them as cacheable or write-combining or something like that, but in general it's a safe default to leave them uncached---one would think. ok, I guess this series makes sense then, assuming it works, and assuming we don't kill performance by going to RAM all the time when we don't have to... Thanks, -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote: On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote: Well, PCI BARs are generally MMIO resources, and hence should not be cached. As an optimization, OS drivers can mark them as cacheable or write-combining or something like that, but in general it's a safe default to leave them uncached---one would think. Isn't this handled by the OS mapping them in the 'prefetchable' MMIO window rather than the 'non-prefetchable' one? (QEMU's generic-PCIe device doesn't yet support the prefetchable window.) I was thinking (with my limited PCI knowledge) the same thing, and was planning on experimenting with that. drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 14 May 2015 at 14:03, Andrew Jones drjo...@redhat.com wrote: On Thu, May 14, 2015 at 11:37:46AM +0100, Peter Maydell wrote: On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote: Forgot to (4): switch from setting userspace's mapping to device memory to normal, non-cacheable. Using device memory caused a problem that Alex Graf found, and Peter Maydell suggested using normal, non-cacheable instead. Did you check that non-cacheable is definitely the correct kind of Normal memory attribute we want? (ie not write-through). I was concerned that write-through wouldn't be sufficient. If the guest writes to its non-cached memory, and QEMU needs to see what it wrote, then won't write-through fail to work? Unless we some how invalidate the cache first? Well, I meant more that the correct mapping for userspace is the same as the guest, whatever that is, and so somebody needs to look at what the guest actually does rather than merely hoping NormalNC is OK. (For instance, do we need to provide support for QEMU to map both NC and writethrough?) -- PMM ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote: On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote: Provide a method to change normal, cacheable memory to non-cacheable. KVM will make use of this to keep emulated device memory regions coherent with the guest. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org But you obviously need Russell and Will/Catalin to ack/merge this. I guess this patch is going to go away in the next round. You've pointed out that I screwed stuff up royally with my over eagerness to reuse code. I need to reimplement change_memory_common, but a version that takes an mm, which is more or less what I did in the last version of this series, back when I was pinning pages. drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 05/14/15 14:34, Christoffer Dall wrote: On Thu, May 14, 2015 at 02:28:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:24, Christoffer Dall wrote: On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote: On 14/05/2015 14:00, Christoffer Dall wrote: So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? Well, we could use the subsystem id. But it's a hack, and may cause incompatibilities with some drivers. Michael, any ideas? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? I think that was rejected before. Do you remember where? I just remember Catalin mentioning the idea to me verbally. In the last centithread on the subject. :) At least I and Peter disagreed. It's not about the heavy added use of resources, it's more about it being really easy to misconfigure. But I'm still not sure why UEFI/Linux currently sees our PCI bus as being non-coherent when in fact it is and we have no passthrough issues currently. Are all PCI controllers always non-coherent for some reason and therefore we model it as such too? Well, PCI BARs are generally MMIO resources, and hence should not be cached. As an optimization, OS drivers can mark them as cacheable or write-combining or something like that, but in general it's a safe default to leave them uncached---one would think. ok, I guess this series makes sense then, assuming it works, and assuming we don't kill performance by going to RAM all the time when we don't have to... The idea Paolo and I had discussed in the past is: - Remove the kludge from UEFI, and map all MMIO BARs as uncached by default. This should be a theoretically correct approach, and for assigned devices, correct in practice too. - At an appropriate place in the firmware (specifically, right before this line: [1]), when PCI devices have been enumerated, but their particular drivers (especially VGA) have not been connected yet, check the subsystem id / vendor id / etc for each, and if we can tell it's virtual, then set the attributes for all of its MMIO bars to writeback. It doesn't seem hard to implement, I've just been shying away from actually coding it up because I'd like to see it make a difference for an actual assigned device. That is, reproducing the current (statically kludged) behavior wouldn't be hard, but I prefer not to write a new patch until I can test it both ways. UC is broken and WB works for virtual devices, fine; now let's see if the exact reverse holds for assigned devices in reality. ... Testing of which will require someone to send a PCI card (NIC or GPU) -- with an AARCH64 UEFI driver oprom on it -- to my place, so that I can plug into my Mustang. ;) Thanks Laszlo [1] https://github.com/tianocore/edk2/blob/99d9ade85aad554a0fa08fff8586b0fd40570ac3/ArmPlatformPkg/ArmVirtualizationPkg/Library/PlatformIntelBdsLib/IntelBdsPlatform.c#L366 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 14/05/2015 14:00, Christoffer Dall wrote: So, getting back to my original question. Is the point then that UEFI must assume (from ACPI/DT) the cache-coherency properties of the PCI controller which exists in hardware on the system you're running on, even for the virtual PCI bus because that will be the semantics for assigned devices? And in that case, we have no way to distinguish between passthrough devices and virtual devices plugged into the virtual PCI bus? Well, we could use the subsystem id. But it's a hack, and may cause incompatibilities with some drivers. Michael, any ideas? What about the idea of having two virtual PCI buses on your system where one is always cache-coherent and uses for virtual devices, and the other is whatever the hardware is and used for passthrough devices? I think that was rejected before. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 03:32:10PM +0200, Laszlo Ersek wrote: On 05/14/15 15:00, Andrew Jones wrote: On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote: On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote: Well, PCI BARs are generally MMIO resources, and hence should not be cached. As an optimization, OS drivers can mark them as cacheable or write-combining or something like that, but in general it's a safe default to leave them uncached---one would think. Isn't this handled by the OS mapping them in the 'prefetchable' MMIO window rather than the 'non-prefetchable' one? (QEMU's generic-PCIe device doesn't yet support the prefetchable window.) I was thinking (with my limited PCI knowledge) the same thing, and was planning on experimenting with that. This could be supported in UEFI as well, with the following steps: - the DTB that QEMU provides UEFI with should advertise such a prefetchable window. - The driver in UEFI that parses the DTB should understand that DTB node (well, record type), and store the appropriate base size into some new dynamic PCDs (= basically, firmware wide global variables; PCD = platform configuration database) - The entry point of the host bridge driver would call gDS-AddMemorySpace() twice, separately for the two different windows, with their appropriate caching attributes. - The host bridge driver needs to be extended so that TypePMem32 requests are not rejected (like now); they should be handled similarly to TypeMem32. Except, the gDS-AllocateMemorySpace() call should allocate from the prefetchable range (determined by the new PCDs above). - QEMU's emulated devices should then expose their BARs as prefetchable (so that the above branch would be taken in the host bridge driver). (Of course, if QEMU intends to emulate PCI devices somewhat realistically, then QEMU should claim non-prefetchable for BARs that would not be prefetchable on physical hardware either, and then the hypervisor should accommodate the firmware's UC mapping and say hey I know better, we're virtual in fact, and override the attribute (- use WB instead of UC). With which we'd be back to square one...) Thanks Laszlo Prefetcheable is unrelated to BAR caching or drivers, it's a way to tell host bridges they can do limited tweaks to downstream transactions in a specific range. Really non-prefetcheable BARs are mostly those where read has side-effects, which is best avoided. this does not mean it's ok to reorder transactions or cache them. -- MST ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] VFIO: platform: add reset_list and register/unregister functions
On Thu, 2015-05-14 at 10:25 +0200, Eric Auger wrote: Hi Alex, On 05/13/2015 08:32 PM, Alex Williamson wrote: On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote: vfio_platform_common now stores a lists of available reset functions. Two functions are exposed to register/unregister a reset function. A reset function is paired with a compat string. Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/platform/vfio_platform_common.c | 63 +++ drivers/vfio/platform/vfio_platform_private.h | 13 ++ 2 files changed, 76 insertions(+) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index abcff7a..edbf24c 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -23,6 +23,9 @@ #include vfio_platform_private.h +struct list_head reset_list; +LIST_HEAD(reset_list); + Redundant? Static? static, yes static DEFINE_MUTEX(driver_lock); static int vfio_platform_regions_init(struct vfio_platform_device *vdev) @@ -511,6 +514,13 @@ EXPORT_SYMBOL_GPL(vfio_platform_probe_common); struct vfio_platform_device *vfio_platform_remove_common(struct device *dev) { struct vfio_platform_device *vdev; + struct vfio_platform_reset_node *iter, *tmp; + + list_for_each_entry_safe(iter, tmp, reset_list, link) { + list_del(iter-link); + kfree(iter-compat); + kfree(iter); + } This doesn't make sense. We allow reset functions to be registered and unregistered, but we forget them all when any device is released?! I acknowledge indeed. Can I rely on the reset module exit and associated unregister_reset or shall I take this action in the vfio driver itself, core? If we were to load the reset modules via request_module() from vfio-platform, I think they would stay loaded as long as vfio-platform is loaded and automatically unload if vfio-platform is unloaded. Our reference via try_module_get() for an in-use reset function should prevent the module from being unloaded while we're dependent on it. I think that covers everything we need. A user is free to rmmod the reset module, but if it's in use it shouldn't work, and we can request it again for the next device. vdev = vfio_del_group_dev(dev); if (vdev) @@ -519,3 +529,56 @@ struct vfio_platform_device *vfio_platform_remove_common(struct device *dev) return vdev; } EXPORT_SYMBOL_GPL(vfio_platform_remove_common); + +int vfio_platform_register_reset(char *compat, struct module *reset_owner, + vfio_platform_reset_fn_t reset) +{ + struct vfio_platform_reset_node *node, *iter; + bool found = false; + + list_for_each_entry(iter, reset_list, link) { + if (!strcmp(iter-compat, compat)) { + found = true; Just return errno here ok + break; + } + } + if (found) + return -EINVAL; + + node = kmalloc(sizeof(*node), GFP_KERNEL); + if (!node) + return -ENOMEM; + + node-compat = kstrdup(compat, GFP_KERNEL); + if (!node-compat) Leaking node ok + return -ENOMEM; + + node-owner = reset_owner; + node-reset = reset; + + list_add(node-link, reset_list); Isn't this racy? Don't we need some locks around the list? I will add a lock to protect access to the list. + return 0; +} +EXPORT_SYMBOL_GPL(vfio_platform_register_reset); + +int vfio_platform_unregister_reset(char *compat) +{ + struct vfio_platform_reset_node *iter; + bool found = false; + + list_for_each_entry(iter, reset_list, link) { + if (!strcmp(iter-compat, compat)) { Return errno here ok + found = true; + break; + } + } + if (!found) + return -EINVAL; + + list_del(iter-link); Racy + kfree(iter-compat); + kfree(iter); + return 0; +} +EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset); + diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index 5d31e04..da2d60b 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -69,6 +69,15 @@ struct vfio_platform_device { int (*get_irq)(struct vfio_platform_device *vdev, int i); }; +typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev); Seems like this ought to be in a non-private header if we're exporting the [un]register functions. I considered the vfio reset modules were internal to the vfio subsystem but if you prefer I can expose that in vfio.h. I guess register/unregister should become an external API then? The patch descriptions implied that it was intended for external reset
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 05/14/15 12:30, Christoffer Dall wrote: On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote: Introduce a new memory region flag, KVM_MEM_UNCACHED, which is needed by ARM. This flag informs KVM that the given memory region is typically mapped by the guest as non-cacheable. KVM for ARM then ensures that that memory is indeed mapped non-cacheable by the guest, and also remaps that region as non-cacheable for userspace, allowing them both to maintain a coherent view. Changes since v1: 1) don't pin pages [Paolo] 2) ensure the guest maps the memory non-cacheable [me] 3) clean up memslot flag documentation [Christoffer] changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here http://www.spinics.net/lists/kvm-arm/msg14022.html The QEMU series for v1 hasn't really changed. Only the linux header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to 116. Find the series here http://www.spinics.net/lists/kvm-arm/msg14026.html Testing: This series still needs lots of testing, but I thought I'd kick it to the list early, as there's been recent interest in solving this problem, and I'd like to get test results and opinions on this approach from others sooner than later. I've tested with AAVMF (UEFI for AArch64 mach-virt guests). AAVMF has a kludge in it to avoid the coherency problem. How does the 'kludge' work? https://github.com/tianocore/edk2/commit/f9a8be42 (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Thanks Laszlo I've tested both with and without that kludge active. Both worked for me (almost). Sometimes with the non-kludged version I was still able to see a bit of corruption in grub's output after edk2 loaded it - not much, and not always, but something. Remind me, this is a VGA framebuffer corruption with a PCI-plugged VGA card? Thanks, -Christoffer Anyway, it's quite frustrating, as I'm not sure what I'm missing... This series applies to Linus' 110bc76729d4, but I tested with a version backported to the current RHELSA kernel. Thanks for reviews and testing! drew Andrew Jones (3): arm/arm64: pageattr: add set_memory_nc KVM: promote KVM_MEMSLOT_INCOHERENT to uapi arm/arm64: KVM: implement 'uncached' mem coherency Documentation/virtual/kvm/api.txt | 20 -- arch/arm/include/asm/cacheflush.h | 1 + arch/arm/include/asm/kvm_mmu.h| 5 - arch/arm/include/asm/pgtable-3level.h | 1 + arch/arm/include/asm/pgtable.h| 1 + arch/arm/include/uapi/asm/kvm.h | 1 + arch/arm/kvm/arm.c| 1 + arch/arm/kvm/mmu.c| 39 ++- arch/arm/mm/pageattr.c| 7 +++ arch/arm64/include/asm/cacheflush.h | 1 + arch/arm64/include/asm/kvm_mmu.h | 5 - arch/arm64/include/asm/memory.h | 1 + arch/arm64/include/asm/pgtable.h | 1 + arch/arm64/include/uapi/asm/kvm.h | 1 + arch/arm64/mm/pageattr.c | 8 +++ include/linux/kvm_host.h | 1 - include/uapi/linux/kvm.h | 2 ++ virt/kvm/kvm_main.c | 7 ++- 18 files changed, 79 insertions(+), 24 deletions(-) -- 2.1.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 01:09:34PM +0200, Laszlo Ersek wrote: On 05/14/15 12:30, Christoffer Dall wrote: On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote: Introduce a new memory region flag, KVM_MEM_UNCACHED, which is needed by ARM. This flag informs KVM that the given memory region is typically mapped by the guest as non-cacheable. KVM for ARM then ensures that that memory is indeed mapped non-cacheable by the guest, and also remaps that region as non-cacheable for userspace, allowing them both to maintain a coherent view. Changes since v1: 1) don't pin pages [Paolo] 2) ensure the guest maps the memory non-cacheable [me] 3) clean up memslot flag documentation [Christoffer] changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here http://www.spinics.net/lists/kvm-arm/msg14022.html The QEMU series for v1 hasn't really changed. Only the linux header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to 116. Find the series here http://www.spinics.net/lists/kvm-arm/msg14026.html Testing: This series still needs lots of testing, but I thought I'd kick it to the list early, as there's been recent interest in solving this problem, and I'd like to get test results and opinions on this approach from others sooner than later. I've tested with AAVMF (UEFI for AArch64 mach-virt guests). AAVMF has a kludge in it to avoid the coherency problem. How does the 'kludge' work? https://github.com/tianocore/edk2/commit/f9a8be42 (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 14/05/2015 13:29, Christoffer Dall wrote: (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? Only until device assignment gets into the picture. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Thu, May 14, 2015 at 01:31:03PM +0200, Paolo Bonzini wrote: On 14/05/2015 13:29, Christoffer Dall wrote: (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? Only until device assignment gets into the picture. Will UEFI have to deal with device assignment in any respect? -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 14/05/2015 13:36, Christoffer Dall wrote: (It's probably worth looking at the documentation in the first hunk too, under the commit message.) Why is this a hack/unintuitive? Is the semantics of the QEMU PCI bus not simply that MMIO regions are coherent? Only until device assignment gets into the picture. Will UEFI have to deal with device assignment in any respect? Why not? For example you could do network boot from an assigned network card. In fact, anything that UEFI has to deal with, the OS has to deal with too. If you need a UEFI hack, chances are you need or will need a Linux hack too. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 2/5] VFIO: platform: add get_device callback
On 05/13/2015 08:32 PM, Alex Williamson wrote: On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote: It is needed to introduce a new callback enabling to retrieve the struct device* from the vfio_platform_device. Implementation depends on the underlying device, platform or amba. This will be used to retrieve the compatibility string of the device. Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/platform/vfio_amba.c | 9 + drivers/vfio/platform/vfio_platform.c | 10 ++ drivers/vfio/platform/vfio_platform_private.h | 1 + 3 files changed, 20 insertions(+) diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c index ff0331f..fd68115 100644 --- a/drivers/vfio/platform/vfio_amba.c +++ b/drivers/vfio/platform/vfio_amba.c @@ -48,6 +48,14 @@ static int get_amba_irq(struct vfio_platform_device *vdev, int i) return ret ? ret : -ENXIO; } +static struct device *vfio_amba_get_device(struct vfio_platform_device *vdev) +{ +struct amba_device *pdev = (struct amba_device *) vdev-opaque; +struct device *dev = pdev-dev; + +return dev; +} *get* device implies a increment in the reference count. There's nothing like that happening here. Do you just want a to_device wrapper? Yes you already told that in the past and I should have learnt ... Why doesn't struct vfio_platform_device just cache a pointer to the struct device? simpler indeed if agreed. + static int vfio_amba_probe(struct amba_device *adev, const struct amba_id *id) { struct vfio_platform_device *vdev; @@ -67,6 +75,7 @@ static int vfio_amba_probe(struct amba_device *adev, const struct amba_id *id) vdev-flags = VFIO_DEVICE_FLAGS_AMBA; vdev-get_resource = get_amba_resource; vdev-get_irq = get_amba_irq; +vdev-get_device = vfio_amba_get_device; ret = vfio_platform_probe_common(vdev, adev-dev); if (ret) { diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c index cef645c..c025d76 100644 --- a/drivers/vfio/platform/vfio_platform.c +++ b/drivers/vfio/platform/vfio_platform.c @@ -51,6 +51,15 @@ static int get_platform_irq(struct vfio_platform_device *vdev, int i) return platform_get_irq(pdev, i); } +static struct device *vfio_platform_get_device( +struct vfio_platform_device *vdev) +{ +struct platform_device *pdev = (struct platform_device *) vdev-opaque; +struct device *dev = pdev-dev; + +return dev; +} + static int vfio_platform_probe(struct platform_device *pdev) { struct vfio_platform_device *vdev; @@ -65,6 +74,7 @@ static int vfio_platform_probe(struct platform_device *pdev) vdev-flags = VFIO_DEVICE_FLAGS_PLATFORM; vdev-get_resource = get_platform_resource; vdev-get_irq = get_platform_irq; +vdev-get_device = vfio_platform_get_device; ret = vfio_platform_probe_common(vdev, pdev-dev); if (ret) diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index da2d60b..68909a4 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -67,6 +67,7 @@ struct vfio_platform_device { struct resource* (*get_resource)(struct vfio_platform_device *vdev, int i); int (*get_irq)(struct vfio_platform_device *vdev, int i); +struct device *(*get_device)(struct vfio_platform_device *vdev); }; typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev); ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 5/5] VFIO: platform: VFIO platform Calxeda xgmac reset module
Alex, On 05/13/2015 08:33 PM, Alex Williamson wrote: On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote: This patch introduces a module that registers and implements a basic reset function for the Calxeda xgmac device. This latter basically disables interrupts and stops DMA transfers. The reset function code is inherited from the native calxeda xgmac driver. Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/platform/Kconfig | 2 + drivers/vfio/platform/Makefile | 2 + drivers/vfio/platform/reset/Kconfig| 7 ++ drivers/vfio/platform/reset/Makefile | 5 + .../platform/reset/vfio_platform_calxedaxgmac.c| 121 + 5 files changed, 137 insertions(+) create mode 100644 drivers/vfio/platform/reset/Kconfig create mode 100644 drivers/vfio/platform/reset/Makefile create mode 100644 drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig index 9a4403e..1df7477 100644 --- a/drivers/vfio/platform/Kconfig +++ b/drivers/vfio/platform/Kconfig @@ -18,3 +18,5 @@ config VFIO_AMBA framework. If you don't know what to do here, say N. + +source drivers/vfio/platform/reset/Kconfig diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile index 81de144..9ce8afe 100644 --- a/drivers/vfio/platform/Makefile +++ b/drivers/vfio/platform/Makefile @@ -2,7 +2,9 @@ vfio-platform-y := vfio_platform.o vfio_platform_common.o vfio_platform_irq.o obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o +obj-$(CONFIG_VFIO_PLATFORM) += reset/ vfio-amba-y := vfio_amba.o obj-$(CONFIG_VFIO_AMBA) += vfio-amba.o +obj-$(CONFIG_VFIO_AMBA) += reset/ diff --git a/drivers/vfio/platform/reset/Kconfig b/drivers/vfio/platform/reset/Kconfig new file mode 100644 index 000..746b96b --- /dev/null +++ b/drivers/vfio/platform/reset/Kconfig @@ -0,0 +1,7 @@ +config VFIO_PLATFORM_CALXEDAXGMAC_RESET +tristate VFIO support for calxeda xgmac reset +depends on VFIO_PLATFORM +help + Enables the VFIO platform driver to handle reset for Calxeda xgmac + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/reset/Makefile b/drivers/vfio/platform/reset/Makefile new file mode 100644 index 000..2a486af --- /dev/null +++ b/drivers/vfio/platform/reset/Makefile @@ -0,0 +1,5 @@ +vfio-platform-calxedaxgmac-y := vfio_platform_calxedaxgmac.o + +ccflags-y += -Idrivers/vfio/platform + +obj-$(CONFIG_VFIO_PLATFORM_CALXEDAXGMAC_RESET) += vfio-platform-calxedaxgmac.o diff --git a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c new file mode 100644 index 000..3c6e428 --- /dev/null +++ b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c @@ -0,0 +1,121 @@ +/* + * VFIO platform driver specialized for Calxeda xgmac reset + * reset code is inherited from calxeda xgmac native driver + * + * Copyright 2010-2011 Calxeda, Inc. + * Copyright (c) 2015 Linaro Ltd. + * www.linaro.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/io.h + +#include vfio_platform_private.h + +#define DRIVER_VERSION 0.1 +#define DRIVER_AUTHOR Eric Auger eric.au...@linaro.org +#define DRIVER_DESC Reset support for Calxeda xgmac vfio platform device + +#define CALXEDAXGMAC_COMPAT calxeda,hb-xgmac + +/* XGMAC Register definitions */ +#define XGMAC_CONTROL 0x /* MAC Configuration */ + +/* DMA Control and Status Registers */ +#define XGMAC_DMA_CONTROL 0x0f18 /* Ctrl (Operational Mode) */ +#define XGMAC_DMA_INTR_ENA 0x0f1c /* Interrupt Enable */ + +/* DMA Control registe defines */ +#define DMA_CONTROL_ST 0x2000 /* Start/Stop Transmission */ +#define DMA_CONTROL_SR 0x0002 /* Start/Stop Receive */ + +/* Common MAC defines */ +#define MAC_ENABLE_TX 0x0008 /* Transmitter Enable */ +#define MAC_ENABLE_RX 0x0004 /* Receiver Enable */ + +static inline void xgmac_mac_disable(void __iomem *ioaddr) +{ +u32 value = readl(ioaddr + XGMAC_DMA_CONTROL);
Re: [PATCH 1/5] VFIO: platform: add reset_list and register/unregister functions
Hi Alex, On 05/13/2015 08:32 PM, Alex Williamson wrote: On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote: vfio_platform_common now stores a lists of available reset functions. Two functions are exposed to register/unregister a reset function. A reset function is paired with a compat string. Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/platform/vfio_platform_common.c | 63 +++ drivers/vfio/platform/vfio_platform_private.h | 13 ++ 2 files changed, 76 insertions(+) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index abcff7a..edbf24c 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -23,6 +23,9 @@ #include vfio_platform_private.h +struct list_head reset_list; +LIST_HEAD(reset_list); + Redundant? Static? static, yes static DEFINE_MUTEX(driver_lock); static int vfio_platform_regions_init(struct vfio_platform_device *vdev) @@ -511,6 +514,13 @@ EXPORT_SYMBOL_GPL(vfio_platform_probe_common); struct vfio_platform_device *vfio_platform_remove_common(struct device *dev) { struct vfio_platform_device *vdev; +struct vfio_platform_reset_node *iter, *tmp; + +list_for_each_entry_safe(iter, tmp, reset_list, link) { +list_del(iter-link); +kfree(iter-compat); +kfree(iter); +} This doesn't make sense. We allow reset functions to be registered and unregistered, but we forget them all when any device is released?! I acknowledge indeed. Can I rely on the reset module exit and associated unregister_reset or shall I take this action in the vfio driver itself, core? vdev = vfio_del_group_dev(dev); if (vdev) @@ -519,3 +529,56 @@ struct vfio_platform_device *vfio_platform_remove_common(struct device *dev) return vdev; } EXPORT_SYMBOL_GPL(vfio_platform_remove_common); + +int vfio_platform_register_reset(char *compat, struct module *reset_owner, + vfio_platform_reset_fn_t reset) +{ +struct vfio_platform_reset_node *node, *iter; +bool found = false; + +list_for_each_entry(iter, reset_list, link) { +if (!strcmp(iter-compat, compat)) { +found = true; Just return errno here ok +break; +} +} +if (found) +return -EINVAL; + +node = kmalloc(sizeof(*node), GFP_KERNEL); +if (!node) +return -ENOMEM; + +node-compat = kstrdup(compat, GFP_KERNEL); +if (!node-compat) Leaking node ok +return -ENOMEM; + +node-owner = reset_owner; +node-reset = reset; + +list_add(node-link, reset_list); Isn't this racy? Don't we need some locks around the list? I will add a lock to protect access to the list. +return 0; +} +EXPORT_SYMBOL_GPL(vfio_platform_register_reset); + +int vfio_platform_unregister_reset(char *compat) +{ +struct vfio_platform_reset_node *iter; +bool found = false; + +list_for_each_entry(iter, reset_list, link) { +if (!strcmp(iter-compat, compat)) { Return errno here ok +found = true; +break; +} +} +if (!found) +return -EINVAL; + +list_del(iter-link); Racy +kfree(iter-compat); +kfree(iter); +return 0; +} +EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset); + diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index 5d31e04..da2d60b 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -69,6 +69,15 @@ struct vfio_platform_device { int (*get_irq)(struct vfio_platform_device *vdev, int i); }; +typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev); Seems like this ought to be in a non-private header if we're exporting the [un]register functions. I considered the vfio reset modules were internal to the vfio subsystem but if you prefer I can expose that in vfio.h. I guess register/unregister should become an external API then? Thanks Eric + +struct vfio_platform_reset_node { +struct list_head link; +char *compat; +struct module *owner; +vfio_platform_reset_fn_t reset; +}; + extern int vfio_platform_probe_common(struct vfio_platform_device *vdev, struct device *dev); extern struct vfio_platform_device *vfio_platform_remove_common @@ -82,4 +91,8 @@ extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev, unsigned start, unsigned count, void *data); +extern int vfio_platform_register_reset(char *compat, struct module *owner, +
Re: [PATCH 3/5] VFIO: platform: add reset callback
On 05/13/2015 08:32 PM, Alex Williamson wrote: On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote: A new reset callback is introduced. If this callback is populated, the reset is invoked on device release or upon userspace ioctl. The modality is exposed on VFIO_DEVICE_GET_INFO. Signed-off-by: Eric Auger eric.au...@linaro.org --- v2 - v3: - change patch title and commit message. Use IS_ERR_OR_NULL to anticipate for getter returned value. v1 - v2: - On VFIO_DEVICE_RESET returns -EINVAL in case the callback is not populated (instead of -ENOTTY) --- drivers/vfio/platform/vfio_platform_common.c | 12 ++-- drivers/vfio/platform/vfio_platform_private.h | 1 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index edbf24c..0d10018 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -103,6 +103,8 @@ static void vfio_platform_release(void *device_data) mutex_lock(driver_lock); if (!(--vdev-refcnt)) { +if (!IS_ERR_OR_NULL(vdev-reset)) +vdev-reset(vdev); It seems sloppy to have a function pointer that could be ERR_PTR. The reset lookup function currently can return errors. I will change this. Should we also be trying to reset on device open? may be yes. I was influenced by qemu use case where reset is also explicitly invoked through iotcl by the reset notifier. If we do so we might have duplicated resets. - Eric vfio_platform_regions_cleanup(vdev); vfio_platform_irq_cleanup(vdev); } @@ -162,6 +164,8 @@ static long vfio_platform_ioctl(void *device_data, if (info.argsz minsz) return -EINVAL; +if (!IS_ERR_OR_NULL(vdev-reset)) +vdev-flags |= VFIO_DEVICE_FLAGS_RESET; info.flags = vdev-flags; info.num_regions = vdev-num_regions; info.num_irqs = vdev-num_irqs; @@ -255,8 +259,12 @@ static long vfio_platform_ioctl(void *device_data, return ret; -} else if (cmd == VFIO_DEVICE_RESET) -return -EINVAL; +} else if (cmd == VFIO_DEVICE_RESET) { +if (!IS_ERR_OR_NULL(vdev-reset)) +return vdev-reset(vdev); +else +return -EINVAL; +} return -ENOTTY; } diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h index 68909a4..84ccf6d 100644 --- a/drivers/vfio/platform/vfio_platform_private.h +++ b/drivers/vfio/platform/vfio_platform_private.h @@ -68,6 +68,7 @@ struct vfio_platform_device { (*get_resource)(struct vfio_platform_device *vdev, int i); int (*get_irq)(struct vfio_platform_device *vdev, int i); struct device *(*get_device)(struct vfio_platform_device *vdev); +int (*reset)(struct vfio_platform_device *vdev); }; typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev); ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 4/5] VFIO: platform: populate reset function according to compat
On 05/13/2015 08:33 PM, Alex Williamson wrote: On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote: Add the reset function lookup according to the device compat string. This lookup is added at different places: - on VFIO_DEVICE_GET_INFO - on VFIO_DEVICE_RESET - on device release A reference to the module implementing the reset function is taken on first reset function lookup and released on vfio platform device release. Signed-off-by: Eric Auger eric.au...@linaro.org --- drivers/vfio/platform/vfio_platform_common.c | 50 1 file changed, 50 insertions(+) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index 0d10018..bd7e44c 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -28,6 +28,52 @@ LIST_HEAD(reset_list); static DEFINE_MUTEX(driver_lock); +static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char *compat) +{ +struct vfio_platform_reset_node *iter; + +list_for_each_entry(iter, reset_list, link) { Racy ok +if (!strcmp(iter-compat, compat) +try_module_get(iter-owner)) +return iter-reset; +} + +return ERR_PTR(-ENODEV); return NULL imo ok +} + +static vfio_platform_reset_fn_t vfio_platform_get_reset( +struct vfio_platform_device *vdev) +{ +struct device *dev = vdev-get_device(vdev); +const char *compat_str_array[2]; +vfio_platform_reset_fn_t reset; +int ret; + +if (!IS_ERR_OR_NULL(vdev-reset)) +return vdev-reset; + +ret = device_property_read_string_array(dev, compatible, +compat_str_array, 2); +if (!ret) +return ERR_PTR(ret); + +reset = vfio_platform_lookup_reset(compat_str_array[0]); +return reset; Something got allocated into compat_str_array and gets leaked here. is there any allocation? device_property_read_string_array does not return -ENOMEM. +} + +static void vfio_platform_put_reset(struct vfio_platform_device *vdev) +{ +struct vfio_platform_reset_node *iter; + +list_for_each_entry(iter, reset_list, link) { Racy ok +if (iter-reset == vdev-reset) { +module_put(iter-owner); +vdev-reset = NULL; +return; +} +} +} + static int vfio_platform_regions_init(struct vfio_platform_device *vdev) { int cnt = 0, i; @@ -103,10 +149,12 @@ static void vfio_platform_release(void *device_data) mutex_lock(driver_lock); if (!(--vdev-refcnt)) { +vdev-reset = vfio_platform_get_reset(vdev); if (!IS_ERR_OR_NULL(vdev-reset)) vdev-reset(vdev); vfio_platform_regions_cleanup(vdev); vfio_platform_irq_cleanup(vdev); +vfio_platform_put_reset(vdev); } mutex_unlock(driver_lock); @@ -164,6 +212,7 @@ static long vfio_platform_ioctl(void *device_data, if (info.argsz minsz) return -EINVAL; +vdev-reset = vfio_platform_get_reset(vdev); if (!IS_ERR_OR_NULL(vdev-reset)) vdev-flags |= VFIO_DEVICE_FLAGS_RESET; info.flags = vdev-flags; @@ -260,6 +309,7 @@ static long vfio_platform_ioctl(void *device_data, return ret; } else if (cmd == VFIO_DEVICE_RESET) { +vdev-reset = vfio_platform_get_reset(vdev); if (!IS_ERR_OR_NULL(vdev-reset)) return vdev-reset(vdev); else I count 3 gets and 1 put, isn't the module reference count increase showing that? vfio_platform_get_reset simply returns if the function pointer already is populated so there is no systematic ref increment. This looks like it hasn't been tested. It did testing with external and in-kernel modules through Why would we do a get every time we want to do a reset? My doubt were about the order of probing between the vfio-platform_driver and the vfio reset module? This question was the rationale of this implementation choice. But again the actual ref count increment is devised to be done once on the first entry point (iotcl or internal release) Do one get when the device is registered or opened and one put when the device is unregistered or closed. We don't want erratic userspace behavior that the reset property of a device can come and go. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote: Introduce a new memory region flag, KVM_MEM_UNCACHED, which is needed by ARM. This flag informs KVM that the given memory region is typically mapped by the guest as non-cacheable. KVM for ARM then ensures that that memory is indeed mapped non-cacheable by the guest, and also remaps that region as non-cacheable for userspace, allowing them both to maintain a coherent view. Changes since v1: 1) don't pin pages [Paolo] 2) ensure the guest maps the memory non-cacheable [me] 3) clean up memslot flag documentation [Christoffer] Forgot to (4): switch from setting userspace's mapping to device memory to normal, non-cacheable. Using device memory caused a problem that Alex Graf found, and Peter Maydell suggested using normal, non-cacheable instead. changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here http://www.spinics.net/lists/kvm-arm/msg14022.html The QEMU series for v1 hasn't really changed. Only the linux header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to 116. Find the series here http://www.spinics.net/lists/kvm-arm/msg14026.html Testing: This series still needs lots of testing, but I thought I'd kick it to the list early, as there's been recent interest in solving this problem, and I'd like to get test results and opinions on this approach from others sooner than later. I've tested with AAVMF (UEFI for AArch64 mach-virt guests). AAVMF has a kludge in it to avoid the coherency problem. I've tested both with and without that kludge active. Both worked for me (almost). Sometimes with the non-kludged version I was still able to see a bit of corruption in grub's output after edk2 loaded it - not much, and not always, but something. Anyway, it's quite frustrating, as I'm not sure what I'm missing... This series applies to Linus' 110bc76729d4, but I tested with a version backported to the current RHELSA kernel. Thanks for reviews and testing! drew Andrew Jones (3): arm/arm64: pageattr: add set_memory_nc KVM: promote KVM_MEMSLOT_INCOHERENT to uapi arm/arm64: KVM: implement 'uncached' mem coherency Documentation/virtual/kvm/api.txt | 20 -- arch/arm/include/asm/cacheflush.h | 1 + arch/arm/include/asm/kvm_mmu.h| 5 - arch/arm/include/asm/pgtable-3level.h | 1 + arch/arm/include/asm/pgtable.h| 1 + arch/arm/include/uapi/asm/kvm.h | 1 + arch/arm/kvm/arm.c| 1 + arch/arm/kvm/mmu.c| 39 ++- arch/arm/mm/pageattr.c| 7 +++ arch/arm64/include/asm/cacheflush.h | 1 + arch/arm64/include/asm/kvm_mmu.h | 5 - arch/arm64/include/asm/memory.h | 1 + arch/arm64/include/asm/pgtable.h | 1 + arch/arm64/include/uapi/asm/kvm.h | 1 + arch/arm64/mm/pageattr.c | 8 +++ include/linux/kvm_host.h | 1 - include/uapi/linux/kvm.h | 2 ++ virt/kvm/kvm_main.c | 7 ++- 18 files changed, 79 insertions(+), 24 deletions(-) -- 2.1.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED
On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote: Forgot to (4): switch from setting userspace's mapping to device memory to normal, non-cacheable. Using device memory caused a problem that Alex Graf found, and Peter Maydell suggested using normal, non-cacheable instead. Did you check that non-cacheable is definitely the correct kind of Normal memory attribute we want? (ie not write-through). -- PMM ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency
On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote: When S1 and S2 memory attributes combine wrt to caching policy, non-cacheable types take precedence. If a guest maps a region as device memory, which KVM userspace is using to emulate the device using normal, cacheable memory, then we lose coherency. With KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory regions are likely to be problematic. With this patch, as pages of these types of regions are faulted into the guest, not only do we flush the page's dcache, but we also change userspace's mapping to NC in order to maintain coherency. What if the guest doesn't do what we expect? While we can't force a guest to use cacheable memory, we can take advantage of the non-cacheable precedence, and force it to use non-cacheable. So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on KVM_MEM_UNCACHED regions to force them to NC. We now have both guest and userspace on the same page (pun intended) I'd like to revisit the overall approach here. Is doing non-cached accesses in both the guest and host really the right thing to do here? The semantics of the device becomes that it is cache coherent (because QEMU is), and I think Marc argued that Linux/UEFI should simply be adapted to handle whatever emulated devices we have as coherent. I also remember someone arguing that would be wrong (Peter?). Finally, does this address all cache coherency issues with emulated devices? Some VOS guys had seen things still not working with this approach, unsure why... I'd like to avoid us merging this only to merge a more complete solution in a few weeks which reverts this solution... More comments/questions below: Signed-off-by: Andrew Jones drjo...@redhat.com --- arch/arm/include/asm/kvm_mmu.h| 5 - arch/arm/include/asm/pgtable-3level.h | 1 + arch/arm/include/asm/pgtable.h| 1 + arch/arm/kvm/mmu.c| 37 +++ arch/arm64/include/asm/kvm_mmu.h | 5 - arch/arm64/include/asm/memory.h | 1 + arch/arm64/include/asm/pgtable.h | 1 + 7 files changed, 36 insertions(+), 15 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 405aa18833073..e8034a80b12e5 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn, while (size) { void *va = kmap_atomic_pfn(pfn); - if (need_flush) + if (need_flush) { kvm_flush_dcache_to_poc(va, PAGE_SIZE); + if (ipa_uncached) + set_memory_nc((unsigned long)va, 1); nit: consider moving this outside the need_flush + } if (icache_is_pipt()) __cpuc_coherent_user_range((unsigned long)va, diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index a745a2a53853c..39b3f7a40e663 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -121,6 +121,7 @@ * 2nd stage PTE definitions for LPAE. */ #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0) 2) /* strongly ordered */ +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5) 2) /* normal non-cacheable */ #define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) 2) /* normal inner write-through */ #define L_PTE_S2_MT_WRITEBACK(_AT(pteval_t, 0xf) 2) /* normal inner write-back */ #define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1) 2) /* device */ diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index f40354198bad4..ae13ca8b0a23d 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -100,6 +100,7 @@ extern pgprot_t pgprot_s2_device; #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP) #define PAGE_HYP_DEVICE _MOD_PROT(pgprot_hyp_device, L_PTE_HYP) #define PAGE_S2 _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY) +#define PAGE_S2_NORMAL_NC__pgprot((pgprot_val(PAGE_S2) ~L_PTE_S2_MT_MASK) | L_PTE_S2_MT_NORMAL_NC) #define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY) #define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index bc1665acd73e7..6b3bd8061bd2a 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -1220,7 +1220,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct vm_area_struct *vma; pfn_t pfn; pgprot_t mem_type = PAGE_S2; - bool fault_ipa_uncached; + bool fault_ipa_uncached = false; bool logging_active =
Re: [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc
On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote: Provide a method to change normal, cacheable memory to non-cacheable. KVM will make use of this to keep emulated device memory regions coherent with the guest. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org But you obviously need Russell and Will/Catalin to ack/merge this. -Christoffer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm