Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices
On 07/23/2018 07:46 AM, Anshuman Khandual wrote: > On 07/20/2018 06:45 PM, Michael S. Tsirkin wrote: >> On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote: >>> Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for >>> virito devices >> >> s/virito/virtio/ > > Oops, will fix it. Thanks for pointing out. > >> >>> This adds a hook which a platform can define in order to allow it to >>> override virtio device's DMA OPS irrespective of whether it has the >>> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do >>> bounce-buffering of data on the new secure pSeries platform, currently >>> under development, where a KVM host cannot access all of the memory >>> space of a secure KVM guest. The host can only access the pages which >>> the guest has explicitly requested to be shared with the host, thus >>> the virtio implementation in the guest has to copy data to and from >>> shared pages. >>> >>> With this hook, the platform code in the secure guest can force the >>> use of swiotlb for virtio buffers, with a back-end for swiotlb which >>> will use a pool of pre-allocated shared pages. Thus all data being >>> sent or received by virtio devices will be copied through pages which >>> the host has access to. >>> >>> Signed-off-by: Anshuman Khandual >>> --- >>> arch/powerpc/include/asm/dma-mapping.h | 6 ++ >>> arch/powerpc/platforms/pseries/iommu.c | 6 ++ >>> drivers/virtio/virtio.c| 7 +++ >>> 3 files changed, 19 insertions(+) >>> >>> diff --git a/arch/powerpc/include/asm/dma-mapping.h >>> b/arch/powerpc/include/asm/dma-mapping.h >>> index 8fa3945..bc5a9d3 100644 >>> --- a/arch/powerpc/include/asm/dma-mapping.h >>> +++ b/arch/powerpc/include/asm/dma-mapping.h >>> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev); >>> >>> #endif /* __KERNEL__ */ >>> #endif /* _ASM_DMA_MAPPING_H */ >>> + >>> +#define platform_override_dma_ops platform_override_dma_ops >>> + >>> +struct virtio_device; >>> + >>> +extern void platform_override_dma_ops(struct virtio_device *vdev); >>> diff --git a/arch/powerpc/platforms/pseries/iommu.c >>> b/arch/powerpc/platforms/pseries/iommu.c >>> index 06f0296..5773bc7 100644 >>> --- a/arch/powerpc/platforms/pseries/iommu.c >>> +++ b/arch/powerpc/platforms/pseries/iommu.c >>> @@ -38,6 +38,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> #include >>> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str) >>> __setup("multitce=", disable_multitce); >>> >>> machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init); >>> + >>> +void platform_override_dma_ops(struct virtio_device *vdev) >>> +{ >>> + /* Override vdev->parent.dma_ops if required */ >>> +} >>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c >>> index 6b13987..432c332 100644 >>> --- a/drivers/virtio/virtio.c >>> +++ b/drivers/virtio/virtio.c >>> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status); >>> >>> const struct dma_map_ops virtio_direct_dma_ops; >>> >>> +#ifndef platform_override_dma_ops >>> +static inline void platform_override_dma_ops(struct virtio_device *vdev) >>> +{ >>> +} >>> +#endif >>> + >>> int virtio_finalize_features(struct virtio_device *dev) >>> { >>> int ret = dev->config->finalize_features(dev); >>> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev) >>> if (virtio_has_iommu_quirk(dev)) >>> set_dma_ops(dev->dev.parent, _direct_dma_ops); >>> >>> + platform_override_dma_ops(dev); >> >> Is there a single place where virtio_has_iommu_quirk is called now? > > Not other than this one. But in the proposed implementation of > platform_override_dma_ops on powerpc, we will again check on > virtio_has_iommu_quirk before overriding it with SWIOTLB. > > void platform_override_dma_ops(struct virtio_device *vdev) > { > if (is_ultravisor_platform() && virtio_has_iommu_quirk(vdev)) > set_dma_ops(vdev->dev.parent, _dma_ops); > } > >> If so, we could put this into virtio_has_iommu_quirk then. > > Did you mean platform_override_dma_ops instead ? If so, yes that > is possible. Default implementation of platform_override_dma_ops > should just check on VIRTIO_F_IOMMU_PLATFORM feature and override > with virtio_direct_dma_ops but arch implementation can check on > what ever else they would like and override appropriately. > > Default platform_override_dma_ops will be like this > > #ifndef platform_override_dma_ops > static inline void platform_override_dma_ops(struct virtio_device *vdev) > { > if(!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) > set_dma_ops(dev->dev.parent, _direct_dma_ops); > } > #endif > > Proposed powerpc implementation will be like this instead > > void platform_override_dma_ops(struct virtio_device *vdev) > { > if (virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) > return; >
Re: [RFC 0/4] Virtio uses DMA API for all devices
On 07/23/2018 02:38 PM, Michael S. Tsirkin wrote: > On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote: >> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote: >>> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: This patch series is the follow up on the discussions we had before about the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation for virito devices (https://patchwork.kernel.org/patch/10417371/). There were suggestions about doing away with two different paths of transactions with the host/QEMU, first being the direct GPA and the other being the DMA API based translations. First patch attempts to create a direct GPA mapping based DMA operations structure called 'virtio_direct_dma_ops' with exact same implementation of the direct GPA path which virtio core currently has but just wrapped in a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the existing semantics. The second patch does exactly that inside the function virtio_finalize_features(). The third patch removes the default direct GPA path from virtio core forcing it to use DMA API callbacks for all devices. Now with that change, every device must have a DMA operations structure associated with it. The fourth patch adds an additional hook which gives the platform an opportunity to do yet another override if required. This platform hook can be used on POWER Ultravisor based protected guests to load up SWIOTLB DMA callbacks to do the required (as discussed previously in the above mentioned thread how host is allowed to access only parts of the guest GPA range) bounce buffering into the shared memory for all I/O scatter gather buffers to be consumed on the host side. Please go through these patches and review whether this approach broadly makes sense. I will appreciate suggestions, inputs, comments regarding the patches or the approach in general. Thank you. >>> I like how patches 1-3 look. Could you test performance >>> with/without to see whether the extra indirection through >>> use of DMA ops causes a measurable slow-down? >> >> I ran this simple DD command 10 times where /dev/vda is a virtio block >> device of 10GB size. >> >> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct >> >> With and without patches bandwidth which has a bit wide range does not >> look that different from each other. >> >> Without patches >> === >> >> -- 1 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s >> -- 2 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s >> -- 3 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s >> -- 4 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s >> -- 5 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s >> -- 6 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s >> -- 7 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s >> -- 8 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s >> -- 9 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s >> -- 10 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s >> >> >> With patches >> >> >> -- 1 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s >> -- 2 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s >> -- 3 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s >> -- 4 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s >> -- 5 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s >> -- 6 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s >> -- 7 - >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s >> -- 8 - >>
Re: [PATCH net-next 0/6] virtio_net: Add ethtool stat items
From: Toshiaki Makita Date: Mon, 23 Jul 2018 23:36:03 +0900 > From: Toshiaki Makita > > Add some ethtool stat items useful for performance analysis. > > Signed-off-by: Toshiaki Makita Michael and Jason, any objections to these new stats? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v36 0/5] Virtio-balloon: support free page reporting
On 07/23/2018 10:36 PM, Dr. David Alan Gilbert wrote: * Michael S. Tsirkin (m...@redhat.com) wrote: On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote: This patch series is separated from the previous "Virtio-balloon Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT, implemented by this series enables the virtio-balloon driver to report hints of guest free pages to the host. It can be used to accelerate live migration of VMs. Here is an introduction of this usage: Live migration needs to transfer the VM's memory from the source machine to the destination round by round. For the 1st round, all the VM's memory is transferred. From the 2nd round, only the pieces of memory that were written by the guest (after the 1st round) are transferred. One method that is popularly used by the hypervisor to track which part of memory is written is to write-protect all the guest memory. This feature enables the optimization by skipping the transfer of guest free pages during VM live migration. It is not concerned that the memory pages are used after they are given to the hypervisor as a hint of the free pages, because they will be tracked by the hypervisor and transferred in the subsequent round if they are used and written. * Tests - Test Environment Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Guest: 8G RAM, 4 vCPU Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second - Test Results - Idle Guest Live Migration Time (results are averaged over 10 runs): - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction (setting page poisoning zero and enabling ksm don't affect the comparison result) - Guest with Linux Compilation Workload (make bzImage -j4): - Live Migration Time (average) Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction - Linux Compilation Time Optimization v.s. Legacy = 5min4s v.s. 5min12s --> no obvious difference I'd like to see dgilbert's take on whether this kind of gain justifies adding a PV interfaces, and what kind of guest workload is appropriate. Cc'd. Well, 44% is great ... although the measurement is a bit weird. a) A 2 second downtime is very large; 300-500ms is more normal No problem, I will set downtime to 400ms for the tests. b) I'm not sure what the 'average' is - is that just between a bunch of repeated migrations? Yes, just repeatedly ("source<>destination" migration) do the tests and get an averaged result. c) What load was running in the guest during the live migration? The first one above just uses a guest without running any specific workload (named idle guests). The second one uses a guest with the Linux compilation workload running. An interesting measurement to add would be to do the same test but with a VM with a lot more RAM but the same load; you'd hope the gain would be even better. It would be interesting, especially because the users who are interested are people creating VMs allocated with lots of extra memory (for the worst case) but most of the time migrating when it's fairly idle. OK. I will add tests of a guest with larger memory. Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization