Re: [PATCH] Rescan the entire target on transport reset when LUN is 0
On Tue, Sep 08, 2020 at 05:53:16PM +, Felipe Franciosi wrote: > > > > On Sep 8, 2020, at 3:22 PM, Paolo Bonzini wrote: > > > > On 28/08/20 14:21, Matej Genci wrote: > >> VirtIO 1.0 spec says > >>The removed and rescan events ... when sent for LUN 0, they MAY > >>apply to the entire target so the driver can ask the initiator > >>to rescan the target to detect this. > >> > >> This change introduces the behaviour described above by scanning the > >> entire scsi target when LUN is set to 0. This is both a functional and a > >> performance fix. It aligns the driver with the spec and allows control > >> planes to hotplug targets with large numbers of LUNs without having to > >> request a RESCAN for each one of them. > >> > >> Signed-off-by: Matej Genci > >> Suggested-by: Felipe Franciosi > >> --- > >> drivers/scsi/virtio_scsi.c | 7 ++- > >> 1 file changed, 6 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > >> index bfec84aacd90..a4b9bc7b4b4a 100644 > >> --- a/drivers/scsi/virtio_scsi.c > >> +++ b/drivers/scsi/virtio_scsi.c > >> @@ -284,7 +284,12 @@ static void virtscsi_handle_transport_reset(struct > >> virtio_scsi *vscsi, > >> > >>switch (virtio32_to_cpu(vscsi->vdev, event->reason)) { > >>case VIRTIO_SCSI_EVT_RESET_RESCAN: > >> - scsi_add_device(shost, 0, target, lun); > >> + if (lun == 0) { > >> + scsi_scan_target(>shost_gendev, 0, target, > >> + SCAN_WILD_CARD, SCSI_SCAN_INITIAL); > >> + } else { > >> + scsi_add_device(shost, 0, target, lun); > >> + } > >>break; > >>case VIRTIO_SCSI_EVT_RESET_REMOVED: > >>sdev = scsi_device_lookup(shost, 0, target, lun); > >> > > > > > > Acked-by: Paolo Bonzini > > Cc: sta...@vger.kernel.org > > Thanks, Paolo. > > I'm Cc'ing stable as I believe this fixes a driver bug where it > doesn't follow the spec. Per commit message, today devices are > required to issue RESCAN events for each LUN behind a target when > hotplugging, or risking the driver not seeing the new LUNs. > > Is this enough? Or should we resend after merge per below? > https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst You need to let stable know the git commit id of the patch in Linus's tree if the cc: stable is not on the final commit that gets merged. thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 6/7] xen/balloon: try to merge system ram resources
On 08.09.20 22:10, David Hildenbrand wrote: Let's try to merge system ram resources we add, to minimize the number of resources in /proc/iomem. We don't care about the boundaries of individual chunks we added. Cc: Andrew Morton Cc: Michal Hocko Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: Roger Pau Monné Cc: Julien Grall Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Signed-off-by: David Hildenbrand Reviewed-by: Juergen Gross Juergen --- drivers/xen/balloon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 7bac38764513d..b57b2067ecbfb 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource, 0); + rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE); unlock_device_hotplug(); mutex_lock(_mutex); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends
On 08.09.20 22:10, David Hildenbrand wrote: We soon want to pass flags, e.g., to mark added System RAM resources. mergeable. Prepare for that. This patch is based on a similar patch by Oscar Salvador: https://lkml.kernel.org/r/20190625075227.15193-3-osalva...@suse.de Acked-by: Wei Liu Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Jason Gunthorpe Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: Vishal Verma Cc: Dave Jiang Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Wei Liu Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: David Hildenbrand Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: "Oliver O'Halloran" Cc: Pingfan Liu Cc: Nathan Lynch Cc: Libor Pechacek Cc: Anton Blanchard Cc: Leonardo Bras Cc: linuxppc-...@lists.ozlabs.org Cc: linux-a...@vger.kernel.org Cc: linux-nvd...@lists.01.org Cc: linux-hyp...@vger.kernel.org Cc: linux-s...@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-de...@lists.xenproject.org Signed-off-by: David Hildenbrand Reviewed-by: Juergen Gross (Xen related part) Juergen --- arch/powerpc/platforms/powernv/memtrace.c | 2 +- arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +- drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 2 +- drivers/dax/kmem.c | 2 +- drivers/hv/hv_balloon.c | 2 +- drivers/s390/char/sclp_cmd.c| 2 +- drivers/virtio/virtio_mem.c | 2 +- drivers/xen/balloon.c | 2 +- include/linux/memory_hotplug.h | 10 ++ mm/memory_hotplug.c | 15 --- 11 files changed, 23 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 13b369d2cc454..a7475d18c671c 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -224,7 +224,7 @@ static int memtrace_online(void) ent->mem = 0; } - if (add_memory(ent->nid, ent->start, ent->size)) { + if (add_memory(ent->nid, ent->start, ent->size, 0)) { pr_err("Failed to add trace memory to node %d\n", ent->nid); ret += 1; diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 5d545b78111f9..54a888ea7f751 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -606,7 +606,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) block_sz = memory_block_size_bytes(); /* Add the memory */ - rc = __add_memory(lmb->nid, lmb->base_addr, block_sz); + rc = __add_memory(lmb->nid, lmb->base_addr, block_sz, 0); if (rc) { invalidate_lmb_associativity_index(lmb); return rc; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index e294f44a78504..d91b3584d4b2b 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -207,7 +207,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length, 0); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 4db3c660de831..2287bcf86480e 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -432,7 +432,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr, nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0); if (ret) goto out; diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 7dcb2902e9b1b..8e66b28ef5bc6 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -95,7 +95,7 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax) * this as RAM automatically. */ rc = add_memory_driver_managed(numa_node, range.start, - range_len(), kmem_name); + range_len(), kmem_name, 0); res->flags |= IORESOURCE_BUSY; if (rc) {
Re: [PATCH] vhost: new vhost_vdpa SET/GET_BACKEND_FEATURES handlers
- Original Message - > This commit introduced vhost_vdpa_set/get_backend_features() to > resolve these issues: > (1)In vhost_vdpa ioctl SET_BACKEND_FEATURES path, currect code > would try to acquire vhost dev mutex twice > (first shown in vhost_vdpa_unlocked_ioctl), which can lead > to a dead lock issue. > (2)SET_BACKEND_FEATURES was blindly added to vring ioctl instead > of vdpa device ioctl > > To resolve these issues, this commit (1)removed mutex operations > in vhost_set_backend_features. (2)Handle ioctl > SET/GET_BACKEND_FEATURES in vdpa ioctl. (3)introduce a new > function vhost_net_set_backend_features() for vhost_net, > which is a wrap of vhost_set_backend_features() with > necessary mutex lockings. > > Signed-off-by: Zhu Lingshan So this patch touches not only vhost-vDPA. Though the function looks correct, I'd prefer you do cleanup on top of my patches which has been tested by Eli. Note that, vhost generic handlers tend to hold mutex by itself. So we probably need more thought on this. Thanks > --- > drivers/vhost/net.c | 9 - > drivers/vhost/vdpa.c | 47 ++- > drivers/vhost/vhost.c | 2 -- > 3 files changed, 41 insertions(+), 17 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 531a00d703cd..e01da77538c8 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -1679,6 +1679,13 @@ static long vhost_net_set_owner(struct vhost_net *n) > return r; > } > > +static void vhost_net_set_backend_features(struct vhost_dev *dev, u64 > features) > +{ > + mutex_lock(>mutex); > + vhost_set_backend_features(dev, features); > + mutex_unlock(>mutex); > +} > + > static long vhost_net_ioctl(struct file *f, unsigned int ioctl, > unsigned long arg) > { > @@ -1715,7 +1722,7 @@ static long vhost_net_ioctl(struct file *f, unsigned > int ioctl, > return -EFAULT; > if (features & ~VHOST_NET_BACKEND_FEATURES) > return -EOPNOTSUPP; > - vhost_set_backend_features(>dev, features); > + vhost_net_set_backend_features(>dev, features); > return 0; > case VHOST_RESET_OWNER: > return vhost_net_reset_owner(n); > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 3fab94f88894..ade33c566a81 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -344,6 +344,33 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa > *v, u32 __user *argp) > return 0; > } > > + > +static long vhost_vdpa_get_backend_features(void __user *argp) > +{ > + u64 features = VHOST_VDPA_BACKEND_FEATURES; > + u64 __user *featurep = argp; > + long r; > + > + r = copy_to_user(featurep, , sizeof(features)); > + > + return r; > +} > +static long vhost_vdpa_set_backend_features(struct vhost_vdpa *v, void > __user *argp) > +{ > + u64 __user *featurep = argp; > + u64 features; > + > + if (copy_from_user(, featurep, sizeof(features))) > + return -EFAULT; > + > + if (features & ~VHOST_VDPA_BACKEND_FEATURES) > + return -EOPNOTSUPP; > + > + vhost_set_backend_features(>vdev, features); > + > + return 0; > +} > + > static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, > void __user *argp) > { > @@ -353,8 +380,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, > unsigned int cmd, > struct vdpa_callback cb; > struct vhost_virtqueue *vq; > struct vhost_vring_state s; > - u64 __user *featurep = argp; > - u64 features; > u32 idx; > long r; > > @@ -381,18 +406,6 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, > unsigned int cmd, > > vq->last_avail_idx = vq_state.avail_index; > break; > - case VHOST_GET_BACKEND_FEATURES: > - features = VHOST_VDPA_BACKEND_FEATURES; > - if (copy_to_user(featurep, , sizeof(features))) > - return -EFAULT; > - return 0; > - case VHOST_SET_BACKEND_FEATURES: > - if (copy_from_user(, featurep, sizeof(features))) > - return -EFAULT; > - if (features & ~VHOST_VDPA_BACKEND_FEATURES) > - return -EOPNOTSUPP; > - vhost_set_backend_features(>vdev, features); > - return 0; > } > > r = vhost_vring_ioctl(>vdev, cmd, argp); > @@ -476,6 +489,12 @@ static long vhost_vdpa_unlocked_ioctl(struct file > *filep, > case VHOST_VDPA_SET_CONFIG_CALL: > r = vhost_vdpa_set_config_call(v, argp); > break; > + case VHOST_SET_BACKEND_FEATURES: > + r = vhost_vdpa_set_backend_features(v, argp); > + break; > + case VHOST_GET_BACKEND_FEATURES: > + r = vhost_vdpa_get_backend_features(argp); > + break; >
Re: [PATCH v2] vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
- Original Message - > set_map() is used by mlx5 vdpa to create a memory region based on the > address map passed by the iotlb argument. If we get successive calls, we > will destroy the current memory region and build another one based on > the new address mapping. We also need to setup the hardware resources > since they depend on the memory region. > > If these calls happen before DRIVER_OK, It means that driver VQs may > also not been setup and we may not create them yet. In this case we want > to avoid setting up the other resources and defer this till we get > DRIVER OK. > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") > Signed-off-by: Eli Cohen > --- > V1->V2: Improve changelog description > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index 9df69d5efe8c..c89cd48a0aab 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -1645,6 +1645,9 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_net > *ndev, struct vhost_iotlb * > if (err) > goto err_mr; > > + if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) > + return 0; > + Is there any reason that we still need to do vq suspending and saving before? Thanks > restore_channels_info(ndev); > err = setup_driver(ndev); > if (err) > -- > 2.26.0 > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
- Original Message - > On Mon, Sep 07, 2020 at 06:53:23AM -0400, Jason Wang wrote: > > > > > > - Original Message - > > > If the memory map changes before the driver status is > > > VIRTIO_CONFIG_S_DRIVER_OK, don't attempt to create resources because it > > > may fail. For example, if the VQ is not ready there is no point in > > > creating resources. > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 > > > devices") > > > Signed-off-by: Eli Cohen > > > --- > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > index 9df69d5efe8c..c89cd48a0aab 100644 > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > @@ -1645,6 +1645,9 @@ static int mlx5_vdpa_change_map(struct > > > mlx5_vdpa_net > > > *ndev, struct vhost_iotlb * > > > if (err) > > > goto err_mr; > > > > > > + if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) > > > + return 0; > > > + > > > > I'm not sure I get this. > > > > It looks to me if set_map() is called before DRIVER_OK, we won't build > > any mapping? > > > What would prevent that? Is it some qemu logic you're relying upon? Ok, I think the map is still there, we just avoid to create some resources. > With current qemu 5.1 with lack of batching support, I get plenty calls > to set_map which result in calls to mlx5_vdpa_change_map(). > If that happens before VIRTIO_CONFIG_S_DRIVER_OK then Imay fail (in case > I was not called to set VQs ready). Right, this could be solved by adding the batched IOTLB updating. Thanks > > > > > > restore_channels_info(ndev); > > > err = setup_driver(ndev); > > > if (err) > > > -- > > > 2.26.0 > > > > > > > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Intel-gfx] [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
On Tue, 8 Sep 2020 at 16:56, Tvrtko Ursulin wrote: > > > On 08/09/2020 16:44, Logan Gunthorpe wrote: > > On 2020-09-08 9:28 a.m., Tvrtko Ursulin wrote: > >>> > >>> diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h > >>> b/drivers/gpu/drm/i915/i915 > >>> index b7b59328cb76..9367ac801f0c 100644 > >>> --- a/drivers/gpu/drm/i915/i915_scatterlist.h > >>> +++ b/drivers/gpu/drm/i915/i915_scatterlist.h > >>> @@ -27,13 +27,19 @@ static __always_inline struct sgt_iter { > >>>} __sgt_iter(struct scatterlist *sgl, bool dma) { > >>> struct sgt_iter s = { .sgp = sgl }; > >>> > >>> + if (sgl && !sg_dma_len(s.sgp)) > >> > >> I'd extend the condition to be, just to be safe: > >> if (dma && sgl && !sg_dma_len(s.sgp)) > >> > > > > Right, good catch, that's definitely necessary. > > > >>> + s.sgp = NULL; > >>> + > >>> if (s.sgp) { > >>> s.max = s.curr = s.sgp->offset; > >>> - s.max += s.sgp->length; > >>> - if (dma) > >>> + > >>> + if (dma) { > >>> + s.max += sg_dma_len(s.sgp); > >>> s.dma = sg_dma_address(s.sgp); > >>> - else > >>> + } else { > >>> + s.max += s.sgp->length; > >>> s.pfn = page_to_pfn(sg_page(s.sgp)); > >>> + } > >> > >> Otherwise has this been tested or alternatively how to test it? (How to > >> repro the issue.) > > > > It has not been tested. To test it, you need Tom's patch set without the > > last "DO NOT MERGE" patch: > > > > https://lkml.kernel.org/lkml/20200907070035.ga25...@infradead.org/T/ > > Tom, do you have a branch somewhere I could pull from? (Just being lazy > about downloading a bunch of messages from the archives.) I don't unfortunately. I'm working locally with poor internet. > > What GPU is in your Lenovo x1 carbon 5th generation and what > graphical/desktop setup I need to repro? Is this enough info?: $ lspci -vnn | grep VGA -A 12 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller]) Subsystem: Lenovo ThinkPad X1 Carbon 5th Gen [17aa:224f] Flags: bus master, fast devsel, latency 0, IRQ 148 Memory at eb00 (64-bit, non-prefetchable) [size=16M] Memory at 6000 (64-bit, prefetchable) [size=256M] I/O ports at e000 [size=64] [virtual] Expansion ROM at 000c [disabled] [size=128K] Capabilities: [40] Vendor Specific Information: Len=0c Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Capabilities: [100] Process Address Space ID (PASID) Capabilities: [200] Address Translation Service (ATS) > > Regards, > > Tvrtko ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 7/7] hv_balloon: try to merge system ram resources
Let's try to merge system ram resources we add, to minimize the number of resources in /proc/iomem. We don't care about the boundaries of individual chunks we added. Cc: Andrew Morton Cc: Michal Hocko Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Wei Liu Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Signed-off-by: David Hildenbrand --- drivers/hv/hv_balloon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 0194bed1a5736..b64d2efbefe71 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), - (HA_CHUNK << PAGE_SHIFT), 0); + (HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE); if (ret) { pr_err("hot_add memory failed error is %d\n", ret); -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 5/7] virtio-mem: try to merge system ram resources
virtio-mem adds memory in memory block granularity, to be able to remove it in the same granularity again later, and to grow slowly on demand. This, however, results in quite a lot of resources when adding a lot of memory. Resources are effectively stored in a list-based tree. Having a lot of resources not only wastes memory, it also makes traversing that tree more expensive, and makes /proc/iomem explode in size (e.g., requiring kexec-tools to manually merge resources later when e.g., trying to create a kdump header). Before this patch, we get (/proc/iomem) when hotplugging 2G via virtio-mem on x86-64: [...] 1-13fff : System RAM 14000-33fff : virtio0 14000-147ff : System RAM (virtio_mem) 14800-14fff : System RAM (virtio_mem) 15000-157ff : System RAM (virtio_mem) 15800-15fff : System RAM (virtio_mem) 16000-167ff : System RAM (virtio_mem) 16800-16fff : System RAM (virtio_mem) 17000-177ff : System RAM (virtio_mem) 17800-17fff : System RAM (virtio_mem) 18000-187ff : System RAM (virtio_mem) 18800-18fff : System RAM (virtio_mem) 19000-197ff : System RAM (virtio_mem) 19800-19fff : System RAM (virtio_mem) 1a000-1a7ff : System RAM (virtio_mem) 1a800-1afff : System RAM (virtio_mem) 1b000-1b7ff : System RAM (virtio_mem) 1b800-1bfff : System RAM (virtio_mem) 328000-32 : PCI Bus :00 With this patch, we get (/proc/iomem): [...] fffc- : Reserved 1-13fff : System RAM 14000-33fff : virtio0 14000-1bfff : System RAM (virtio_mem) 328000-32 : PCI Bus :00 Of course, with more hotplugged memory, it gets worse. When unplugging memory blocks again, try_remove_memory() (via offline_and_remove_memory()) will properly split the resource up again. Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Michael S. Tsirkin Cc: Jason Wang Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 314ab753139d1..ba4de598f6636 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -424,7 +424,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id) dev_dbg(>vdev->dev, "adding memory block: %lu\n", mb_id); return add_memory_driver_managed(nid, addr, memory_block_size_bytes(), -vm->resource_name, 0); +vm->resource_name, +MEMHP_MERGE_RESOURCE); } /* -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 6/7] xen/balloon: try to merge system ram resources
Let's try to merge system ram resources we add, to minimize the number of resources in /proc/iomem. We don't care about the boundaries of individual chunks we added. Cc: Andrew Morton Cc: Michal Hocko Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: Roger Pau Monné Cc: Julien Grall Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Signed-off-by: David Hildenbrand --- drivers/xen/balloon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 7bac38764513d..b57b2067ecbfb 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource, 0); + rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE); unlock_device_hotplug(); mutex_lock(_mutex); -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 4/7] mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources
Some add_memory*() users add memory in small, contiguous memory blocks. Examples include virtio-mem, hyper-v balloon, and the XEN balloon. This can quickly result in a lot of memory resources, whereby the actual resource boundaries are not of interest (e.g., it might be relevant for DIMMs, exposed via /proc/iomem to user space). We really want to merge added resources in this scenario where possible. Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource either created within add_memory*() or passed via add_memory_resource() shall be marked mergeable and merged with applicable siblings. To implement that, we need a kernel/resource interface to mark selected System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger merging. Note: We really want to merge after the whole operation succeeded, not directly when adding a resource to the resource tree (it would break add_memory_resource() and require splitting resources again when the operation failed - e.g., due to -ENOMEM). Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Jason Gunthorpe Cc: Kees Cook Cc: Ard Biesheuvel Cc: Thomas Gleixner Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Wei Liu Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: Roger Pau Monné Cc: Julien Grall Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Signed-off-by: David Hildenbrand --- include/linux/ioport.h | 4 +++ include/linux/memory_hotplug.h | 9 + kernel/resource.c | 60 ++ mm/memory_hotplug.c| 7 4 files changed, 80 insertions(+) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index d7620d7c941a0..7e61389dcb017 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -60,6 +60,7 @@ struct resource { /* IORESOURCE_SYSRAM specific bits. */ #define IORESOURCE_SYSRAM_DRIVER_MANAGED 0x0200 /* Always detected via a driver. */ +#define IORESOURCE_SYSRAM_MERGEABLE0x0400 /* Resource can be merged. */ #define IORESOURCE_EXCLUSIVE 0x0800 /* Userland may not map this resource */ @@ -253,6 +254,9 @@ extern void __release_region(struct resource *, resource_size_t, extern void release_mem_region_adjustable(struct resource *, resource_size_t, resource_size_t); #endif +#ifdef CONFIG_MEMORY_HOTPLUG +extern void merge_system_ram_resource(struct resource *res); +#endif /* Wrappers for managed devices */ struct device; diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 5cd48332ce119..feb4aac03f2eb 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -68,6 +68,15 @@ struct mhp_params { pgprot_t pgprot; }; +/* Flags used for add_memory() and friends. */ + +/* + * Allow merging of the added System RAM resource with adjacent, mergeable + * resources. After a successful call to add_memory_resource() with this flag + * set, the resource pointer must no longer be used as it might be stale. + */ +#define MEMHP_MERGE_RESOURCE 1 + /* * Zone resizing functions * diff --git a/kernel/resource.c b/kernel/resource.c index 36b3552210120..7a91b935f4c20 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1363,6 +1363,66 @@ void release_mem_region_adjustable(struct resource *parent, } #endif /* CONFIG_MEMORY_HOTREMOVE */ +#ifdef CONFIG_MEMORY_HOTPLUG +static bool system_ram_resources_mergeable(struct resource *r1, + struct resource *r2) +{ + /* We assume either r1 or r2 is IORESOURCE_SYSRAM_MERGEABLE. */ + return r1->flags == r2->flags && r1->end + 1 == r2->start && + r1->name == r2->name && r1->desc == r2->desc && + !r1->child && !r2->child; +} + +/* + * merge_system_ram_resource - mark the System RAM resource mergeable and try to + * merge it with adjacent, mergeable resources + * @res: resource descriptor + * + * This interface is intended for memory hotplug, whereby lots of contiguous + * system ram resources are added (e.g., via add_memory*()) by a driver, and + * the actual resource boundaries are not of interest (e.g., it might be + * relevant for DIMMs). Only resources that are marked mergeable, that have the + * same parent, and that don't have any children are considered. All mergeable + * resources must be immutable during the request. + * + * Note: + * - The caller has to make sure that no pointers to resources that are + * marked mergeable are used anymore after this call - the resource might + * be freed and the pointer might be stale! + * - release_mem_region_adjustable() will split on demand on memory hotunplug + */ +void merge_system_ram_resource(struct resource *res) +{ + const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + struct resource *cur; + + if (WARN_ON_ONCE((res->flags & flags) !=
[PATCH v2 0/7] mm/memory_hotplug: selective merging of system ram resources
Some add_memory*() users add memory in small, contiguous memory blocks. Examples include virtio-mem, hyper-v balloon, and the XEN balloon. This can quickly result in a lot of memory resources, whereby the actual resource boundaries are not of interest (e.g., it might be relevant for DIMMs, exposed via /proc/iomem to user space). We really want to merge added resources in this scenario where possible. Resources are effectively stored in a list-based tree. Having a lot of resources not only wastes memory, it also makes traversing that tree more expensive, and makes /proc/iomem explode in size (e.g., requiring kexec-tools to manually merge resources when creating a kdump header. The current kexec-tools resource count limit does not allow for more than ~100GB of memory with a memory block size of 128MB on x86-64). Let's allow to selectively merge system ram resources by specifying a new flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only tested with virtio-mem. v1 -> v2: - I had another look at v1 after vacation and didn't like it - it felt like a hack. So I want forward and added a proper flag to add_memory*(), and introduce a clean (non-racy) way to mark System RAM resources mergeable. - "kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED" -- Clean that flag up, felt wrong in the PnP section - "mm/memory_hotplug: prepare passing flags to add_memory() and friends" -- Previously sent in other context - decided to keep Wei's ack - "mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources" -- Cleaner approach to get the job done by using proper flags and only merging the single, specified resource - "virtio-mem: try to merge system ram resources" "xen/balloon: try to merge system ram resources" "hv_balloon: try to merge system ram resources" -- Use the new flag MEMHP_MERGE_RESOURCE, much cleaner RFC -> v1: - Switch from rather generic "merge_child_mem_resources()" where a resource name has to be specified to "merge_system_ram_resources(). - Smaller comment/documentation/patch description changes/fixes David Hildenbrand (7): kernel/resource: make release_mem_region_adjustable() never fail kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED mm/memory_hotplug: prepare passing flags to add_memory() and friends mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources virtio-mem: try to merge system ram resources xen/balloon: try to merge system ram resources hv_balloon: try to merge system ram resources arch/powerpc/platforms/powernv/memtrace.c | 2 +- .../platforms/pseries/hotplug-memory.c| 2 +- drivers/acpi/acpi_memhotplug.c| 2 +- drivers/base/memory.c | 2 +- drivers/dax/kmem.c| 2 +- drivers/hv/hv_balloon.c | 2 +- drivers/s390/char/sclp_cmd.c | 2 +- drivers/virtio/virtio_mem.c | 3 +- drivers/xen/balloon.c | 2 +- include/linux/ioport.h| 12 +- include/linux/memory_hotplug.h| 19 ++- kernel/kexec_file.c | 2 +- kernel/resource.c | 109 ++ mm/memory_hotplug.c | 48 +++- 14 files changed, 141 insertions(+), 68 deletions(-) -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends
We soon want to pass flags, e.g., to mark added System RAM resources. mergeable. Prepare for that. This patch is based on a similar patch by Oscar Salvador: https://lkml.kernel.org/r/20190625075227.15193-3-osalva...@suse.de Acked-by: Wei Liu Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Jason Gunthorpe Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: Vishal Verma Cc: Dave Jiang Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Wei Liu Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: David Hildenbrand Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: "Oliver O'Halloran" Cc: Pingfan Liu Cc: Nathan Lynch Cc: Libor Pechacek Cc: Anton Blanchard Cc: Leonardo Bras Cc: linuxppc-...@lists.ozlabs.org Cc: linux-a...@vger.kernel.org Cc: linux-nvd...@lists.01.org Cc: linux-hyp...@vger.kernel.org Cc: linux-s...@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-de...@lists.xenproject.org Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 2 +- arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +- drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 2 +- drivers/dax/kmem.c | 2 +- drivers/hv/hv_balloon.c | 2 +- drivers/s390/char/sclp_cmd.c| 2 +- drivers/virtio/virtio_mem.c | 2 +- drivers/xen/balloon.c | 2 +- include/linux/memory_hotplug.h | 10 ++ mm/memory_hotplug.c | 15 --- 11 files changed, 23 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 13b369d2cc454..a7475d18c671c 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -224,7 +224,7 @@ static int memtrace_online(void) ent->mem = 0; } - if (add_memory(ent->nid, ent->start, ent->size)) { + if (add_memory(ent->nid, ent->start, ent->size, 0)) { pr_err("Failed to add trace memory to node %d\n", ent->nid); ret += 1; diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 5d545b78111f9..54a888ea7f751 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -606,7 +606,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) block_sz = memory_block_size_bytes(); /* Add the memory */ - rc = __add_memory(lmb->nid, lmb->base_addr, block_sz); + rc = __add_memory(lmb->nid, lmb->base_addr, block_sz, 0); if (rc) { invalidate_lmb_associativity_index(lmb); return rc; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index e294f44a78504..d91b3584d4b2b 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -207,7 +207,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length, 0); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 4db3c660de831..2287bcf86480e 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -432,7 +432,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr, nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0); if (ret) goto out; diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 7dcb2902e9b1b..8e66b28ef5bc6 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -95,7 +95,7 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax) * this as RAM automatically. */ rc = add_memory_driver_managed(numa_node, range.start, - range_len(), kmem_name); + range_len(), kmem_name, 0); res->flags |= IORESOURCE_BUSY; if (rc) { diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
[PATCH v2 2/7] kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED
IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is always set to 0 by hardware. This is far from beautiful (and confusing), and the bit only applies to SYSRAM. So let's move it out of the bus-specific (PnP) defined bits. We'll add another SYSRAM specific bit soon. If we ever need more bits for other purposes, we can steal some from "desc", or reshuffle/regroup what we have. Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Jason Gunthorpe Cc: Kees Cook Cc: Ard Biesheuvel Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Cc: Eric Biederman Cc: Thomas Gleixner Cc: Greg Kroah-Hartman Cc: ke...@lists.infradead.org Signed-off-by: David Hildenbrand --- include/linux/ioport.h | 4 +++- kernel/kexec_file.c| 2 +- mm/memory_hotplug.c| 4 ++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 52a91f5fa1a36..d7620d7c941a0 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -58,6 +58,9 @@ struct resource { #define IORESOURCE_EXT_TYPE_BITS 0x0100/* Resource extended types */ #define IORESOURCE_SYSRAM 0x0100 /* System RAM (modifier) */ +/* IORESOURCE_SYSRAM specific bits. */ +#define IORESOURCE_SYSRAM_DRIVER_MANAGED 0x0200 /* Always detected via a driver. */ + #define IORESOURCE_EXCLUSIVE 0x0800 /* Userland may not map this resource */ #define IORESOURCE_DISABLED0x1000 @@ -103,7 +106,6 @@ struct resource { #define IORESOURCE_MEM_32BIT (3<<3) #define IORESOURCE_MEM_SHADOWABLE (1<<5) /* dup: IORESOURCE_SHADOWABLE */ #define IORESOURCE_MEM_EXPANSIONROM(1<<6) -#define IORESOURCE_MEM_DRIVER_MANAGED (1<<7) /* PnP I/O specific bits (IORESOURCE_BITS) */ #define IORESOURCE_IO_16BIT_ADDR (1<<0) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index ca40bef75a616..dfeeed1aed084 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -520,7 +520,7 @@ static int locate_mem_hole_callback(struct resource *res, void *arg) /* Returning 0 will take to next memory range */ /* Don't use memory that will be detected and handled by a driver. */ - if (res->flags & IORESOURCE_MEM_DRIVER_MANAGED) + if (res->flags & IORESOURCE_SYSRAM_DRIVER_MANAGED) return 0; if (sz < kbuf->memsz) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 4c47b68a9f4b5..8e1cd18b5cf14 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size, unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; if (strcmp(resource_name, "System RAM")) - flags |= IORESOURCE_MEM_DRIVER_MANAGED; + flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED; /* * Make sure value parsed from 'mem=' only restricts memory adding @@ -1160,7 +1160,7 @@ EXPORT_SYMBOL_GPL(add_memory); * * For this memory, no entries in /sys/firmware/memmap ("raw firmware-provided * memory map") are created. Also, the created memory resource is flagged - * with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-case + * with IORESOURCE_SYSRAM_DRIVER_MANAGED, so in-kernel users can special-case * this memory as well (esp., not place kexec images onto it). * * The resource_name (visible via /proc/iomem) has to have the format -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 1/7] kernel/resource: make release_mem_region_adjustable() never fail
Let's make sure splitting a resource on memory hotunplug will never fail. This will become more relevant once we merge selected System RAM resources - then, we'll trigger that case more often on memory hotunplug. In general, this function is already unlikely to fail. When we remove memory, we free up quite a lot of metadata (memmap, page tables, memory block device, etc.). The only reason it could really fail would be when injecting allocation errors. All other error cases inside release_mem_region_adjustable() seem to be sanity checks if the function would be abused in different context - let's add WARN_ON_ONCE() in these cases so we can catch them. Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Jason Gunthorpe Cc: Kees Cook Cc: Ard Biesheuvel Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Signed-off-by: David Hildenbrand --- include/linux/ioport.h | 4 ++-- kernel/resource.c | 49 -- mm/memory_hotplug.c| 22 +-- 3 files changed, 31 insertions(+), 44 deletions(-) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 6c2b06fe8beb7..52a91f5fa1a36 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -248,8 +248,8 @@ extern struct resource * __request_region(struct resource *, extern void __release_region(struct resource *, resource_size_t, resource_size_t); #ifdef CONFIG_MEMORY_HOTREMOVE -extern int release_mem_region_adjustable(struct resource *, resource_size_t, - resource_size_t); +extern void release_mem_region_adjustable(struct resource *, resource_size_t, + resource_size_t); #endif /* Wrappers for managed devices */ diff --git a/kernel/resource.c b/kernel/resource.c index f1175ce93a1d5..36b3552210120 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1258,21 +1258,28 @@ EXPORT_SYMBOL(__release_region); * assumes that all children remain in the lower address entry for * simplicity. Enhance this logic when necessary. */ -int release_mem_region_adjustable(struct resource *parent, - resource_size_t start, resource_size_t size) +void release_mem_region_adjustable(struct resource *parent, + resource_size_t start, resource_size_t size) { + struct resource *new_res = NULL; + bool alloc_nofail = false; struct resource **p; struct resource *res; - struct resource *new_res; resource_size_t end; - int ret = -EINVAL; end = start + size - 1; - if ((start < parent->start) || (end > parent->end)) - return ret; + if (WARN_ON_ONCE((start < parent->start) || (end > parent->end))) + return; - /* The alloc_resource() result gets checked later */ - new_res = alloc_resource(GFP_KERNEL); + /* +* We free up quite a lot of memory on memory hotunplug (esp., memap), +* just before releasing the region. This is highly unlikely to +* fail - let's play save and make it never fail as the caller cannot +* perform any error handling (e.g., trying to re-add memory will fail +* similarly). +*/ +retry: + new_res = alloc_resource(GFP_KERNEL | alloc_nofail ? __GFP_NOFAIL : 0); p = >child; write_lock(_lock); @@ -1298,7 +1305,6 @@ int release_mem_region_adjustable(struct resource *parent, * so if we are dealing with them, let us just back off here. */ if (!(res->flags & IORESOURCE_SYSRAM)) { - ret = 0; break; } @@ -1315,20 +1321,23 @@ int release_mem_region_adjustable(struct resource *parent, /* free the whole entry */ *p = res->sibling; free_resource(res); - ret = 0; } else if (res->start == start && res->end != end) { /* adjust the start */ - ret = __adjust_resource(res, end + 1, - res->end - end); + WARN_ON_ONCE(__adjust_resource(res, end + 1, + res->end - end)); } else if (res->start != start && res->end == end) { /* adjust the end */ - ret = __adjust_resource(res, res->start, - start - res->start); + WARN_ON_ONCE(__adjust_resource(res, res->start, + start - res->start)); } else { - /* split into two entries */ + /* split into two entries - we need a new resource */ if (!new_res) { -
Re: [PATCH v7 67/72] x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
On Tue, Sep 08, 2020 at 07:20:42PM +0200, Borislav Petkov wrote: > On Mon, Sep 07, 2020 at 03:16:08PM +0200, Joerg Roedel wrote: > > +void cpu_init_exception_handling(void) > > +{ > > + struct tss_struct *tss = this_cpu_ptr(_tss_rw); > > + int cpu = raw_smp_processor_id(); > > + > > + /* paranoid_entry() gets the CPU number from the GDT */ > > + setup_getcpu(cpu); > > + > > + /* IST vectors need TSS to be set up. */ > > + tss_setup_ist(tss); > > + tss_setup_io_bitmap(tss); > > + set_tss_desc(cpu, _cpu_entry_area(cpu)->tss.x86_tss); > > + > > + load_TR_desc(); > > Aha, this is what you mean here in your 0th message. I'm guessing it is > ok to do those things twice in start_secondary... Yes, I think its best to do it twice, so that cpu_init() stays the CPU state barrier it should be, independent of what happens before. Joerg ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] Rescan the entire target on transport reset when LUN is 0
> On Sep 8, 2020, at 3:22 PM, Paolo Bonzini wrote: > > On 28/08/20 14:21, Matej Genci wrote: >> VirtIO 1.0 spec says >>The removed and rescan events ... when sent for LUN 0, they MAY >>apply to the entire target so the driver can ask the initiator >>to rescan the target to detect this. >> >> This change introduces the behaviour described above by scanning the >> entire scsi target when LUN is set to 0. This is both a functional and a >> performance fix. It aligns the driver with the spec and allows control >> planes to hotplug targets with large numbers of LUNs without having to >> request a RESCAN for each one of them. >> >> Signed-off-by: Matej Genci >> Suggested-by: Felipe Franciosi >> --- >> drivers/scsi/virtio_scsi.c | 7 ++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c >> index bfec84aacd90..a4b9bc7b4b4a 100644 >> --- a/drivers/scsi/virtio_scsi.c >> +++ b/drivers/scsi/virtio_scsi.c >> @@ -284,7 +284,12 @@ static void virtscsi_handle_transport_reset(struct >> virtio_scsi *vscsi, >> >> switch (virtio32_to_cpu(vscsi->vdev, event->reason)) { >> case VIRTIO_SCSI_EVT_RESET_RESCAN: >> -scsi_add_device(shost, 0, target, lun); >> +if (lun == 0) { >> +scsi_scan_target(>shost_gendev, 0, target, >> + SCAN_WILD_CARD, SCSI_SCAN_INITIAL); >> +} else { >> +scsi_add_device(shost, 0, target, lun); >> +} >> break; >> case VIRTIO_SCSI_EVT_RESET_REMOVED: >> sdev = scsi_device_lookup(shost, 0, target, lun); >> > > > Acked-by: Paolo Bonzini Cc: sta...@vger.kernel.org Thanks, Paolo. I'm Cc'ing stable as I believe this fixes a driver bug where it doesn't follow the spec. Per commit message, today devices are required to issue RESCAN events for each LUN behind a target when hotplugging, or risking the driver not seeing the new LUNs. Is this enough? Or should we resend after merge per below? https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst F. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active
+ Ard so that he can ack the efi bits. On Mon, Sep 07, 2020 at 03:16:12PM +0200, Joerg Roedel wrote: > From: Tom Lendacky > > Calling down to EFI runtime services can result in the firmware performing > VMGEXIT calls. The firmware is likely to use the GHCB of the OS (e.g., for > setting EFI variables), so each GHCB in the system needs to be identity > mapped in the EFI page tables, as unencrypted, to avoid page faults. > > Signed-off-by: Tom Lendacky > [ jroe...@suse.de: Moved GHCB mapping loop to sev-es.c ] > Signed-off-by: Joerg Roedel > --- > arch/x86/boot/compressed/sev-es.c | 1 + > arch/x86/include/asm/sev-es.h | 2 ++ > arch/x86/kernel/sev-es.c | 30 ++ > arch/x86/platform/efi/efi_64.c| 10 ++ > 4 files changed, 43 insertions(+) > > diff --git a/arch/x86/boot/compressed/sev-es.c > b/arch/x86/boot/compressed/sev-es.c > index 45702b866c33..0a9a248ca33d 100644 > --- a/arch/x86/boot/compressed/sev-es.c > +++ b/arch/x86/boot/compressed/sev-es.c > @@ -12,6 +12,7 @@ > */ > #include "misc.h" > > +#include > #include > #include > #include > diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h > index e919f09ae33c..cf1d957c7091 100644 > --- a/arch/x86/include/asm/sev-es.h > +++ b/arch/x86/include/asm/sev-es.h > @@ -102,11 +102,13 @@ static __always_inline void sev_es_nmi_complete(void) > if (static_branch_unlikely(_es_enable_key)) > __sev_es_nmi_complete(); > } > +extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd); > #else > static inline void sev_es_ist_enter(struct pt_regs *regs) { } > static inline void sev_es_ist_exit(void) { } > static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { > return 0; } > static inline void sev_es_nmi_complete(void) { } > +static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; } > #endif > > #endif > diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c > index 9ab3a4dfecd8..4e2b7e4d9b87 100644 > --- a/arch/x86/kernel/sev-es.c > +++ b/arch/x86/kernel/sev-es.c > @@ -491,6 +491,36 @@ int sev_es_setup_ap_jump_table(struct real_mode_header > *rmh) > return 0; > } > > +/* > + * This is needed by the OVMF UEFI firmware which will use whatever it finds > in > + * the GHCB MSR as its GHCB to talk to the hypervisor. So make sure the > per-cpu > + * runtime GHCBs used by the kernel are also mapped in the EFI page-table. > + */ > +int __init sev_es_efi_map_ghcbs(pgd_t *pgd) > +{ > + struct sev_es_runtime_data *data; > + unsigned long address, pflags; > + int cpu; > + u64 pfn; > + > + if (!sev_es_active()) > + return 0; > + > + pflags = _PAGE_NX | _PAGE_RW; > + > + for_each_possible_cpu(cpu) { > + data = per_cpu(runtime_data, cpu); > + > + address = __pa(>ghcb_page); > + pfn = address >> PAGE_SHIFT; > + > + if (kernel_map_pages_in_pgd(pgd, pfn, address, 1, pflags)) > + return 1; > + } > + > + return 0; > +} > + > static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt > *ctxt) > { > struct pt_regs *regs = ctxt->regs; > diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c > index 6af4da1149ba..8f5759df7776 100644 > --- a/arch/x86/platform/efi/efi_64.c > +++ b/arch/x86/platform/efi/efi_64.c > @@ -47,6 +47,7 @@ > #include > #include > #include > +#include > > /* > * We allocate runtime services regions top-down, starting from -4G, i.e. > @@ -229,6 +230,15 @@ int __init efi_setup_page_tables(unsigned long > pa_memmap, unsigned num_pages) > return 1; > } > > + /* > + * When SEV-ES is active, the GHCB as set by the kernel will be used > + * by firmware. Create a 1:1 unencrypted mapping for each GHCB. > + */ > + if (sev_es_efi_map_ghcbs(pgd)) { > + pr_err("Failed to create 1:1 mapping for the GHCBs!\n"); > + return 1; > + } > + > /* >* When making calls to the firmware everything needs to be 1:1 >* mapped and addressable with 32-bit pointers. Map the kernel > -- > 2.28.0 > -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 67/72] x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
On Mon, Sep 07, 2020 at 03:16:08PM +0200, Joerg Roedel wrote: > From: Joerg Roedel > > The IDT on 64bit contains vectors which use paranoid_entry() and/or IST > stacks. To make these vectors work the TSS and the getcpu GDT entry need > to be set up before the IDT is loaded. > > Signed-off-by: Joerg Roedel > --- > arch/x86/include/asm/processor.h | 1 + > arch/x86/kernel/cpu/common.c | 23 +++ > arch/x86/kernel/smpboot.c| 2 +- > 3 files changed, 25 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/processor.h > b/arch/x86/include/asm/processor.h > index d8a82e650810..5ac507586769 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -696,6 +696,7 @@ extern void load_direct_gdt(int); > extern void load_fixmap_gdt(int); > extern void load_percpu_segment(int); > extern void cpu_init(void); > +extern void cpu_init_exception_handling(void); > extern void cr4_init(void); > > static inline unsigned long get_debugctlmsr(void) > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 1d65365363a1..a9527c0c38fb 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -1854,6 +1854,29 @@ static inline void tss_setup_io_bitmap(struct > tss_struct *tss) > #endif > } > > +/* > + * Setup everything needed to handle exceptions from the IDT, including the > IST > + * exceptions which use paranoid_entry() > + */ > +void cpu_init_exception_handling(void) > +{ > + struct tss_struct *tss = this_cpu_ptr(_tss_rw); > + int cpu = raw_smp_processor_id(); > + > + /* paranoid_entry() gets the CPU number from the GDT */ > + setup_getcpu(cpu); > + > + /* IST vectors need TSS to be set up. */ > + tss_setup_ist(tss); > + tss_setup_io_bitmap(tss); > + set_tss_desc(cpu, _cpu_entry_area(cpu)->tss.x86_tss); > + > + load_TR_desc(); Aha, this is what you mean here in your 0th message. I'm guessing it is ok to do those things twice in start_secondary... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication
On Tue, 1 Sep 2020 16:50:03 +0800 Jason Wang wrote: > On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote: > > Hi, > > > > On 28/08/20 4:04 pm, Cornelia Huck wrote: > >> On Thu, 9 Jul 2020 14:26:53 +0800 > >> Jason Wang wrote: > >> > >> [Let me note right at the beginning that I first noted this while > >> listening to Kishon's talk at LPC on Wednesday. I might be very > >> confused about the background here, so let me apologize beforehand for > >> any confusion I might spread.] > >> > >>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote: > Hi Jason, > > On 7/8/2020 4:52 PM, Jason Wang wrote: > > On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote: > >> Hi Jason, > >> > >> On 7/7/2020 3:17 PM, Jason Wang wrote: > >>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote: > Hi Jason, > > On 7/3/2020 12:46 PM, Jason Wang wrote: > > On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote: > >> Hi Jason, > >> > >> On 7/2/2020 3:40 PM, Jason Wang wrote: > >>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote: > On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay > Abraham I wrote: > > This series enhances Linux Vhost support to enable SoC-to-SoC > > communication over MMIO. This series enables rpmsg > > communication between > > two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2 > > > > 1) Modify vhost to use standard Linux driver model > > 2) Add support in vring to access virtqueue over MMIO > > 3) Add vhost client driver for rpmsg > > 4) Add PCIe RC driver (uses virtio) and PCIe EP driver > > (uses vhost) for > > rpmsg communication between two SoCs connected to > > each other > > 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg > > communication > > between two SoCs connected via NTB > > 6) Add configfs to configure the components > > > > UseCase1 : > > > > VHOST RPMSG VIRTIO RPMSG > > + + > > | | > > | | > > | | > > | | > > +-v--+ +--v---+ > > | Linux | | Linux | > > | Endpoint | | Root Complex | > > | <-> | > > | | | | > > | SOC1 | | SOC2 | > > ++ +--+ > > > > UseCase 2: > > > > VHOST RPMSG VIRTIO RPMSG > > + + > > | | > > | | > > | | > > | | > > +--v--+ +--v--+ > > | | | | > > | HOST1 | | > > HOST2 | > > | | | | > > +--^--+ +--^--+ > > | | > > | | > > +-+ > > > > > > | +--v--+ +--v--+ | > > | | | | | | > > | | EP | | EP > > | | > > | | CONTROLLER1 | | > > CONTROLLER2 | | > > | | <---> | | > > | | | | | | > > | | | | | | > > | | | SoC With Multiple EP Instances > > | | | > > | | | (Configured using NTB Function) > > | | | > > | +-+ +-+ | > > +-+ > > > > > >> > >> First of all, to clarify the terminology: > >> Is "vhost rpmsg" acting as what the virtio standard calls the 'device', > >> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just > > > > Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'. > >> virtqueues + the exiting vhost interfaces? > > > > It's implemented to provide the full 'device' functionality. Ok. > >> > > > > Software Layering: > > > > The high-level SW layering should look something like > > below. This series > > adds support only for RPMSG VHOST, however
Re: [PATCH] Rescan the entire target on transport reset when LUN is 0
On 28/08/20 14:21, Matej Genci wrote: > VirtIO 1.0 spec says > The removed and rescan events ... when sent for LUN 0, they MAY > apply to the entire target so the driver can ask the initiator > to rescan the target to detect this. > > This change introduces the behaviour described above by scanning the > entire scsi target when LUN is set to 0. This is both a functional and a > performance fix. It aligns the driver with the spec and allows control > planes to hotplug targets with large numbers of LUNs without having to > request a RESCAN for each one of them. > > Signed-off-by: Matej Genci > Suggested-by: Felipe Franciosi > --- > drivers/scsi/virtio_scsi.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index bfec84aacd90..a4b9bc7b4b4a 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -284,7 +284,12 @@ static void virtscsi_handle_transport_reset(struct > virtio_scsi *vscsi, > > switch (virtio32_to_cpu(vscsi->vdev, event->reason)) { > case VIRTIO_SCSI_EVT_RESET_RESCAN: > - scsi_add_device(shost, 0, target, lun); > + if (lun == 0) { > + scsi_scan_target(>shost_gendev, 0, target, > + SCAN_WILD_CARD, SCSI_SCAN_INITIAL); > + } else { > + scsi_add_device(shost, 0, target, lun); > + } > break; > case VIRTIO_SCSI_EVT_RESET_REMOVED: > sdev = scsi_device_lookup(shost, 0, target, lun); > Acked-by: Paolo Bonzini ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 0/4] Add a vhost RPMsg API
On Wed, Aug 26, 2020 at 07:46:32PM +0200, Guennadi Liakhovetski wrote: > Hi, > > Next update: OK could we get some acks from rpmsg folks on this please? It's been quite a while, patchset is not huge. > v5: > - don't hard-code message layout > > v4: > - add endianness conversions to comply with the VirtIO standard > > v3: > - address several checkpatch warnings > - address comments from Mathieu Poirier > > v2: > - update patch #5 with a correct vhost_dev_init() prototype > - drop patch #6 - it depends on a different patch, that is currently > an RFC > - address comments from Pierre-Louis Bossart: > * remove "default n" from Kconfig > > Linux supports RPMsg over VirtIO for "remote processor" / AMP use > cases. It can however also be used for virtualisation scenarios, > e.g. when using KVM to run Linux on both the host and the guests. > This patch set adds a wrapper API to facilitate writing vhost > drivers for such RPMsg-based solutions. The first use case is an > audio DSP virtualisation project, currently under development, ready > for review and submission, available at > https://github.com/thesofproject/linux/pull/1501/commits > > Thanks > Guennadi > > Guennadi Liakhovetski (4): > vhost: convert VHOST_VSOCK_SET_RUNNING to a generic ioctl > rpmsg: move common structures and defines to headers > rpmsg: update documentation > vhost: add an RPMsg API > > Documentation/rpmsg.txt | 6 +- > drivers/rpmsg/virtio_rpmsg_bus.c | 78 +-- > drivers/vhost/Kconfig| 7 + > drivers/vhost/Makefile | 3 + > drivers/vhost/rpmsg.c| 373 +++ > drivers/vhost/vhost_rpmsg.h | 74 ++ > include/linux/virtio_rpmsg.h | 83 +++ > include/uapi/linux/rpmsg.h | 3 + > include/uapi/linux/vhost.h | 4 +- > 9 files changed, 551 insertions(+), 80 deletions(-) > create mode 100644 drivers/vhost/rpmsg.c > create mode 100644 drivers/vhost/vhost_rpmsg.h > create mode 100644 include/linux/virtio_rpmsg.h > > -- > 2.28.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] Rescan the entire target on transport reset when LUN is 0
On Fri, Aug 28, 2020 at 12:21:35PM +, Matej Genci wrote: > VirtIO 1.0 spec says > The removed and rescan events ... when sent for LUN 0, they MAY > apply to the entire target so the driver can ask the initiator > to rescan the target to detect this. > > This change introduces the behaviour described above by scanning the > entire scsi target when LUN is set to 0. This is both a functional and a > performance fix. It aligns the driver with the spec and allows control > planes to hotplug targets with large numbers of LUNs without having to > request a RESCAN for each one of them. > > Signed-off-by: Matej Genci > Suggested-by: Felipe Franciosi Stefan, Paolo, could you review this pls? > --- > drivers/scsi/virtio_scsi.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index bfec84aacd90..a4b9bc7b4b4a 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -284,7 +284,12 @@ static void virtscsi_handle_transport_reset(struct > virtio_scsi *vscsi, > > switch (virtio32_to_cpu(vscsi->vdev, event->reason)) { > case VIRTIO_SCSI_EVT_RESET_RESCAN: > - scsi_add_device(shost, 0, target, lun); > + if (lun == 0) { > + scsi_scan_target(>shost_gendev, 0, target, > + SCAN_WILD_CARD, SCSI_SCAN_INITIAL); > + } else { > + scsi_add_device(shost, 0, target, lun); > + } > break; > case VIRTIO_SCSI_EVT_RESET_REMOVED: > sdev = scsi_device_lookup(shost, 0, target, lun); > -- > 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3] virtio-rng: return available data with O_NONBLOCK
On Mon, Aug 31, 2020 at 02:37:26PM +0200, Laurent Vivier wrote: > On 28/08/2020 23:34, Martin Wilck wrote: > > On Wed, 2020-08-26 at 08:26 -0400, Michael S. Tsirkin wrote: > >> On Tue, Aug 11, 2020 at 04:42:32PM +0200, Laurent Vivier wrote: > >>> On 11/08/2020 16:28, mwi...@suse.com wrote: > From: Martin Wilck > > If a program opens /dev/hwrng with O_NONBLOCK and uses poll() and > non-blocking read() to retrieve random data, it ends up in a > tight > loop with poll() always returning POLLIN and read() returning > EAGAIN. > This repeats forever until some process makes a blocking read() > call. > The reason is that virtio_read() always returns 0 in non-blocking > mode, > even if data is available. Worse, it fetches random data from the > hypervisor after every non-blocking call, without ever using this > data. > > The following test program illustrates the behavior and can be > used > for testing and experiments. The problem will only be seen if all > tasks use non-blocking access; otherwise the blocking reads will > "recharge" the random pool and cause other, non-blocking reads to > succeed at least sometimes. > > /* Whether to use non-blocking mode in a task, problem occurs if > CONDITION is 1 */ > //#define CONDITION (getpid() % 2 != 0) > > static volatile sig_atomic_t stop; > static void handler(int sig __attribute__((unused))) { stop = 1; > } > > static void loop(int fd, int sec) > { > struct pollfd pfd = { .fd = fd, .events = POLLIN, }; > unsigned long errors = 0, eagains = 0, bytes = 0, succ = 0; > int size, rc, rd; > > srandom(getpid()); > if (CONDITION && fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | > O_NONBLOCK) == -1) > perror("fcntl"); > size = MINBUFSIZ + random() % (MAXBUFSIZ - MINBUFSIZ + 1); > > for(;;) { > char buf[size]; > > if (stop) > break; > rc = poll(, 1, sec); > if (rc > 0) { > rd = read(fd, buf, sizeof(buf)); > if (rd == -1 && errno == EAGAIN) > eagains++; > else if (rd == -1) > errors++; > else { > succ++; > bytes += rd; > write(1, buf, sizeof(buf)); > } > } else if (rc == -1) { > if (errno != EINTR) > perror("poll"); > break; > } else > fprintf(stderr, "poll: timeout\n"); > } > fprintf(stderr, > "pid %d %sblocking, bufsize %d, %d seconds, %lu bytes > read, %lu success, %lu eagain, %lu errors\n", > getpid(), CONDITION ? "non-" : "", size, sec, bytes, > succ, eagains, errors); > } > > int main(void) > { > int fd; > > fork(); fork(); > fd = open("/dev/hwrng", O_RDONLY); > if (fd == -1) { > perror("open"); > return 1; > }; > signal(SIGALRM, handler); > alarm(SECONDS); > loop(fd, SECONDS); > close(fd); > wait(NULL); > return 0; > } > > void loop(int fd) > { > struct pollfd pfd0 = { .fd = fd, .events = POLLIN, }; > int rc; > unsigned int n; > > for (n = LOOPS; n > 0; n--) { > struct pollfd pfd = pfd0; > char buf[SIZE]; > > rc = poll(, 1, 1); > if (rc > 0) { > int rd = read(fd, buf, sizeof(buf)); > > if (rd == -1) > perror("read"); > else > printf("read %d bytes\n", rd); > } else if (rc == -1) > perror("poll"); > else > fprintf(stderr, "timeout\n"); > > } > } > > int main(void) > { > int fd; > > fd = open("/dev/hwrng", O_RDONLY|O_NONBLOCK); > if (fd == -1) { > perror("open"); > return 1; > }; > loop(fd); > close(fd); > return 0; > } > > This can be observed in the real word e.g. with nested qemu/KVM > virtual > machines, if both the "outer" and "inner" VMs have a virtio-rng > device. > If the "inner" VM requests random data, qemu running in the > "outer" VM > uses this device in a non-blocking
Re: [PATCH v11 2/2] s390: virtio: PV needs VIRTIO I/O device protection
Hi Pierre, I love your patch! Perhaps something to improve: [auto build test WARNING on s390/features] [also build test WARNING on linus/master v5.9-rc4 next-20200903] [cannot apply to linux/master kvms390/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Pierre-Morel/s390-virtio-let-arch-validate-VIRTIO-features/20200907-174101 base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features config: s390-randconfig-r006-20200908 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 5f5a0bb0872a9673bad08b38bc0b14c42263902a) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install s390 cross compiling tool for clang build # apt-get install binutils-s390x-linux-gnu # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x)) ^ include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32' ___constant_swab32(x) : \ ^ include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32' (((__u32)(x) & (__u32)0xff00UL) << 8) |\ ^ In file included from arch/s390/mm/init.c:20: In file included from include/linux/swap.h:9: In file included from include/linux/memcontrol.h:22: In file included from include/linux/writeback.h:14: In file included from include/linux/blk-cgroup.h:23: In file included from include/linux/blkdev.h:25: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:72: include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~ ^ include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x)) ^ include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32' ___constant_swab32(x) : \ ^ include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32' (((__u32)(x) & (__u32)0x00ffUL) >> 8) |\ ^ In file included from arch/s390/mm/init.c:20: In file included from include/linux/swap.h:9: In file included from include/linux/memcontrol.h:22: In file included from include/linux/writeback.h:14: In file included from include/linux/blk-cgroup.h:23: In file included from include/linux/blkdev.h:25: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:72: include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~ ^ include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x)) ^ include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32' ___constant_swab32(x) : \ ^ include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32' (((__u32)(x) & (__u32)0xff00UL) >> 24))) ^ In file included from arch/s390/mm/init.c:20: In file included from include/linux/swap.h:9: In file included from include/linux/memcontrol.h:22: In file included from include/linux/writeback.h:14: In file included from include/linux/blk-cgroup.h:23: In file included from include/linux/blkdev.h:25: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:72: include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((_
[PATCH v7.2 40/74] x86/sev-es: Setup GHCB based boot #VC handler
From: Joerg Roedel Add the infrastructure to handle #VC exceptions when the kernel runs on virtual addresses and has a GHCB mapped. This handler will be used until the runtime #VC handler takes over. Since the handler runs very early, disable instrumentation for sev-es.c. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/realmode.h | 3 + arch/x86/include/asm/segment.h | 2 +- arch/x86/include/asm/sev-es.h | 2 + arch/x86/kernel/Makefile| 2 + arch/x86/kernel/head64.c| 8 +++ arch/x86/kernel/head_64.S | 36 ++ arch/x86/kernel/sev-es-shared.c | 14 ++-- arch/x86/kernel/sev-es.c| 116 arch/x86/mm/extable.c | 1 + 9 files changed, 176 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index b35030eeec36..96118fb041b8 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -57,6 +57,9 @@ extern unsigned char real_mode_blob_end[]; extern unsigned long initial_code; extern unsigned long initial_gs; extern unsigned long initial_stack; +#ifdef CONFIG_AMD_MEM_ENCRYPT +extern unsigned long initial_vc_handler; +#endif extern unsigned char real_mode_blob[]; extern unsigned char real_mode_relocs[]; diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index 517920928989..7fdd4facfce7 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -226,7 +226,7 @@ #define NUM_EXCEPTION_VECTORS 32 /* Bitmask of exception vectors which push an error code on the stack: */ -#define EXCEPTION_ERRCODE_MASK 0x00027d00 +#define EXCEPTION_ERRCODE_MASK 0x20027d00 #define GDT_SIZE (GDT_ENTRIES*8) #define GDT_ENTRY_TLS_ENTRIES 3 diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h index 7175d432ebfe..9fbeedaa66ee 100644 --- a/arch/x86/include/asm/sev-es.h +++ b/arch/x86/include/asm/sev-es.h @@ -75,5 +75,7 @@ static inline u64 lower_bits(u64 val, unsigned int bits) /* Early IDT entry points for #VC handler */ extern void vc_no_ghcb(void); +extern void vc_boot_ghcb(void); +extern bool handle_vc_boot_ghcb(struct pt_regs *regs); #endif diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 3bcdd8d2bbdd..04ceea8f4a89 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -20,6 +20,7 @@ CFLAGS_REMOVE_kvmclock.o = -pg CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_early_printk.o = -pg CFLAGS_REMOVE_head64.o = -pg +CFLAGS_REMOVE_sev-es.o = -pg endif KASAN_SANITIZE_head$(BITS).o := n @@ -27,6 +28,7 @@ KASAN_SANITIZE_dumpstack.o:= n KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_stacktrace.o:= n KASAN_SANITIZE_paravirt.o := n +KASAN_SANITIZE_sev-es.o:= n # With some compiler versions the generated code results in boot hangs, caused # by several compilation units. To be safe, disable all instrumentation. diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index fc55cc9ccb0f..4199f25c0063 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -406,6 +406,10 @@ void __init do_early_exception(struct pt_regs *regs, int trapnr) early_make_pgtable(native_read_cr2())) return; + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && + trapnr == X86_TRAP_VC && handle_vc_boot_ghcb(regs)) + return; + early_fixup_exception(regs, trapnr); } @@ -575,6 +579,10 @@ static void startup_64_load_idt(unsigned long physbase) /* This is used when running on kernel addresses */ void early_setup_idt(void) { + /* VMM Communication Exception */ + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) + set_bringup_idt_handler(bringup_idt_table, X86_TRAP_VC, vc_boot_ghcb); + bringup_idt_descr.address = (unsigned long)bringup_idt_table; native_load_idt(_idt_descr); } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 6e68bca64ae4..1a71d0d4d575 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -279,6 +279,39 @@ SYM_CODE_START(start_cpu0) movqinitial_stack(%rip), %rsp jmp .Ljump_to_C_code SYM_CODE_END(start_cpu0) +#endif + +#ifdef CONFIG_AMD_MEM_ENCRYPT +/* + * VC Exception handler used during early boot when running on kernel + * addresses, but before the switch to the idt_table can be made. + * The early_idt_handler_array can't be used here because it calls into a lot + * of __init code and this handler is also used during CPU offlining/onlining. + * Therefore this handler ends up in the .text section so that it stays around + * when .init.text is freed. + */ +SYM_CODE_START_NOALIGN(vc_boot_ghcb) + UNWIND_HINT_IRET_REGS
[PATCH v7.2 39/74] x86/sev-es: Setup early #VC handler
From: Joerg Roedel Setup an early handler for #VC exceptions. There is no GHCB mapped yet, so just re-use the vc_no_ghcb_handler. It can only handle CPUID exit-codes, but that should be enough to get the kernel through verify_cpu() and __startup_64() until it runs on virtual addresses. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/sev-es.h | 3 +++ arch/x86/kernel/head64.c | 25 - arch/x86/kernel/head_64.S | 30 ++ 3 files changed, 57 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h index 6dc52440c4b4..7175d432ebfe 100644 --- a/arch/x86/include/asm/sev-es.h +++ b/arch/x86/include/asm/sev-es.h @@ -73,4 +73,7 @@ static inline u64 lower_bits(u64 val, unsigned int bits) return (val & mask); } +/* Early IDT entry points for #VC handler */ +extern void vc_no_ghcb(void); + #endif diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 4282dac694c3..fc55cc9ccb0f 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -40,6 +40,7 @@ #include #include #include +#include /* * Manage page tables very early on. @@ -540,12 +541,34 @@ static struct desc_ptr bringup_idt_descr = { .address= 0, /* Set at runtime */ }; +static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler) +{ +#ifdef CONFIG_AMD_MEM_ENCRYPT + struct idt_data data; + gate_desc desc; + + init_idt_data(, n, handler); + idt_init_desc(, ); + native_write_idt_entry(idt, n, ); +#endif +} + /* This runs while still in the direct mapping */ static void startup_64_load_idt(unsigned long physbase) { struct desc_ptr *desc = fixup_pointer(_idt_descr, physbase); + gate_desc *idt = fixup_pointer(bringup_idt_table, physbase); + + + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) { + void *handler; + + /* VMM Communication Exception */ + handler = fixup_pointer(vc_no_ghcb, physbase); + set_bringup_idt_handler(idt, X86_TRAP_VC, handler); + } - desc->address = (unsigned long)fixup_pointer(bringup_idt_table, physbase); + desc->address = (unsigned long)idt; native_load_idt(desc); } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 3b40ec44a67d..6e68bca64ae4 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -348,6 +348,36 @@ SYM_CODE_START_LOCAL(early_idt_handler_common) jmp restore_regs_and_return_to_kernel SYM_CODE_END(early_idt_handler_common) +#ifdef CONFIG_AMD_MEM_ENCRYPT +/* + * VC Exception handler used during very early boot. The + * early_idt_handler_array can't be used because it returns via the + * paravirtualized INTERRUPT_RETURN and pv-ops don't work that early. + * + * This handler will end up in the .init.text section and not be + * available to boot secondary CPUs. + */ +SYM_CODE_START_NOALIGN(vc_no_ghcb) + UNWIND_HINT_IRET_REGS offset=8 + + /* Build pt_regs */ + PUSH_AND_CLEAR_REGS + + /* Call C handler */ + movq%rsp, %rdi + movqORIG_RAX(%rsp), %rsi + calldo_vc_no_ghcb + + /* Unwind pt_regs */ + POP_REGS + + /* Remove Error Code */ + addq$8, %rsp + + /* Pure iret required here - don't use INTERRUPT_RETURN */ + iretq +SYM_CODE_END(vc_no_ghcb) +#endif #define SYM_DATA_START_PAGE_ALIGNED(name) \ SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE) -- 2.28.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] vhost: remove mutex ops in vhost_set_backend_features
On Mon, Sep 07, 2020 at 06:52:19PM +0800, Zhu Lingshan wrote: > In vhost_vdpa ioctl SET_BACKEND_FEATURES path, currect code > would try to acquire vhost dev mutex twice > (first shown in vhost_vdpa_unlocked_ioctl), which can lead > to a dead lock issue. > This commit removed mutex operations in vhost_set_backend_features. > As a compensation for vhost_net, a followinig commit will add > needed mutex lock/unlock operations in a new function > vhost_net_set_backend_features() which is a wrap of > vhost_set_backend_features(). > > Signed-off-by: Zhu Lingshan I think you need to squash these two or reorder, we can't first make code racy then fix it up. > --- > drivers/vhost/vhost.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index b45519ca66a7..e03c9e6f058f 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -2591,14 +2591,12 @@ void vhost_set_backend_features(struct vhost_dev > *dev, u64 features) > struct vhost_virtqueue *vq; > int i; > > - mutex_lock(>mutex); > for (i = 0; i < dev->nvqs; ++i) { > vq = dev->vqs[i]; > mutex_lock(>mutex); > vq->acked_backend_features = features; > mutex_unlock(>mutex); > } > - mutex_unlock(>mutex); > } > EXPORT_SYMBOL_GPL(vhost_set_backend_features); > > -- > 2.18.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
On Mon, Sep 07, 2020 at 02:43:51PM +0300, Eli Cohen wrote: > On Mon, Sep 07, 2020 at 07:34:00AM -0400, Michael S. Tsirkin wrote: > > On Mon, Sep 07, 2020 at 10:51:36AM +0300, Eli Cohen wrote: > > > If the memory map changes before the driver status is > > > VIRTIO_CONFIG_S_DRIVER_OK, don't attempt to create resources because it > > > may fail. For example, if the VQ is not ready there is no point in > > > creating resources. > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 > > > devices") > > > Signed-off-by: Eli Cohen > > > > > > Could you add a bit more data about the problem to the log? > > To be more exact, what exactly happens right now? > > > > Sure I can. > > set_map() is used by mlx5 vdpa to create a memory region based on the > address map passed by the iotlb argument. If I get successive calls, I > will destroy the current memory region and build another one based on > the new address mapping. I also need to setup the hardware resources > since they depend on the memory region. > > If these calls happen before DRIVER_OK, It means it that driver VQs may > also not been setup and I may not create them yet. In this case I want > to avoid setting up the other resources and defer this till I get DRIVER > OK. > > Let me know if that answers your question so I can post another patch. it does, pls do. > > > --- > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > index 9df69d5efe8c..c89cd48a0aab 100644 > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > > > @@ -1645,6 +1645,9 @@ static int mlx5_vdpa_change_map(struct > > > mlx5_vdpa_net *ndev, struct vhost_iotlb * > > > if (err) > > > goto err_mr; > > > > > > + if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) > > > + return 0; > > > + > > > restore_channels_info(ndev); > > > err = setup_driver(ndev); > > > if (err) > > > -- > > > 2.26.0 > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] drm/qxl: use drmm_mode_config_init
On Tue, Sep 08, 2020 at 11:39:10AM +0200, Gerd Hoffmann wrote: > Signed-off-by: Gerd Hoffmann Btw going all in on devm_drm_dev_alloc and managed functions might be good cleanup for virtio. Reviewed-by: Daniel Vetter > --- > drivers/gpu/drm/qxl/qxl_display.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/qxl/qxl_display.c > b/drivers/gpu/drm/qxl/qxl_display.c > index fa79688013b7..4be04eaf7f37 100644 > --- a/drivers/gpu/drm/qxl/qxl_display.c > +++ b/drivers/gpu/drm/qxl/qxl_display.c > @@ -1190,7 +1190,9 @@ int qxl_modeset_init(struct qxl_device *qdev) > int i; > int ret; > > - drm_mode_config_init(>ddev); > + ret = drmm_mode_config_init(>ddev); > + if (ret) > + return ret; > > ret = qxl_create_monitors_object(qdev); > if (ret) > @@ -1223,5 +1225,4 @@ int qxl_modeset_init(struct qxl_device *qdev) > void qxl_modeset_fini(struct qxl_device *qdev) > { > qxl_destroy_monitors_object(qdev); > - drm_mode_config_cleanup(>ddev); > } > -- > 2.27.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 41/72] x86/sev-es: Setup per-cpu GHCBs for the runtime handler
On Mon, Sep 07, 2020 at 03:15:42PM +0200, Joerg Roedel wrote: > +void __init sev_es_init_vc_handling(void) > +{ > + int cpu; > + > + BUILD_BUG_ON((offsetof(struct sev_es_runtime_data, ghcb_page) % > PAGE_SIZE) != 0); Simplified that to: BUILD_BUG_ON(offsetof(struct sev_es_runtime_data, ghcb_page) % PAGE_SIZE); -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 0/6] mm / virtio-mem: support ZONE_MOVABLE
On 21.08.20 22:55, Andrew Morton wrote: > On Fri, 21 Aug 2020 10:45:33 +0200 David Hildenbrand wrote: > >> On 21.08.20 10:31, David Hildenbrand wrote: >>> On 16.08.20 14:53, David Hildenbrand wrote: For 5.10. Patch #1-#4,#6 have RBs or ACKs, patch #5 is virtio-mem stuff maintained by me. This should go via the -mm tree. >>> >>> @Andrew, can we give this a churn if there are no further comments? Thanks! >> >> ... I just spotted the patches in -next, strange I didn't get an email >> notification. Thanks :) > > https://lore.kernel.org/mm-commits/20200819025501.gjhzlolfc%25a...@linux-foundation.org/ > > akpm!=spam :) > I would never even dare to think that ... now I have to teach my spam filter some good manners ;) -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 5/5] hv_balloon: try to merge system ram resources
On 04.09.20 19:30, Wei Liu wrote: > On Fri, Aug 21, 2020 at 12:34:31PM +0200, David Hildenbrand wrote: >> Let's use the new mechanism to merge system ram resources below the >> root. We are the only one hotplugging system ram, e.g., DIMMs don't apply, >> so this is safe to be used. >> >> Cc: Andrew Morton >> Cc: Michal Hocko >> Cc: "K. Y. Srinivasan" >> Cc: Haiyang Zhang >> Cc: Stephen Hemminger >> Cc: Wei Liu >> Cc: Pankaj Gupta >> Cc: Baoquan He >> Cc: Wei Yang >> Signed-off-by: David Hildenbrand >> --- >> drivers/hv/hv_balloon.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c >> index 32e3bc0aa665a..49a6305f0fb73 100644 >> --- a/drivers/hv/hv_balloon.c >> +++ b/drivers/hv/hv_balloon.c >> @@ -745,6 +745,9 @@ static void hv_mem_hot_add(unsigned long start, unsigned >> long size, >> has->covered_end_pfn -= processed_pfn; >> spin_unlock_irqrestore(_device.ha_lock, flags); >> break; >> +} else { >> +/* Try to reduce the number of system ram resources. */ >> +merge_system_ram_resources(_resource); >> } > > You don't need to put the call under the "else" branch. It will have > broken out of the loop if ret is not zero. > Agreed, thanks! -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 2/5] kernel/resource: merge_system_ram_resources() to merge resources after hotplug
On 31.08.20 11:35, Pankaj Gupta wrote: >> Some add_memory*() users add memory in small, contiguous memory blocks. >> Examples include virtio-mem, hyper-v balloon, and the XEN balloon. >> >> This can quickly result in a lot of memory resources, whereby the actual >> resource boundaries are not of interest (e.g., it might be relevant for >> DIMMs, exposed via /proc/iomem to user space). We really want to merge >> added resources in this scenario where possible. >> >> Let's provide an interface to trigger merging of applicable child >> resources. It will be, for example, used by virtio-mem to trigger >> merging of system ram resources it added to its resource container, but >> also by XEN and Hyper-V to trigger merging of system ram resources in >> iomem_resource. >> >> Note: We really want to merge after the whole operation succeeded, not >> directly when adding a resource to the resource tree (it would break >> add_memory_resource() and require splitting resources again when the >> operation failed - e.g., due to -ENOMEM). >> >> Cc: Andrew Morton >> Cc: Michal Hocko >> Cc: Dan Williams >> Cc: Jason Gunthorpe >> Cc: Kees Cook >> Cc: Ard Biesheuvel >> Cc: Thomas Gleixner >> Cc: "K. Y. Srinivasan" >> Cc: Haiyang Zhang >> Cc: Stephen Hemminger >> Cc: Wei Liu >> Cc: Boris Ostrovsky >> Cc: Juergen Gross >> Cc: Stefano Stabellini >> Cc: Roger Pau Monné >> Cc: Julien Grall >> Cc: Pankaj Gupta >> Cc: Baoquan He >> Cc: Wei Yang >> Signed-off-by: David Hildenbrand >> --- >> include/linux/ioport.h | 3 +++ >> kernel/resource.c | 52 ++ >> 2 files changed, 55 insertions(+) >> >> diff --git a/include/linux/ioport.h b/include/linux/ioport.h >> index 52a91f5fa1a36..3bb0020cd6ddc 100644 >> --- a/include/linux/ioport.h >> +++ b/include/linux/ioport.h >> @@ -251,6 +251,9 @@ extern void __release_region(struct resource *, >> resource_size_t, >> extern void release_mem_region_adjustable(struct resource *, >> resource_size_t, >> resource_size_t); >> #endif >> +#ifdef CONFIG_MEMORY_HOTPLUG >> +extern void merge_system_ram_resources(struct resource *res); >> +#endif >> >> /* Wrappers for managed devices */ >> struct device; >> diff --git a/kernel/resource.c b/kernel/resource.c >> index 1dcef5d53d76e..b4e0963edadd2 100644 >> --- a/kernel/resource.c >> +++ b/kernel/resource.c >> @@ -1360,6 +1360,58 @@ void release_mem_region_adjustable(struct resource >> *parent, >> } >> #endif /* CONFIG_MEMORY_HOTREMOVE */ >> >> +#ifdef CONFIG_MEMORY_HOTPLUG >> +static bool system_ram_resources_mergeable(struct resource *r1, >> + struct resource *r2) >> +{ >> + return r1->flags == r2->flags && r1->end + 1 == r2->start && >> + r1->name == r2->name && r1->desc == r2->desc && >> + !r1->child && !r2->child; >> +} >> + >> +/* >> + * merge_system_ram_resources - try to merge contiguous system ram resources >> + * @parent: parent resource descriptor >> + * >> + * This interface is intended for memory hotplug, whereby lots of contiguous >> + * system ram resources are added (e.g., via add_memory*()) by a driver, and >> + * the actual resource boundaries are not of interest (e.g., it might be >> + * relevant for DIMMs). Only immediate child resources that are busy and >> + * don't have any children are considered. All applicable child resources >> + * must be immutable during the request. >> + * >> + * Note: >> + * - The caller has to make sure that no pointers to resources that might >> + * get merged are held anymore. Callers should only trigger merging of >> child >> + * resources when they are the only one adding system ram resources to the >> + * parent (besides during boot). >> + * - release_mem_region_adjustable() will split on demand on memory >> hotunplug >> + */ >> +void merge_system_ram_resources(struct resource *parent) >> +{ >> + const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> + struct resource *cur, *next; >> + >> + write_lock(_lock); >> + >> + cur = parent->child; >> + while (cur && cur->sibling) { >> + next = cur->sibling; >> + if ((cur->flags & flags) == flags && > > Maybe this can be changed to: > !(cur->flags & ~flags) That would be different I think. (cur->flags & flags) == flags checks that all "flags" are set (additional ones might be set). !(cur->flags & ~flags) checks that no other flags besides "flags" are set (and "flags" are not required to be set). We use the same handling in find_next_iomem_res(), e.g., called via walk_system_ram_range also with IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY. Thanks for having a look! -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v7.1 40/72] x86/sev-es: Setup GHCB based boot #VC handler
From: Joerg Roedel Add the infrastructure to handle #VC exceptions when the kernel runs on virtual addresses and has a GHCB mapped. This handler will be used until the runtime #VC handler takes over. Since the handler runs very early, disable instrumentation for sev-es.c. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/realmode.h | 3 + arch/x86/include/asm/segment.h | 2 +- arch/x86/include/asm/sev-es.h | 2 + arch/x86/kernel/Makefile| 2 + arch/x86/kernel/head64.c| 8 +++ arch/x86/kernel/head_64.S | 36 ++ arch/x86/kernel/sev-es-shared.c | 14 ++-- arch/x86/kernel/sev-es.c| 116 arch/x86/mm/extable.c | 1 + 9 files changed, 176 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index b35030eeec36..96118fb041b8 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -57,6 +57,9 @@ extern unsigned char real_mode_blob_end[]; extern unsigned long initial_code; extern unsigned long initial_gs; extern unsigned long initial_stack; +#ifdef CONFIG_AMD_MEM_ENCRYPT +extern unsigned long initial_vc_handler; +#endif extern unsigned char real_mode_blob[]; extern unsigned char real_mode_relocs[]; diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index 517920928989..7fdd4facfce7 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -226,7 +226,7 @@ #define NUM_EXCEPTION_VECTORS 32 /* Bitmask of exception vectors which push an error code on the stack: */ -#define EXCEPTION_ERRCODE_MASK 0x00027d00 +#define EXCEPTION_ERRCODE_MASK 0x20027d00 #define GDT_SIZE (GDT_ENTRIES*8) #define GDT_ENTRY_TLS_ENTRIES 3 diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h index 7175d432ebfe..9fbeedaa66ee 100644 --- a/arch/x86/include/asm/sev-es.h +++ b/arch/x86/include/asm/sev-es.h @@ -75,5 +75,7 @@ static inline u64 lower_bits(u64 val, unsigned int bits) /* Early IDT entry points for #VC handler */ extern void vc_no_ghcb(void); +extern void vc_boot_ghcb(void); +extern bool handle_vc_boot_ghcb(struct pt_regs *regs); #endif diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 3bcdd8d2bbdd..04ceea8f4a89 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -20,6 +20,7 @@ CFLAGS_REMOVE_kvmclock.o = -pg CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_early_printk.o = -pg CFLAGS_REMOVE_head64.o = -pg +CFLAGS_REMOVE_sev-es.o = -pg endif KASAN_SANITIZE_head$(BITS).o := n @@ -27,6 +28,7 @@ KASAN_SANITIZE_dumpstack.o:= n KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_stacktrace.o:= n KASAN_SANITIZE_paravirt.o := n +KASAN_SANITIZE_sev-es.o:= n # With some compiler versions the generated code results in boot hangs, caused # by several compilation units. To be safe, disable all instrumentation. diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 5683bbb555ef..530e055e231b 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -406,6 +406,10 @@ void __init do_early_exception(struct pt_regs *regs, int trapnr) early_make_pgtable(native_read_cr2())) return; + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) && + trapnr == X86_TRAP_VC && handle_vc_boot_ghcb(regs)) + return; + early_fixup_exception(regs, trapnr); } @@ -573,6 +577,10 @@ static void startup_64_load_idt(unsigned long physbase) /* This is used when running on kernel addresses */ void early_setup_idt(void) { + /* VMM Communication Exception */ + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) + set_bringup_idt_handler(X86_TRAP_VC, vc_boot_ghcb); + bringup_idt_descr.address = (unsigned long)bringup_idt_table; native_load_idt(_idt_descr); } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 6e68bca64ae4..1a71d0d4d575 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -279,6 +279,39 @@ SYM_CODE_START(start_cpu0) movqinitial_stack(%rip), %rsp jmp .Ljump_to_C_code SYM_CODE_END(start_cpu0) +#endif + +#ifdef CONFIG_AMD_MEM_ENCRYPT +/* + * VC Exception handler used during early boot when running on kernel + * addresses, but before the switch to the idt_table can be made. + * The early_idt_handler_array can't be used here because it calls into a lot + * of __init code and this handler is also used during CPU offlining/onlining. + * Therefore this handler ends up in the .text section so that it stays around + * when .init.text is freed. + */ +SYM_CODE_START_NOALIGN(vc_boot_ghcb) + UNWIND_HINT_IRET_REGS offset=8 + + /*
[PATCH v7.1 39/72] x86/sev-es: Setup early #VC handler
From: Joerg Roedel Setup an early handler for #VC exceptions. There is no GHCB mapped yet, so just re-use the vc_no_ghcb_handler. It can only handle CPUID exit-codes, but that should be enough to get the kernel through verify_cpu() and __startup_64() until it runs on virtual addresses. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/sev-es.h | 3 +++ arch/x86/kernel/head64.c | 21 + arch/x86/kernel/head_64.S | 30 ++ 3 files changed, 54 insertions(+) diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h index 6dc52440c4b4..7175d432ebfe 100644 --- a/arch/x86/include/asm/sev-es.h +++ b/arch/x86/include/asm/sev-es.h @@ -73,4 +73,7 @@ static inline u64 lower_bits(u64 val, unsigned int bits) return (val & mask); } +/* Early IDT entry points for #VC handler */ +extern void vc_no_ghcb(void); + #endif diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 4282dac694c3..5683bbb555ef 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -40,6 +40,7 @@ #include #include #include +#include /* * Manage page tables very early on. @@ -540,11 +541,31 @@ static struct desc_ptr bringup_idt_descr = { .address= 0, /* Set at runtime */ }; +static void set_bringup_idt_handler(int n, void *handler) +{ +#ifdef CONFIG_AMD_MEM_ENCRYPT + struct idt_data data; + gate_desc desc; + + init_idt_data(, n, handler); + idt_init_desc(, ); + native_write_idt_entry(bringup_idt_table, n, ); +#endif +} + /* This runs while still in the direct mapping */ static void startup_64_load_idt(unsigned long physbase) { struct desc_ptr *desc = fixup_pointer(_idt_descr, physbase); + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) { + void *handler; + + /* VMM Communication Exception */ + handler = fixup_pointer(vc_no_ghcb, physbase); + set_bringup_idt_handler(X86_TRAP_VC, handler); + } + desc->address = (unsigned long)fixup_pointer(bringup_idt_table, physbase); native_load_idt(desc); } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 3b40ec44a67d..6e68bca64ae4 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -348,6 +348,36 @@ SYM_CODE_START_LOCAL(early_idt_handler_common) jmp restore_regs_and_return_to_kernel SYM_CODE_END(early_idt_handler_common) +#ifdef CONFIG_AMD_MEM_ENCRYPT +/* + * VC Exception handler used during very early boot. The + * early_idt_handler_array can't be used because it returns via the + * paravirtualized INTERRUPT_RETURN and pv-ops don't work that early. + * + * This handler will end up in the .init.text section and not be + * available to boot secondary CPUs. + */ +SYM_CODE_START_NOALIGN(vc_no_ghcb) + UNWIND_HINT_IRET_REGS offset=8 + + /* Build pt_regs */ + PUSH_AND_CLEAR_REGS + + /* Call C handler */ + movq%rsp, %rdi + movqORIG_RAX(%rsp), %rsi + calldo_vc_no_ghcb + + /* Unwind pt_regs */ + POP_REGS + + /* Remove Error Code */ + addq$8, %rsp + + /* Pure iret required here - don't use INTERRUPT_RETURN */ + iretq +SYM_CODE_END(vc_no_ghcb) +#endif #define SYM_DATA_START_PAGE_ALIGNED(name) \ SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE) -- 2.28.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm/virtio: drop quirks handling
On Tue, Sep 08, 2020 at 10:57:21AM +0200, Daniel Vetter wrote: > On Tue, Sep 08, 2020 at 08:47:41AM +0200, Gerd Hoffmann wrote: > > These days dma ops can be overridden per device, and the virtio core > > "can be overridden" or "are"? Didn't happen yet, so scratch this one for now ... take care, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm/virtio: drop quirks handling
On Tue, Sep 08, 2020 at 08:47:41AM +0200, Gerd Hoffmann wrote: > These days dma ops can be overridden per device, and the virtio core "can be overridden" or "are"? The comment above vring_use_dma_api() suggests that's not yet done. If that's wrong then I think updating the comment would be really good. -Daniel > uses that to handle the dma quirks transparently for the rest of the > kernel. So we can drop the virtio_has_dma_quirk() checks, just use > the dma api unconditionally and depend on the virtio core having setup > dma_ops as needed. > > Signed-off-by: Gerd Hoffmann > --- > drivers/gpu/drm/virtio/virtgpu_object.c | 19 ++- > drivers/gpu/drm/virtio/virtgpu_vq.c | 16 ++-- > 2 files changed, 12 insertions(+), 23 deletions(-) > > diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c > b/drivers/gpu/drm/virtio/virtgpu_object.c > index 729f98ad7c02..9c35ce64ff9e 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_object.c > +++ b/drivers/gpu/drm/virtio/virtgpu_object.c > @@ -141,7 +141,6 @@ static int virtio_gpu_object_shmem_init(struct > virtio_gpu_device *vgdev, > struct virtio_gpu_mem_entry **ents, > unsigned int *nents) > { > - bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev); > struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo); > struct scatterlist *sg; > int si, ret; > @@ -162,15 +161,11 @@ static int virtio_gpu_object_shmem_init(struct > virtio_gpu_device *vgdev, > return -EINVAL; > } > > - if (use_dma_api) { > - shmem->mapped = dma_map_sg(vgdev->vdev->dev.parent, > -shmem->pages->sgl, > -shmem->pages->nents, > -DMA_TO_DEVICE); > - *nents = shmem->mapped; > - } else { > - *nents = shmem->pages->nents; > - } > + shmem->mapped = dma_map_sg(vgdev->vdev->dev.parent, > +shmem->pages->sgl, > +shmem->pages->nents, > +DMA_TO_DEVICE); > + *nents = shmem->mapped; > > *ents = kmalloc_array(*nents, sizeof(struct virtio_gpu_mem_entry), > GFP_KERNEL); > @@ -180,9 +175,7 @@ static int virtio_gpu_object_shmem_init(struct > virtio_gpu_device *vgdev, > } > > for_each_sg(shmem->pages->sgl, sg, *nents, si) { > - (*ents)[si].addr = cpu_to_le64(use_dma_api > -? sg_dma_address(sg) > -: sg_phys(sg)); > + (*ents)[si].addr = cpu_to_le64(sg_dma_address(sg)); > (*ents)[si].length = cpu_to_le32(sg->length); > (*ents)[si].padding = 0; > } > diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c > b/drivers/gpu/drm/virtio/virtgpu_vq.c > index c93c2db35aaf..1c1d2834547d 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_vq.c > +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c > @@ -599,13 +599,11 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct > virtio_gpu_device *vgdev, > struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]); > struct virtio_gpu_transfer_to_host_2d *cmd_p; > struct virtio_gpu_vbuffer *vbuf; > - bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev); > struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo); > > - if (use_dma_api) > - dma_sync_sg_for_device(vgdev->vdev->dev.parent, > -shmem->pages->sgl, shmem->pages->nents, > -DMA_TO_DEVICE); > + dma_sync_sg_for_device(vgdev->vdev->dev.parent, > +shmem->pages->sgl, shmem->pages->nents, > +DMA_TO_DEVICE); > > cmd_p = virtio_gpu_alloc_cmd(vgdev, , sizeof(*cmd_p)); > memset(cmd_p, 0, sizeof(*cmd_p)); > @@ -1015,13 +1013,11 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct > virtio_gpu_device *vgdev, > struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]); > struct virtio_gpu_transfer_host_3d *cmd_p; > struct virtio_gpu_vbuffer *vbuf; > - bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev); > struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo); > > - if (use_dma_api) > - dma_sync_sg_for_device(vgdev->vdev->dev.parent, > -shmem->pages->sgl, shmem->pages->nents, > -DMA_TO_DEVICE); > + dma_sync_sg_for_device(vgdev->vdev->dev.parent, > +shmem->pages->sgl, shmem->pages->nents, > +DMA_TO_DEVICE); > > cmd_p = virtio_gpu_alloc_cmd(vgdev, , sizeof(*cmd_p)); > memset(cmd_p, 0, sizeof(*cmd_p)); > -- > 2.27.0 > -- Daniel Vetter
[PATCH 2/3] drm/qxl: release shadow on shutdown
In case we have a shadow surface on shutdown release it so it doesn't leak. Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_display.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 4be04eaf7f37..85dcb7fe56a9 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -1224,5 +1224,9 @@ int qxl_modeset_init(struct qxl_device *qdev) void qxl_modeset_fini(struct qxl_device *qdev) { + if (qdev->dumb_shadow_bo) { + drm_gem_object_put(>dumb_shadow_bo->tbo.base); + qdev->dumb_shadow_bo = NULL; + } qxl_destroy_monitors_object(qdev); } -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/3] drm/qxl: use drmm_mode_config_init
Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_display.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index fa79688013b7..4be04eaf7f37 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -1190,7 +1190,9 @@ int qxl_modeset_init(struct qxl_device *qdev) int i; int ret; - drm_mode_config_init(>ddev); + ret = drmm_mode_config_init(>ddev); + if (ret) + return ret; ret = qxl_create_monitors_object(qdev); if (ret) @@ -1223,5 +1225,4 @@ int qxl_modeset_init(struct qxl_device *qdev) void qxl_modeset_fini(struct qxl_device *qdev) { qxl_destroy_monitors_object(qdev); - drm_mode_config_cleanup(>ddev); } -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 3/3] drm/qxl: handle shadow in primary destroy
qxl_primary_atomic_disable must check whenever the framebuffer bo has a shadow surface and in case it has check the shadow primary status. Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/qxl/qxl_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 85dcb7fe56a9..3dcbb359e0f5 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -560,6 +560,8 @@ static void qxl_primary_atomic_disable(struct drm_plane *plane, if (old_state->fb) { struct qxl_bo *bo = gem_to_qxl_bo(old_state->fb->obj[0]); + if (bo->shadow) + bo = bo->shadow; if (bo->is_primary) { qxl_io_destroy_primary(qdev); bo->is_primary = false; -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 0/2] s390: virtio: let arch validate VIRTIO features
On Tue, Sep 08, 2020 at 08:55:21AM +0200, Cornelia Huck wrote: > On Tue, 8 Sep 2020 00:39:51 +0200 > Halil Pasic wrote: > > > On Mon, 7 Sep 2020 11:39:05 +0200 > > Pierre Morel wrote: > > > > > Hi all, > > > > > > The goal of the series is to give a chance to the architecture > > > to validate VIRTIO device features. > > > > Michael, is this going in via your tree? > > > > I believe Michael's tree is the right place for this, but I can also > queue it if I get an ack on patch 1. I think Halil pointed out some minor issues, so a new version is in order. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 0/2] s390: virtio: let arch validate VIRTIO features
On Tue, Sep 08, 2020 at 12:39:51AM +0200, Halil Pasic wrote: > On Mon, 7 Sep 2020 11:39:05 +0200 > Pierre Morel wrote: > > > Hi all, > > > > The goal of the series is to give a chance to the architecture > > to validate VIRTIO device features. > > Michael, is this going in via your tree? I guess so. Still not really happy about second-guessing the hypervisor, but this got acks from others so maybe I'm wrong in this instance. Won't be the first time. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/3] drm/virtio: use drmm_mode_config_init
Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/virtio/virtgpu_drv.h | 2 +- drivers/gpu/drm/virtio/virtgpu_display.c | 11 +++ drivers/gpu/drm/virtio/virtgpu_kms.c | 6 +- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h index a52b7a39f286..55c34b4fc3e9 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.h +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h @@ -352,7 +352,7 @@ virtio_gpu_cmd_resource_assign_uuid(struct virtio_gpu_device *vgdev, struct virtio_gpu_object_array *objs); /* virtgpu_display.c */ -void virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev); +int virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev); void virtio_gpu_modeset_fini(struct virtio_gpu_device *vgdev); /* virtgpu_plane.c */ diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c index effea07abe62..f84b7e61311b 100644 --- a/drivers/gpu/drm/virtio/virtgpu_display.c +++ b/drivers/gpu/drm/virtio/virtgpu_display.c @@ -325,11 +325,14 @@ static const struct drm_mode_config_funcs virtio_gpu_mode_funcs = { .atomic_commit = drm_atomic_helper_commit, }; -void virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev) +int virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev) { - int i; + int i, ret; + + ret = drmm_mode_config_init(vgdev->ddev); + if (ret) + return ret; - drm_mode_config_init(vgdev->ddev); vgdev->ddev->mode_config.quirk_addfb_prefer_host_byte_order = true; vgdev->ddev->mode_config.funcs = _gpu_mode_funcs; @@ -343,6 +346,7 @@ void virtio_gpu_modeset_init(struct virtio_gpu_device *vgdev) vgdev_output_init(vgdev, i); drm_mode_config_reset(vgdev->ddev); + return 0; } void virtio_gpu_modeset_fini(struct virtio_gpu_device *vgdev) @@ -351,5 +355,4 @@ void virtio_gpu_modeset_fini(struct virtio_gpu_device *vgdev) for (i = 0 ; i < vgdev->num_scanouts; ++i) kfree(vgdev->outputs[i].edid); - drm_mode_config_cleanup(vgdev->ddev); } diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c b/drivers/gpu/drm/virtio/virtgpu_kms.c index 75d0dc2f6d28..6ea74a99d8ad 100644 --- a/drivers/gpu/drm/virtio/virtgpu_kms.c +++ b/drivers/gpu/drm/virtio/virtgpu_kms.c @@ -184,7 +184,11 @@ int virtio_gpu_init(struct drm_device *dev) num_capsets, _capsets); DRM_INFO("number of cap sets: %d\n", num_capsets); - virtio_gpu_modeset_init(vgdev); + ret = virtio_gpu_modeset_init(vgdev); + if (ret) { + DRM_ERROR("modeset init failed\n"); + goto err_scanouts; + } virtio_device_ready(vgdev->vdev); -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/3] drm/virtio: return virtio_gpu_queue errors
In case queuing virtio commands fails (can happen when the device got unplugged) pass up the error. Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/virtio/virtgpu_vq.c | 36 +++-- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index c93c2db35aaf..b1884e6e242c 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -320,13 +320,13 @@ static struct sg_table *vmalloc_to_sgt(char *data, uint32_t size, int *sg_ents) return sgt; } -static void virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev, - struct virtio_gpu_vbuffer *vbuf, - struct virtio_gpu_fence *fence, - int elemcnt, - struct scatterlist **sgs, - int outcnt, - int incnt) +static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev, +struct virtio_gpu_vbuffer *vbuf, +struct virtio_gpu_fence *fence, +int elemcnt, +struct scatterlist **sgs, +int outcnt, +int incnt) { struct virtqueue *vq = vgdev->ctrlq.vq; int ret, idx; @@ -335,7 +335,7 @@ static void virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev, if (fence && vbuf->objs) virtio_gpu_array_unlock_resv(vbuf->objs); free_vbuf(vgdev, vbuf); - return; + return -1; } if (vgdev->has_indirect) @@ -373,15 +373,16 @@ static void virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev, spin_unlock(>ctrlq.qlock); drm_dev_exit(idx); + return 0; } -static void virtio_gpu_queue_fenced_ctrl_buffer(struct virtio_gpu_device *vgdev, - struct virtio_gpu_vbuffer *vbuf, - struct virtio_gpu_fence *fence) +static int virtio_gpu_queue_fenced_ctrl_buffer(struct virtio_gpu_device *vgdev, + struct virtio_gpu_vbuffer *vbuf, + struct virtio_gpu_fence *fence) { struct scatterlist *sgs[3], vcmd, vout, vresp; struct sg_table *sgt = NULL; - int elemcnt = 0, outcnt = 0, incnt = 0; + int elemcnt = 0, outcnt = 0, incnt = 0, ret; /* set up vcmd */ sg_init_one(, vbuf->buf, vbuf->size); @@ -398,7 +399,7 @@ static void virtio_gpu_queue_fenced_ctrl_buffer(struct virtio_gpu_device *vgdev, if (!sgt) { if (fence && vbuf->objs) virtio_gpu_array_unlock_resv(vbuf->objs); - return; + return -1; } elemcnt += sg_ents; @@ -419,13 +420,14 @@ static void virtio_gpu_queue_fenced_ctrl_buffer(struct virtio_gpu_device *vgdev, incnt++; } - virtio_gpu_queue_ctrl_sgs(vgdev, vbuf, fence, elemcnt, sgs, outcnt, - incnt); + ret = virtio_gpu_queue_ctrl_sgs(vgdev, vbuf, fence, elemcnt, sgs, outcnt, + incnt); if (sgt) { sg_free_table(sgt); kfree(sgt); } + return ret; } void virtio_gpu_notify(struct virtio_gpu_device *vgdev) @@ -444,10 +446,10 @@ void virtio_gpu_notify(struct virtio_gpu_device *vgdev) virtqueue_notify(vgdev->ctrlq.vq); } -static void virtio_gpu_queue_ctrl_buffer(struct virtio_gpu_device *vgdev, +static int virtio_gpu_queue_ctrl_buffer(struct virtio_gpu_device *vgdev, struct virtio_gpu_vbuffer *vbuf) { - virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, NULL); + return virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, NULL); } static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev, -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 3/3] drm/virtio: add virtio_gpu_cmd_unref_resource error handling
Usually we wait for the host to complete the unref request, then cleanup the guest-side state of the object in the completion callback. When submitting the unref command failed the completion callback will not be called though, so cleanup right away. Fixes a WARN on stale mm entries on driver shutdown. Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/virtio/virtgpu_vq.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index b1884e6e242c..4d2325bf4aed 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -536,6 +536,7 @@ void virtio_gpu_cmd_unref_resource(struct virtio_gpu_device *vgdev, { struct virtio_gpu_resource_unref *cmd_p; struct virtio_gpu_vbuffer *vbuf; + int ret; cmd_p = virtio_gpu_alloc_cmd_cb(vgdev, , sizeof(*cmd_p), virtio_gpu_cmd_unref_cb); @@ -545,7 +546,9 @@ void virtio_gpu_cmd_unref_resource(struct virtio_gpu_device *vgdev, cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle); vbuf->resp_cb_data = bo; - virtio_gpu_queue_ctrl_buffer(vgdev, vbuf); + ret = virtio_gpu_queue_ctrl_buffer(vgdev, vbuf); + if (ret < 0) + virtio_gpu_cleanup_object(bo); } void virtio_gpu_cmd_set_scanout(struct virtio_gpu_device *vgdev, -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v11 0/2] s390: virtio: let arch validate VIRTIO features
On Tue, 8 Sep 2020 00:39:51 +0200 Halil Pasic wrote: > On Mon, 7 Sep 2020 11:39:05 +0200 > Pierre Morel wrote: > > > Hi all, > > > > The goal of the series is to give a chance to the architecture > > to validate VIRTIO device features. > > Michael, is this going in via your tree? > I believe Michael's tree is the right place for this, but I can also queue it if I get an ack on patch 1. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] drm/virtio: drop quirks handling
These days dma ops can be overridden per device, and the virtio core uses that to handle the dma quirks transparently for the rest of the kernel. So we can drop the virtio_has_dma_quirk() checks, just use the dma api unconditionally and depend on the virtio core having setup dma_ops as needed. Signed-off-by: Gerd Hoffmann --- drivers/gpu/drm/virtio/virtgpu_object.c | 19 ++- drivers/gpu/drm/virtio/virtgpu_vq.c | 16 ++-- 2 files changed, 12 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c index 729f98ad7c02..9c35ce64ff9e 100644 --- a/drivers/gpu/drm/virtio/virtgpu_object.c +++ b/drivers/gpu/drm/virtio/virtgpu_object.c @@ -141,7 +141,6 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev, struct virtio_gpu_mem_entry **ents, unsigned int *nents) { - bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev); struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo); struct scatterlist *sg; int si, ret; @@ -162,15 +161,11 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev, return -EINVAL; } - if (use_dma_api) { - shmem->mapped = dma_map_sg(vgdev->vdev->dev.parent, - shmem->pages->sgl, - shmem->pages->nents, - DMA_TO_DEVICE); - *nents = shmem->mapped; - } else { - *nents = shmem->pages->nents; - } + shmem->mapped = dma_map_sg(vgdev->vdev->dev.parent, + shmem->pages->sgl, + shmem->pages->nents, + DMA_TO_DEVICE); + *nents = shmem->mapped; *ents = kmalloc_array(*nents, sizeof(struct virtio_gpu_mem_entry), GFP_KERNEL); @@ -180,9 +175,7 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev, } for_each_sg(shmem->pages->sgl, sg, *nents, si) { - (*ents)[si].addr = cpu_to_le64(use_dma_api - ? sg_dma_address(sg) - : sg_phys(sg)); + (*ents)[si].addr = cpu_to_le64(sg_dma_address(sg)); (*ents)[si].length = cpu_to_le32(sg->length); (*ents)[si].padding = 0; } diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index c93c2db35aaf..1c1d2834547d 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -599,13 +599,11 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev, struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]); struct virtio_gpu_transfer_to_host_2d *cmd_p; struct virtio_gpu_vbuffer *vbuf; - bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev); struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo); - if (use_dma_api) - dma_sync_sg_for_device(vgdev->vdev->dev.parent, - shmem->pages->sgl, shmem->pages->nents, - DMA_TO_DEVICE); + dma_sync_sg_for_device(vgdev->vdev->dev.parent, + shmem->pages->sgl, shmem->pages->nents, + DMA_TO_DEVICE); cmd_p = virtio_gpu_alloc_cmd(vgdev, , sizeof(*cmd_p)); memset(cmd_p, 0, sizeof(*cmd_p)); @@ -1015,13 +1013,11 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev, struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]); struct virtio_gpu_transfer_host_3d *cmd_p; struct virtio_gpu_vbuffer *vbuf; - bool use_dma_api = !virtio_has_dma_quirk(vgdev->vdev); struct virtio_gpu_object_shmem *shmem = to_virtio_gpu_shmem(bo); - if (use_dma_api) - dma_sync_sg_for_device(vgdev->vdev->dev.parent, - shmem->pages->sgl, shmem->pages->nents, - DMA_TO_DEVICE); + dma_sync_sg_for_device(vgdev->vdev->dev.parent, + shmem->pages->sgl, shmem->pages->nents, + DMA_TO_DEVICE); cmd_p = virtio_gpu_alloc_cmd(vgdev, , sizeof(*cmd_p)); memset(cmd_p, 0, sizeof(*cmd_p)); -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization