Re: [PATCH 5/5] virtio: Add bounce DMA ops
On Wed, Apr 29, 2020 at 01:42:13PM +0800, Lu Baolu wrote: > On 2020/4/29 12:57, Michael S. Tsirkin wrote: > > On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote: > > > On 2020/4/29 4:41, Michael S. Tsirkin wrote: > > > > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote: > > > > > * Michael S. Tsirkin [2020-04-28 12:17:57]: > > > > > > > > > > > Okay, but how is all this virtio specific? For example, why not > > > > > > allow > > > > > > separate swiotlbs for any type of device? > > > > > > For example, this might make sense if a given device is from a > > > > > > different, less trusted vendor. > > > > > Is swiotlb commonly used for multiple devices that may be on > > > > > different trust > > > > > boundaries (and not behind a hardware iommu)? > > > > Even a hardware iommu does not imply a 100% security from malicious > > > > hardware. First lots of people use iommu=pt for performance reasons. > > > > Second even without pt, unmaps are often batched, and sub-page buffers > > > > might be used for DMA, so we are not 100% protected at all times. > > > > > > > > > > For untrusted devices, IOMMU is forced on even iommu=pt is used; > > > > I think you are talking about untrusted *drivers* like with VFIO. > > No. I am talking about untrusted devices like thunderbolt peripherals. > We always trust drivers hosted in kernel and the DMA APIs are designed > for them, right? > > Please refer to this series. > > https://lkml.org/lkml/2019/9/6/39 > > Best regards, > baolu Oh, thanks for that! I didn't realize Linux is doing this. So it seems that with modern Linux, all one needs to do on x86 is mark the device as untrusted. It's already possible to do this with ACPI and with OF - would that be sufficient for achieving what this patchset is trying to do? Adding more ways to mark a device as untrusted, and adding support for more platforms to use bounce buffers sounds like a reasonable thing to do. > > > > On the other hand, I am talking about things like thunderbolt > > peripherals being less trusted than on-board ones. > > > > > > > Or possibly even using swiotlb for specific use-cases where > > speed is less of an issue. > > > > E.g. my wifi is pretty slow anyway, and that card is exposed to > > malicious actors all the time, put just that behind swiotlb > > for security, and leave my graphics card with pt since > > I'm trusting it with secrets anyway. > > > > > > > and > > > iotlb flush is in strict mode (no batched flushes); ATS is also not > > > allowed. Swiotlb is used to protect sub-page buffers since IOMMU can > > > only apply page granularity protection. Swiotlb is now used for devices > > > from different trust zone. > > > > > > Best regards, > > > baolu > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
Hi Jacob, On 2020/4/29 11:36, Jacob Pan wrote: On Wed, 22 Apr 2020 16:06:10 +0800 Lu Baolu wrote: When a PASID is stopped or terminated, there can be pending PRQs (requests that haven't received responses) in remapping hardware. This adds the interface to drain page requests and call it when a PASID is terminated. Signed-off-by: Jacob Pan Signed-off-by: Liu Yi L Signed-off-by: Lu Baolu --- drivers/iommu/intel-svm.c | 103 ++-- include/linux/intel-iommu.h | 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 83dc4319f661..2534641ef707 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -23,6 +23,7 @@ #include "intel-pasid.h" static irqreturn_t prq_event_thread(int irq, void *d); +static void intel_svm_drain_prq(struct device *dev, int pasid); #define PRQ_ORDER 0 @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu) dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL); dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER); + init_completion(&iommu->prq_complete); + return 0; } @@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) rcu_read_lock(); list_for_each_entry_rcu(sdev, &svm->devs, list) { intel_pasid_tear_down_entry(svm->iommu, sdev->dev, svm->pasid); + intel_svm_drain_prq(sdev->dev, svm->pasid); mmu_notifier release is called in atomic context, drain_prq needs to wait for completion. I tested exit path when a process crashes. I got [ +0.696214] BUG: sleeping function called from invalid context at kernel/sched/completion.c:101 [ +0.68] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest [ +0.46] INFO: lockdep is turned off. [ +0.02] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637 [ +0.00] Hardware name: Intel Corporation Kabylake Client platform/Skylake Halo DDR4 RVP11, BIOS 04.1709050855 09/05/2017 [ +0.01] Call Trace: [ +0.04] dump_stack+0x68/0x9b [ +0.03] ___might_sleep+0x229/0x250 [ +0.03] wait_for_completion_timeout+0x3c/0x110 [ +0.03] intel_svm_drain_prq+0x12f/0x210 [ +0.03] intel_mm_release+0x73/0x110 [ +0.03] __mmu_notifier_release+0x94/0x220 [ +0.02] ? do_munmap+0x10/0x10 [ +0.02] ? prepare_ftrace_return+0x5c/0x80 [ +0.03] exit_mmap+0x156/0x1a0 [ +0.02] ? mmput+0x44/0x120 [ +0.03] ? exit_mmap+0x5/0x1a0 [ +0.02] ? ftrace_graph_caller+0xa0/0xa0 [ +0.01] mmput+0x5e/0x120 Thanks a lot! Actually, we can't drain page requests in this mm_notifier code path, right? The assumptions of page request draining are that 1) the device driver has drained DMA requests in the device end; 2) the pasid entry has been marked as non-present. So we could only drain page requests in the unbind path. Thought? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On 2020/4/29 12:57, Michael S. Tsirkin wrote: On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote: On 2020/4/29 4:41, Michael S. Tsirkin wrote: On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote: * Michael S. Tsirkin [2020-04-28 12:17:57]: Okay, but how is all this virtio specific? For example, why not allow separate swiotlbs for any type of device? For example, this might make sense if a given device is from a different, less trusted vendor. Is swiotlb commonly used for multiple devices that may be on different trust boundaries (and not behind a hardware iommu)? Even a hardware iommu does not imply a 100% security from malicious hardware. First lots of people use iommu=pt for performance reasons. Second even without pt, unmaps are often batched, and sub-page buffers might be used for DMA, so we are not 100% protected at all times. For untrusted devices, IOMMU is forced on even iommu=pt is used; I think you are talking about untrusted *drivers* like with VFIO. No. I am talking about untrusted devices like thunderbolt peripherals. We always trust drivers hosted in kernel and the DMA APIs are designed for them, right? Please refer to this series. https://lkml.org/lkml/2019/9/6/39 Best regards, baolu On the other hand, I am talking about things like thunderbolt peripherals being less trusted than on-board ones. Or possibly even using swiotlb for specific use-cases where speed is less of an issue. E.g. my wifi is pretty slow anyway, and that card is exposed to malicious actors all the time, put just that behind swiotlb for security, and leave my graphics card with pt since I'm trusting it with secrets anyway. and iotlb flush is in strict mode (no batched flushes); ATS is also not allowed. Swiotlb is used to protect sub-page buffers since IOMMU can only apply page granularity protection. Swiotlb is now used for devices from different trust zone. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote: > On 2020/4/29 4:41, Michael S. Tsirkin wrote: > > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote: > > > * Michael S. Tsirkin [2020-04-28 12:17:57]: > > > > > > > Okay, but how is all this virtio specific? For example, why not allow > > > > separate swiotlbs for any type of device? > > > > For example, this might make sense if a given device is from a > > > > different, less trusted vendor. > > > Is swiotlb commonly used for multiple devices that may be on different > > > trust > > > boundaries (and not behind a hardware iommu)? > > Even a hardware iommu does not imply a 100% security from malicious > > hardware. First lots of people use iommu=pt for performance reasons. > > Second even without pt, unmaps are often batched, and sub-page buffers > > might be used for DMA, so we are not 100% protected at all times. > > > > For untrusted devices, IOMMU is forced on even iommu=pt is used; I think you are talking about untrusted *drivers* like with VFIO. On the other hand, I am talking about things like thunderbolt peripherals being less trusted than on-board ones. Or possibly even using swiotlb for specific use-cases where speed is less of an issue. E.g. my wifi is pretty slow anyway, and that card is exposed to malicious actors all the time, put just that behind swiotlb for security, and leave my graphics card with pt since I'm trusting it with secrets anyway. > and > iotlb flush is in strict mode (no batched flushes); ATS is also not > allowed. Swiotlb is used to protect sub-page buffers since IOMMU can > only apply page granularity protection. Swiotlb is now used for devices > from different trust zone. > > Best regards, > baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: iommu_iova slab eats too much memory
Hi Shlil: Thank you for your attention, and these are my answers: 1. I don't really understand what you're saying. What's the difference between DMA buffer and DMA mapping? It's like a memory block pool and a memory block or something like that? 2. Yes, the TSO is enabled all the time, but it seems not helping. 3. The CPU usage is pretty normal, and what's the point of this question? Is it relevant to the leaking problem? FYI: I found an interesting phenomenon that it's just a small part of the running hosts has this issue, even though they all have the same kernel, configuration and hardwares, I don't know if this really mean something. Salil Mehta 于2020年4月28日周二 下午5:17写道: > Hi Bin, > > Few questions: > > 1. If there is a leak of IOVA due to dma_unmap_* not being called > somewhere then > at certain point the throughput will drastically fall and will almost > become equal > to zero. This should be due to unavailability of the mapping anymore. But > in your > case VM is getting killed so this could be actual DMA buffer leak not DMA > mapping > leak. I doubt VM will get killed due to exhaustion of the DMA mappings in > the IOMMU > Layer for a transient reason or even due to mapping/unmapping leak. > > 2. Could you check if you have TSO offload enabled on Intel 82599? It will > help > in reducing the number of mappings and will take off IOVA mapping pressure > from > the IOMMU/VT-d? Though I am not sure it will help in reducing the amount > of memory > required for the buffers. > > 3. Also, have you checked the cpu-usage while your experiment is going on? > > Thanks > Salil. > > > -Original Message- > > From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf > Of > > Robin Murphy > > Sent: Friday, April 24, 2020 5:31 PM > > To: Bin > > Cc: iommu@lists.linux-foundation.org > > Subject: Re: iommu_iova slab eats too much memory > > > > On 2020-04-24 2:20 pm, Bin wrote: > > > Dear Robin: > > > Thank you for your explanation. Now, I understand that this could > be > > > NIC driver's fault, but how could I confirm it? Do I have to debug the > > > driver myself? > > > > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through > > memory about an order of magnitude faster than the IOVAs alone, but it > > should shed some light on whether DMA API usage looks suspicious, and > > dumping the mappings should help track down the responsible driver(s). > > Although the debugfs code doesn't show the stacktrace of where each > > mapping was made, I guess it would be fairly simple to tweak that for a > > quick way to narrow down where to start looking in an offending driver. > > > > Robin. > > > > > Robin Murphy 于2020年4月24日周五 下午8:15写道: > > > > > >> On 2020-04-24 1:06 pm, Bin wrote: > > >>> I'm not familiar with the mmu stuff, so what you mean by "some driver > > >>> leaking DMA mappings", is it possible that some other kernel module > like > > >>> KVM or NIC driver leads to the leaking problem instead of the iommu > > >> module > > >>> itself? > > >> > > >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it > > >> should, since I'd expect a lot of people to have noticed that. It's > far > > >> more likely that some driver is failing to call dma_unmap_* when it's > > >> finished with a buffer - with the IOMMU disabled that would be a no-op > > >> on x86 with a modern 64-bit-capable device, so such a latent bug could > > >> have been easily overlooked. > > >> > > >> Robin. > > >> > > >>> Bin 于 2020年4月24日周五 20:00写道: > > >>> > > Well, that's the problem! I'm assuming the iommu kernel module is > > >> leaking > > memory. But I don't know why and how. > > > > Do you have any idea about it? Or any further information is needed? > > > > Robin Murphy 于 2020年4月24日周五 19:20写道: > > > > > On 2020-04-24 1:40 am, Bin wrote: > > >> Hello? anyone there? > > >> > > >> Bin 于2020年4月23日周四 下午5:14写道: > > >> > > >>> Forget to mention, I've already disabled the slab merge, so this > is > > > what > > >>> it is. > > >>> > > >>> Bin 于2020年4月23日周四 下午5:11写道: > > >>> > > Hey, guys: > > > > I'm running a batch of CoreOS boxes, the lsb_release is: > > > > ``` > > # cat /etc/lsb-release > > DISTRIB_ID="Container Linux by CoreOS" > > DISTRIB_RELEASE=2303.3.0 > > DISTRIB_CODENAME="Rhyolite" > > DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 > (Rhyolite)" > > ``` > > > > ``` > > # uname -a > > Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 > -00 > > > 2019 > > x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel > > > GNU/Linux > > ``` > > Recently, I found my vms constently being killed due to OOM, and > > >> after > > digging into the problem, I finally realized that the kernel is > > > leaking >
Re: [PATCH 5/5] virtio: Add bounce DMA ops
* Stefano Stabellini [2020-04-28 16:04:34]: > > > Is swiotlb commonly used for multiple devices that may be on different > > > trust > > > boundaries (and not behind a hardware iommu)? > > The trust boundary is not a good way of describing the scenario and I > think it leads to miscommunication. > > A better way to describe the scenario would be that the device can only > DMA to/from a small reserved-memory region advertised on device tree. > > Do we have other instances of devices that can only DMA to/from very > specific and non-configurable address ranges? If so, this series could > follow their example. AFAICT there is no such notion in current DMA API. static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size, bool is_ram) { return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit); } Only the max address a device can access is defined and not a range that we seem to need here. I think we need to set the bus_dma_limit to 0 for virtio devices which will force the use of swiotlb_map API. We should also have a per-device swiotlb pool defined, so that swiotlb can use the pool meant for the given device. -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
* Michael S. Tsirkin [2020-04-28 16:41:04]: > > Won't we still need some changes to virtio to make use of its own pool (to > > bounce buffers)? Something similar to its own DMA ops proposed in this > > patch? > > If you are doing this for all devices, you need to either find a way > to do this without chaning DMA ops, or by doing some automatic change > to all drivers. Ok thanks for this input. I will see how we can obfuscate this in DMA APIs itself. Can you also comment on the virtio transport problem I cited? The hypervisor we are dealing with does not support MMIO transport. It supports message queue send/recv and also doorbell, which I think can be used if we can make some change like this to virtio_mmio.c: +static inline u32 +virtio_readl(struct virtio_mmio_device *vm_dev, u32 reg_offset) +{ +return vm_dev->mmio_ops->readl(vm_dev, reg_offset); +} + +static inline void +virtio_writel(struct virtio_mmio_device *vm_dev, u32 reg_offset, u32 data) +{ +vm_dev->mmio_ops->writel(vm_dev, reg_offset, data); +} /* Check magic value */ -magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE); +magic = vrito_readl(vm_dev, VIRTIO_MMIO_MAGIC_VALUE); mmio_ops->readl on most platforms can default to readl itself, while on a platform like us, it can boil down to message_queue send/recv. Would such a change be acceptable? -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support
On Wed, 22 Apr 2020 16:06:10 +0800 Lu Baolu wrote: > When a PASID is stopped or terminated, there can be pending PRQs > (requests that haven't received responses) in remapping hardware. > This adds the interface to drain page requests and call it when a > PASID is terminated. > > Signed-off-by: Jacob Pan > Signed-off-by: Liu Yi L > Signed-off-by: Lu Baolu > --- > drivers/iommu/intel-svm.c | 103 > ++-- include/linux/intel-iommu.h | > 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index 83dc4319f661..2534641ef707 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -23,6 +23,7 @@ > #include "intel-pasid.h" > > static irqreturn_t prq_event_thread(int irq, void *d); > +static void intel_svm_drain_prq(struct device *dev, int pasid); > > #define PRQ_ORDER 0 > > @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu) > dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL); > dmar_writeq(iommu->reg + DMAR_PQA_REG, > virt_to_phys(iommu->prq) | PRQ_ORDER); > + init_completion(&iommu->prq_complete); > + > return 0; > } > > @@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier > *mn, struct mm_struct *mm) rcu_read_lock(); > list_for_each_entry_rcu(sdev, &svm->devs, list) { > intel_pasid_tear_down_entry(svm->iommu, sdev->dev, > svm->pasid); > + intel_svm_drain_prq(sdev->dev, svm->pasid); mmu_notifier release is called in atomic context, drain_prq needs to wait for completion. I tested exit path when a process crashes. I got [ +0.696214] BUG: sleeping function called from invalid context at kernel/sched/completion.c:101 [ +0.68] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, name: dsatest [ +0.46] INFO: lockdep is turned off. [ +0.02] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ #1637 [ +0.00] Hardware name: Intel Corporation Kabylake Client platform/Skylake Halo DDR4 RVP11, BIOS 04.1709050855 09/05/2017 [ +0.01] Call Trace: [ +0.04] dump_stack+0x68/0x9b [ +0.03] ___might_sleep+0x229/0x250 [ +0.03] wait_for_completion_timeout+0x3c/0x110 [ +0.03] intel_svm_drain_prq+0x12f/0x210 [ +0.03] intel_mm_release+0x73/0x110 [ +0.03] __mmu_notifier_release+0x94/0x220 [ +0.02] ? do_munmap+0x10/0x10 [ +0.02] ? prepare_ftrace_return+0x5c/0x80 [ +0.03] exit_mmap+0x156/0x1a0 [ +0.02] ? mmput+0x44/0x120 [ +0.03] ? exit_mmap+0x5/0x1a0 [ +0.02] ? ftrace_graph_caller+0xa0/0xa0 [ +0.01] mmput+0x5e/0x120 > intel_flush_svm_range_dev(svm, sdev, 0, -1, 0); > } > rcu_read_unlock(); > @@ -401,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev, > int pasid) if (!sdev->users) { > list_del_rcu(&sdev->list); > intel_pasid_tear_down_entry(iommu, dev, > svm->pasid); > + intel_svm_drain_prq(dev, svm->pasid); > intel_flush_svm_range_dev(svm, sdev, 0, -1, > 0); > - /* TODO: Drain in flight PRQ for the PASID > since it > - * may get reused soon, we don't want to > - * confuse with its previous life. > - * intel_svm_drain_prq(dev, pasid); > - */ > kfree_rcu(sdev, rcu); > > if (list_empty(&svm->devs)) { > @@ -644,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int > pasid) >* large and has to be physically > contiguous. So it's >* hard to be as defensive as we might like. > */ intel_pasid_tear_down_entry(iommu, dev, svm->pasid); > + intel_svm_drain_prq(dev, svm->pasid); > intel_flush_svm_range_dev(svm, sdev, 0, -1, > 0); kfree_rcu(sdev, rcu); > > @@ -722,6 +723,92 @@ static bool is_canonical_address(u64 addr) > return (((saddr << shift) >> shift) == saddr); > } > > +/** > + * intel_svm_drain_prq: > + * > + * Drain all pending page requests and responses related to a > specific > + * pasid in both software and hardware. > + */ > +static void intel_svm_drain_prq(struct device *dev, int pasid) > +{ > + struct device_domain_info *info; > + struct dmar_domain *domain; > + struct intel_iommu *iommu; > + struct qi_desc desc[3]; > + struct pci_dev *pdev; > + int head, tail; > + u16 sid, did; > + int qdep; > + > + info = get_domain_info(dev); > + if (WARN_ON(!info || !dev_is_pci(dev))) > + return; > + > + if (!info->ats_enabled) > + return; > + > + iommu = info->iommu; > + domain = info->domain; > + pdev = to_pci_dev(dev); > + sid = PCI_DEVID(info->bus, info->devfn); > + did = domain->iommu_did[iommu->seq_id]; > + qdep = pci_ats_
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On 2020/4/29 4:41, Michael S. Tsirkin wrote: On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote: * Michael S. Tsirkin [2020-04-28 12:17:57]: Okay, but how is all this virtio specific? For example, why not allow separate swiotlbs for any type of device? For example, this might make sense if a given device is from a different, less trusted vendor. Is swiotlb commonly used for multiple devices that may be on different trust boundaries (and not behind a hardware iommu)? Even a hardware iommu does not imply a 100% security from malicious hardware. First lots of people use iommu=pt for performance reasons. Second even without pt, unmaps are often batched, and sub-page buffers might be used for DMA, so we are not 100% protected at all times. For untrusted devices, IOMMU is forced on even iommu=pt is used; and iotlb flush is in strict mode (no batched flushes); ATS is also not allowed. Swiotlb is used to protect sub-page buffers since IOMMU can only apply page granularity protection. Swiotlb is now used for devices from different trust zone. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] swiotlb: Introduce concept of swiotlb_pool
Hi Srivatsa, Thank you for the patch! Yet something to improve: [auto build test ERROR on vhost/linux-next] [also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 next-20200428] [cannot apply to swiotlb/linux-next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334 base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next config: x86_64-defconfig (attached as .config) compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All errors (new ones prefixed by >>): drivers/iommu/intel-iommu.c: In function 'bounce_map_single': >> drivers/iommu/intel-iommu.c:3990:24: error: 'io_tlb_start' undeclared (first >> use in this function); did you mean 'swiotlb_start'? __phys_to_dma(dev, io_tlb_start), ^~~~ swiotlb_start drivers/iommu/intel-iommu.c:3990:24: note: each undeclared identifier is reported only once for each function it appears in vim +3990 drivers/iommu/intel-iommu.c cfb94a372f2d4e Lu Baolu 2019-09-06 3941 cfb94a372f2d4e Lu Baolu 2019-09-06 3942 static dma_addr_t cfb94a372f2d4e Lu Baolu 2019-09-06 3943 bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size, cfb94a372f2d4e Lu Baolu 2019-09-06 3944 enum dma_data_direction dir, unsigned long attrs, cfb94a372f2d4e Lu Baolu 2019-09-06 3945 u64 dma_mask) cfb94a372f2d4e Lu Baolu 2019-09-06 3946 { cfb94a372f2d4e Lu Baolu 2019-09-06 3947size_t aligned_size = ALIGN(size, VTD_PAGE_SIZE); cfb94a372f2d4e Lu Baolu 2019-09-06 3948struct dmar_domain *domain; cfb94a372f2d4e Lu Baolu 2019-09-06 3949struct intel_iommu *iommu; cfb94a372f2d4e Lu Baolu 2019-09-06 3950unsigned long iova_pfn; cfb94a372f2d4e Lu Baolu 2019-09-06 3951unsigned long nrpages; cfb94a372f2d4e Lu Baolu 2019-09-06 3952phys_addr_t tlb_addr; cfb94a372f2d4e Lu Baolu 2019-09-06 3953int prot = 0; cfb94a372f2d4e Lu Baolu 2019-09-06 3954int ret; cfb94a372f2d4e Lu Baolu 2019-09-06 3955 a11bfde9c77df1 Joerg Roedel 2020-02-17 3956if (unlikely(attach_deferred(dev))) a11bfde9c77df1 Joerg Roedel 2020-02-17 3957do_deferred_attach(dev); a11bfde9c77df1 Joerg Roedel 2020-02-17 3958 96d170f3b1a607 Joerg Roedel 2020-02-17 3959domain = find_domain(dev); a11bfde9c77df1 Joerg Roedel 2020-02-17 3960 cfb94a372f2d4e Lu Baolu 2019-09-06 3961if (WARN_ON(dir == DMA_NONE || !domain)) cfb94a372f2d4e Lu Baolu 2019-09-06 3962return DMA_MAPPING_ERROR; cfb94a372f2d4e Lu Baolu 2019-09-06 3963 cfb94a372f2d4e Lu Baolu 2019-09-06 3964iommu = domain_get_iommu(domain); cfb94a372f2d4e Lu Baolu 2019-09-06 3965if (WARN_ON(!iommu)) cfb94a372f2d4e Lu Baolu 2019-09-06 3966return DMA_MAPPING_ERROR; cfb94a372f2d4e Lu Baolu 2019-09-06 3967 cfb94a372f2d4e Lu Baolu 2019-09-06 3968nrpages = aligned_nrpages(0, size); cfb94a372f2d4e Lu Baolu 2019-09-06 3969iova_pfn = intel_alloc_iova(dev, domain, cfb94a372f2d4e Lu Baolu 2019-09-06 3970 dma_to_mm_pfn(nrpages), dma_mask); cfb94a372f2d4e Lu Baolu 2019-09-06 3971if (!iova_pfn) cfb94a372f2d4e Lu Baolu 2019-09-06 3972return DMA_MAPPING_ERROR; cfb94a372f2d4e Lu Baolu 2019-09-06 3973 cfb94a372f2d4e Lu Baolu 2019-09-06 3974/* cfb94a372f2d4e Lu Baolu 2019-09-06 3975 * Check if DMAR supports zero-length reads on write only cfb94a372f2d4e Lu Baolu 2019-09-06 3976 * mappings.. cfb94a372f2d4e Lu Baolu 2019-09-06 3977 */ cfb94a372f2d4e Lu Baolu 2019-09-06 3978if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL || cfb94a372f2d4e Lu Baolu 2019-09-06 3979 !cap_zlr(iommu->cap)) cfb94a372f2d4e Lu Baolu 2019-09-06 3980prot |= DMA_PTE_READ; cfb94a372f2d4e Lu Baolu 2019-09-06 3981if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) cfb94a372f2d4e Lu Baolu 2019-09-06 3982prot |= DMA_PTE_WRITE; cfb94a372f2d4e Lu Baolu 2019-09-06 3983 cfb94a372f2d4e Lu Baolu 2019-09-06 3984/* cfb94a372f2d4e Lu Baolu 2019-09-06 3985 * If both the physical buffer start address and size are cfb94a372f2d4e Lu Baolu 2019-09-06 3986 * page aligned, we don't need to use a bounce page. cfb94a372f2d4e Lu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On Tue, 28 Apr 2020, Michael S. Tsirkin wrote: > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote: > > * Michael S. Tsirkin [2020-04-28 12:17:57]: > > > > > Okay, but how is all this virtio specific? For example, why not allow > > > separate swiotlbs for any type of device? > > > For example, this might make sense if a given device is from a > > > different, less trusted vendor. > > > > Is swiotlb commonly used for multiple devices that may be on different trust > > boundaries (and not behind a hardware iommu)? The trust boundary is not a good way of describing the scenario and I think it leads to miscommunication. A better way to describe the scenario would be that the device can only DMA to/from a small reserved-memory region advertised on device tree. Do we have other instances of devices that can only DMA to/from very specific and non-configurable address ranges? If so, this series could follow their example. > Even a hardware iommu does not imply a 100% security from malicious > hardware. First lots of people use iommu=pt for performance reasons. > Second even without pt, unmaps are often batched, and sub-page buffers > might be used for DMA, so we are not 100% protected at all times. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On Tue, 28 Apr 2020, Srivatsa Vaddagiri wrote: > For better security, its desirable that a guest VM's memory is > not accessible to any entity that executes outside the context of > guest VM. In case of virtio, backend drivers execute outside the > context of guest VM and in general will need access to complete > guest VM memory. One option to restrict the access provided to > backend driver is to make use of a bounce buffer. The bounce > buffer is accessible to both backend and frontend drivers. All IO > buffers that are in private space of guest VM are bounced to be > accessible to backend. [...] > +static int __init virtio_bounce_setup(struct reserved_mem *rmem) > +{ > + unsigned long node = rmem->fdt_node; > + > + if (!of_get_flat_dt_prop(node, "no-map", NULL)) > + return -EINVAL; > + > + return virtio_register_bounce_buffer(rmem->base, rmem->size); > +} > + > +RESERVEDMEM_OF_DECLARE(virtio, "virtio_bounce_pool", virtio_bounce_setup); Is this special reserved-memory region documented somewhere? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 5/7] x86/mmu: Allocate/free PASID
> There are two users of a PASID, mm and device driver(FD). If > either one is not done with the PASID, it cannot be reclaimed. As you > mentioned, it could take a long time for the driver to abort. If the > abort ends *after* mmdrop, we are in trouble. > If driver drops reference after abort/drain PASID is done, then we are > safe. I don't think there should be an abort ... suppose the application requested the DSA to copy some large block of important results from DDR4 to persistent memory. Driver should wait for that copy operation to complete. Note that for the operation to succeed, the kernel should still be processing and fixing page faults for the "mm" (some parts of the data that the user wanted to save to persistent memory may have been paged out). The wait by the DSA diver needs to by synchronous ... the "mm" cannot be freed until DSA says all the pending operations have completed. Even without persistent memory, there are cases where you want the operations to complete (mmap'd files, shared memory with other processes). -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
Hi Srivatsa, Thank you for the patch! Yet something to improve: [auto build test ERROR on vhost/linux-next] [also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 next-20200428] [cannot apply to swiotlb/linux-next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334 base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next config: s390-randconfig-a001-20200428 (attached as .config) compiler: s390-linux-gcc (GCC) 9.3.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=s390 If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All errors (new ones prefixed by >>): drivers/virtio/virtio_bounce.c: In function 'virtio_bounce_setup': >> drivers/virtio/virtio_bounce.c:144:7: error: implicit declaration of >> function 'of_get_flat_dt_prop' [-Werror=implicit-function-declaration] 144 | if (!of_get_flat_dt_prop(node, "no-map", NULL)) | ^~~ cc1: some warnings being treated as errors vim +/of_get_flat_dt_prop +144 drivers/virtio/virtio_bounce.c 139 140 static int __init virtio_bounce_setup(struct reserved_mem *rmem) 141 { 142 unsigned long node = rmem->fdt_node; 143 > 144 if (!of_get_flat_dt_prop(node, "no-map", NULL)) 145 return -EINVAL; 146 147 return virtio_register_bounce_buffer(rmem->base, rmem->size); 148 } 149 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] x86/mmu: Allocate/free PASID
On Tue, 28 Apr 2020 13:59:43 -0700 "Luck, Tony" wrote: > >> So the driver needs to use flush/drain operations to make sure all > >> the in-flight work has completed before releasing/re-using the > >> PASID. > > Are you suggesting we should let driver also hold a reference of the > > PASID? > > The sequence for bare metal is: > > process is queuing requests to DSA > process exits (either deliberately, or crashes, or is killed) > kernel does exit processing > DSA driver is called as part of tear down of "mm" > issues drain/flush commands to ensure that all > queued operations on the PASID for this mm have > completed > PASID can be freed > > There's a 1:1 map from "mm" to PASID ... so reference counting seems > like overkill. Once the kernel is in the "exit" path, we know that no > more work can be queued using this PASID. > There are two users of a PASID, mm and device driver(FD). If either one is not done with the PASID, it cannot be reclaimed. As you mentioned, it could take a long time for the driver to abort. If the abort ends *after* mmdrop, we are in trouble. If driver drops reference after abort/drain PASID is done, then we are safe. > -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
Hi Srivatsa, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on vhost/linux-next] [also build test WARNING on xen-tip/linux-next linus/master v5.7-rc3 next-20200428] [cannot apply to swiotlb/linux-next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334 base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next config: sh-randconfig-a001-20200428 (attached as .config) compiler: sh4-linux-gcc (GCC) 9.3.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=sh If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All warnings (new ones prefixed by >>): In file included from drivers/virtio/virtio_bounce.c:13: include/linux/swiotlb.h: In function 'swiotlb_alloc': include/linux/swiotlb.h:234:9: error: 'DMA_MAPPING_ERROR' undeclared (first use in this function) 234 | return DMA_MAPPING_ERROR; | ^ include/linux/swiotlb.h:234:9: note: each undeclared identifier is reported only once for each function it appears in >> include/linux/swiotlb.h:235:1: warning: control reaches end of non-void >> function [-Wreturn-type] 235 | } | ^ vim +235 include/linux/swiotlb.h 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 229 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 230 static inline phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 231 size_t alloc_size, unsigned long tbl_dma_addr, 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 232 unsigned long mask) 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 233 { 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 234 return DMA_MAPPING_ERROR; 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 @235 } 9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 236 :: The code at line 235 was first introduced by commit :: 9ab4c39d9f929840cb884e589f6112770dbc2f63 swiotlb: Add alloc and free APIs :: TO: Srivatsa Vaddagiri :: CC: 0day robot --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 5/7] x86/mmu: Allocate/free PASID
>> So the driver needs to use flush/drain operations to make sure all >> the in-flight work has completed before releasing/re-using the PASID. >> > Are you suggesting we should let driver also hold a reference of the > PASID? The sequence for bare metal is: process is queuing requests to DSA process exits (either deliberately, or crashes, or is killed) kernel does exit processing DSA driver is called as part of tear down of "mm" issues drain/flush commands to ensure that all queued operations on the PASID for this mm have completed PASID can be freed There's a 1:1 map from "mm" to PASID ... so reference counting seems like overkill. Once the kernel is in the "exit" path, we know that no more work can be queued using this PASID. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] x86/mmu: Allocate/free PASID
On Sun, Apr 26, 2020 at 04:55:25PM +0200, Thomas Gleixner wrote: > Fenghua Yu writes: > > diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h > > index bdeae9291e5c..137bf51f19e6 100644 > > --- a/arch/x86/include/asm/mmu.h > > +++ b/arch/x86/include/asm/mmu.h > > @@ -50,6 +50,10 @@ typedef struct { > > u16 pkey_allocation_map; > > s16 execute_only_pkey; > > #endif > > + > > +#ifdef CONFIG_INTEL_IOMMU_SVM > > + int pasid; > > int? It's a value which gets programmed into the MSR along with the > valid bit (bit 31) set. BTW, ARM is working on PASID as well. Christoph suggested that the PASID should be defined in mm_struct instead of mm->context so that both ARM and X86 can access it: https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-phili...@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911 So I will define "pasid" to mm_struct in a separate patch in the next version. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] x86/mmu: Allocate/free PASID
On Tue, 28 Apr 2020 12:07:25 -0700 "Luck, Tony" wrote: > > If fd release cleans up then how should there be something in > > flight at the final mmdrop? > > ENQCMD from the user is only synchronous in that it lets the user > know their request has been added to a queue (or not). Execution of > the request may happen later (if the device is busy working on > requests for other users). The request will take some time to > complete. Someone told me the theoretical worst case once, which I've > since forgotten, but it can be a long time. > > So the driver needs to use flush/drain operations to make sure all > the in-flight work has completed before releasing/re-using the PASID. > Are you suggesting we should let driver also hold a reference of the PASID? > -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote: > * Michael S. Tsirkin [2020-04-28 12:17:57]: > > > Okay, but how is all this virtio specific? For example, why not allow > > separate swiotlbs for any type of device? > > For example, this might make sense if a given device is from a > > different, less trusted vendor. > > Is swiotlb commonly used for multiple devices that may be on different trust > boundaries (and not behind a hardware iommu)? Even a hardware iommu does not imply a 100% security from malicious hardware. First lots of people use iommu=pt for performance reasons. Second even without pt, unmaps are often batched, and sub-page buffers might be used for DMA, so we are not 100% protected at all times. > If so, then yes it sounds like a > good application of multiple swiotlb pools. > > > All this can then maybe be hidden behind the DMA API. > > Won't we still need some changes to virtio to make use of its own pool (to > bounce buffers)? Something similar to its own DMA ops proposed in this patch? If you are doing this for all devices, you need to either find a way to do this without chaning DMA ops, or by doing some automatic change to all drivers. > > > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev) > > > +{ > > > + if (!bounce_buf_paddr) > > > + return; > > > + > > > + set_dma_ops(vdev->dev.parent, &virtio_dma_ops); > > > > > > I don't think DMA API maintainers will be happy with new users > > of set_dma_ops. > > Is there an alternate API that is more preffered? all this is supposed to be part of DMA API itself. new drivers aren't supposed to have custom DMA ops. > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] x86/mmu: Allocate/free PASID
On Tue, 28 Apr 2020 20:54:01 +0200 Thomas Gleixner wrote: > "Jacob Pan (Jun)" writes: > > On Sun, 26 Apr 2020 16:55:25 +0200 > > Thomas Gleixner wrote: > >> Fenghua Yu writes: > >> > The PASID is freed when the process exits (so no need to keep > >> > reference counts on how many SVM devices are sharing the > >> > PASID). > >> > >> I'm not buying that. If there is an outstanding request with the > >> PASID of a process then tearing down the process address space and > >> freeing the PASID (which might be reused) is fundamentally broken. > >> > > Device driver unbind PASID is tied to FD release. So when a process > > exits, FD close causes driver to do the following: > > > > 1. stops DMA > > 2. unbind PASID (clears the PASID entry in IOMMU, flush all TLBs, > > drain in flight page requests) > > Fair enough. Explaining that somewhere might be helpful. > Will do. I plan to document this in a kernel doc for IOASID/PASID lifecycle management. > > For bare metal SVM, if the last mmdrop always happens after FD > > release, we can ensure no outstanding requests at the point of > > ioasid_free(). Perhaps this is a wrong assumption? > > If fd release cleans up then how should there be something in flight > at the final mmdrop? > > > For guest SVM, there will be more users of a PASID. I am also > > working on adding refcounting to ioasid. ioasid_free() will not > > release the PASID back to the pool until all references are > > dropped. > > What does more users mean? For VT-d, a PASID can be used by VFIO, IOMMU driver, KVM, and Virtual Device Composition Module (VDCM*) at the same time. *https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification There are HW context associated with the PASID in IOMMU, KVM, and VDCM. So before the lifetime of the PASID is over, clean up must be done in all of the above. PASID cannot be reclaimed until the last user drops its reference. Our plan is to do notification and refcouting. > > >> > +if (mm && mm->context.pasid && !(flags & > >> > SVM_FLAG_PRIVATE_PASID)) { > >> > +/* > >> > + * Once a PASID is allocated for this mm, the > >> > PASID > >> > + * stays with the mm until the mm is dropped. > >> > Reuse > >> > + * the PASID which has been already allocated > >> > for the > >> > + * mm instead of allocating a new one. > >> > + */ > >> > +ioasid_set_data(mm->context.pasid, svm); > >> > >> So if the PASID is reused several times for different SVMs then > >> every time ioasid_data->private is set to a different SVM. How is > >> that supposed to work? > >> > > For the lifetime of the mm, there is only one PASID. > > svm_bind/unbind_mm could happen many times with different private > > data: intel_svm. Multiple devices can bind to the same PASID as > > well. But private data don't change within the first bind and last > > unbind. > > Ok. I read through that spaghetti of intel_svm_bind_mm() again and > now I start to get an idea how that is supposed to work. What a mess. > > That function really wants to be restructured in a way so it is > understandable to mere mortals. > Agreed. We are adding many new features and converging with generic sva_bind_device. Things will get more clear after we have fewer moving pieces. > Thanks, > > tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 5/7] x86/mmu: Allocate/free PASID
> If fd release cleans up then how should there be something in flight at > the final mmdrop? ENQCMD from the user is only synchronous in that it lets the user know their request has been added to a queue (or not). Execution of the request may happen later (if the device is busy working on requests for other users). The request will take some time to complete. Someone told me the theoretical worst case once, which I've since forgotten, but it can be a long time. So the driver needs to use flush/drain operations to make sure all the in-flight work has completed before releasing/re-using the PASID. -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] x86/mmu: Allocate/free PASID
"Jacob Pan (Jun)" writes: > On Sun, 26 Apr 2020 16:55:25 +0200 > Thomas Gleixner wrote: >> Fenghua Yu writes: >> > The PASID is freed when the process exits (so no need to keep >> > reference counts on how many SVM devices are sharing the PASID). >> >> I'm not buying that. If there is an outstanding request with the PASID >> of a process then tearing down the process address space and freeing >> the PASID (which might be reused) is fundamentally broken. >> > Device driver unbind PASID is tied to FD release. So when a process > exits, FD close causes driver to do the following: > > 1. stops DMA > 2. unbind PASID (clears the PASID entry in IOMMU, flush all TLBs, drain > in flight page requests) Fair enough. Explaining that somewhere might be helpful. > For bare metal SVM, if the last mmdrop always happens after FD release, > we can ensure no outstanding requests at the point of ioasid_free(). > Perhaps this is a wrong assumption? If fd release cleans up then how should there be something in flight at the final mmdrop? > For guest SVM, there will be more users of a PASID. I am also > working on adding refcounting to ioasid. ioasid_free() will not release > the PASID back to the pool until all references are dropped. What does more users mean? >> > + if (mm && mm->context.pasid && !(flags & >> > SVM_FLAG_PRIVATE_PASID)) { >> > + /* >> > + * Once a PASID is allocated for this mm, the PASID >> > + * stays with the mm until the mm is dropped. Reuse >> > + * the PASID which has been already allocated for >> > the >> > + * mm instead of allocating a new one. >> > + */ >> > + ioasid_set_data(mm->context.pasid, svm); >> >> So if the PASID is reused several times for different SVMs then every >> time ioasid_data->private is set to a different SVM. How is that >> supposed to work? >> > For the lifetime of the mm, there is only one PASID. svm_bind/unbind_mm > could happen many times with different private data: intel_svm. > Multiple devices can bind to the same PASID as well. But private data > don't change within the first bind and last unbind. Ok. I read through that spaghetti of intel_svm_bind_mm() again and now I start to get an idea how that is supposed to work. What a mess. That function really wants to be restructured in a way so it is understandable to mere mortals. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
On Tue, 28 Apr 2020, Jürgen Groß wrote: > On 28.04.20 09:33, peng@nxp.com wrote: > > From: Peng Fan > > > > When booting xen on i.MX8QM, met: > > " > > [3.602128] Unable to handle kernel paging request at virtual address > > 00272d40 > > [3.610804] Mem abort info: > > [3.613905] ESR = 0x9604 > > [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits > > [3.623211] SET = 0, FnV = 0 > > [3.626628] EA = 0, S1PTW = 0 > > [3.630128] Data abort info: > > [3.633362] ISV = 0, ISS = 0x0004 > > [3.637630] CM = 0, WnR = 0 > > [3.640955] [00272d40] user address but active_mm is swapper > > [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP > > [3.654137] Modules linked in: > > [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) > > [3.677302] Workqueue: events deferred_probe_work_func > > [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 > > [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) > > [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 > > [3.693993] pci_bus :00: root bus resource [bus 00-ff] > > [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 > > " > > > > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or > > range_straddles_page_boundary(phys, size) are true, it will > > create contiguous region. So when free, we need to free contiguous > > region use upper check condition. > > No, this will break PV guests on x86. > > I think there is something wrong with your setup in combination with > the ARM xen_create_contiguous_region() implementation. > > Stefano? Let me start by asking Peng a couple of questions: Peng, could you please add a printk to check which one of the two conditions is True for you? Is it (dev_addr + size - 1 > dma_mask) or range_straddles_page_boundary(phys, size)? Is hwdev->coherent_dma_mask set for your DMA capable device? Finally, is this patch supposed to fix the crash you are seeing? If not, can you tell where is the crash exactly? Juergen, keep in mind that xen_create_contiguous_region does nothing on ARM because in dom0 guest_phys == phys. xen_create_contiguous_region simply sets dma_handle to phys. Whatever condition caused the code to take the xen_create_contiguous_region branch in xen_swiotlb_alloc_coherent, it will also cause it to WARN in xen_swiotlb_free_coherent. range_straddles_page_boundary should never return True because guest_phys == phys. That leaves us with the dma_mask check: dev_addr + size - 1 <= dma_mask dev_addr is the dma_handle allocated by xen_alloc_coherent_pages. dma_mask is either DMA_BIT_MASK(32) or hwdev->coherent_dma_mask. The implementation of xen_alloc_coherent_pages has been recently changed to use dma_direct_alloc. Christoff, does dma_direct_alloc respect hwdev->coherent_dma_mask if present? Also, can it return highmem pages? > Juergen > > > > > Signed-off-by: Peng Fan > > --- > > drivers/xen/swiotlb-xen.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index b6d27762c6f8..ab96e468584f 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t > > size, void *vaddr, > > /* Convert the size to actually allocated. */ > > size = 1UL << (order + XEN_PAGE_SHIFT); > > - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > > -range_straddles_page_boundary(phys, size)) && > > + if (((dev_addr + size - 1 > dma_mask) || > > + range_straddles_page_boundary(phys, size)) && > > TestClearPageXenRemapped(virt_to_page(vaddr))) > > xen_destroy_contiguous_region(phys, order); > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
On 4/28/20 10:25 AM, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 28, 2020 at 12:19:41PM +0200, Jürgen Groß wrote: >> On 28.04.20 10:25, Peng Fan wrote: > > Adding Joe Jin. > > Joe, didn't you have some ideas on how this could be implemented? > Subject: Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region On 28.04.20 09:33, peng@nxp.com wrote: > From: Peng Fan > > When booting xen on i.MX8QM, met: > " > [3.602128] Unable to handle kernel paging request at virtual address 00272d40 > [3.610804] Mem abort info: > [3.613905] ESR = 0x9604 > [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits > [3.623211] SET = 0, FnV = 0 > [3.626628] EA = 0, S1PTW = 0 > [3.630128] Data abort info: > [3.633362] ISV = 0, ISS = 0x0004 > [3.637630] CM = 0, WnR = 0 > [3.640955] [00272d40] user address but active_mm is swapper > [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP > [3.654137] Modules linked in: > [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) > [3.677302] Workqueue: events deferred_probe_work_func > [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 > [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) > [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 > [3.693993] pci_bus :00: root bus resource [bus 00-ff] > [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 > " > > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) > or range_straddles_page_boundary(phys, size) are true, it will create > contiguous region. So when free, we need to free contiguous region use > upper check condition. No, this will break PV guests on x86. >>> >>> Could you share more details why alloc and free not matching for the check? >> >> xen_create_contiguous_region() is needed only in case: >> >> - the bus address is not within dma_mask, or >> - the memory region is not physically contiguous (can happen only for >> PV guests) >> >> In any case it should arrange for the memory to be suitable for the >> DMA operation, so to be contiguous and within dma_mask afterwards. So >> xen_destroy_contiguous_region() should only ever called for areas >> which match above criteria, as otherwise we can be sure >> xen_create_contiguous_region() was not used for making the area DMA-able >> in the beginning. I agreed with Juergen's explanation, That is my understanding. Peng, if panic caused by (dev_addr + size - 1 > dma_mask), you should check how you get the addr, if memory created by xen_create_contiguous_region(), memory must be with in [0 - dma_mask]. Thanks, Joe >> >> And this is very important in the PV case, as in those guests the page >> tables are containing the host-PFNs, not the guest-PFNS, and >> xen_create_contiguous_region() will fiddle with host- vs. guest-PFN >> arrangements, and xen_destroy_contiguous_region() is reverting this >> fiddling. Any call of xen_destroy_contiguous_region() for an area it >> was not intended to be called for might swap physical pages beneath >> random virtual addresses, which was the reason for this test to be >> added by me. >> >> >> Juergen >> >>> >>> Thanks, >>> Peng. >>> I think there is something wrong with your setup in combination with the ARM xen_create_contiguous_region() implementation. Stefano? Juergen > > Signed-off-by: Peng Fan > --- >drivers/xen/swiotlb-xen.c | 4 ++-- >1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index b6d27762c6f8..ab96e468584f 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, > /* Convert the size to actually allocated. */ > size = 1UL << (order + XEN_PAGE_SHIFT); > > - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > - range_straddles_page_boundary(phys, size)) && > + if (((dev_addr + size - 1 > dma_mask) || > + range_straddles_page_boundary(phys, size)) && > TestClearPageXenRemapped(virt_to_page(vaddr))) > xen_destroy_contiguous_region(phys, order); > > >>> >> ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/7] x86/mmu: Allocate/free PASID
On Sun, 26 Apr 2020 16:55:25 +0200 Thomas Gleixner wrote: > Fenghua Yu writes: > > > PASID is shared by all threads in a process. So the logical place > > to keep track of it is in the "mm". Add the field to the > > architecture specific mm_context_t structure. > > > > A PASID is allocated for an "mm" the first time any thread attaches > > to an SVM capable device. Later device atatches (whether to the > > same > > atatches? > > > device or another SVM device) will re-use the same PASID. > > > > The PASID is freed when the process exits (so no need to keep > > reference counts on how many SVM devices are sharing the PASID). > > I'm not buying that. If there is an outstanding request with the PASID > of a process then tearing down the process address space and freeing > the PASID (which might be reused) is fundamentally broken. > Device driver unbind PASID is tied to FD release. So when a process exits, FD close causes driver to do the following: 1. stops DMA 2. unbind PASID (clears the PASID entry in IOMMU, flush all TLBs, drain in flight page requests) For bare metal SVM, if the last mmdrop always happens after FD release, we can ensure no outstanding requests at the point of ioasid_free(). Perhaps this is a wrong assumption? For guest SVM, there will be more users of a PASID. I am also working on adding refcounting to ioasid. ioasid_free() will not release the PASID back to the pool until all references are dropped. > > +void __free_pasid(struct mm_struct *mm); > > + > > #endif /* _ASM_X86_IOMMU_H */ > > diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h > > index bdeae9291e5c..137bf51f19e6 100644 > > --- a/arch/x86/include/asm/mmu.h > > +++ b/arch/x86/include/asm/mmu.h > > @@ -50,6 +50,10 @@ typedef struct { > > u16 pkey_allocation_map; > > s16 execute_only_pkey; > > #endif > > + > > +#ifdef CONFIG_INTEL_IOMMU_SVM > > + int pasid; > > int? It's a value which gets programmed into the MSR along with the > valid bit (bit 31) set. > > > extern void switch_mm(struct mm_struct *prev, struct mm_struct > > *next, diff --git a/drivers/iommu/intel-svm.c > > b/drivers/iommu/intel-svm.c index d7f2a5358900..da718a49e91e 100644 > > --- a/drivers/iommu/intel-svm.c > > +++ b/drivers/iommu/intel-svm.c > > @@ -226,6 +226,45 @@ static LIST_HEAD(global_svm_list); > > list_for_each_entry((sdev), &(svm)->devs, list) \ > > if ((d) != (sdev)->dev) {} else > > > > +/* > > + * If this mm already has a PASID we can use it. Otherwise > > allocate a new one. > > + * Let the caller know if we did an allocation via 'new_pasid'. > > + */ > > +static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm, > > + int pasid_max, bool *new_pasid, int > > flags) > > Again, data types please. flags are generally unsigned and not plain > int. Also pasid_max is certainly not plain int either. > > > +{ > > + int pasid; > > + > > + /* > > +* Reuse the PASID if the mm already has a PASID and not a > > private > > +* PASID is requested. > > +*/ > > + if (mm && mm->context.pasid && !(flags & > > SVM_FLAG_PRIVATE_PASID)) { > > + /* > > +* Once a PASID is allocated for this mm, the PASID > > +* stays with the mm until the mm is dropped. Reuse > > +* the PASID which has been already allocated for > > the > > +* mm instead of allocating a new one. > > +*/ > > + ioasid_set_data(mm->context.pasid, svm); > > So if the PASID is reused several times for different SVMs then every > time ioasid_data->private is set to a different SVM. How is that > supposed to work? > For the lifetime of the mm, there is only one PASID. svm_bind/unbind_mm could happen many times with different private data: intel_svm. Multiple devices can bind to the same PASID as well. But private data don't change within the first bind and last unbind. > > + *new_pasid = false; > > + > > + return mm->context.pasid; > > + } > > + > > + /* > > +* Allocate a new pasid. Do not use PASID 0, reserved for > > RID to > > +* PASID. > > +*/ > > + pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, > > svm); > > ioasid_alloc() uses ioasid_t which is > > typedef unsigned int ioasid_t; > > Can we please have consistent types and behaviour all over the place? > > > + if (pasid == INVALID_IOASID) > > + return -ENOSPC; > > + > > + *new_pasid = true; > > + > > + return pasid; > > +} > > + > > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, > > struct svm_dev_ops *ops) { > > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > > @@ -324,6 +363,8 @@ int intel_svm_bind_mm(struct device *dev, int > > *pasid, int flags, struct svm_dev_ init_rcu_head(&sdev->rcu); > > > > if (!svm) { > > + bool new_pasid; > > + > > svm = kzalloc(sizeof(*svm), GFP_KERNEL); > > if (!svm) { > >
Re: [PATCH 5/5] virtio: Add bounce DMA ops
* Michael S. Tsirkin [2020-04-28 12:17:57]: > Okay, but how is all this virtio specific? For example, why not allow > separate swiotlbs for any type of device? > For example, this might make sense if a given device is from a > different, less trusted vendor. Is swiotlb commonly used for multiple devices that may be on different trust boundaries (and not behind a hardware iommu)? If so, then yes it sounds like a good application of multiple swiotlb pools. > All this can then maybe be hidden behind the DMA API. Won't we still need some changes to virtio to make use of its own pool (to bounce buffers)? Something similar to its own DMA ops proposed in this patch? > > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev) > > +{ > > + if (!bounce_buf_paddr) > > + return; > > + > > + set_dma_ops(vdev->dev.parent, &virtio_dma_ops); > > > I don't think DMA API maintainers will be happy with new users > of set_dma_ops. Is there an alternate API that is more preffered? -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
On Tue, Apr 28, 2020 at 12:19:41PM +0200, Jürgen Groß wrote: > On 28.04.20 10:25, Peng Fan wrote: Adding Joe Jin. Joe, didn't you have some ideas on how this could be implemented? > > > Subject: Re: [PATCH] xen/swiotlb: correct the check for > > > xen_destroy_contiguous_region > > > > > > On 28.04.20 09:33, peng@nxp.com wrote: > > > > From: Peng Fan > > > > > > > > When booting xen on i.MX8QM, met: > > > > " > > > > [3.602128] Unable to handle kernel paging request at virtual address > > > 00272d40 > > > > [3.610804] Mem abort info: > > > > [3.613905] ESR = 0x9604 > > > > [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits > > > > [3.623211] SET = 0, FnV = 0 > > > > [3.626628] EA = 0, S1PTW = 0 > > > > [3.630128] Data abort info: > > > > [3.633362] ISV = 0, ISS = 0x0004 > > > > [3.637630] CM = 0, WnR = 0 > > > > [3.640955] [00272d40] user address but active_mm is > > > swapper > > > > [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP > > > > [3.654137] Modules linked in: > > > > [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) > > > > [3.677302] Workqueue: events deferred_probe_work_func > > > > [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 > > > > [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) > > > > [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 > > > > [3.693993] pci_bus :00: root bus resource [bus 00-ff] > > > > [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 > > > > " > > > > > > > > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) > > > > or range_straddles_page_boundary(phys, size) are true, it will create > > > > contiguous region. So when free, we need to free contiguous region use > > > > upper check condition. > > > > > > No, this will break PV guests on x86. > > > > Could you share more details why alloc and free not matching for the check? > > xen_create_contiguous_region() is needed only in case: > > - the bus address is not within dma_mask, or > - the memory region is not physically contiguous (can happen only for > PV guests) > > In any case it should arrange for the memory to be suitable for the > DMA operation, so to be contiguous and within dma_mask afterwards. So > xen_destroy_contiguous_region() should only ever called for areas > which match above criteria, as otherwise we can be sure > xen_create_contiguous_region() was not used for making the area DMA-able > in the beginning. > > And this is very important in the PV case, as in those guests the page > tables are containing the host-PFNs, not the guest-PFNS, and > xen_create_contiguous_region() will fiddle with host- vs. guest-PFN > arrangements, and xen_destroy_contiguous_region() is reverting this > fiddling. Any call of xen_destroy_contiguous_region() for an area it > was not intended to be called for might swap physical pages beneath > random virtual addresses, which was the reason for this test to be > added by me. > > > Juergen > > > > > Thanks, > > Peng. > > > > > > > > I think there is something wrong with your setup in combination with the > > > ARM > > > xen_create_contiguous_region() implementation. > > > > > > Stefano? > > > > > > > > > Juergen > > > > > > > > > > > Signed-off-by: Peng Fan > > > > --- > > > >drivers/xen/swiotlb-xen.c | 4 ++-- > > > >1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > > > index b6d27762c6f8..ab96e468584f 100644 > > > > --- a/drivers/xen/swiotlb-xen.c > > > > +++ b/drivers/xen/swiotlb-xen.c > > > > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, > > > size_t size, void *vaddr, > > > > /* Convert the size to actually allocated. */ > > > > size = 1UL << (order + XEN_PAGE_SHIFT); > > > > > > > > - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > > > > -range_straddles_page_boundary(phys, size)) && > > > > + if (((dev_addr + size - 1 > dma_mask) || > > > > + range_straddles_page_boundary(phys, size)) && > > > > TestClearPageXenRemapped(virt_to_page(vaddr))) > > > > xen_destroy_contiguous_region(phys, order); > > > > > > > > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On Tue, Apr 28, 2020 at 05:09:18PM +0530, Srivatsa Vaddagiri wrote: > For better security, its desirable that a guest VM's memory is > not accessible to any entity that executes outside the context of > guest VM. In case of virtio, backend drivers execute outside the > context of guest VM and in general will need access to complete > guest VM memory. One option to restrict the access provided to > backend driver is to make use of a bounce buffer. The bounce > buffer is accessible to both backend and frontend drivers. All IO > buffers that are in private space of guest VM are bounced to be > accessible to backend. > > This patch proposes a new memory pool to be used for this bounce > purpose, rather than the default swiotlb memory pool. That will > avoid any conflicts that may arise in situations where a VM needs > to use swiotlb pool for driving any pass-through devices (in > which case swiotlb memory needs not be shared with another VM) as > well as virtio devices (which will require swiotlb memory to be > shared with backend VM). As a possible extension to this patch, > we can provide an option for virtio to make use of default > swiotlb memory pool itself, where no such conflicts may exist in > a given deployment. > > Signed-off-by: Srivatsa Vaddagiri Okay, but how is all this virtio specific? For example, why not allow separate swiotlbs for any type of device? For example, this might make sense if a given device is from a different, less trusted vendor. All this can then maybe be hidden behind the DMA API. > --- > drivers/virtio/Makefile| 2 +- > drivers/virtio/virtio.c| 2 + > drivers/virtio/virtio_bounce.c | 150 > + > include/linux/virtio.h | 4 ++ > 4 files changed, 157 insertions(+), 1 deletion(-) > create mode 100644 drivers/virtio/virtio_bounce.c > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile > index 29a1386e..3fd3515 100644 > --- a/drivers/virtio/Makefile > +++ b/drivers/virtio/Makefile > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: GPL-2.0 > -obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o > +obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o > obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o > obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o > virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index a977e32..bc2f779 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev) > > dev->index = err; > dev_set_name(&dev->dev, "virtio%u", dev->index); > + virtio_bounce_set_dma_ops(dev); > > spin_lock_init(&dev->config_lock); > dev->config_enabled = false; > @@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore); > > static int virtio_init(void) > { > + virtio_map_bounce_buffer(); > if (bus_register(&virtio_bus) != 0) > panic("virtio bus registration failed"); > return 0; > diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c > new file mode 100644 > index 000..3de8e0e > --- /dev/null > +++ b/drivers/virtio/virtio_bounce.c > @@ -0,0 +1,150 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * Virtio DMA ops to bounce buffers > + * > + * Copyright (c) 2020, The Linux Foundation. All rights reserved. > + * > + * This module allows bouncing of IO buffers to a region which will be > + * accessible to backend drivers. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +static phys_addr_t bounce_buf_paddr; > +static void *bounce_buf_vaddr; > +static size_t bounce_buf_size; > +struct swiotlb_pool *virtio_pool; > + > +#define VIRTIO_MAX_BOUNCE_SIZE (16*4096) > + > +static void *virtio_alloc_coherent(struct device *dev, size_t size, > + dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs) > +{ > + phys_addr_t addr; > + > + if (!virtio_pool) > + return NULL; > + > + addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX); > + if (addr == DMA_MAPPING_ERROR) > + return NULL; > + > + *dma_handle = (addr - bounce_buf_paddr); > + > + return bounce_buf_vaddr + (addr - bounce_buf_paddr); > +} > + > +static void virtio_free_coherent(struct device *dev, size_t size, void > *vaddr, > + dma_addr_t dma_handle, unsigned long attrs) > +{ > + phys_addr_t addr = (dma_handle + bounce_buf_paddr); > + > + swiotlb_free(virtio_pool, addr, size); > +} > + > +static dma_addr_t virtio_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, unsigned long attrs) > +{ > + void *ptr = page_address(page) + offset; > + phys_addr_t paddr = virt_to_phys(ptr); > + dma_addr_t handle; > + > + if (!virtio_pool)
Re: [RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse
On 2020-04-28 4:32 pm, Daniel Vetter wrote: On Tue, Apr 28, 2020 at 04:02:57PM +0200, Christoph Hellwig wrote: On Tue, Apr 28, 2020 at 03:19:48PM +0200, Marek Szyprowski wrote: 1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use a proper sg_table entries and call respective DMA-mapping functions and adapt current code to it That sounds reasonable to me. Those could be pretty trivial wrappers. 2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state which one refers to which part of the scatterlist; I'm open for other names for those entries nr_cpu_ents and nr_dma_ents might be better names, but it still would be a whole lot of churn for little gain. I think just good wrappers like suggested above might be more helpful. I guess long-term we could aim for both? I.e. roll out better wrappers first, once that's soaked through the tree, rename the last offenders. Yes, that's what I was thinking too - most of these uses are just passing them in and out of the DMA APIs, and thus would be subsumed into the wrappers anyway, then in the relatively few remaining places where the table is actually iterated for one reason or the other, renaming would stand to help review and maintenance in terms of making it far more obvious when the implementation and the intent don't match. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse
On Tue, Apr 28, 2020 at 04:02:57PM +0200, Christoph Hellwig wrote: > On Tue, Apr 28, 2020 at 03:19:48PM +0200, Marek Szyprowski wrote: > > 1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use > >a proper sg_table entries and call respective DMA-mapping functions > >and adapt current code to it > > That sounds reasonable to me. Those could be pretty trivial wrappers. > > > > > > > 2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state > >which one refers to which part of the scatterlist; I'm open for > >other names for those entries > > nr_cpu_ents and nr_dma_ents might be better names, but it still would be > a whole lot of churn for little gain. I think just good wrappers like > suggested above might be more helpful. I guess long-term we could aim for both? I.e. roll out better wrappers first, once that's soaked through the tree, rename the last offenders. Personally I like nr_cpu_ents and nr_dma_ents, that's about as clear as it gets. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 10/17] drm: radeon: fix sg_table nents vs. orig_nents misuse
Am 28.04.20 um 15:19 schrieb Marek Szyprowski: The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski Reviewed-by: Christian König --- drivers/gpu/drm/radeon/radeon_ttm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 5d50c9e..4770880 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -481,7 +481,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm) { struct radeon_device *rdev = radeon_get_rdev(ttm->bdev); struct radeon_ttm_tt *gtt = (void *)ttm; - unsigned pinned = 0, nents; + unsigned pinned = 0; int r; int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY); @@ -522,8 +522,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm) goto release_sg; r = -ENOMEM; - nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction); - if (nents == 0) + ttm->sg->nents = dma_map_sg(rdev->dev, ttm->sg->sgl, + if (ttm->sg->nents == 0) goto release_sg; drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages, @@ -554,9 +554,9 @@ static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm) return; /* free the sg table and pages again */ - dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction); + dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction); - for_each_sg_page(ttm->sg->sgl, &sg_iter, ttm->sg->nents, 0) { + for_each_sg_page(ttm->sg->sgl, &sg_iter, ttm->sg->orig_nents, 0) { struct page *page = sg_page_iter_page(&sg_iter); if (!(gtt->userflags & RADEON_GEM_USERPTR_READONLY)) set_page_dirty(page); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/17] drm: amdgpu: fix sg_table nents vs. orig_nents misuse
Am 28.04.20 um 15:19 schrieb Marek Szyprowski: The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 43d8ed7..4df813e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -307,8 +307,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, if (IS_ERR(sgt)) return sgt; - if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir, - DMA_ATTR_SKIP_CPU_SYNC)) + sgt->nents = dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->orig_nents, + dir, DMA_ATTR_SKIP_CPU_SYNC); + if (!sgt->nents) goto error_free; break; @@ -349,7 +350,7 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment *attach, struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); if (sgt->sgl->page_link) { - dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir); + dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir); sg_free_table(sgt); kfree(sgt); } else { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d5543c2..5f31585 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1043,7 +1043,6 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) { struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev); struct amdgpu_ttm_tt *gtt = (void *)ttm; - unsigned nents; int r; int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY); @@ -1059,8 +1058,9 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) /* Map SG to device */ r = -ENOMEM; - nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction); - if (nents == 0) + ttm->sg->nents = dma_map_sg(adev->dev, ttm->sg->sgl, + ttm->sg->orig_nents, direction); + if (ttm->sg->nents == 0) goto release_sg; /* convert SG to linear array of pages and dma addresses */ @@ -1091,7 +1091,7 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt *ttm) return; /* unmap the pages mapped to the device */ - dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction); + dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction); sg_free_table(ttm->sg); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Intel-gfx] [RFC 06/17] drm: i915: fix sg_table nents vs. orig_nents misuse
On 28/04/2020 14:19, Marek Szyprowski wrote: The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 13 +++-- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_region.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_shmem.c| 5 +++-- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 10 +- drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- drivers/gpu/drm/i915/i915_gem_gtt.c | 12 +++- drivers/gpu/drm/i915/i915_scatterlist.c | 4 ++-- drivers/gpu/drm/i915/selftests/scatterlist.c | 8 10 files changed, 41 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 7db5a79..d829852 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -36,21 +36,22 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme goto err_unpin_pages; } - ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); + ret = sg_alloc_table(st, obj->mm.pages->orig_nents, GFP_KERNEL); if (ret) goto err_free; src = obj->mm.pages->sgl; dst = st->sgl; - for (i = 0; i < obj->mm.pages->nents; i++) { + for (i = 0; i < obj->mm.pages->orig_nents; i++) { sg_set_page(dst, sg_page(src), src->length, 0); dst = sg_next(dst); src = sg_next(src); } - if (!dma_map_sg_attrs(attachment->dev, - st->sgl, st->nents, dir, - DMA_ATTR_SKIP_CPU_SYNC)) { + st->nents = dma_map_sg_attrs(attachment->dev, +st->sgl, st->orig_nents, dir, +DMA_ATTR_SKIP_CPU_SYNC); + if (!st->nents) { ret = -ENOMEM; goto err_free_sg; } @@ -74,7 +75,7 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf); dma_unmap_sg_attrs(attachment->dev, - sg->sgl, sg->nents, dir, + sg->sgl, sg->orig_nents, dir, DMA_ATTR_SKIP_CPU_SYNC); sg_free_table(sg); kfree(sg); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index cbbff81..a8ebfdd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -73,7 +73,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) } sg = st->sgl; - st->nents = 0; + st->nents = st->orig_nents = 0; sg_page_sizes = 0; do { @@ -94,7 +94,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) sg_set_page(sg, page, PAGE_SIZE << order, 0); sg_page_sizes |= PAGE_SIZE << order; - st->nents++; + st->nents = st->orig_nents = st->nents + 1; npages -= 1 << order; if (!npages) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 1515384..58ca560 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -53,7 +53,7 @@ GEM_BUG_ON(list_empty(blocks)); sg = st->sgl; - st->nents = 0; + st->nents = st->orig_nents = 0; sg_page_sizes = 0; prev_end = (resource_size_t)-1; @@ -78,7 +78,7 @@ sg->length = block_size; - st->nents++; + st->nents = st->orig_nents = st->nents + 1; } else { sg->length += block_size; sg_dma_len(sg) += block_size; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 5d5d7ee..851a732 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -80,7 +80,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) noreclaim |= __GFP_NORETRY | __GFP_NOWARN; sg = st->sgl; - st->nents = 0; + st->nents = st->orig_nents = 0; sg_page_sizes = 0; for (i = 0; i < page_count; i++) {
Re: [RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse
On Tue, Apr 28, 2020 at 03:19:48PM +0200, Marek Szyprowski wrote: > 1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use >a proper sg_table entries and call respective DMA-mapping functions >and adapt current code to it That sounds reasonable to me. Those could be pretty trivial wrappers. > > > 2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state >which one refers to which part of the scatterlist; I'm open for >other names for those entries nr_cpu_ents and nr_dma_ents might be better names, but it still would be a whole lot of churn for little gain. I think just good wrappers like suggested above might be more helpful. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 08/17] drm: msm: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/msm/msm_gem.c | 8 drivers/gpu/drm/msm/msm_iommu.c | 3 ++- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index 5a6a79f..54c3bbb 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -54,10 +54,10 @@ static void sync_for_device(struct msm_gem_object *msm_obj) if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) { dma_sync_sg_for_device(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL); } else { dma_map_sg(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL); } } @@ -67,10 +67,10 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj) if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) { dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL); } else { dma_unmap_sg(dev, msm_obj->sgt->sgl, - msm_obj->sgt->nents, DMA_BIDIRECTIONAL); + msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL); } } diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index ad58cfe..b0ca084 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -43,7 +43,8 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); size_t ret; - ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); + ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->orig_nents, + prot); WARN_ON(!ret); return (ret == len) ? 0 : -EINVAL; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse
Dear All, During the Exynos DRM GEM rework and fixing the issues in the drm_prime_sg_to_page_addr_arrays() function [1] I've noticed that most drivers in DRM framework incorrectly use nents and orig_nents entries of the struct sg_table. In case of the most DMA-mapping implementations exchanging those two entries or using nents for all loops on the scatterlist is harmless, because they both have the same value. There exists however a DMA-mapping implementations, for which such incorrect usage breaks things. The nents returned by dma_map_sg() might be lower than the nents passed as its parameter and this is perfectly fine. DMA framework or IOMMU is allowed to join consecutive chunks while mapping if such operation is supported by the underlying HW (bus, bridge, IOMMU, etc). Example of the case where dma_map_sg() might return 1 'DMA' chunk for the 4 'physical' pages is described here [2] The DMA-mapping framework documentation [3] states that dma_map_sg() returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The common pattern in DRM drivers were to assign the dma_map_sg() return value to sg_table->nents and use that value for the subsequent calls to dma_sync_sg_* or dma_unmap_sg functions. Also the code iterated over nents times to access the pages stored in the processed scatterlist, while it should use orig_nents as the numer of the page entries. I've tried to identify all such incorrect usage of sg_table->nents in the DRM subsystem and this is a result of my research. It looks that the incorrect pattern has been copied over the many drivers. Too bad in most cases it even worked correctly if the system used simple, linear DMA-mapping implementation, for which swapping nents and orig_nents doesn't make any difference. I wonder what to do to avoid such misuse in the future, as this case clearly shows that the current sg_table structure is a bit hard to understand. I have the following ideas and I would like to know your opinion before I prepare more patches and check sg_table usage all over the kernel: 1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use a proper sg_table entries and call respective DMA-mapping functions and adapt current code to it 2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state which one refers to which part of the scatterlist; I'm open for other names for those entries I will appreciate your comments. Patches are based on top of Linux next-20200428. I've reduced the recipients list mainly to mailing lists, the next version I will try to send to the maintainers of the respective drivers. Best regards, Marek Szyprowski References: [1] https://lkml.org/lkml/2020/3/27/555 [2] https://lkml.org/lkml/2020/3/29/65 [3] Documentation/DMA-API-HOWTO.txt Patch summary: Marek Szyprowski (17): drm: core: fix sg_table nents vs. orig_nents usage drm: amdgpu: fix sg_table nents vs. orig_nents usage drm: armada: fix sg_table nents vs. orig_nents usage drm: etnaviv: fix sg_table nents vs. orig_nents usage drm: exynos: fix sg_table nents vs. orig_nents usage drm: i915: fix sg_table nents vs. orig_nents usage drm: lima: fix sg_table nents vs. orig_nents usage drm: msm: fix sg_table nents vs. orig_nents usage drm: panfrost: fix sg_table nents vs. orig_nents usage drm: radeon: fix sg_table nents vs. orig_nents usage drm: rockchip: fix sg_table nents vs. orig_nents usage drm: tegra: fix sg_table nents vs. orig_nents usage drm: virtio: fix sg_table nents vs. orig_nents usage drm: vmwgfx: fix sg_table nents vs. orig_nents usage drm: xen: fix sg_table nents vs. orig_nents usage drm: host1x: fix sg_table nents vs. orig_nents usage dmabuf: fix sg_table nents vs. orig_nents usage drivers/dma-buf/heaps/heap-helpers.c | 7 --- drivers/dma-buf/udmabuf.c| 5 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 drivers/gpu/drm/armada/armada_gem.c | 14 - drivers/gpu/drm/drm_cache.c | 2 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 7 --- drivers/gpu/drm/drm_prime.c | 9 + drivers/gpu/drm/etnaviv/etnaviv_gem.c| 10 ++ drivers/gpu/drm/exynos/exynos_drm_g2d.c | 7 --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 13 ++-- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_region.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_shmem.c| 5 +++-- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 10 +- drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_gg
[RFC 12/17] drm: tegra: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/tegra/gem.c | 25 + drivers/gpu/drm/tegra/plane.c | 13 +++-- 2 files changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c index 6237681..5710ab4 100644 --- a/drivers/gpu/drm/tegra/gem.c +++ b/drivers/gpu/drm/tegra/gem.c @@ -98,8 +98,8 @@ static struct sg_table *tegra_bo_pin(struct device *dev, struct host1x_bo *bo, * the SG table needs to be copied to avoid overwriting any * other potential users of the original SG table. */ - err = sg_alloc_table_from_sg(sgt, obj->sgt->sgl, obj->sgt->nents, -GFP_KERNEL); + err = sg_alloc_table_from_sg(sgt, obj->sgt->sgl, +obj->sgt->orig_nents, GFP_KERNEL); if (err < 0) goto free; } else { @@ -197,7 +197,7 @@ static int tegra_bo_iommu_map(struct tegra_drm *tegra, struct tegra_bo *bo) bo->iova = bo->mm->start; bo->size = iommu_map_sg(tegra->domain, bo->iova, bo->sgt->sgl, - bo->sgt->nents, prot); + bo->sgt->orig_nents, prot); if (!bo->size) { dev_err(tegra->drm->dev, "failed to map buffer\n"); err = -ENOMEM; @@ -264,7 +264,7 @@ static struct tegra_bo *tegra_bo_alloc_object(struct drm_device *drm, static void tegra_bo_free(struct drm_device *drm, struct tegra_bo *bo) { if (bo->pages) { - dma_unmap_sg(drm->dev, bo->sgt->sgl, bo->sgt->nents, + dma_unmap_sg(drm->dev, bo->sgt->sgl, bo->sgt->orig_nents, DMA_FROM_DEVICE); drm_gem_put_pages(&bo->gem, bo->pages, true, true); sg_free_table(bo->sgt); @@ -290,9 +290,9 @@ static int tegra_bo_get_pages(struct drm_device *drm, struct tegra_bo *bo) goto put_pages; } - err = dma_map_sg(drm->dev, bo->sgt->sgl, bo->sgt->nents, -DMA_FROM_DEVICE); - if (err == 0) { + bo->sgt->nents = dma_map_sg(drm->dev, bo->sgt->sgl, bo->sgt->orig_nents, + DMA_FROM_DEVICE); + if (bo->sgt->nents == 0) { err = -EFAULT; goto free_sgt; } @@ -571,7 +571,8 @@ int tegra_drm_mmap(struct file *file, struct vm_area_struct *vma) goto free; } - if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) + sgt->nents = dma_map_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir); + if (sgt->nents == 0) goto free; return sgt; @@ -590,7 +591,7 @@ static void tegra_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach, struct tegra_bo *bo = to_tegra_bo(gem); if (bo->pages) - dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir); + dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir); sg_free_table(sgt); kfree(sgt); @@ -609,7 +610,7 @@ static int tegra_gem_prime_begin_cpu_access(struct dma_buf *buf, struct drm_device *drm = gem->dev; if (bo->pages) - dma_sync_sg_for_cpu(drm->dev, bo->sgt->sgl, bo->sgt->nents, + dma_sync_sg_for_cpu(drm->dev, bo->sgt->sgl, bo->sgt->orig_nents, DMA_FROM_DEVICE); return 0; @@ -623,8 +624,8 @@ static int tegra_gem_prime_end_cpu_access(struct dma_buf *buf, struct drm_device *drm = gem->dev; if (bo->pages) - dma_sync_sg_for_device(drm->dev, bo->sgt->sgl, bo->sgt->nents, - DMA_TO_DEVICE); + dma_sync_sg_for_device(drm->dev, bo->sgt->sgl, + bo->sgt->orig_nents, DMA_TO_DEVICE); return 0; } diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c index 9ccfb56..3982bf8 100644 --- a/drivers/gpu/drm/tegra/plane.c +++ b/drivers/gpu/drm/tegra/plane.c @@ -130,9 +130,10 @@ static int tegra_dc_pin(struct tegra_dc *dc, struct tegra_plane_state *state) } if (sgt) { - err = dma_map_sg(dc->dev, sgt->sgl, sgt->nents, -DMA_TO_DEVICE); - if (err == 0) { + sgt->nents = dma_map_sg(dc->dev, sgt->sgl, +
[RFC 17/17] dmabuf: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/dma-buf/heaps/heap-helpers.c | 7 --- drivers/dma-buf/udmabuf.c| 5 +++-- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/dma-buf/heaps/heap-helpers.c b/drivers/dma-buf/heaps/heap-helpers.c index 9f964ca..b923863 100644 --- a/drivers/dma-buf/heaps/heap-helpers.c +++ b/drivers/dma-buf/heaps/heap-helpers.c @@ -144,8 +144,9 @@ struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment, table = &a->table; - if (!dma_map_sg(attachment->dev, table->sgl, table->nents, - direction)) + table->nents = dma_map_sg(attachment->dev, table->sgl, + table->orig_nents, direction); + if (!table->nents) table = ERR_PTR(-ENOMEM); return table; } @@ -154,7 +155,7 @@ static void dma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment, struct sg_table *table, enum dma_data_direction direction) { - dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction); + dma_unmap_sg(attachment->dev, table->sgl, table->orig_nents, direction); } static vm_fault_t dma_heap_vm_fault(struct vm_fault *vmf) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index acb26c6..ea0cf71 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -63,7 +63,8 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, GFP_KERNEL); if (ret < 0) goto err; - if (!dma_map_sg(dev, sg->sgl, sg->nents, direction)) { + sg->nents = dma_map_sg(dev, sg->sgl, sg->orig_nents, direction); + if (!sg->nents) { ret = -EINVAL; goto err; } @@ -78,7 +79,7 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, static void put_sg_table(struct device *dev, struct sg_table *sg, enum dma_data_direction direction) { - dma_unmap_sg(dev, sg->sgl, sg->nents, direction); + dma_unmap_sg(dev, sg->sgl, sg->orig_nents, direction); sg_free_table(sg); kfree(sg); } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 03/17] drm: armada: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/armada/armada_gem.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c index 976685f..749647f 100644 --- a/drivers/gpu/drm/armada/armada_gem.c +++ b/drivers/gpu/drm/armada/armada_gem.c @@ -407,8 +407,10 @@ int armada_gem_pwrite_ioctl(struct drm_device *dev, void *data, sg_set_page(sg, page, PAGE_SIZE, 0); } - if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) { - num = sgt->nents; + sgt->nents = dma_map_sg(attach->dev, sgt->sgl, sgt->orig_nents, + dir); + if (sgt->nents == 0) { + num = sgt->orig_nents; goto release; } } else if (dobj->page) { @@ -418,7 +420,9 @@ int armada_gem_pwrite_ioctl(struct drm_device *dev, void *data, sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0); - if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) + sgt->nents = dma_map_sg(attach->dev, sgt->sgl, sgt->orig_nents, + dir); + if (sgt->nents == 0) goto free_table; } else if (dobj->linear) { /* Single contiguous physical region - no struct page */ @@ -449,11 +453,11 @@ static void armada_gem_prime_unmap_dma_buf(struct dma_buf_attachment *attach, int i; if (!dobj->linear) - dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir); + dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir); if (dobj->obj.filp) { struct scatterlist *sg; - for_each_sg(sgt->sgl, sg, sgt->nents, i) + for_each_sg(sgt->sgl, sg, sgt->orig_nents, i) put_page(sg_page(sg)); } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 13/17] drm: virtio: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/virtio/virtgpu_object.c | 11 ++- drivers/gpu/drm/virtio/virtgpu_vq.c | 8 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c index 6ccbd01..12f6bee 100644 --- a/drivers/gpu/drm/virtio/virtgpu_object.c +++ b/drivers/gpu/drm/virtio/virtgpu_object.c @@ -73,7 +73,8 @@ void virtio_gpu_cleanup_object(struct virtio_gpu_object *bo) if (shmem->pages) { if (shmem->mapped) { dma_unmap_sg(vgdev->vdev->dev.parent, -shmem->pages->sgl, shmem->mapped, +shmem->pages->sgl, +shmem->pages->orig_nents, DMA_TO_DEVICE); shmem->mapped = 0; } @@ -157,13 +158,13 @@ static int virtio_gpu_object_shmem_init(struct virtio_gpu_device *vgdev, } if (use_dma_api) { - shmem->mapped = dma_map_sg(vgdev->vdev->dev.parent, + shmem->pages->nents = dma_map_sg(vgdev->vdev->dev.parent, shmem->pages->sgl, - shmem->pages->nents, + shmem->pages->orig_nents, DMA_TO_DEVICE); - *nents = shmem->mapped; + *nents = shmem->mapped = shmem->pages->nents; } else { - *nents = shmem->pages->nents; + *nents = shmem->pages->orig_nents; } *ents = kmalloc_array(*nents, sizeof(struct virtio_gpu_mem_entry), diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c index 9e663a5..661325b 100644 --- a/drivers/gpu/drm/virtio/virtgpu_vq.c +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c @@ -604,8 +604,8 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct virtio_gpu_device *vgdev, if (use_dma_api) dma_sync_sg_for_device(vgdev->vdev->dev.parent, - shmem->pages->sgl, shmem->pages->nents, - DMA_TO_DEVICE); + shmem->pages->sgl, + shmem->pages->orig_nents, DMA_TO_DEVICE); cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p)); memset(cmd_p, 0, sizeof(*cmd_p)); @@ -1020,8 +1020,8 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev, if (use_dma_api) dma_sync_sg_for_device(vgdev->vdev->dev.parent, - shmem->pages->sgl, shmem->pages->nents, - DMA_TO_DEVICE); + shmem->pages->sgl, + shmem->pages->orig_nents, DMA_TO_DEVICE); cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p)); memset(cmd_p, 0, sizeof(*cmd_p)); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 14/17] drm: vmwgfx: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c index bf0bc46..a5fd128 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c @@ -362,7 +362,7 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt *vmw_tt) { struct device *dev = vmw_tt->dev_priv->dev->dev; - dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents, + dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.orig_nents, DMA_BIDIRECTIONAL); vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; } @@ -449,10 +449,10 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt) if (unlikely(ret != 0)) goto out_sg_alloc_fail; - if (vsgt->num_pages > vmw_tt->sgt.nents) { + if (vsgt->num_pages > vmw_tt->sgt.orig_nents) { uint64_t over_alloc = sgl_size * (vsgt->num_pages - - vmw_tt->sgt.nents); + vmw_tt->sgt.orig_nents); ttm_mem_global_free(glob, over_alloc); vmw_tt->sg_alloc_size -= over_alloc; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 09/17] drm: panfrost: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/panfrost/panfrost_gem.c | 3 ++- drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 17b654e..22fec7c 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -42,7 +42,8 @@ static void panfrost_gem_free_object(struct drm_gem_object *obj) for (i = 0; i < n_sgt; i++) { if (bo->sgts[i].sgl) { dma_unmap_sg(pfdev->dev, bo->sgts[i].sgl, -bo->sgts[i].nents, DMA_BIDIRECTIONAL); +bo->sgts[i].orig_nents, +DMA_BIDIRECTIONAL); sg_free_table(&bo->sgts[i]); } } diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index ed28aeb..2d9b1f9 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -517,7 +517,9 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, if (ret) goto err_pages; - if (!dma_map_sg(pfdev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL)) { + sgt->nents = dma_map_sg(pfdev->dev, sgt->sgl, sgt->orig_nents, + DMA_BIDIRECTIONAL); + if (!sgt->nents) { ret = -EINVAL; goto err_map; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 05/17] drm: exynos: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c index fcee33a..e27715c 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c @@ -396,7 +396,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d, out: dma_unmap_sg(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt->sgl, - g2d_userptr->sgt->nents, DMA_BIDIRECTIONAL); +g2d_userptr->sgt->orig_nents, DMA_BIDIRECTIONAL); pages = frame_vector_pages(g2d_userptr->vec); if (!IS_ERR(pages)) { @@ -511,8 +511,9 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d, g2d_userptr->sgt = sgt; - if (!dma_map_sg(to_dma_dev(g2d->drm_dev), sgt->sgl, sgt->nents, - DMA_BIDIRECTIONAL)) { + sgt->nents = dma_map_sg(to_dma_dev(g2d->drm_dev), sgt->sgl, + sgt->orig_nents, DMA_BIDIRECTIONAL) + if (!sgt->nents) { DRM_DEV_ERROR(g2d->dev, "failed to map sgt with dma region.\n"); ret = -ENOMEM; goto err_sg_free_table; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 04/17] drm: etnaviv: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c index dc9ef30..a224a97 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c @@ -27,7 +27,8 @@ static void etnaviv_gem_scatter_map(struct etnaviv_gem_object *etnaviv_obj) * because display controller, GPU, etc. are not coherent. */ if (etnaviv_obj->flags & ETNA_BO_CACHE_MASK) - dma_map_sg(dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL); + sgt->nents = dma_map_sg(dev->dev, sgt->sgl, sgt->orig_nents, + DMA_BIDIRECTIONAL); } static void etnaviv_gem_scatterlist_unmap(struct etnaviv_gem_object *etnaviv_obj) @@ -51,7 +52,8 @@ static void etnaviv_gem_scatterlist_unmap(struct etnaviv_gem_object *etnaviv_obj * discard those writes. */ if (etnaviv_obj->flags & ETNA_BO_CACHE_MASK) - dma_unmap_sg(dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL); + dma_unmap_sg(dev->dev, sgt->sgl, sgt->orig_nents, +DMA_BIDIRECTIONAL); } /* called with etnaviv_obj->lock held */ @@ -405,7 +407,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op, if (etnaviv_obj->flags & ETNA_BO_CACHED) { dma_sync_sg_for_cpu(dev->dev, etnaviv_obj->sgt->sgl, - etnaviv_obj->sgt->nents, + etnaviv_obj->sgt->orig_nents, etnaviv_op_to_dma_dir(op)); etnaviv_obj->last_cpu_prep_op = op; } @@ -422,7 +424,7 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj) /* fini without a prep is almost certainly a userspace error */ WARN_ON(etnaviv_obj->last_cpu_prep_op == 0); dma_sync_sg_for_device(dev->dev, etnaviv_obj->sgt->sgl, - etnaviv_obj->sgt->nents, + etnaviv_obj->sgt->orig_nents, etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op)); etnaviv_obj->last_cpu_prep_op = 0; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 15/17] drm: xen: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/xen/xen_drm_front_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c index f0b85e0..ba4bdc5 100644 --- a/drivers/gpu/drm/xen/xen_drm_front_gem.c +++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c @@ -215,7 +215,7 @@ struct drm_gem_object * return ERR_PTR(ret); DRM_DEBUG("Imported buffer of size %zu with nents %u\n", - size, sgt->nents); + size, sgt->orig_nents); return &xen_obj->base; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 10/17] drm: radeon: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/radeon/radeon_ttm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 5d50c9e..4770880 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -481,7 +481,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm) { struct radeon_device *rdev = radeon_get_rdev(ttm->bdev); struct radeon_ttm_tt *gtt = (void *)ttm; - unsigned pinned = 0, nents; + unsigned pinned = 0; int r; int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY); @@ -522,8 +522,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm) goto release_sg; r = -ENOMEM; - nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction); - if (nents == 0) + ttm->sg->nents = dma_map_sg(rdev->dev, ttm->sg->sgl, + if (ttm->sg->nents == 0) goto release_sg; drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages, @@ -554,9 +554,9 @@ static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm) return; /* free the sg table and pages again */ - dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction); + dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction); - for_each_sg_page(ttm->sg->sgl, &sg_iter, ttm->sg->nents, 0) { + for_each_sg_page(ttm->sg->sgl, &sg_iter, ttm->sg->orig_nents, 0) { struct page *page = sg_page_iter_page(&sg_iter); if (!(gtt->userflags & RADEON_GEM_USERPTR_READONLY)) set_page_dirty(page); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 16/17] drm: host1x: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/host1x/job.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index a10643a..3ea185e 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -166,8 +166,9 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job) goto unpin; } - err = dma_map_sg(dev, sgt->sgl, sgt->nents, dir); - if (!err) { + sgt->nents = dma_map_sg(dev, sgt->sgl, sgt->orig_nents, + dir); + if (!sgt->nents) { err = -ENOMEM; goto unpin; } @@ -217,7 +218,7 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job) } if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && host->domain) { - for_each_sg(sgt->sgl, sg, sgt->nents, j) + for_each_sg(sgt->sgl, sg, sgt->orig_nents, j) gather_size += sg->length; gather_size = iova_align(&host->iova, gather_size); @@ -231,7 +232,7 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job) err = iommu_map_sg(host->domain, iova_dma_addr(&host->iova, alloc), - sgt->sgl, sgt->nents, IOMMU_READ); + sgt->sgl, sgt->orig_nents, IOMMU_READ); if (err == 0) { __free_iova(&host->iova, alloc); err = -EINVAL; @@ -241,9 +242,9 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job) job->unpins[job->num_unpins].size = gather_size; phys_addr = iova_dma_addr(&host->iova, alloc); } else if (sgt) { - err = dma_map_sg(host->dev, sgt->sgl, sgt->nents, -DMA_TO_DEVICE); - if (!err) { + sgt->nents = dma_map_sg(host->dev, sgt->sgl, + sgt->orig_nents, DMA_TO_DEVICE); + if (!sgt->nents) { err = -ENOMEM; goto unpin; } @@ -647,7 +648,7 @@ void host1x_job_unpin(struct host1x_job *job) } if (unpin->dev && sgt) - dma_unmap_sg(unpin->dev, sgt->sgl, sgt->nents, + dma_unmap_sg(unpin->dev, sgt->sgl, sgt->orig_nents, unpin->dir); host1x_bo_unpin(dev, unpin->bo, sgt); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 01/17] drm: core: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/drm_cache.c| 2 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 7 --- drivers/gpu/drm/drm_prime.c| 9 + 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 03e01b0..63bd497 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -127,7 +127,7 @@ static void drm_cache_flush_clflush(struct page *pages[], struct sg_page_iter sg_iter; mb(); /*CLFLUSH is ordered only by using memory barriers*/ - for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) + for_each_sg_page(st->sgl, &sg_iter, st->orig_nents, 0) drm_clflush_page(sg_page_iter_page(&sg_iter)); mb(); /*Make sure that all cache line entry is flushed*/ diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index df31e57..f47caa7 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -118,7 +118,7 @@ void drm_gem_shmem_free_object(struct drm_gem_object *obj) } else { if (shmem->sgt) { dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl, -shmem->sgt->nents, DMA_BIDIRECTIONAL); +shmem->sgt->orig_nents, DMA_BIDIRECTIONAL); sg_free_table(shmem->sgt); kfree(shmem->sgt); } @@ -396,7 +396,7 @@ void drm_gem_shmem_purge_locked(struct drm_gem_object *obj) WARN_ON(!drm_gem_shmem_is_purgeable(shmem)); dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl, -shmem->sgt->nents, DMA_BIDIRECTIONAL); +shmem->sgt->orig_nents, DMA_BIDIRECTIONAL); sg_free_table(shmem->sgt); kfree(shmem->sgt); shmem->sgt = NULL; @@ -623,7 +623,8 @@ struct sg_table *drm_gem_shmem_get_pages_sgt(struct drm_gem_object *obj) goto err_put_pages; } /* Map the pages for use by the h/w. */ - dma_map_sg(obj->dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL); + sgt->nents = dma_map_sg(obj->dev->dev, sgt->sgl, sgt->orig_nents, + DMA_BIDIRECTIONAL); shmem->sgt = sgt; diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 282774e..f3e2d2a 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -626,8 +626,9 @@ struct sg_table *drm_gem_map_dma_buf(struct dma_buf_attachment *attach, else sgt = obj->dev->driver->gem_prime_get_sg_table(obj); - if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir, - DMA_ATTR_SKIP_CPU_SYNC)) { + sgt->nents = dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->orig_nents, + dir, DMA_ATTR_SKIP_CPU_SYNC); + if (!sgt->nents) { sg_free_table(sgt); kfree(sgt); sgt = ERR_PTR(-ENOMEM); @@ -652,7 +653,7 @@ void drm_gem_unmap_dma_buf(struct dma_buf_attachment *attach, if (!sgt) return; - dma_unmap_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir, + dma_unmap_sg_attrs(attach->dev, sgt->sgl, sgt->orig_nents, dir, DMA_ATTR_SKIP_CPU_SYNC); sg_free_table(sgt); kfree(sgt); @@ -975,7 +976,7 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages, */ page_index = 0; dma_index = 0; - for_each_sg(sgt->sgl, sg, sgt->nents, count) { + for_each_sg(sgt->sgl, sg, sgt->orig_nents, count) { page_len = sg->length; page = sg_page(sg); dma_len = sg_dma_len(sg); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 02/17] drm: amdgpu: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 43d8ed7..4df813e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -307,8 +307,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct dma_buf_attachment *attach, if (IS_ERR(sgt)) return sgt; - if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir, - DMA_ATTR_SKIP_CPU_SYNC)) + sgt->nents = dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->orig_nents, + dir, DMA_ATTR_SKIP_CPU_SYNC); + if (!sgt->nents) goto error_free; break; @@ -349,7 +350,7 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment *attach, struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); if (sgt->sgl->page_link) { - dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir); + dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir); sg_free_table(sgt); kfree(sgt); } else { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d5543c2..5f31585 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1043,7 +1043,6 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) { struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev); struct amdgpu_ttm_tt *gtt = (void *)ttm; - unsigned nents; int r; int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY); @@ -1059,8 +1058,9 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) /* Map SG to device */ r = -ENOMEM; - nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction); - if (nents == 0) + ttm->sg->nents = dma_map_sg(adev->dev, ttm->sg->sgl, + ttm->sg->orig_nents, direction); + if (ttm->sg->nents == 0) goto release_sg; /* convert SG to linear array of pages and dma addresses */ @@ -1091,7 +1091,7 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt *ttm) return; /* unmap the pages mapped to the device */ - dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction); + dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction); sg_free_table(ttm->sg); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 07/17] drm: lima: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/lima/lima_gem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index 5404e0d..3edd2ff 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -70,7 +70,7 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm) if (bo->base.sgt) { dma_unmap_sg(dev, bo->base.sgt->sgl, -bo->base.sgt->nents, DMA_BIDIRECTIONAL); +bo->base.sgt->orig_nents, DMA_BIDIRECTIONAL); sg_free_table(bo->base.sgt); } else { bo->base.sgt = kmalloc(sizeof(*bo->base.sgt), GFP_KERNEL); @@ -80,7 +80,7 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm) } } - dma_map_sg(dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL); + sgt.nents = dma_map_sg(dev, sgt.sgl, sgt.orig_nents, DMA_BIDIRECTIONAL); *bo->base.sgt = sgt; -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 11/17] drm: rockchip: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c index 0d18846..a024c71 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c @@ -37,7 +37,7 @@ static int rockchip_gem_iommu_map(struct rockchip_gem_object *rk_obj) rk_obj->dma_addr = rk_obj->mm.start; ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl, - rk_obj->sgt->nents, prot); + rk_obj->sgt->orig_nents, prot); if (ret < rk_obj->base.size) { DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n", ret, rk_obj->base.size); @@ -98,11 +98,11 @@ static int rockchip_gem_get_pages(struct rockchip_gem_object *rk_obj) * TODO: Replace this by drm_clflush_sg() once it can be implemented * without relying on symbols that are not exported. */ - for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->nents, i) + for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->orig_nents, i) sg_dma_address(s) = sg_phys(s); - dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl, rk_obj->sgt->nents, - DMA_TO_DEVICE); + dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl, + rk_obj->sgt->orig_nents, DMA_TO_DEVICE); return 0; @@ -351,7 +351,8 @@ void rockchip_gem_free_object(struct drm_gem_object *obj) rockchip_gem_iommu_unmap(rk_obj); } else { dma_unmap_sg(drm->dev, rk_obj->sgt->sgl, -rk_obj->sgt->nents, DMA_BIDIRECTIONAL); +rk_obj->sgt->orig_nents, +DMA_BIDIRECTIONAL); } drm_prime_gem_destroy(obj, rk_obj->sgt); } else { @@ -493,14 +494,14 @@ static unsigned long rockchip_sg_get_contiguous_size(struct sg_table *sgt, struct sg_table *sg, struct rockchip_gem_object *rk_obj) { - int count = dma_map_sg(drm->dev, sg->sgl, sg->nents, + int count = dma_map_sg(drm->dev, sg->sgl, sg->orig_nents, DMA_BIDIRECTIONAL); if (!count) return -EINVAL; if (rockchip_sg_get_contiguous_size(sg, count) < attach->dmabuf->size) { DRM_ERROR("failed to map sg_table to contiguous linear address.\n"); - dma_unmap_sg(drm->dev, sg->sgl, sg->nents, + dma_unmap_sg(drm->dev, sg->sgl, sg->orig_nents, DMA_BIDIRECTIONAL); return -EINVAL; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 06/17] drm: i915: fix sg_table nents vs. orig_nents misuse
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the numer of the created entries in the DMA address space. However the subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be called with the original number of entries passed to dma_map_sg. The sg_table->nents in turn holds the result of the dma_map_sg call as stated in include/linux/scatterlist.h. Adapt the code to obey those rules. Signed-off-by: Marek Szyprowski --- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 13 +++-- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_region.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_shmem.c| 5 +++-- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 10 +- drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++-- drivers/gpu/drm/i915/i915_gem_gtt.c | 12 +++- drivers/gpu/drm/i915/i915_scatterlist.c | 4 ++-- drivers/gpu/drm/i915/selftests/scatterlist.c | 8 10 files changed, 41 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 7db5a79..d829852 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -36,21 +36,22 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme goto err_unpin_pages; } - ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); + ret = sg_alloc_table(st, obj->mm.pages->orig_nents, GFP_KERNEL); if (ret) goto err_free; src = obj->mm.pages->sgl; dst = st->sgl; - for (i = 0; i < obj->mm.pages->nents; i++) { + for (i = 0; i < obj->mm.pages->orig_nents; i++) { sg_set_page(dst, sg_page(src), src->length, 0); dst = sg_next(dst); src = sg_next(src); } - if (!dma_map_sg_attrs(attachment->dev, - st->sgl, st->nents, dir, - DMA_ATTR_SKIP_CPU_SYNC)) { + st->nents = dma_map_sg_attrs(attachment->dev, +st->sgl, st->orig_nents, dir, +DMA_ATTR_SKIP_CPU_SYNC); + if (!st->nents) { ret = -ENOMEM; goto err_free_sg; } @@ -74,7 +75,7 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf); dma_unmap_sg_attrs(attachment->dev, - sg->sgl, sg->nents, dir, + sg->sgl, sg->orig_nents, dir, DMA_ATTR_SKIP_CPU_SYNC); sg_free_table(sg); kfree(sg); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index cbbff81..a8ebfdd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -73,7 +73,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) } sg = st->sgl; - st->nents = 0; + st->nents = st->orig_nents = 0; sg_page_sizes = 0; do { @@ -94,7 +94,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) sg_set_page(sg, page, PAGE_SIZE << order, 0); sg_page_sizes |= PAGE_SIZE << order; - st->nents++; + st->nents = st->orig_nents = st->nents + 1; npages -= 1 << order; if (!npages) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c b/drivers/gpu/drm/i915/gem/i915_gem_region.c index 1515384..58ca560 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_region.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c @@ -53,7 +53,7 @@ GEM_BUG_ON(list_empty(blocks)); sg = st->sgl; - st->nents = 0; + st->nents = st->orig_nents = 0; sg_page_sizes = 0; prev_end = (resource_size_t)-1; @@ -78,7 +78,7 @@ sg->length = block_size; - st->nents++; + st->nents = st->orig_nents = st->nents + 1; } else { sg->length += block_size; sg_dma_len(sg) += block_size; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 5d5d7ee..851a732 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -80,7 +80,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) noreclaim |= __GFP_NORETRY | __GFP_NOWARN; sg = st->sgl; - st->nents = 0; + st->nents = st->orig_nents = 0; sg_page_sizes = 0; for (i = 0; i <
[PATCH 5/5] virtio: Add bounce DMA ops
For better security, its desirable that a guest VM's memory is not accessible to any entity that executes outside the context of guest VM. In case of virtio, backend drivers execute outside the context of guest VM and in general will need access to complete guest VM memory. One option to restrict the access provided to backend driver is to make use of a bounce buffer. The bounce buffer is accessible to both backend and frontend drivers. All IO buffers that are in private space of guest VM are bounced to be accessible to backend. This patch proposes a new memory pool to be used for this bounce purpose, rather than the default swiotlb memory pool. That will avoid any conflicts that may arise in situations where a VM needs to use swiotlb pool for driving any pass-through devices (in which case swiotlb memory needs not be shared with another VM) as well as virtio devices (which will require swiotlb memory to be shared with backend VM). As a possible extension to this patch, we can provide an option for virtio to make use of default swiotlb memory pool itself, where no such conflicts may exist in a given deployment. Signed-off-by: Srivatsa Vaddagiri --- drivers/virtio/Makefile| 2 +- drivers/virtio/virtio.c| 2 + drivers/virtio/virtio_bounce.c | 150 + include/linux/virtio.h | 4 ++ 4 files changed, 157 insertions(+), 1 deletion(-) create mode 100644 drivers/virtio/virtio_bounce.c diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 29a1386e..3fd3515 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o +obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index a977e32..bc2f779 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev) dev->index = err; dev_set_name(&dev->dev, "virtio%u", dev->index); + virtio_bounce_set_dma_ops(dev); spin_lock_init(&dev->config_lock); dev->config_enabled = false; @@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore); static int virtio_init(void) { + virtio_map_bounce_buffer(); if (bus_register(&virtio_bus) != 0) panic("virtio bus registration failed"); return 0; diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c new file mode 100644 index 000..3de8e0e --- /dev/null +++ b/drivers/virtio/virtio_bounce.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Virtio DMA ops to bounce buffers + * + * Copyright (c) 2020, The Linux Foundation. All rights reserved. + * + * This module allows bouncing of IO buffers to a region which will be + * accessible to backend drivers. + */ + +#include +#include +#include +#include +#include +#include +#include + +static phys_addr_t bounce_buf_paddr; +static void *bounce_buf_vaddr; +static size_t bounce_buf_size; +struct swiotlb_pool *virtio_pool; + +#define VIRTIO_MAX_BOUNCE_SIZE (16*4096) + +static void *virtio_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs) +{ + phys_addr_t addr; + + if (!virtio_pool) + return NULL; + + addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX); + if (addr == DMA_MAPPING_ERROR) + return NULL; + + *dma_handle = (addr - bounce_buf_paddr); + + return bounce_buf_vaddr + (addr - bounce_buf_paddr); +} + +static void virtio_free_coherent(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle, unsigned long attrs) +{ + phys_addr_t addr = (dma_handle + bounce_buf_paddr); + + swiotlb_free(virtio_pool, addr, size); +} + +static dma_addr_t virtio_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + void *ptr = page_address(page) + offset; + phys_addr_t paddr = virt_to_phys(ptr); + dma_addr_t handle; + + if (!virtio_pool) + return DMA_MAPPING_ERROR; + + handle = _swiotlb_tbl_map_single(virtio_pool, dev, bounce_buf_paddr, +paddr, size, size, dir, attrs); + if (handle == (phys_addr_t)DMA_MAPPING_ERROR) + return DMA_MAPPING_ERROR; + + return handle - bounce_buf_paddr; +} + +static void virtio_unmap_page(struct device *dev, dma_addr_t dev_addr, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + phys_addr_t addr = dev_addr + bounce_buf_pad
[PATCH 3/5] swiotlb: Add alloc and free APIs
Move the memory allocation and free portion of swiotlb driver into independent routines. They will be useful for drivers that need swiotlb driver to just allocate/free memory chunks and not additionally bounce memory. Signed-off-by: Srivatsa Vaddagiri --- include/linux/swiotlb.h | 17 ++ kernel/dma/swiotlb.c| 151 2 files changed, 106 insertions(+), 62 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index c634b4d..957697e 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -186,6 +186,10 @@ void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(void); +extern phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, size_t alloc_size, + unsigned long tbl_dma_addr, unsigned long mask); +extern void swiotlb_free(struct swiotlb_pool *pool, + phys_addr_t tlb_addr, size_t alloc_size); #else #define swiotlb_force SWIOTLB_NO_FORCE @@ -219,6 +223,19 @@ static inline bool is_swiotlb_active(void) { return false; } + +static inline phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, + size_t alloc_size, unsigned long tbl_dma_addr, + unsigned long mask) +{ + return DMA_MAPPING_ERROR; +} + +static inline void swiotlb_free(struct swiotlb_pool *pool, + phys_addr_t tlb_addr, size_t alloc_size) +{ +} + #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8cf0b57..7411ce5 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -444,37 +444,14 @@ static inline void *tlb_vaddr(struct swiotlb_pool *pool, phys_addr_t tlb_addr) return pool->io_tlb_vstart + (tlb_addr - pool->io_tlb_start); } -phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool, - struct device *hwdev, - dma_addr_t tbl_dma_addr, - phys_addr_t orig_addr, - size_t mapping_size, - size_t alloc_size, - enum dma_data_direction dir, - unsigned long attrs) +phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, size_t alloc_size, + unsigned long tbl_dma_addr, unsigned long mask) { unsigned long flags; phys_addr_t tlb_addr; - unsigned int nslots, stride, index, wrap; - int i; - unsigned long mask; + unsigned int i, nslots, stride, index, wrap; unsigned long offset_slots; unsigned long max_slots; - unsigned long tmp_io_tlb_used; - - if (pool->no_iotlb_memory) - panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer"); - - if (mem_encrypt_active()) - pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n"); - - if (mapping_size > alloc_size) { - dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)", - mapping_size, alloc_size); - return (phys_addr_t)DMA_MAPPING_ERROR; - } - - mask = dma_get_seg_boundary(hwdev); tbl_dma_addr &= mask; @@ -555,54 +532,23 @@ phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool, } while (index != wrap); not_found: - tmp_io_tlb_used = pool->io_tlb_used; - spin_unlock_irqrestore(&pool->io_tlb_lock, flags); - if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit()) - dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n", -alloc_size, pool->io_tlb_nslabs, tmp_io_tlb_used); return (phys_addr_t)DMA_MAPPING_ERROR; + found: pool->io_tlb_used += nslots; spin_unlock_irqrestore(&pool->io_tlb_lock, flags); - /* -* Save away the mapping from the original address to the DMA address. -* This is needed when we sync the memory. Then we sync the buffer if -* needed. -*/ - for (i = 0; i < nslots; i++) - pool->io_tlb_orig_addr[index+i] = orig_addr + - (i << IO_TLB_SHIFT); - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr), - mapping_size, DMA_TO_DEVICE); - return tlb_addr; } -/* - * tlb_addr is the physical address of the bounce buffer to unmap. - */ -void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool, - struct device *hwdev, phys_addr_t tlb_addr, - size_t mapping_s
[PATCH 1/5] swiotlb: Introduce concept of swiotlb_pool
Currently swiotlb driver manages a global pool of memory which acts as bounce buffers for memory that is not accessible to some devices. The core functions provides by this driver to allocate/free/bounce memory chunks will be more useful if this driver can manage more than one pool. An immediate application of such extension to swiotlb driver is to bounce virtio buffers between private and shared space of a VM. This patch introduces the concept of a swiotlb memory pool and reorganizes current driver to work with the default global pool. There is no functional change introduced by this patch. Subsequent patches allow the swiotlb driver to work with more than one pool of memory. Signed-off-by: Srivatsa Vaddagiri --- drivers/xen/swiotlb-xen.c | 4 +- include/linux/swiotlb.h | 129 - kernel/dma/swiotlb.c | 359 +++--- 3 files changed, 307 insertions(+), 185 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index b6d2776..c2dc9c8 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -190,8 +190,8 @@ int __ref xen_swiotlb_init(int verbose, bool early) /* * IO TLB memory already allocated. Just use it. */ - if (io_tlb_start != 0) { - xen_io_tlb_start = phys_to_virt(io_tlb_start); + if (swiotlb_start()) { + xen_io_tlb_start = phys_to_virt(swiotlb_start()); goto end; } diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 046bb94..8c7843f 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -44,7 +44,59 @@ enum dma_sync_target { SYNC_FOR_DEVICE = 1, }; -extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, +#define MAX_POOL_NAME_SIZE 16 + +struct swiotlb_pool { + char name[MAX_POOL_NAME_SIZE]; + bool no_iotlb_memory; + int late_alloc; + + spinlock_t io_tlb_lock; + + /* +* Used to do a quick range check in swiotlb_tbl_unmap_single and +* swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated +* by this API. +*/ + + phys_addr_t io_tlb_start, io_tlb_end; + + /* +* The number of IO TLB blocks (in groups of 64) between io_tlb_start +* and io_tlb_end. This is command line adjustable via +* setup_io_tlb_npages. +*/ + unsigned long io_tlb_nslabs; + + /* +* The number of used IO TLB block +*/ + unsigned long io_tlb_used; + + /* +* This is a free list describing the number of free entries available +* from each index +*/ + unsigned int *io_tlb_list; + unsigned int io_tlb_index; + + /* +* We need to save away the original address corresponding to a mapped +* entry for the sync operations. +*/ + phys_addr_t *io_tlb_orig_addr; + + /* +* Max segment that we can provide which (if pages are contingous) will +* not be bounced (unless SWIOTLB_FORCE is set). +*/ + unsigned int max_segment; +}; + +extern struct swiotlb_pool default_swiotlb_pool; + +extern phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool, + struct device *hwdev, dma_addr_t tbl_dma_addr, phys_addr_t phys, size_t mapping_size, @@ -52,28 +104,80 @@ extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, enum dma_data_direction dir, unsigned long attrs); -extern void swiotlb_tbl_unmap_single(struct device *hwdev, +extern void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool, +struct device *hwdev, phys_addr_t tlb_addr, size_t mapping_size, size_t alloc_size, enum dma_data_direction dir, unsigned long attrs); -extern void swiotlb_tbl_sync_single(struct device *hwdev, +extern void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool, + struct device *hwdev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir, enum dma_sync_target target); -dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, - size_t size, enum dma_data_direction dir, unsigned long attrs); +dma_addr_t _swiotlb_map(struct swiotlb_pool *pool, struct device *dev, + phys_addr_t phys, size_t size, enum dma_data_direction dir, + unsigned long attrs); + +static inline phys_addr_t swiotlb_tbl_map_s
[PATCH 4/5] swiotlb: Add API to register new pool
This patch adds an interface for the swiotlb driver to recognize a new memory pool. Upon successful initialization of the pool, swiotlb returns a handle, which needs to be passed as an argument for any future operations on the pool (map/unmap/alloc/free). Signed-off-by: Srivatsa Vaddagiri --- include/linux/swiotlb.h | 9 kernel/dma/swiotlb.c| 60 + 2 files changed, 69 insertions(+) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 957697e..97ac82a 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -182,6 +182,9 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr) return paddr >= swiotlb_start() && paddr < swiotlb_end(); } +extern struct swiotlb_pool *swiotlb_register_pool(char *name, + phys_addr_t start, void *vstart, size_t size); + void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); @@ -236,6 +239,12 @@ static inline void swiotlb_free(struct swiotlb_pool *pool, { } +static struct swiotlb_pool *swiotlb_register_pool(char *name, + phys_addr_t start, void *vstart, size_t size) +{ + return NULL; +} + #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 7411ce5..9883744 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -36,6 +36,7 @@ #include #include #include +#include #ifdef CONFIG_DEBUG_FS #include #endif @@ -736,6 +737,65 @@ size_t swiotlb_max_mapping_size(struct device *dev) return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE; } +struct swiotlb_pool *swiotlb_register_pool(char *name, phys_addr_t start, + void *vstart, size_t size) +{ + struct swiotlb_pool *pool; + unsigned long i, bytes; + unsigned long nslabs; + + nslabs = size >> IO_TLB_SHIFT; + if (!nslabs) + return ERR_PTR(-EINVAL); + + pool = kzalloc(sizeof(*pool), GFP_KERNEL); + if (!pool) + return ERR_PTR(-ENOMEM); + + bytes = nslabs << IO_TLB_SHIFT; + + strncpy(pool->name, name, sizeof(pool->name)); + spin_lock_init(&pool->io_tlb_lock); + pool->late_alloc = 1; + pool->io_tlb_start = start; + pool->io_tlb_end = start + bytes; + pool->io_tlb_vstart = vstart; + pool->io_tlb_nslabs = nslabs; + pool->max_segment = rounddown(bytes, PAGE_SIZE); + + /* +* Allocate and initialize the free list array. This array is used +* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE +* between io_tlb_start and io_tlb_end. +*/ + pool->io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL, + get_order(pool->io_tlb_nslabs * sizeof(int))); + if (!pool->io_tlb_list) + goto cleanup; + + pool->io_tlb_orig_addr = (phys_addr_t *) + __get_free_pages(GFP_KERNEL, +get_order(pool->io_tlb_nslabs * + sizeof(phys_addr_t))); + if (!pool->io_tlb_orig_addr) + goto cleanup; + + for (i = 0; i < pool->io_tlb_nslabs; i++) { + pool->io_tlb_list[i] = IO_TLB_SEGSIZE - + OFFSET(i, IO_TLB_SEGSIZE); + pool->io_tlb_orig_addr[i] = INVALID_PHYS_ADDR; + } + + return pool; + +cleanup: + kfree(pool->io_tlb_list); + kfree(pool->io_tlb_orig_addr); + kfree(pool); + + return ERR_PTR(-ENOMEM); +} + bool is_swiotlb_active(void) { /* -- 2.7.4 -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/5] virtio on Type-1 hypervisor
We ran into several problems in using virtio for IO paravirtualization on a Type-1 hypervisor with these characteristics: * By default, all of a guests's memory is private to it (no other guest can access its memory). * One of the VM is considered as primary and has access to most IO devices. This is similar to dom0 VM in case of Xen. All other VMs are considered as secondary VMs * virtio-backend drivers for all secondary VMs need to be hosted in primary VM * Since secondary VM's memory is not accessible to primary VM, to make virtio backend driver work, instead an additional piece of memory is provisioned by the hypervisor that is shared between primary and secondary VMs. This shared memory can be used, for example, to host virtio-ring structures and also to bounce IO buffers of secondary VM. * Message-queue and doorbell interfaces available in hypervisor to support inter-VM communication. Messge-queue API (send/recv) allows one VM to send short messages to another VM. Doorbell interface allows a VM to inject an interrupt into another VM. * No support for MMIO transport i.e hypervisor does not support trapping MMIO config space access by front-end driver and having it handled in backend drivers. Few problem statements arising out of this: 1) How can we make use of the shared memory region effectively to make virtio work in this scenario? What is proposed in the patch series for this problem is a virtio bounce driver that recognizes a shared memory region (shared between VMs) and makes use of swiotlb driver interfaces to bounce IO buffers between private and shared space. Some changes are proposed to swiotlb driver in this regard, that can let us reuse swiotlb functions to work with the shared memory pool. swiotlb driver can now recognize more than one pool of memory and extend its functions (allocate/free/bounce memory chunks) for each pool. 2) What transport mechanism works best in this case? I realize that ivshmem2-virtio proposal has discussed the possibility of using shared memory + doorbell as a virtio transport option. We can consider using that as a transport in case it will be acceptable upstream. Other option we had considered was to modify virtio_mmio.c itself to use message_queue send/recv hypercall interface (in place of readl/writel). That could be abstracted via 'mmio_ops' structure providing suitable implementation of readl/writel. Another option suggested by Christopher Dall is to unmap the config space from kernel address space and as part of the fault handler, use hypervisor specific APIs to achieve config space IO. 3) Which virtio backend drivers to leverage? We realized there are multiple implementations of virtio backend drivers, bundled as part of separate projects (Qemu, lkvm etc). We think it would be nice if we had some consolidation in that regard so that we can make use of the backend drivers that are not tightly coupled with a VMM. In our case, we need to be able to run virtio backend drivers as standalone programs (and not coupled with any VMM). Srivatsa Vaddagiri (5): swiotlb: Introduce concept of swiotlb_pool swiotlb: Allow for non-linear mapping between paddr and vaddr swiotlb: Add alloc and free APIs swiotlb: Add API to register new pool virtio: Add bounce DMA ops drivers/virtio/Makefile| 2 +- drivers/virtio/virtio.c| 2 + drivers/virtio/virtio_bounce.c | 150 +++ drivers/xen/swiotlb-xen.c | 4 +- include/linux/swiotlb.h| 157 +++- include/linux/virtio.h | 4 + kernel/dma/swiotlb.c | 556 - 7 files changed, 638 insertions(+), 237 deletions(-) create mode 100644 drivers/virtio/virtio_bounce.c -- 2.7.4 -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/5] swiotlb: Allow for non-linear mapping between paddr and vaddr
Some of the memory pool managed by swiotlb driver could fall outside the direct-mapped range, made accessible via memremap() routine. To facilitate easy conversion between virtual and physical address of such memory, store the virtual address of memory pool in addition to its physical address. Signed-off-by: Srivatsa Vaddagiri --- include/linux/swiotlb.h | 2 ++ kernel/dma/swiotlb.c| 20 ++-- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 8c7843f..c634b4d 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -61,6 +61,8 @@ struct swiotlb_pool { phys_addr_t io_tlb_start, io_tlb_end; + void *io_tlb_vstart; + /* * The number of IO TLB blocks (in groups of 64) between io_tlb_start * and io_tlb_end. This is command line adjustable via diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 9c504d3..8cf0b57 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -178,6 +178,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) default_swiotlb_pool.io_tlb_start = __pa(tlb); default_swiotlb_pool.io_tlb_end = default_swiotlb_pool.io_tlb_start + bytes; + default_swiotlb_pool.io_tlb_vstart = tlb; /* * Allocate and initialize the free list array. This array is used @@ -307,6 +308,7 @@ static void swiotlb_cleanup(void) default_swiotlb_pool.io_tlb_start = 0; default_swiotlb_pool.io_tlb_nslabs = 0; default_swiotlb_pool.max_segment = 0; + default_swiotlb_pool.io_tlb_vstart = NULL; } int @@ -320,6 +322,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) default_swiotlb_pool.io_tlb_start = virt_to_phys(tlb); default_swiotlb_pool.io_tlb_end = default_swiotlb_pool.io_tlb_start + bytes; + default_swiotlb_pool.io_tlb_vstart = tlb; set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT); memset(tlb, 0, bytes); @@ -400,11 +403,10 @@ void __init swiotlb_exit(void) /* * Bounce: copy the swiotlb buffer from or back to the original dma location */ -static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, +static void swiotlb_bounce(phys_addr_t orig_addr, void *vaddr, size_t size, enum dma_data_direction dir) { unsigned long pfn = PFN_DOWN(orig_addr); - unsigned char *vaddr = phys_to_virt(tlb_addr); if (PageHighMem(pfn_to_page(pfn))) { /* The buffer does not have a mapping. Map it in and copy */ @@ -437,6 +439,11 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, } } +static inline void *tlb_vaddr(struct swiotlb_pool *pool, phys_addr_t tlb_addr) +{ + return pool->io_tlb_vstart + (tlb_addr - pool->io_tlb_start); +} + phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool, struct device *hwdev, dma_addr_t tbl_dma_addr, @@ -569,7 +576,7 @@ phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool, (i << IO_TLB_SHIFT); if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(orig_addr, tlb_addr, + swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr), mapping_size, DMA_TO_DEVICE); return tlb_addr; @@ -594,7 +601,8 @@ void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool, if (orig_addr != INVALID_PHYS_ADDR && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL))) - swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE); + swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr), + mapping_size, DMA_FROM_DEVICE); /* * Return the buffer to the free list by setting the corresponding @@ -643,14 +651,14 @@ void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool, switch (target) { case SYNC_FOR_CPU: if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(orig_addr, tlb_addr, + swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr), size, DMA_FROM_DEVICE); else BUG_ON(dir != DMA_TO_DEVICE); break; case SYNC_FOR_DEVICE: if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(orig_addr, tlb_addr, + swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr), size, DMA_TO_DEVICE);
Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
On 28.04.20 10:25, Peng Fan wrote: Subject: Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region On 28.04.20 09:33, peng@nxp.com wrote: From: Peng Fan When booting xen on i.MX8QM, met: " [3.602128] Unable to handle kernel paging request at virtual address 00272d40 [3.610804] Mem abort info: [3.613905] ESR = 0x9604 [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits [3.623211] SET = 0, FnV = 0 [3.626628] EA = 0, S1PTW = 0 [3.630128] Data abort info: [3.633362] ISV = 0, ISS = 0x0004 [3.637630] CM = 0, WnR = 0 [3.640955] [00272d40] user address but active_mm is swapper [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP [3.654137] Modules linked in: [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) [3.677302] Workqueue: events deferred_probe_work_func [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 [3.693993] pci_bus :00: root bus resource [bus 00-ff] [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 " In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or range_straddles_page_boundary(phys, size) are true, it will create contiguous region. So when free, we need to free contiguous region use upper check condition. No, this will break PV guests on x86. Could you share more details why alloc and free not matching for the check? xen_create_contiguous_region() is needed only in case: - the bus address is not within dma_mask, or - the memory region is not physically contiguous (can happen only for PV guests) In any case it should arrange for the memory to be suitable for the DMA operation, so to be contiguous and within dma_mask afterwards. So xen_destroy_contiguous_region() should only ever called for areas which match above criteria, as otherwise we can be sure xen_create_contiguous_region() was not used for making the area DMA-able in the beginning. And this is very important in the PV case, as in those guests the page tables are containing the host-PFNs, not the guest-PFNS, and xen_create_contiguous_region() will fiddle with host- vs. guest-PFN arrangements, and xen_destroy_contiguous_region() is reverting this fiddling. Any call of xen_destroy_contiguous_region() for an area it was not intended to be called for might swap physical pages beneath random virtual addresses, which was the reason for this test to be added by me. Juergen Thanks, Peng. I think there is something wrong with your setup in combination with the ARM xen_create_contiguous_region() implementation. Stefano? Juergen Signed-off-by: Peng Fan --- drivers/xen/swiotlb-xen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index b6d27762c6f8..ab96e468584f 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, /* Convert the size to actually allocated. */ size = 1UL << (order + XEN_PAGE_SHIFT); - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || -range_straddles_page_boundary(phys, size)) && + if (((dev_addr + size - 1 > dma_mask) || + range_straddles_page_boundary(phys, size)) && TestClearPageXenRemapped(virt_to_page(vaddr))) xen_destroy_contiguous_region(phys, order); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
From: Peng Fan When booting xen on i.MX8QM, met: " [3.602128] Unable to handle kernel paging request at virtual address 00272d40 [3.610804] Mem abort info: [3.613905] ESR = 0x9604 [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits [3.623211] SET = 0, FnV = 0 [3.626628] EA = 0, S1PTW = 0 [3.630128] Data abort info: [3.633362] ISV = 0, ISS = 0x0004 [3.637630] CM = 0, WnR = 0 [3.640955] [00272d40] user address but active_mm is swapper [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP [3.654137] Modules linked in: [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) [3.677302] Workqueue: events deferred_probe_work_func [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 [3.693993] pci_bus :00: root bus resource [bus 00-ff] [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 " In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or range_straddles_page_boundary(phys, size) are true, it will create contiguous region. So when free, we need to free contiguous region use upper check condition. Signed-off-by: Peng Fan --- drivers/xen/swiotlb-xen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index b6d27762c6f8..ab96e468584f 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, /* Convert the size to actually allocated. */ size = 1UL << (order + XEN_PAGE_SHIFT); - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || -range_straddles_page_boundary(phys, size)) && + if (((dev_addr + size - 1 > dma_mask) || + range_straddles_page_boundary(phys, size)) && TestClearPageXenRemapped(virt_to_page(vaddr))) xen_destroy_contiguous_region(phys, order); -- 2.16.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: iommu_iova slab eats too much memory
Hi Bin, Few questions: 1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere then at certain point the throughput will drastically fall and will almost become equal to zero. This should be due to unavailability of the mapping anymore. But in your case VM is getting killed so this could be actual DMA buffer leak not DMA mapping leak. I doubt VM will get killed due to exhaustion of the DMA mappings in the IOMMU Layer for a transient reason or even due to mapping/unmapping leak. 2. Could you check if you have TSO offload enabled on Intel 82599? It will help in reducing the number of mappings and will take off IOVA mapping pressure from the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of memory required for the buffers. 3. Also, have you checked the cpu-usage while your experiment is going on? Thanks Salil. > -Original Message- > From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of > Robin Murphy > Sent: Friday, April 24, 2020 5:31 PM > To: Bin > Cc: iommu@lists.linux-foundation.org > Subject: Re: iommu_iova slab eats too much memory > > On 2020-04-24 2:20 pm, Bin wrote: > > Dear Robin: > > Thank you for your explanation. Now, I understand that this could be > > NIC driver's fault, but how could I confirm it? Do I have to debug the > > driver myself? > > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through > memory about an order of magnitude faster than the IOVAs alone, but it > should shed some light on whether DMA API usage looks suspicious, and > dumping the mappings should help track down the responsible driver(s). > Although the debugfs code doesn't show the stacktrace of where each > mapping was made, I guess it would be fairly simple to tweak that for a > quick way to narrow down where to start looking in an offending driver. > > Robin. > > > Robin Murphy 于2020年4月24日周五 下午8:15写道: > > > >> On 2020-04-24 1:06 pm, Bin wrote: > >>> I'm not familiar with the mmu stuff, so what you mean by "some driver > >>> leaking DMA mappings", is it possible that some other kernel module like > >>> KVM or NIC driver leads to the leaking problem instead of the iommu > >> module > >>> itself? > >> > >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it > >> should, since I'd expect a lot of people to have noticed that. It's far > >> more likely that some driver is failing to call dma_unmap_* when it's > >> finished with a buffer - with the IOMMU disabled that would be a no-op > >> on x86 with a modern 64-bit-capable device, so such a latent bug could > >> have been easily overlooked. > >> > >> Robin. > >> > >>> Bin 于 2020年4月24日周五 20:00写道: > >>> > Well, that's the problem! I'm assuming the iommu kernel module is > >> leaking > memory. But I don't know why and how. > > Do you have any idea about it? Or any further information is needed? > > Robin Murphy 于 2020年4月24日周五 19:20写道: > > > On 2020-04-24 1:40 am, Bin wrote: > >> Hello? anyone there? > >> > >> Bin 于2020年4月23日周四 下午5:14写道: > >> > >>> Forget to mention, I've already disabled the slab merge, so this is > > what > >>> it is. > >>> > >>> Bin 于2020年4月23日周四 下午5:11写道: > >>> > Hey, guys: > > I'm running a batch of CoreOS boxes, the lsb_release is: > > ``` > # cat /etc/lsb-release > DISTRIB_ID="Container Linux by CoreOS" > DISTRIB_RELEASE=2303.3.0 > DISTRIB_CODENAME="Rhyolite" > DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)" > ``` > > ``` > # uname -a > Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 > > 2019 > x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel > > GNU/Linux > ``` > Recently, I found my vms constently being killed due to OOM, and > >> after > digging into the problem, I finally realized that the kernel is > > leaking > memory. > > Here's my slabinfo: > > Active / Total Objects (% used): 83818306 / 84191607 (99.6%) > Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%) > Active / Total Caches (% used) : 152 / 217 (70.0%) > Active / Total Size (% used) : 5828768.08K / 5996848.72K > > (97.2%) > Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > > 80253888 80253888 100%0.06K 1253967 64 5015868K > >> iommu_iova > > > > Do you really have a peak demand of ~80 million simultaneous DMA > > buffers, or is some driver leaking DMA mappings? > > > > Robin. > > > 489472 489123 99%0.03K 3824 128 15296K kmalloc-32 > >
RE: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
> Subject: Re: [PATCH] xen/swiotlb: correct the check for > xen_destroy_contiguous_region > > On 28.04.20 09:33, peng@nxp.com wrote: > > From: Peng Fan > > > > When booting xen on i.MX8QM, met: > > " > > [3.602128] Unable to handle kernel paging request at virtual address > 00272d40 > > [3.610804] Mem abort info: > > [3.613905] ESR = 0x9604 > > [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits > > [3.623211] SET = 0, FnV = 0 > > [3.626628] EA = 0, S1PTW = 0 > > [3.630128] Data abort info: > > [3.633362] ISV = 0, ISS = 0x0004 > > [3.637630] CM = 0, WnR = 0 > > [3.640955] [00272d40] user address but active_mm is > swapper > > [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP > > [3.654137] Modules linked in: > > [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) > > [3.677302] Workqueue: events deferred_probe_work_func > > [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 > > [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) > > [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 > > [3.693993] pci_bus :00: root bus resource [bus 00-ff] > > [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 > > " > > > > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) > > or range_straddles_page_boundary(phys, size) are true, it will create > > contiguous region. So when free, we need to free contiguous region use > > upper check condition. > > No, this will break PV guests on x86. Could you share more details why alloc and free not matching for the check? Thanks, Peng. > > I think there is something wrong with your setup in combination with the ARM > xen_create_contiguous_region() implementation. > > Stefano? > > > Juergen > > > > > Signed-off-by: Peng Fan > > --- > > drivers/xen/swiotlb-xen.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index b6d27762c6f8..ab96e468584f 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, > size_t size, void *vaddr, > > /* Convert the size to actually allocated. */ > > size = 1UL << (order + XEN_PAGE_SHIFT); > > > > - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > > -range_straddles_page_boundary(phys, size)) && > > + if (((dev_addr + size - 1 > dma_mask) || > > + range_straddles_page_boundary(phys, size)) && > > TestClearPageXenRemapped(virt_to_page(vaddr))) > > xen_destroy_contiguous_region(phys, order); > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
> Subject: Re: [PATCH] xen/swiotlb: correct the check for > xen_destroy_contiguous_region > > On Tue, Apr 28, 2020 at 03:33:45PM +0800, peng@nxp.com wrote: > > > > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) > > or range_straddles_page_boundary(phys, size) are true, it will create > > contiguous region. So when free, we need to free contiguous region use > > upper check condition. > > > > Signed-off-by: Peng Fan > > --- > > drivers/xen/swiotlb-xen.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index b6d27762c6f8..ab96e468584f 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, > size_t size, void *vaddr, > > /* Convert the size to actually allocated. */ > > size = 1UL << (order + XEN_PAGE_SHIFT); > > > > - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > > -range_straddles_page_boundary(phys, size)) && > > + if (((dev_addr + size - 1 > dma_mask) || > > + range_straddles_page_boundary(phys, size)) && > > TestClearPageXenRemapped(virt_to_page(vaddr))) > > No need for the inner braces. > > But more importantly please factor our a helper that can be used by alloc and > free to make sure that they always stay in sync. Something Thanks for reviewing. I'll take your suggestion in v2. Before that, I would wait to see if there are more comments in this patch, because there are several history commits touching this place. Thanks, Peng. > like: > > static inline bool xen_swiotlb_need_contiguous_region(struct device *dev, > phys_addr_t phys, size_t size) > { > > return xen_phys_to_bus(phys) + size - 1 > dev->coherent_dma_mask || > range_straddles_page_boundary(phys, size)) } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
On 28.04.20 09:33, peng@nxp.com wrote: From: Peng Fan When booting xen on i.MX8QM, met: " [3.602128] Unable to handle kernel paging request at virtual address 00272d40 [3.610804] Mem abort info: [3.613905] ESR = 0x9604 [3.617332] EC = 0x25: DABT (current EL), IL = 32 bits [3.623211] SET = 0, FnV = 0 [3.626628] EA = 0, S1PTW = 0 [3.630128] Data abort info: [3.633362] ISV = 0, ISS = 0x0004 [3.637630] CM = 0, WnR = 0 [3.640955] [00272d40] user address but active_mm is swapper [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP [3.654137] Modules linked in: [3.677285] Hardware name: Freescale i.MX8QM MEK (DT) [3.677302] Workqueue: events deferred_probe_work_func [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00 [3.688297] pstate: 6005 (nZCv daif -PAN -UAO) [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0 [3.693993] pci_bus :00: root bus resource [bus 00-ff] [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0 " In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or range_straddles_page_boundary(phys, size) are true, it will create contiguous region. So when free, we need to free contiguous region use upper check condition. No, this will break PV guests on x86. I think there is something wrong with your setup in combination with the ARM xen_create_contiguous_region() implementation. Stefano? Juergen Signed-off-by: Peng Fan --- drivers/xen/swiotlb-xen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index b6d27762c6f8..ab96e468584f 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, /* Convert the size to actually allocated. */ size = 1UL << (order + XEN_PAGE_SHIFT); - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || -range_straddles_page_boundary(phys, size)) && + if (((dev_addr + size - 1 > dma_mask) || + range_straddles_page_boundary(phys, size)) && TestClearPageXenRemapped(virt_to_page(vaddr))) xen_destroy_contiguous_region(phys, order); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region
On Tue, Apr 28, 2020 at 03:33:45PM +0800, peng@nxp.com wrote: > > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or > range_straddles_page_boundary(phys, size) are true, it will > create contiguous region. So when free, we need to free contiguous > region use upper check condition. > > Signed-off-by: Peng Fan > --- > drivers/xen/swiotlb-xen.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index b6d27762c6f8..ab96e468584f 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t > size, void *vaddr, > /* Convert the size to actually allocated. */ > size = 1UL << (order + XEN_PAGE_SHIFT); > > - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || > - range_straddles_page_boundary(phys, size)) && > + if (((dev_addr + size - 1 > dma_mask) || > + range_straddles_page_boundary(phys, size)) && > TestClearPageXenRemapped(virt_to_page(vaddr))) No need for the inner braces. But more importantly please factor our a helper that can be used by alloc and free to make sure that they always stay in sync. Something like: static inline bool xen_swiotlb_need_contiguous_region(struct device *dev, phys_addr_t phys, size_t size) { return xen_phys_to_bus(phys) + size - 1 > dev->coherent_dma_mask || range_straddles_page_boundary(phys, size)) } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu