Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Lu Baolu

On 2020/4/29 12:57, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:

On 2020/4/29 4:41, Michael S. Tsirkin wrote:

On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:

* Michael S. Tsirkin  [2020-04-28 12:17:57]:


Okay, but how is all this virtio specific?  For example, why not allow
separate swiotlbs for any type of device?
For example, this might make sense if a given device is from a
different, less trusted vendor.

Is swiotlb commonly used for multiple devices that may be on different trust
boundaries (and not behind a hardware iommu)?

Even a hardware iommu does not imply a 100% security from malicious
hardware. First lots of people use iommu=pt for performance reasons.
Second even without pt, unmaps are often batched, and sub-page buffers
might be used for DMA, so we are not 100% protected at all times.



For untrusted devices, IOMMU is forced on even iommu=pt is used;


I think you are talking about untrusted *drivers* like with VFIO.


No. I am talking about untrusted devices like thunderbolt peripherals.
We always trust drivers hosted in kernel and the DMA APIs are designed
for them, right?

Please refer to this series.

https://lkml.org/lkml/2019/9/6/39

Best regards,
baolu



On the other hand, I am talking about things like thunderbolt
peripherals being less trusted than on-board ones.






Or possibly even using swiotlb for specific use-cases where
speed is less of an issue.

E.g. my wifi is pretty slow anyway, and that card is exposed to
malicious actors all the time, put just that behind swiotlb
for security, and leave my graphics card with pt since
I'm trusting it with secrets anyway.



and
iotlb flush is in strict mode (no batched flushes); ATS is also not
allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
only apply page granularity protection. Swiotlb is now used for devices
from different trust zone.

Best regards,
baolu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> > > 
> > > > Okay, but how is all this virtio specific?  For example, why not allow
> > > > separate swiotlbs for any type of device?
> > > > For example, this might make sense if a given device is from a
> > > > different, less trusted vendor.
> > > Is swiotlb commonly used for multiple devices that may be on different 
> > > trust
> > > boundaries (and not behind a hardware iommu)?
> > Even a hardware iommu does not imply a 100% security from malicious
> > hardware. First lots of people use iommu=pt for performance reasons.
> > Second even without pt, unmaps are often batched, and sub-page buffers
> > might be used for DMA, so we are not 100% protected at all times.
> > 
> 
> For untrusted devices, IOMMU is forced on even iommu=pt is used;

I think you are talking about untrusted *drivers* like with VFIO.

On the other hand, I am talking about things like thunderbolt
peripherals being less trusted than on-board ones.

Or possibly even using swiotlb for specific use-cases where
speed is less of an issue.

E.g. my wifi is pretty slow anyway, and that card is exposed to
malicious actors all the time, put just that behind swiotlb
for security, and leave my graphics card with pt since
I'm trusting it with secrets anyway.


> and
> iotlb flush is in strict mode (no batched flushes); ATS is also not
> allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> only apply page granularity protection. Swiotlb is now used for devices
> from different trust zone.
> 
> Best regards,
> baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu_iova slab eats too much memory

2020-04-28 Thread Bin
Hi Shlil:

Thank you for your attention, and these are my answers:

1. I don't really understand what you're saying. What's the difference
between DMA buffer and DMA mapping?
It's like a memory block pool and a memory block or something like that?
2. Yes, the TSO is enabled all the time, but it seems not helping.
3. The CPU usage is pretty normal, and what's the point of this question?
Is it relevant to the leaking problem?

FYI:
I found an interesting phenomenon that it's just a small part of the
running hosts has this issue, even though they all
have the same kernel, configuration and hardwares, I don't know if this
really mean something.


Salil Mehta  于2020年4月28日周二 下午5:17写道:

> Hi Bin,
>
> Few questions:
>
> 1. If there is a leak of IOVA due to dma_unmap_* not being called
> somewhere then
> at certain point the throughput will drastically fall and will almost
> become equal
> to zero. This should be due to unavailability of the mapping anymore. But
> in your
> case VM is getting killed so this could be actual DMA buffer leak not DMA
> mapping
> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in
> the IOMMU
> Layer for a transient reason or even due to mapping/unmapping leak.
>
> 2. Could you check if you have TSO offload enabled on Intel 82599? It will
> help
> in reducing the number of mappings and will take off IOVA mapping pressure
> from
> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount
> of memory
> required for the buffers.
>
> 3. Also, have you checked the cpu-usage while your experiment is going on?
>
> Thanks
> Salil.
>
> > -Original Message-
> > From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf
> Of
> > Robin Murphy
> > Sent: Friday, April 24, 2020 5:31 PM
> > To: Bin 
> > Cc: iommu@lists.linux-foundation.org
> > Subject: Re: iommu_iova slab eats too much memory
> >
> > On 2020-04-24 2:20 pm, Bin wrote:
> > > Dear Robin:
> > >  Thank you for your explanation. Now, I understand that this could
> be
> > > NIC driver's fault, but how could I confirm it? Do I have to debug the
> > > driver myself?
> >
> > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> > memory about an order of magnitude faster than the IOVAs alone, but it
> > should shed some light on whether DMA API usage looks suspicious, and
> > dumping the mappings should help track down the responsible driver(s).
> > Although the debugfs code doesn't show the stacktrace of where each
> > mapping was made, I guess it would be fairly simple to tweak that for a
> > quick way to narrow down where to start looking in an offending driver.
> >
> > Robin.
> >
> > > Robin Murphy  于2020年4月24日周五 下午8:15写道:
> > >
> > >> On 2020-04-24 1:06 pm, Bin wrote:
> > >>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> > >>> leaking DMA mappings", is it possible that some other kernel module
> like
> > >>> KVM or NIC driver leads to the leaking problem instead of the iommu
> > >> module
> > >>> itself?
> > >>
> > >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> > >> should, since I'd expect a lot of people to have noticed that. It's
> far
> > >> more likely that some driver is failing to call dma_unmap_* when it's
> > >> finished with a buffer - with the IOMMU disabled that would be a no-op
> > >> on x86 with a modern 64-bit-capable device, so such a latent bug could
> > >> have been easily overlooked.
> > >>
> > >> Robin.
> > >>
> > >>> Bin  于 2020年4月24日周五 20:00写道:
> > >>>
> >  Well, that's the problem! I'm assuming the iommu kernel module is
> > >> leaking
> >  memory. But I don't know why and how.
> > 
> >  Do you have any idea about it? Or any further information is needed?
> > 
> >  Robin Murphy  于 2020年4月24日周五 19:20写道:
> > 
> > > On 2020-04-24 1:40 am, Bin wrote:
> > >> Hello? anyone there?
> > >>
> > >> Bin  于2020年4月23日周四 下午5:14写道:
> > >>
> > >>> Forget to mention, I've already disabled the slab merge, so this
> is
> > > what
> > >>> it is.
> > >>>
> > >>> Bin  于2020年4月23日周四 下午5:11写道:
> > >>>
> >  Hey, guys:
> > 
> >  I'm running a batch of CoreOS boxes, the lsb_release is:
> > 
> >  ```
> >  # cat /etc/lsb-release
> >  DISTRIB_ID="Container Linux by CoreOS"
> >  DISTRIB_RELEASE=2303.3.0
> >  DISTRIB_CODENAME="Rhyolite"
> >  DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0
> (Rhyolite)"
> >  ```
> > 
> >  ```
> >  # uname -a
> >  Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38
> -00
> > > 2019
> >  x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> > > GNU/Linux
> >  ```
> >  Recently, I found my vms constently being killed due to OOM, and
> > >> after
> >  digging into the problem, I finally realized that the kernel is
> > > leaking
> 

Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Srivatsa Vaddagiri
* Stefano Stabellini  [2020-04-28 16:04:34]:

> > > Is swiotlb commonly used for multiple devices that may be on different 
> > > trust
> > > boundaries (and not behind a hardware iommu)?
> 
> The trust boundary is not a good way of describing the scenario and I
> think it leads to miscommunication.
> 
> A better way to describe the scenario would be that the device can only
> DMA to/from a small reserved-memory region advertised on device tree.
> 
> Do we have other instances of devices that can only DMA to/from very
> specific and non-configurable address ranges? If so, this series could
> follow their example.

AFAICT there is no such notion in current DMA API.

static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size,
bool is_ram)
{
return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
}

Only the max address a device can access is defined and not a range that we seem
to need here. I think we need to set the bus_dma_limit to 0 for virtio devices
which will force the use of swiotlb_map API. We should also have a per-device
swiotlb pool defined, so that swiotlb can use the pool meant for the given
device.

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Srivatsa Vaddagiri
* Michael S. Tsirkin  [2020-04-28 16:41:04]:

> > Won't we still need some changes to virtio to make use of its own pool (to
> > bounce buffers)? Something similar to its own DMA ops proposed in this 
> > patch?
> 
> If you are doing this for all devices, you need to either find a way
> to do this without chaning DMA ops, or by doing some automatic change
> to all drivers.

Ok thanks for this input. I will see how we can obfuscate this in DMA APIs
itself.

Can you also comment on the virtio transport problem I cited? The hypervisor we
are dealing with does not support MMIO transport. It supports message queue
send/recv and also doorbell, which I think can be used if we can make some
change like this to virtio_mmio.c:

+static inline u32
+virtio_readl(struct virtio_mmio_device *vm_dev, u32 reg_offset)
+{
+return vm_dev->mmio_ops->readl(vm_dev, reg_offset);
+}
+ 
+static inline void
+virtio_writel(struct virtio_mmio_device *vm_dev, u32 reg_offset, u32 data)
+{
+vm_dev->mmio_ops->writel(vm_dev, reg_offset, data);
+}


/* Check magic value */
-magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE);
+magic = vrito_readl(vm_dev, VIRTIO_MMIO_MAGIC_VALUE);

mmio_ops->readl on most platforms can default to readl itself, while on a
platform like us, it can boil down to message_queue send/recv. Would such a
change be acceptable?

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] iommu/vt-d: Add page request draining support

2020-04-28 Thread Jacob Pan
On Wed, 22 Apr 2020 16:06:10 +0800
Lu Baolu  wrote:

> When a PASID is stopped or terminated, there can be pending PRQs
> (requests that haven't received responses) in remapping hardware.
> This adds the interface to drain page requests and call it when a
> PASID is terminated.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-svm.c   | 103
> ++-- include/linux/intel-iommu.h |
> 4 ++ 2 files changed, 102 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 83dc4319f661..2534641ef707 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -23,6 +23,7 @@
>  #include "intel-pasid.h"
>  
>  static irqreturn_t prq_event_thread(int irq, void *d);
> +static void intel_svm_drain_prq(struct device *dev, int pasid);
>  
>  #define PRQ_ORDER 0
>  
> @@ -66,6 +67,8 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
>   dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
>   dmar_writeq(iommu->reg + DMAR_PQA_REG,
> virt_to_phys(iommu->prq) | PRQ_ORDER); 
> + init_completion(>prq_complete);
> +
>   return 0;
>  }
>  
> @@ -208,6 +211,7 @@ static void intel_mm_release(struct mmu_notifier
> *mn, struct mm_struct *mm) rcu_read_lock();
>   list_for_each_entry_rcu(sdev, >devs, list) {
>   intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
> svm->pasid);
> + intel_svm_drain_prq(sdev->dev, svm->pasid);
mmu_notifier release is called in atomic context, drain_prq needs to
wait for completion. I tested exit path when a process crashes. I got

[  +0.696214] BUG: sleeping function called from invalid context at 
kernel/sched/completion.c:101
[  +0.68] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3235, 
name: dsatest
[  +0.46] INFO: lockdep is turned off.
[  +0.02] CPU: 1 PID: 3235 Comm: dsatest Not tainted 5.7.0-rc1-z-svmtest+ 
#1637
[  +0.00] Hardware name: Intel Corporation Kabylake Client platform/Skylake 
Halo DDR4 RVP11, BIOS 
04.1709050855 09/05/2017
[  +0.01] Call Trace:
[  +0.04]  dump_stack+0x68/0x9b
[  +0.03]  ___might_sleep+0x229/0x250
[  +0.03]  wait_for_completion_timeout+0x3c/0x110
[  +0.03]  intel_svm_drain_prq+0x12f/0x210
[  +0.03]  intel_mm_release+0x73/0x110
[  +0.03]  __mmu_notifier_release+0x94/0x220
[  +0.02]  ? do_munmap+0x10/0x10
[  +0.02]  ? prepare_ftrace_return+0x5c/0x80
[  +0.03]  exit_mmap+0x156/0x1a0
[  +0.02]  ? mmput+0x44/0x120
[  +0.03]  ? exit_mmap+0x5/0x1a0
[  +0.02]  ? ftrace_graph_caller+0xa0/0xa0
[  +0.01]  mmput+0x5e/0x120


>   intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
>   }
>   rcu_read_unlock();
> @@ -401,12 +405,8 @@ int intel_svm_unbind_gpasid(struct device *dev,
> int pasid) if (!sdev->users) {
>   list_del_rcu(>list);
>   intel_pasid_tear_down_entry(iommu, dev,
> svm->pasid);
> + intel_svm_drain_prq(dev, svm->pasid);
>   intel_flush_svm_range_dev(svm, sdev, 0, -1,
> 0);
> - /* TODO: Drain in flight PRQ for the PASID
> since it
> -  * may get reused soon, we don't want to
> -  * confuse with its previous life.
> -  * intel_svm_drain_prq(dev, pasid);
> -  */
>   kfree_rcu(sdev, rcu);
>  
>   if (list_empty(>devs)) {
> @@ -644,6 +644,7 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
>* large and has to be physically
> contiguous. So it's
>* hard to be as defensive as we might like.
> */ intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> + intel_svm_drain_prq(dev, svm->pasid);
>   intel_flush_svm_range_dev(svm, sdev, 0, -1,
> 0); kfree_rcu(sdev, rcu);
>  
> @@ -722,6 +723,92 @@ static bool is_canonical_address(u64 addr)
>   return (((saddr << shift) >> shift) == saddr);
>  }
>  
> +/**
> + * intel_svm_drain_prq:
> + *
> + * Drain all pending page requests and responses related to a
> specific
> + * pasid in both software and hardware.
> + */
> +static void intel_svm_drain_prq(struct device *dev, int pasid)
> +{
> + struct device_domain_info *info;
> + struct dmar_domain *domain;
> + struct intel_iommu *iommu;
> + struct qi_desc desc[3];
> + struct pci_dev *pdev;
> + int head, tail;
> + u16 sid, did;
> + int qdep;
> +
> + info = get_domain_info(dev);
> + if (WARN_ON(!info || !dev_is_pci(dev)))
> + return;
> +
> + if (!info->ats_enabled)
> + return;
> +
> + iommu = info->iommu;
> + domain = info->domain;
> + pdev = to_pci_dev(dev);
> + sid = PCI_DEVID(info->bus, info->devfn);
> + did = domain->iommu_did[iommu->seq_id];
> + qdep = pci_ats_queue_depth(pdev);
> +

Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Lu Baolu

On 2020/4/29 4:41, Michael S. Tsirkin wrote:

On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:

* Michael S. Tsirkin  [2020-04-28 12:17:57]:


Okay, but how is all this virtio specific?  For example, why not allow
separate swiotlbs for any type of device?
For example, this might make sense if a given device is from a
different, less trusted vendor.

Is swiotlb commonly used for multiple devices that may be on different trust
boundaries (and not behind a hardware iommu)?

Even a hardware iommu does not imply a 100% security from malicious
hardware. First lots of people use iommu=pt for performance reasons.
Second even without pt, unmaps are often batched, and sub-page buffers
might be used for DMA, so we are not 100% protected at all times.



For untrusted devices, IOMMU is forced on even iommu=pt is used; and
iotlb flush is in strict mode (no batched flushes); ATS is also not
allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
only apply page granularity protection. Swiotlb is now used for devices
from different trust zone.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] swiotlb: Introduce concept of swiotlb_pool

2020-04-28 Thread kbuild test robot
Hi Srivatsa,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 
next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-defconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   drivers/iommu/intel-iommu.c: In function 'bounce_map_single':
>> drivers/iommu/intel-iommu.c:3990:24: error: 'io_tlb_start' undeclared (first 
>> use in this function); did you mean 'swiotlb_start'?
__phys_to_dma(dev, io_tlb_start),
   ^~~~
   swiotlb_start
   drivers/iommu/intel-iommu.c:3990:24: note: each undeclared identifier is 
reported only once for each function it appears in

vim +3990 drivers/iommu/intel-iommu.c

cfb94a372f2d4e Lu Baolu 2019-09-06  3941  
cfb94a372f2d4e Lu Baolu 2019-09-06  3942  static dma_addr_t
cfb94a372f2d4e Lu Baolu 2019-09-06  3943  bounce_map_single(struct device 
*dev, phys_addr_t paddr, size_t size,
cfb94a372f2d4e Lu Baolu 2019-09-06  3944  enum 
dma_data_direction dir, unsigned long attrs,
cfb94a372f2d4e Lu Baolu 2019-09-06  3945  u64 dma_mask)
cfb94a372f2d4e Lu Baolu 2019-09-06  3946  {
cfb94a372f2d4e Lu Baolu 2019-09-06  3947size_t aligned_size = 
ALIGN(size, VTD_PAGE_SIZE);
cfb94a372f2d4e Lu Baolu 2019-09-06  3948struct dmar_domain *domain;
cfb94a372f2d4e Lu Baolu 2019-09-06  3949struct intel_iommu *iommu;
cfb94a372f2d4e Lu Baolu 2019-09-06  3950unsigned long iova_pfn;
cfb94a372f2d4e Lu Baolu 2019-09-06  3951unsigned long nrpages;
cfb94a372f2d4e Lu Baolu 2019-09-06  3952phys_addr_t tlb_addr;
cfb94a372f2d4e Lu Baolu 2019-09-06  3953int prot = 0;
cfb94a372f2d4e Lu Baolu 2019-09-06  3954int ret;
cfb94a372f2d4e Lu Baolu 2019-09-06  3955  
a11bfde9c77df1 Joerg Roedel 2020-02-17  3956if 
(unlikely(attach_deferred(dev)))
a11bfde9c77df1 Joerg Roedel 2020-02-17  3957do_deferred_attach(dev);
a11bfde9c77df1 Joerg Roedel 2020-02-17  3958  
96d170f3b1a607 Joerg Roedel 2020-02-17  3959domain = find_domain(dev);
a11bfde9c77df1 Joerg Roedel 2020-02-17  3960  
cfb94a372f2d4e Lu Baolu 2019-09-06  3961if (WARN_ON(dir == DMA_NONE || 
!domain))
cfb94a372f2d4e Lu Baolu 2019-09-06  3962return 
DMA_MAPPING_ERROR;
cfb94a372f2d4e Lu Baolu 2019-09-06  3963  
cfb94a372f2d4e Lu Baolu 2019-09-06  3964iommu = 
domain_get_iommu(domain);
cfb94a372f2d4e Lu Baolu 2019-09-06  3965if (WARN_ON(!iommu))
cfb94a372f2d4e Lu Baolu 2019-09-06  3966return 
DMA_MAPPING_ERROR;
cfb94a372f2d4e Lu Baolu 2019-09-06  3967  
cfb94a372f2d4e Lu Baolu 2019-09-06  3968nrpages = aligned_nrpages(0, 
size);
cfb94a372f2d4e Lu Baolu 2019-09-06  3969iova_pfn = 
intel_alloc_iova(dev, domain,
cfb94a372f2d4e Lu Baolu 2019-09-06  3970
dma_to_mm_pfn(nrpages), dma_mask);
cfb94a372f2d4e Lu Baolu 2019-09-06  3971if (!iova_pfn)
cfb94a372f2d4e Lu Baolu 2019-09-06  3972return 
DMA_MAPPING_ERROR;
cfb94a372f2d4e Lu Baolu 2019-09-06  3973  
cfb94a372f2d4e Lu Baolu 2019-09-06  3974/*
cfb94a372f2d4e Lu Baolu 2019-09-06  3975 * Check if DMAR supports 
zero-length reads on write only
cfb94a372f2d4e Lu Baolu 2019-09-06  3976 * mappings..
cfb94a372f2d4e Lu Baolu 2019-09-06  3977 */
cfb94a372f2d4e Lu Baolu 2019-09-06  3978if (dir == DMA_TO_DEVICE || dir 
== DMA_BIDIRECTIONAL ||
cfb94a372f2d4e Lu Baolu 2019-09-06  3979
!cap_zlr(iommu->cap))
cfb94a372f2d4e Lu Baolu 2019-09-06  3980prot |= DMA_PTE_READ;
cfb94a372f2d4e Lu Baolu 2019-09-06  3981if (dir == DMA_FROM_DEVICE || 
dir == DMA_BIDIRECTIONAL)
cfb94a372f2d4e Lu Baolu 2019-09-06  3982prot |= DMA_PTE_WRITE;
cfb94a372f2d4e Lu Baolu 2019-09-06  3983  
cfb94a372f2d4e Lu Baolu 2019-09-06  3984/*
cfb94a372f2d4e Lu Baolu 2019-09-06  3985 * If both the physical buffer 
start address and size are
cfb94a372f2d4e Lu Baolu 2019-09-06  3986 * page aligned, we don't need 
to use a bounce page.
cfb94a372f2d4e Lu Baolu 2019-09-06  3987 */
cfb94a372f2

Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Stefano Stabellini
On Tue, 28 Apr 2020, Michael S. Tsirkin wrote:
> On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> > 
> > > Okay, but how is all this virtio specific?  For example, why not allow
> > > separate swiotlbs for any type of device?
> > > For example, this might make sense if a given device is from a
> > > different, less trusted vendor.
> > 
> > Is swiotlb commonly used for multiple devices that may be on different trust
> > boundaries (and not behind a hardware iommu)?

The trust boundary is not a good way of describing the scenario and I
think it leads to miscommunication.

A better way to describe the scenario would be that the device can only
DMA to/from a small reserved-memory region advertised on device tree.

Do we have other instances of devices that can only DMA to/from very
specific and non-configurable address ranges? If so, this series could
follow their example.


> Even a hardware iommu does not imply a 100% security from malicious
> hardware. First lots of people use iommu=pt for performance reasons.
> Second even without pt, unmaps are often batched, and sub-page buffers
> might be used for DMA, so we are not 100% protected at all times.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Stefano Stabellini
On Tue, 28 Apr 2020, Srivatsa Vaddagiri wrote:
> For better security, its desirable that a guest VM's memory is
> not accessible to any entity that executes outside the context of
> guest VM. In case of virtio, backend drivers execute outside the
> context of guest VM and in general will need access to complete
> guest VM memory.  One option to restrict the access provided to
> backend driver is to make use of a bounce buffer. The bounce
> buffer is accessible to both backend and frontend drivers. All IO
> buffers that are in private space of guest VM are bounced to be
> accessible to backend.

[...]

> +static int __init virtio_bounce_setup(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (!of_get_flat_dt_prop(node, "no-map", NULL))
> + return -EINVAL;
> +
> + return virtio_register_bounce_buffer(rmem->base, rmem->size);
> +}
> +
> +RESERVEDMEM_OF_DECLARE(virtio, "virtio_bounce_pool", virtio_bounce_setup);

Is this special reserved-memory region documented somewhere?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
> There are two users of a PASID, mm and device driver(FD). If
> either one is not done with the PASID, it cannot be reclaimed. As you
> mentioned, it could take a long time for the driver to abort. If the
> abort ends *after* mmdrop, we are in trouble.
> If driver drops reference after abort/drain PASID is done, then we are
> safe.

I don't think there should be an abort ... suppose the application requested
the DSA to copy some large block of important results from DDR4 to
persistent memory.  Driver should wait for that copy operation to complete.

Note that for the operation to succeed, the kernel should still be processing
and fixing page faults for the "mm" (some parts of the data that the user wanted
to save to persistent memory may have been paged out).

The wait by the DSA diver needs to by synchronous ... the "mm" cannot be
freed until DSA says all the pending operations have completed.

Even without persistent memory, there are cases where you want the operations
to complete (mmap'd files, shared memory with other processes).

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread kbuild test robot
Hi Srivatsa,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 
next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: s390-randconfig-a001-20200428 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   drivers/virtio/virtio_bounce.c: In function 'virtio_bounce_setup':
>> drivers/virtio/virtio_bounce.c:144:7: error: implicit declaration of 
>> function 'of_get_flat_dt_prop' [-Werror=implicit-function-declaration]
 144 |  if (!of_get_flat_dt_prop(node, "no-map", NULL))
 |   ^~~
   cc1: some warnings being treated as errors

vim +/of_get_flat_dt_prop +144 drivers/virtio/virtio_bounce.c

   139  
   140  static int __init virtio_bounce_setup(struct reserved_mem *rmem)
   141  {
   142  unsigned long node = rmem->fdt_node;
   143  
 > 144  if (!of_get_flat_dt_prop(node, "no-map", NULL))
   145  return -EINVAL;
   146  
   147  return virtio_register_bounce_buffer(rmem->base, rmem->size);
   148  }
   149  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Jacob Pan (Jun)
On Tue, 28 Apr 2020 13:59:43 -0700
"Luck, Tony"  wrote:

> >> So the driver needs to use flush/drain operations to make sure all
> >> the in-flight work has completed before releasing/re-using the
> >> PASID. 
> > Are you suggesting we should let driver also hold a reference of the
> > PASID?  
> 
> The sequence for bare metal is:
> 
>   process is queuing requests to DSA
>   process exits (either deliberately, or crashes, or is killed)
>   kernel does exit processing
>   DSA driver is called as part of tear down of "mm"
>   issues drain/flush commands to ensure that all
>   queued operations on the PASID for this mm have
>   completed
>   PASID can be freed
> 
> There's a 1:1 map from "mm" to PASID ... so reference counting seems
> like overkill. Once the kernel is in the "exit" path, we know that no
> more work can be queued using this PASID.
> 
There are two users of a PASID, mm and device driver(FD). If
either one is not done with the PASID, it cannot be reclaimed. As you
mentioned, it could take a long time for the driver to abort. If the
abort ends *after* mmdrop, we are in trouble.
If driver drops reference after abort/drain PASID is done, then we are
safe.


> -Tony

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread kbuild test robot
Hi Srivatsa,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on xen-tip/linux-next linus/master v5.7-rc3 
next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: sh-randconfig-a001-20200428 (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All warnings (new ones prefixed by >>):

   In file included from drivers/virtio/virtio_bounce.c:13:
   include/linux/swiotlb.h: In function 'swiotlb_alloc':
   include/linux/swiotlb.h:234:9: error: 'DMA_MAPPING_ERROR' undeclared (first 
use in this function)
 234 |  return DMA_MAPPING_ERROR;
 | ^
   include/linux/swiotlb.h:234:9: note: each undeclared identifier is reported 
only once for each function it appears in
>> include/linux/swiotlb.h:235:1: warning: control reaches end of non-void 
>> function [-Wreturn-type]
 235 | }
 | ^

vim +235 include/linux/swiotlb.h

9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28  229  
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28  230  static inline phys_addr_t 
swiotlb_alloc(struct swiotlb_pool *pool,
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28  231   size_t 
alloc_size, unsigned long tbl_dma_addr,
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28  232   unsigned long 
mask)
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28  233  {
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28  234   return 
DMA_MAPPING_ERROR;
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 @235  }
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28  236  

:: The code at line 235 was first introduced by commit
:: 9ab4c39d9f929840cb884e589f6112770dbc2f63 swiotlb: Add alloc and free APIs

:: TO: Srivatsa Vaddagiri 
:: CC: 0day robot 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
>> So the driver needs to use flush/drain operations to make sure all
>> the in-flight work has completed before releasing/re-using the PASID.
>> 
> Are you suggesting we should let driver also hold a reference of the
> PASID?

The sequence for bare metal is:

process is queuing requests to DSA
process exits (either deliberately, or crashes, or is killed)
kernel does exit processing
DSA driver is called as part of tear down of "mm"
issues drain/flush commands to ensure that all
queued operations on the PASID for this mm have
completed
PASID can be freed

There's a 1:1 map from "mm" to PASID ... so reference counting seems
like overkill. Once the kernel is in the "exit" path, we know that no more
work can be queued using this PASID.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Fenghua Yu
On Sun, Apr 26, 2020 at 04:55:25PM +0200, Thomas Gleixner wrote:
> Fenghua Yu  writes:
> > diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> > index bdeae9291e5c..137bf51f19e6 100644
> > --- a/arch/x86/include/asm/mmu.h
> > +++ b/arch/x86/include/asm/mmu.h
> > @@ -50,6 +50,10 @@ typedef struct {
> > u16 pkey_allocation_map;
> > s16 execute_only_pkey;
> >  #endif
> > +
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +   int pasid;
> 
> int? It's a value which gets programmed into the MSR along with the
> valid bit (bit 31) set. 

BTW, ARM is working on PASID as well. Christoph suggested that the PASID
should be defined in mm_struct instead of mm->context so that both ARM and X86
can access it:
https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-phili...@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911

So I will define "pasid" to mm_struct in a separate patch in the next version.

Thanks.

-Fenghua

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Jacob Pan (Jun)
On Tue, 28 Apr 2020 12:07:25 -0700
"Luck, Tony"  wrote:

> > If fd release cleans up then how should there be something in
> > flight at the final mmdrop?  
> 
> ENQCMD from the user is only synchronous in that it lets the user
> know their request has been added to a queue (or not).  Execution of
> the request may happen later (if the device is busy working on
> requests for other users).  The request will take some time to
> complete. Someone told me the theoretical worst case once, which I've
> since forgotten, but it can be a long time.
> 
> So the driver needs to use flush/drain operations to make sure all
> the in-flight work has completed before releasing/re-using the PASID.
> 
Are you suggesting we should let driver also hold a reference of the
PASID?

> -Tony

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin  [2020-04-28 12:17:57]:
> 
> > Okay, but how is all this virtio specific?  For example, why not allow
> > separate swiotlbs for any type of device?
> > For example, this might make sense if a given device is from a
> > different, less trusted vendor.
> 
> Is swiotlb commonly used for multiple devices that may be on different trust
> boundaries (and not behind a hardware iommu)?

Even a hardware iommu does not imply a 100% security from malicious
hardware. First lots of people use iommu=pt for performance reasons.
Second even without pt, unmaps are often batched, and sub-page buffers
might be used for DMA, so we are not 100% protected at all times.


> If so, then yes it sounds like a
> good application of multiple swiotlb pools.
> 
> > All this can then maybe be hidden behind the DMA API.
> 
> Won't we still need some changes to virtio to make use of its own pool (to
> bounce buffers)? Something similar to its own DMA ops proposed in this patch?

If you are doing this for all devices, you need to either find a way
to do this without chaning DMA ops, or by doing some automatic change
to all drivers.


> > > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
> > > +{
> > > + if (!bounce_buf_paddr)
> > > + return;
> > > +
> > > + set_dma_ops(vdev->dev.parent, _dma_ops);
> > 
> > 
> > I don't think DMA API maintainers will be happy with new users
> > of set_dma_ops.
> 
> Is there an alternate API that is more preffered?

all this is supposed to be part of DMA API itself. new drivers aren't
supposed to have custom DMA ops.

> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Jacob Pan (Jun)
On Tue, 28 Apr 2020 20:54:01 +0200
Thomas Gleixner  wrote:

> "Jacob Pan (Jun)"  writes:
> > On Sun, 26 Apr 2020 16:55:25 +0200
> > Thomas Gleixner  wrote:  
> >> Fenghua Yu  writes:  
> >> > The PASID is freed when the process exits (so no need to keep
> >> > reference counts on how many SVM devices are sharing the
> >> > PASID).
> >> 
> >> I'm not buying that. If there is an outstanding request with the
> >> PASID of a process then tearing down the process address space and
> >> freeing the PASID (which might be reused) is fundamentally broken.
> >>   
> > Device driver unbind PASID is tied to FD release. So when a process
> > exits, FD close causes driver to do the following:
> >
> > 1. stops DMA
> > 2. unbind PASID (clears the PASID entry in IOMMU, flush all TLBs,
> > drain in flight page requests)  
> 
> Fair enough. Explaining that somewhere might be helpful.
> 
Will do. I plan to document this in a kernel doc for IOASID/PASID
lifecycle management.

> > For bare metal SVM, if the last mmdrop always happens after FD
> > release, we can ensure no outstanding requests at the point of
> > ioasid_free(). Perhaps this is a wrong assumption?  
> 
> If fd release cleans up then how should there be something in flight
> at the final mmdrop?
> 
> > For guest SVM, there will be more users of a PASID. I am also
> > working on adding refcounting to ioasid. ioasid_free() will not
> > release the PASID back to the pool until all references are
> > dropped.  
> 
> What does more users mean?
For VT-d, a PASID can be used by VFIO, IOMMU driver, KVM, and Virtual
Device Composition Module (VDCM*) at the same time.

*https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

There are HW context associated with the PASID in IOMMU, KVM, and VDCM.
So before the lifetime of the PASID is over, clean up must be done in
all of the above. PASID cannot be reclaimed until the last user drops
its reference. Our plan is to do notification and refcouting.


> 
> >> > +if (mm && mm->context.pasid && !(flags &
> >> > SVM_FLAG_PRIVATE_PASID)) {
> >> > +/*
> >> > + * Once a PASID is allocated for this mm, the
> >> > PASID
> >> > + * stays with the mm until the mm is dropped.
> >> > Reuse
> >> > + * the PASID which has been already allocated
> >> > for the
> >> > + * mm instead of allocating a new one.
> >> > + */
> >> > +ioasid_set_data(mm->context.pasid, svm);
> >> 
> >> So if the PASID is reused several times for different SVMs then
> >> every time ioasid_data->private is set to a different SVM. How is
> >> that supposed to work?
> >>   
> > For the lifetime of the mm, there is only one PASID.
> > svm_bind/unbind_mm could happen many times with different private
> > data: intel_svm. Multiple devices can bind to the same PASID as
> > well. But private data don't change within the first bind and last
> > unbind.  
> 
> Ok. I read through that spaghetti of intel_svm_bind_mm() again and
> now I start to get an idea how that is supposed to work. What a mess.
> 
> That function really wants to be restructured in a way so it is
> understandable to mere mortals. 
> 

Agreed. We are adding many new features and converging with generic
sva_bind_device. Things will get more clear after we have fewer moving
pieces.


> Thanks,
> 
> tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
> If fd release cleans up then how should there be something in flight at
> the final mmdrop?

ENQCMD from the user is only synchronous in that it lets the user know their
request has been added to a queue (or not).  Execution of the request may happen
later (if the device is busy working on requests for other users).  The request 
will
take some time to complete. Someone told me the theoretical worst case once,
which I've since forgotten, but it can be a long time.

So the driver needs to use flush/drain operations to make sure all the in-flight
work has completed before releasing/re-using the PASID.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Thomas Gleixner
"Jacob Pan (Jun)"  writes:
> On Sun, 26 Apr 2020 16:55:25 +0200
> Thomas Gleixner  wrote:
>> Fenghua Yu  writes:
>> > The PASID is freed when the process exits (so no need to keep
>> > reference counts on how many SVM devices are sharing the PASID).  
>> 
>> I'm not buying that. If there is an outstanding request with the PASID
>> of a process then tearing down the process address space and freeing
>> the PASID (which might be reused) is fundamentally broken.
>> 
> Device driver unbind PASID is tied to FD release. So when a process
> exits, FD close causes driver to do the following:
>
> 1. stops DMA
> 2. unbind PASID (clears the PASID entry in IOMMU, flush all TLBs, drain
> in flight page requests)

Fair enough. Explaining that somewhere might be helpful.

> For bare metal SVM, if the last mmdrop always happens after FD release,
> we can ensure no outstanding requests at the point of ioasid_free().
> Perhaps this is a wrong assumption?

If fd release cleans up then how should there be something in flight at
the final mmdrop?

> For guest SVM, there will be more users of a PASID. I am also
> working on adding refcounting to ioasid. ioasid_free() will not release
> the PASID back to the pool until all references are dropped.

What does more users mean?

>> > +  if (mm && mm->context.pasid && !(flags &
>> > SVM_FLAG_PRIVATE_PASID)) {
>> > +  /*
>> > +   * Once a PASID is allocated for this mm, the PASID
>> > +   * stays with the mm until the mm is dropped. Reuse
>> > +   * the PASID which has been already allocated for
>> > the
>> > +   * mm instead of allocating a new one.
>> > +   */
>> > +  ioasid_set_data(mm->context.pasid, svm);  
>> 
>> So if the PASID is reused several times for different SVMs then every
>> time ioasid_data->private is set to a different SVM. How is that
>> supposed to work?
>> 
> For the lifetime of the mm, there is only one PASID. svm_bind/unbind_mm
> could happen many times with different private data: intel_svm.
> Multiple devices can bind to the same PASID as well. But private data
> don't change within the first bind and last unbind.

Ok. I read through that spaghetti of intel_svm_bind_mm() again and now I
start to get an idea how that is supposed to work. What a mess.

That function really wants to be restructured in a way so it is
understandable to mere mortals. 

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Stefano Stabellini
On Tue, 28 Apr 2020, Jürgen Groß wrote:
> On 28.04.20 09:33, peng@nxp.com wrote:
> > From: Peng Fan 
> > 
> > When booting xen on i.MX8QM, met:
> > "
> > [3.602128] Unable to handle kernel paging request at virtual address
> > 00272d40
> > [3.610804] Mem abort info:
> > [3.613905]   ESR = 0x9604
> > [3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [3.623211]   SET = 0, FnV = 0
> > [3.626628]   EA = 0, S1PTW = 0
> > [3.630128] Data abort info:
> > [3.633362]   ISV = 0, ISS = 0x0004
> > [3.637630]   CM = 0, WnR = 0
> > [3.640955] [00272d40] user address but active_mm is swapper
> > [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
> > [3.654137] Modules linked in:
> > [3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
> > [3.677302] Workqueue: events deferred_probe_work_func
> > [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
> > [3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
> > [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
> > [3.693993] pci_bus :00: root bus resource [bus 00-ff]
> > [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
> > "
> > 
> > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or
> > range_straddles_page_boundary(phys, size) are true, it will
> > create contiguous region. So when free, we need to free contiguous
> > region use upper check condition.
> 
> No, this will break PV guests on x86.
> 
> I think there is something wrong with your setup in combination with
> the ARM xen_create_contiguous_region() implementation.
> 
> Stefano?

Let me start by asking Peng a couple of questions:


Peng, could you please add a printk to check which one of the two
conditions is True for you?  Is it (dev_addr + size - 1 > dma_mask) or
range_straddles_page_boundary(phys, size)?

Is hwdev->coherent_dma_mask set for your DMA capable device?

Finally, is this patch supposed to fix the crash you are seeing? If not,
can you tell where is the crash exactly?



Juergen, keep in mind that xen_create_contiguous_region does nothing on
ARM because in dom0 guest_phys == phys. xen_create_contiguous_region
simply sets dma_handle to phys. Whatever condition caused the code to
take the xen_create_contiguous_region branch in
xen_swiotlb_alloc_coherent, it will also cause it to WARN in
xen_swiotlb_free_coherent.


range_straddles_page_boundary should never return True because
guest_phys == phys. That leaves us with the dma_mask check:

  dev_addr + size - 1 <= dma_mask

dev_addr is the dma_handle allocated by xen_alloc_coherent_pages.
dma_mask is either DMA_BIT_MASK(32) or hwdev->coherent_dma_mask.

The implementation of xen_alloc_coherent_pages has been recently changed
to use dma_direct_alloc.


Christoff, does dma_direct_alloc respect hwdev->coherent_dma_mask if
present? Also, can it return highmem pages?



> Juergen
> 
> > 
> > Signed-off-by: Peng Fan 
> > ---
> >   drivers/xen/swiotlb-xen.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index b6d27762c6f8..ab96e468584f 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t
> > size, void *vaddr,
> > /* Convert the size to actually allocated. */
> > size = 1UL << (order + XEN_PAGE_SHIFT);
> >   - if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> > -range_straddles_page_boundary(phys, size)) &&
> > +   if (((dev_addr + size - 1 > dma_mask) ||
> > +   range_straddles_page_boundary(phys, size)) &&
> > TestClearPageXenRemapped(virt_to_page(vaddr)))
> > xen_destroy_contiguous_region(phys, order);
> >   
> ___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Joe Jin
On 4/28/20 10:25 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 28, 2020 at 12:19:41PM +0200, Jürgen Groß wrote:
>> On 28.04.20 10:25, Peng Fan wrote:
> 
> Adding Joe Jin.
> 
> Joe, didn't you have some ideas on how this could be implemented?
> 
 Subject: Re: [PATCH] xen/swiotlb: correct the check for
 xen_destroy_contiguous_region

 On 28.04.20 09:33, peng@nxp.com wrote:
> From: Peng Fan 
>
> When booting xen on i.MX8QM, met:
> "
> [3.602128] Unable to handle kernel paging request at virtual address
 00272d40
> [3.610804] Mem abort info:
> [3.613905]   ESR = 0x9604
> [3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
> [3.623211]   SET = 0, FnV = 0
> [3.626628]   EA = 0, S1PTW = 0
> [3.630128] Data abort info:
> [3.633362]   ISV = 0, ISS = 0x0004
> [3.637630]   CM = 0, WnR = 0
> [3.640955] [00272d40] user address but active_mm is
 swapper
> [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [3.654137] Modules linked in:
> [3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
> [3.677302] Workqueue: events deferred_probe_work_func
> [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
> [3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
> [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
> [3.693993] pci_bus :00: root bus resource [bus 00-ff]
> [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
> "
>
> In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask)
> or range_straddles_page_boundary(phys, size) are true, it will create
> contiguous region. So when free, we need to free contiguous region use
> upper check condition.

 No, this will break PV guests on x86.
>>>
>>> Could you share more details why alloc and free not matching for the check?
>>
>> xen_create_contiguous_region() is needed only in case:
>>
>> - the bus address is not within dma_mask, or
>> - the memory region is not physically contiguous (can happen only for
>>   PV guests)
>>
>> In any case it should arrange for the memory to be suitable for the
>> DMA operation, so to be contiguous and within dma_mask afterwards. So
>> xen_destroy_contiguous_region() should only ever called for areas
>> which match above criteria, as otherwise we can be sure
>> xen_create_contiguous_region() was not used for making the area DMA-able
>> in the beginning.

I agreed with Juergen's explanation, That is my understanding.

Peng, if panic caused by (dev_addr + size - 1 > dma_mask), you should check
how you get the addr, if memory created by xen_create_contiguous_region(),
memory must be with in [0 - dma_mask].

Thanks,
Joe

>>
>> And this is very important in the PV case, as in those guests the page
>> tables are containing the host-PFNs, not the guest-PFNS, and
>> xen_create_contiguous_region() will fiddle with host- vs. guest-PFN
>> arrangements, and xen_destroy_contiguous_region() is reverting this
>> fiddling. Any call of xen_destroy_contiguous_region() for an area it
>> was not intended to be called for might swap physical pages beneath
>> random virtual addresses, which was the reason for this test to be
>> added by me.
>>
>>
>> Juergen
>>
>>>
>>> Thanks,
>>> Peng.
>>>

 I think there is something wrong with your setup in combination with the 
 ARM
 xen_create_contiguous_region() implementation.

 Stefano?


 Juergen

>
> Signed-off-by: Peng Fan 
> ---
>drivers/xen/swiotlb-xen.c | 4 ++--
>1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index b6d27762c6f8..ab96e468584f 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev,
 size_t size, void *vaddr,
>   /* Convert the size to actually allocated. */
>   size = 1UL << (order + XEN_PAGE_SHIFT);
>
> - if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> -  range_straddles_page_boundary(phys, size)) &&
> + if (((dev_addr + size - 1 > dma_mask) ||
> + range_straddles_page_boundary(phys, size)) &&
>   TestClearPageXenRemapped(virt_to_page(vaddr)))
>   xen_destroy_contiguous_region(phys, order);
>
>
>>>
>>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Jacob Pan (Jun)
On Sun, 26 Apr 2020 16:55:25 +0200
Thomas Gleixner  wrote:

> Fenghua Yu  writes:
> 
> > PASID is shared by all threads in a process. So the logical place
> > to keep track of it is in the "mm". Add the field to the
> > architecture specific mm_context_t structure.
> >
> > A PASID is allocated for an "mm" the first time any thread attaches
> > to an SVM capable device. Later device atatches (whether to the
> > same  
> 
> atatches?
> 
> > device or another SVM device) will re-use the same PASID.
> >
> > The PASID is freed when the process exits (so no need to keep
> > reference counts on how many SVM devices are sharing the PASID).  
> 
> I'm not buying that. If there is an outstanding request with the PASID
> of a process then tearing down the process address space and freeing
> the PASID (which might be reused) is fundamentally broken.
> 
Device driver unbind PASID is tied to FD release. So when a process
exits, FD close causes driver to do the following:
1. stops DMA
2. unbind PASID (clears the PASID entry in IOMMU, flush all TLBs, drain
in flight page requests)

For bare metal SVM, if the last mmdrop always happens after FD release,
we can ensure no outstanding requests at the point of ioasid_free().
Perhaps this is a wrong assumption?

For guest SVM, there will be more users of a PASID. I am also
working on adding refcounting to ioasid. ioasid_free() will not release
the PASID back to the pool until all references are dropped.

> > +void __free_pasid(struct mm_struct *mm);
> > +
> >  #endif /* _ASM_X86_IOMMU_H */
> > diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> > index bdeae9291e5c..137bf51f19e6 100644
> > --- a/arch/x86/include/asm/mmu.h
> > +++ b/arch/x86/include/asm/mmu.h
> > @@ -50,6 +50,10 @@ typedef struct {
> > u16 pkey_allocation_map;
> > s16 execute_only_pkey;
> >  #endif
> > +
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +   int pasid;  
> 
> int? It's a value which gets programmed into the MSR along with the
> valid bit (bit 31) set. 
> 
> >  extern void switch_mm(struct mm_struct *prev, struct mm_struct
> > *next, diff --git a/drivers/iommu/intel-svm.c
> > b/drivers/iommu/intel-svm.c index d7f2a5358900..da718a49e91e 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -226,6 +226,45 @@ static LIST_HEAD(global_svm_list);
> > list_for_each_entry((sdev), &(svm)->devs, list) \
> > if ((d) != (sdev)->dev) {} else
> >  
> > +/*
> > + * If this mm already has a PASID we can use it. Otherwise
> > allocate a new one.
> > + * Let the caller know if we did an allocation via 'new_pasid'.
> > + */
> > +static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
> > +  int pasid_max,  bool *new_pasid, int
> > flags)  
> 
> Again, data types please. flags are generally unsigned and not plain
> int. Also pasid_max is certainly not plain int either.
> 
> > +{
> > +   int pasid;
> > +
> > +   /*
> > +* Reuse the PASID if the mm already has a PASID and not a
> > private
> > +* PASID is requested.
> > +*/
> > +   if (mm && mm->context.pasid && !(flags &
> > SVM_FLAG_PRIVATE_PASID)) {
> > +   /*
> > +* Once a PASID is allocated for this mm, the PASID
> > +* stays with the mm until the mm is dropped. Reuse
> > +* the PASID which has been already allocated for
> > the
> > +* mm instead of allocating a new one.
> > +*/
> > +   ioasid_set_data(mm->context.pasid, svm);  
> 
> So if the PASID is reused several times for different SVMs then every
> time ioasid_data->private is set to a different SVM. How is that
> supposed to work?
> 
For the lifetime of the mm, there is only one PASID. svm_bind/unbind_mm
could happen many times with different private data: intel_svm.
Multiple devices can bind to the same PASID as well. But private data
don't change within the first bind and last unbind.

> > +   *new_pasid = false;
> > +
> > +   return mm->context.pasid;
> > +   }
> > +
> > +   /*
> > +* Allocate a new pasid. Do not use PASID 0, reserved for
> > RID to
> > +* PASID.
> > +*/
> > +   pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1,
> > svm);  
> 
> ioasid_alloc() uses ioasid_t which is
> 
> typedef unsigned int ioasid_t;
> 
> Can we please have consistent types and behaviour all over the place?
> 
> > +   if (pasid == INVALID_IOASID)
> > +   return -ENOSPC;
> > +
> > +   *new_pasid = true;
> > +
> > +   return pasid;
> > +}
> > +
> >  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > @@ -324,6 +363,8 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ init_rcu_head(>rcu);
> >  
> > if (!svm) {
> > +   bool new_pasid;
> > +
> > svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> > if (!svm) {
> > 

Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Srivatsa Vaddagiri
* Michael S. Tsirkin  [2020-04-28 12:17:57]:

> Okay, but how is all this virtio specific?  For example, why not allow
> separate swiotlbs for any type of device?
> For example, this might make sense if a given device is from a
> different, less trusted vendor.

Is swiotlb commonly used for multiple devices that may be on different trust
boundaries (and not behind a hardware iommu)? If so, then yes it sounds like a
good application of multiple swiotlb pools.

> All this can then maybe be hidden behind the DMA API.

Won't we still need some changes to virtio to make use of its own pool (to
bounce buffers)? Something similar to its own DMA ops proposed in this patch?

> > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
> > +{
> > +   if (!bounce_buf_paddr)
> > +   return;
> > +
> > +   set_dma_ops(vdev->dev.parent, _dma_ops);
> 
> 
> I don't think DMA API maintainers will be happy with new users
> of set_dma_ops.

Is there an alternate API that is more preffered?

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Konrad Rzeszutek Wilk
On Tue, Apr 28, 2020 at 12:19:41PM +0200, Jürgen Groß wrote:
> On 28.04.20 10:25, Peng Fan wrote:

Adding Joe Jin.

Joe, didn't you have some ideas on how this could be implemented?

> > > Subject: Re: [PATCH] xen/swiotlb: correct the check for
> > > xen_destroy_contiguous_region
> > > 
> > > On 28.04.20 09:33, peng@nxp.com wrote:
> > > > From: Peng Fan 
> > > > 
> > > > When booting xen on i.MX8QM, met:
> > > > "
> > > > [3.602128] Unable to handle kernel paging request at virtual address
> > > 00272d40
> > > > [3.610804] Mem abort info:
> > > > [3.613905]   ESR = 0x9604
> > > > [3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [3.623211]   SET = 0, FnV = 0
> > > > [3.626628]   EA = 0, S1PTW = 0
> > > > [3.630128] Data abort info:
> > > > [3.633362]   ISV = 0, ISS = 0x0004
> > > > [3.637630]   CM = 0, WnR = 0
> > > > [3.640955] [00272d40] user address but active_mm is
> > > swapper
> > > > [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
> > > > [3.654137] Modules linked in:
> > > > [3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
> > > > [3.677302] Workqueue: events deferred_probe_work_func
> > > > [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
> > > > [3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
> > > > [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
> > > > [3.693993] pci_bus :00: root bus resource [bus 00-ff]
> > > > [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
> > > > "
> > > > 
> > > > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask)
> > > > or range_straddles_page_boundary(phys, size) are true, it will create
> > > > contiguous region. So when free, we need to free contiguous region use
> > > > upper check condition.
> > > 
> > > No, this will break PV guests on x86.
> > 
> > Could you share more details why alloc and free not matching for the check?
> 
> xen_create_contiguous_region() is needed only in case:
> 
> - the bus address is not within dma_mask, or
> - the memory region is not physically contiguous (can happen only for
>   PV guests)
> 
> In any case it should arrange for the memory to be suitable for the
> DMA operation, so to be contiguous and within dma_mask afterwards. So
> xen_destroy_contiguous_region() should only ever called for areas
> which match above criteria, as otherwise we can be sure
> xen_create_contiguous_region() was not used for making the area DMA-able
> in the beginning.
> 
> And this is very important in the PV case, as in those guests the page
> tables are containing the host-PFNs, not the guest-PFNS, and
> xen_create_contiguous_region() will fiddle with host- vs. guest-PFN
> arrangements, and xen_destroy_contiguous_region() is reverting this
> fiddling. Any call of xen_destroy_contiguous_region() for an area it
> was not intended to be called for might swap physical pages beneath
> random virtual addresses, which was the reason for this test to be
> added by me.
> 
> 
> Juergen
> 
> > 
> > Thanks,
> > Peng.
> > 
> > > 
> > > I think there is something wrong with your setup in combination with the 
> > > ARM
> > > xen_create_contiguous_region() implementation.
> > > 
> > > Stefano?
> > > 
> > > 
> > > Juergen
> > > 
> > > > 
> > > > Signed-off-by: Peng Fan 
> > > > ---
> > > >drivers/xen/swiotlb-xen.c | 4 ++--
> > > >1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > > > index b6d27762c6f8..ab96e468584f 100644
> > > > --- a/drivers/xen/swiotlb-xen.c
> > > > +++ b/drivers/xen/swiotlb-xen.c
> > > > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev,
> > > size_t size, void *vaddr,
> > > > /* Convert the size to actually allocated. */
> > > > size = 1UL << (order + XEN_PAGE_SHIFT);
> > > > 
> > > > -   if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> > > > -range_straddles_page_boundary(phys, size)) &&
> > > > +   if (((dev_addr + size - 1 > dma_mask) ||
> > > > +   range_straddles_page_boundary(phys, size)) &&
> > > > TestClearPageXenRemapped(virt_to_page(vaddr)))
> > > > xen_destroy_contiguous_region(phys, order);
> > > > 
> > > > 
> > 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 05:09:18PM +0530, Srivatsa Vaddagiri wrote:
> For better security, its desirable that a guest VM's memory is
> not accessible to any entity that executes outside the context of
> guest VM. In case of virtio, backend drivers execute outside the
> context of guest VM and in general will need access to complete
> guest VM memory.  One option to restrict the access provided to
> backend driver is to make use of a bounce buffer. The bounce
> buffer is accessible to both backend and frontend drivers. All IO
> buffers that are in private space of guest VM are bounced to be
> accessible to backend.
> 
> This patch proposes a new memory pool to be used for this bounce
> purpose, rather than the default swiotlb memory pool. That will
> avoid any conflicts that may arise in situations where a VM needs
> to use swiotlb pool for driving any pass-through devices (in
> which case swiotlb memory needs not be shared with another VM) as
> well as virtio devices (which will require swiotlb memory to be
> shared with backend VM). As a possible extension to this patch,
> we can provide an option for virtio to make use of default
> swiotlb memory pool itself, where no such conflicts may exist in
> a given deployment.
> 
> Signed-off-by: Srivatsa Vaddagiri 


Okay, but how is all this virtio specific?  For example, why not allow
separate swiotlbs for any type of device?
For example, this might make sense if a given device is from a
different, less trusted vendor.
All this can then maybe be hidden behind the DMA API.



> ---
>  drivers/virtio/Makefile|   2 +-
>  drivers/virtio/virtio.c|   2 +
>  drivers/virtio/virtio_bounce.c | 150 
> +
>  include/linux/virtio.h |   4 ++
>  4 files changed, 157 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/virtio/virtio_bounce.c
> 
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 29a1386e..3fd3515 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
> +obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32..bc2f779 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
>  
>   dev->index = err;
>   dev_set_name(>dev, "virtio%u", dev->index);
> + virtio_bounce_set_dma_ops(dev);
>  
>   spin_lock_init(>config_lock);
>   dev->config_enabled = false;
> @@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore);
>  
>  static int virtio_init(void)
>  {
> + virtio_map_bounce_buffer();
>   if (bus_register(_bus) != 0)
>   panic("virtio bus registration failed");
>   return 0;
> diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
> new file mode 100644
> index 000..3de8e0e
> --- /dev/null
> +++ b/drivers/virtio/virtio_bounce.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Virtio DMA ops to bounce buffers
> + *
> + *  Copyright (c) 2020, The Linux Foundation. All rights reserved.
> + *
> + * This module allows bouncing of IO buffers to a region which will be
> + * accessible to backend drivers.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static phys_addr_t bounce_buf_paddr;
> +static void *bounce_buf_vaddr;
> +static size_t bounce_buf_size;
> +struct swiotlb_pool *virtio_pool;
> +
> +#define VIRTIO_MAX_BOUNCE_SIZE   (16*4096)
> +
> +static void *virtio_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs)
> +{
> + phys_addr_t addr;
> +
> + if (!virtio_pool)
> + return NULL;
> +
> + addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX);
> + if (addr == DMA_MAPPING_ERROR)
> + return NULL;
> +
> + *dma_handle = (addr - bounce_buf_paddr);
> +
> + return bounce_buf_vaddr + (addr - bounce_buf_paddr);
> +}
> +
> +static void virtio_free_coherent(struct device *dev, size_t size, void 
> *vaddr,
> + dma_addr_t dma_handle, unsigned long attrs)
> +{
> + phys_addr_t addr = (dma_handle + bounce_buf_paddr);
> +
> + swiotlb_free(virtio_pool, addr, size);
> +}
> +
> +static dma_addr_t virtio_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir, unsigned long attrs)
> +{
> + void *ptr = page_address(page) + offset;
> + phys_addr_t paddr = virt_to_phys(ptr);
> + dma_addr_t handle;
> +
> + if (!virtio_pool)
> + 

Re: [RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-04-28 Thread Robin Murphy

On 2020-04-28 4:32 pm, Daniel Vetter wrote:

On Tue, Apr 28, 2020 at 04:02:57PM +0200, Christoph Hellwig wrote:

On Tue, Apr 28, 2020 at 03:19:48PM +0200, Marek Szyprowski wrote:

1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use
a proper sg_table entries and call respective DMA-mapping functions
and adapt current code to it


That sounds reasonable to me.  Those could be pretty trivial wrappers.




2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state
which one refers to which part of the scatterlist; I'm open for
other names for those entries


nr_cpu_ents and nr_dma_ents might be better names, but it still would be
a whole lot of churn for little gain.  I think just good wrappers like
suggested above might be more helpful.


I guess long-term we could aim for both? I.e. roll out better wrappers
first, once that's soaked through the tree, rename the last offenders.


Yes, that's what I was thinking too - most of these uses are just 
passing them in and out of the DMA APIs, and thus would be subsumed into 
the wrappers anyway, then in the relatively few remaining places where 
the table is actually iterated for one reason or the other, renaming 
would stand to help review and maintenance in terms of making it far 
more obvious when the implementation and the intent don't match.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-04-28 Thread Daniel Vetter
On Tue, Apr 28, 2020 at 04:02:57PM +0200, Christoph Hellwig wrote:
> On Tue, Apr 28, 2020 at 03:19:48PM +0200, Marek Szyprowski wrote:
> > 1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use
> >a proper sg_table entries and call respective DMA-mapping functions
> >and adapt current code to it
> 
> That sounds reasonable to me.  Those could be pretty trivial wrappers.
> 
> >
> > 
> > 2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state
> >which one refers to which part of the scatterlist; I'm open for
> >other names for those entries
> 
> nr_cpu_ents and nr_dma_ents might be better names, but it still would be
> a whole lot of churn for little gain.  I think just good wrappers like
> suggested above might be more helpful.

I guess long-term we could aim for both? I.e. roll out better wrappers
first, once that's soaked through the tree, rename the last offenders.

Personally I like nr_cpu_ents and nr_dma_ents, that's about as clear as it
gets.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 10/17] drm: radeon: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Christian König

Am 28.04.20 um 15:19 schrieb Marek Szyprowski:

The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/radeon/radeon_ttm.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5d50c9e..4770880 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -481,7 +481,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
  {
struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
struct radeon_ttm_tt *gtt = (void *)ttm;
-   unsigned pinned = 0, nents;
+   unsigned pinned = 0;
int r;
  
  	int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY);

@@ -522,8 +522,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
goto release_sg;
  
  	r = -ENOMEM;

-   nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   ttm->sg->nents = dma_map_sg(rdev->dev, ttm->sg->sgl,
+   if (ttm->sg->nents == 0)
goto release_sg;
  
  	drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,

@@ -554,9 +554,9 @@ static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
return;
  
  	/* free the sg table and pages again */

-   dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
+   dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction);
  
-	for_each_sg_page(ttm->sg->sgl, _iter, ttm->sg->nents, 0) {

+   for_each_sg_page(ttm->sg->sgl, _iter, ttm->sg->orig_nents, 0) {
struct page *page = sg_page_iter_page(_iter);
if (!(gtt->userflags & RADEON_GEM_USERPTR_READONLY))
set_page_dirty(page);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 02/17] drm: amdgpu: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Christian König

Am 28.04.20 um 15:19 schrieb Marek Szyprowski:

The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 7 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 
  2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 43d8ed7..4df813e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -307,8 +307,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (IS_ERR(sgt))
return sgt;
  
-		if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,

- DMA_ATTR_SKIP_CPU_SYNC))
+   sgt->nents = dma_map_sg_attrs(attach->dev, sgt->sgl, 
sgt->orig_nents,
+ dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (!sgt->nents)
goto error_free;
break;
  
@@ -349,7 +350,7 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment *attach,

struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
  
  	if (sgt->sgl->page_link) {

-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir);
sg_free_table(sgt);
kfree(sgt);
} else {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d5543c2..5f31585 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1043,7 +1043,6 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
  {
struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
struct amdgpu_ttm_tt *gtt = (void *)ttm;
-   unsigned nents;
int r;
  
  	int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);

@@ -1059,8 +1058,9 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
  
  	/* Map SG to device */

r = -ENOMEM;
-   nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   ttm->sg->nents = dma_map_sg(adev->dev, ttm->sg->sgl,
+   ttm->sg->orig_nents, direction);
+   if (ttm->sg->nents == 0)
goto release_sg;
  
  	/* convert SG to linear array of pages and dma addresses */

@@ -1091,7 +1091,7 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt 
*ttm)
return;
  
  	/* unmap the pages mapped to the device */

-   dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
+   dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction);
  
  	sg_free_table(ttm->sg);
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Intel-gfx] [RFC 06/17] drm: i915: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Tvrtko Ursulin



On 28/04/2020 14:19, Marek Szyprowski wrote:

The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   | 13 +++--
  drivers/gpu/drm/i915/gem/i915_gem_internal.c |  4 ++--
  drivers/gpu/drm/i915/gem/i915_gem_region.c   |  4 ++--
  drivers/gpu/drm/i915/gem/i915_gem_shmem.c|  5 +++--
  drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 10 +-
  drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c |  5 +++--
  drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++--
  drivers/gpu/drm/i915/i915_gem_gtt.c  | 12 +++-
  drivers/gpu/drm/i915/i915_scatterlist.c  |  4 ++--
  drivers/gpu/drm/i915/selftests/scatterlist.c |  8 
  10 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 7db5a79..d829852 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -36,21 +36,22 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
goto err_unpin_pages;
}
  
-	ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);

+   ret = sg_alloc_table(st, obj->mm.pages->orig_nents, GFP_KERNEL);
if (ret)
goto err_free;
  
  	src = obj->mm.pages->sgl;

dst = st->sgl;
-   for (i = 0; i < obj->mm.pages->nents; i++) {
+   for (i = 0; i < obj->mm.pages->orig_nents; i++) {
sg_set_page(dst, sg_page(src), src->length, 0);
dst = sg_next(dst);
src = sg_next(src);
}
  
-	if (!dma_map_sg_attrs(attachment->dev,

- st->sgl, st->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
+   st->nents = dma_map_sg_attrs(attachment->dev,
+st->sgl, st->orig_nents, dir,
+DMA_ATTR_SKIP_CPU_SYNC);
+   if (!st->nents) {
ret = -ENOMEM;
goto err_free_sg;
}
@@ -74,7 +75,7 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
  
  	dma_unmap_sg_attrs(attachment->dev,

-  sg->sgl, sg->nents, dir,
+  sg->sgl, sg->orig_nents, dir,
   DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index cbbff81..a8ebfdd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -73,7 +73,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
}
  
  	sg = st->sgl;

-   st->nents = 0;
+   st->nents = st->orig_nents = 0;
sg_page_sizes = 0;
  
  	do {

@@ -94,7 +94,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
  
  		sg_set_page(sg, page, PAGE_SIZE << order, 0);

sg_page_sizes |= PAGE_SIZE << order;
-   st->nents++;
+   st->nents = st->orig_nents = st->nents + 1;
  
  		npages -= 1 << order;

if (!npages) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c 
b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index 1515384..58ca560 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -53,7 +53,7 @@
GEM_BUG_ON(list_empty(blocks));
  
  	sg = st->sgl;

-   st->nents = 0;
+   st->nents = st->orig_nents = 0;
sg_page_sizes = 0;
prev_end = (resource_size_t)-1;
  
@@ -78,7 +78,7 @@
  
  			sg->length = block_size;
  
-			st->nents++;

+   st->nents = st->orig_nents = st->nents + 1;
} else {
sg->length += block_size;
sg_dma_len(sg) += block_size;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 5d5d7ee..851a732 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -80,7 +80,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
  
  	sg = st->sgl;

-   st->nents = 0;
+   st->nents = st->orig_nents = 0;
sg_page_sizes = 0;
for (i = 0; i < page_count; i++) {
 

Re: [RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-04-28 Thread Christoph Hellwig
On Tue, Apr 28, 2020 at 03:19:48PM +0200, Marek Szyprowski wrote:
> 1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use
>a proper sg_table entries and call respective DMA-mapping functions
>and adapt current code to it

That sounds reasonable to me.  Those could be pretty trivial wrappers.

>
> 
> 2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state
>which one refers to which part of the scatterlist; I'm open for
>other names for those entries

nr_cpu_ents and nr_dma_ents might be better names, but it still would be
a whole lot of churn for little gain.  I think just good wrappers like
suggested above might be more helpful.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 08/17] drm: msm: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/msm/msm_gem.c   | 8 
 drivers/gpu/drm/msm/msm_iommu.c | 3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 5a6a79f..54c3bbb 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -54,10 +54,10 @@ static void sync_for_device(struct msm_gem_object *msm_obj)
 
if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
dma_sync_sg_for_device(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL);
} else {
dma_map_sg(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL);
}
 }
 
@@ -67,10 +67,10 @@ static void sync_for_cpu(struct msm_gem_object *msm_obj)
 
if (get_dma_ops(dev) && IS_ENABLED(CONFIG_ARM64)) {
dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL);
} else {
dma_unmap_sg(dev, msm_obj->sgt->sgl,
-   msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
+   msm_obj->sgt->orig_nents, DMA_BIDIRECTIONAL);
}
 }
 
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index ad58cfe..b0ca084 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -43,7 +43,8 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
size_t ret;
 
-   ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
+   ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->orig_nents,
+  prot);
WARN_ON(!ret);
 
return (ret == len) ? 0 : -EINVAL;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 00/17] DRM: fix struct sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
Dear All,

During the Exynos DRM GEM rework and fixing the issues in the 
drm_prime_sg_to_page_addr_arrays() function [1] I've noticed that most
drivers in DRM framework incorrectly use nents and orig_nents entries of
the struct sg_table.

In case of the most DMA-mapping implementations exchanging those two
entries or using nents for all loops on the scatterlist is harmless,
because they both have the same value. There exists however a DMA-mapping
implementations, for which such incorrect usage breaks things. The nents
returned by dma_map_sg() might be lower than the nents passed as its
parameter and this is perfectly fine. DMA framework or IOMMU is allowed
to join consecutive chunks while mapping if such operation is supported
by the underlying HW (bus, bridge, IOMMU, etc). Example of the case
where dma_map_sg() might return 1 'DMA' chunk for the 4 'physical' pages
is described here [2]

The DMA-mapping framework documentation [3] states that dma_map_sg()
returns the numer of the created entries in the DMA address space.
However the subsequent calls to dma_sync_sg_for_{device,cpu} and
dma_unmap_sg must be called with the original number of entries passed to
dma_map_sg. The common pattern in DRM drivers were to assign the
dma_map_sg() return value to sg_table->nents and use that value for
the subsequent calls to dma_sync_sg_* or dma_unmap_sg functions. Also
the code iterated over nents times to access the pages stored in the
processed scatterlist, while it should use orig_nents as the numer of
the page entries.

I've tried to identify all such incorrect usage of sg_table->nents in the
DRM subsystem and this is a result of my research. It looks that the
incorrect pattern has been copied over the many drivers. Too bad in most
cases it even worked correctly if the system used simple, linear
DMA-mapping implementation, for which swapping nents and orig_nents
doesn't make any difference.

I wonder what to do to avoid such misuse in the future, as this case
clearly shows that the current sg_table structure is a bit hard to
understand. I have the following ideas and I would like to know your
opinion before I prepare more patches and check sg_table usage all over
the kernel:

1. introduce a dma_{map,sync,unmap}_sgtable() wrappers, which will use
   a proper sg_table entries and call respective DMA-mapping functions
   and adapt current code to it

2. rename nents and orig_nents to nr_pages, nr_dmas to clearly state
   which one refers to which part of the scatterlist; I'm open for
   other names for those entries

I will appreciate your comments.

Patches are based on top of Linux next-20200428. I've reduced the
recipients list mainly to mailing lists, the next version I will try to
send to the maintainers of the respective drivers.

Best regards,
Marek Szyprowski


References:

[1] https://lkml.org/lkml/2020/3/27/555 
[2] https://lkml.org/lkml/2020/3/29/65
[3] Documentation/DMA-API-HOWTO.txt


Patch summary:

Marek Szyprowski (17):
  drm: core: fix sg_table nents vs. orig_nents usage
  drm: amdgpu: fix sg_table nents vs. orig_nents usage
  drm: armada: fix sg_table nents vs. orig_nents usage
  drm: etnaviv: fix sg_table nents vs. orig_nents usage
  drm: exynos: fix sg_table nents vs. orig_nents usage
  drm: i915: fix sg_table nents vs. orig_nents usage
  drm: lima: fix sg_table nents vs. orig_nents usage
  drm: msm: fix sg_table nents vs. orig_nents usage
  drm: panfrost: fix sg_table nents vs. orig_nents usage
  drm: radeon: fix sg_table nents vs. orig_nents usage
  drm: rockchip: fix sg_table nents vs. orig_nents usage
  drm: tegra: fix sg_table nents vs. orig_nents usage
  drm: virtio: fix sg_table nents vs. orig_nents usage
  drm: vmwgfx: fix sg_table nents vs. orig_nents usage
  drm: xen: fix sg_table nents vs. orig_nents usage
  drm: host1x: fix sg_table nents vs. orig_nents usage
  dmabuf: fix sg_table nents vs. orig_nents usage

 drivers/dma-buf/heaps/heap-helpers.c |  7 ---
 drivers/dma-buf/udmabuf.c|  5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  |  7 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  8 
 drivers/gpu/drm/armada/armada_gem.c  | 14 -
 drivers/gpu/drm/drm_cache.c  |  2 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c   |  7 ---
 drivers/gpu/drm/drm_prime.c  |  9 +
 drivers/gpu/drm/etnaviv/etnaviv_gem.c| 10 ++
 drivers/gpu/drm/exynos/exynos_drm_g2d.c  |  7 ---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   | 13 ++--
 drivers/gpu/drm/i915/gem/i915_gem_internal.c |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_region.c   |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c|  5 +++--
 drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 10 +-
 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c |  5 +++--
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++--
 drivers/g

[RFC 12/17] drm: tegra: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/tegra/gem.c   | 25 +
 drivers/gpu/drm/tegra/plane.c | 13 +++--
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 6237681..5710ab4 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -98,8 +98,8 @@ static struct sg_table *tegra_bo_pin(struct device *dev, 
struct host1x_bo *bo,
 * the SG table needs to be copied to avoid overwriting any
 * other potential users of the original SG table.
 */
-   err = sg_alloc_table_from_sg(sgt, obj->sgt->sgl, 
obj->sgt->nents,
-GFP_KERNEL);
+   err = sg_alloc_table_from_sg(sgt, obj->sgt->sgl,
+obj->sgt->orig_nents, GFP_KERNEL);
if (err < 0)
goto free;
} else {
@@ -197,7 +197,7 @@ static int tegra_bo_iommu_map(struct tegra_drm *tegra, 
struct tegra_bo *bo)
bo->iova = bo->mm->start;
 
bo->size = iommu_map_sg(tegra->domain, bo->iova, bo->sgt->sgl,
-   bo->sgt->nents, prot);
+   bo->sgt->orig_nents, prot);
if (!bo->size) {
dev_err(tegra->drm->dev, "failed to map buffer\n");
err = -ENOMEM;
@@ -264,7 +264,7 @@ static struct tegra_bo *tegra_bo_alloc_object(struct 
drm_device *drm,
 static void tegra_bo_free(struct drm_device *drm, struct tegra_bo *bo)
 {
if (bo->pages) {
-   dma_unmap_sg(drm->dev, bo->sgt->sgl, bo->sgt->nents,
+   dma_unmap_sg(drm->dev, bo->sgt->sgl, bo->sgt->orig_nents,
 DMA_FROM_DEVICE);
drm_gem_put_pages(>gem, bo->pages, true, true);
sg_free_table(bo->sgt);
@@ -290,9 +290,9 @@ static int tegra_bo_get_pages(struct drm_device *drm, 
struct tegra_bo *bo)
goto put_pages;
}
 
-   err = dma_map_sg(drm->dev, bo->sgt->sgl, bo->sgt->nents,
-DMA_FROM_DEVICE);
-   if (err == 0) {
+   bo->sgt->nents = dma_map_sg(drm->dev, bo->sgt->sgl, bo->sgt->orig_nents,
+   DMA_FROM_DEVICE);
+   if (bo->sgt->nents == 0) {
err = -EFAULT;
goto free_sgt;
}
@@ -571,7 +571,8 @@ int tegra_drm_mmap(struct file *file, struct vm_area_struct 
*vma)
goto free;
}
 
-   if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
+   sgt->nents = dma_map_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir);
+   if (sgt->nents == 0)
goto free;
 
return sgt;
@@ -590,7 +591,7 @@ static void tegra_gem_prime_unmap_dma_buf(struct 
dma_buf_attachment *attach,
struct tegra_bo *bo = to_tegra_bo(gem);
 
if (bo->pages)
-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir);
 
sg_free_table(sgt);
kfree(sgt);
@@ -609,7 +610,7 @@ static int tegra_gem_prime_begin_cpu_access(struct dma_buf 
*buf,
struct drm_device *drm = gem->dev;
 
if (bo->pages)
-   dma_sync_sg_for_cpu(drm->dev, bo->sgt->sgl, bo->sgt->nents,
+   dma_sync_sg_for_cpu(drm->dev, bo->sgt->sgl, bo->sgt->orig_nents,
DMA_FROM_DEVICE);
 
return 0;
@@ -623,8 +624,8 @@ static int tegra_gem_prime_end_cpu_access(struct dma_buf 
*buf,
struct drm_device *drm = gem->dev;
 
if (bo->pages)
-   dma_sync_sg_for_device(drm->dev, bo->sgt->sgl, bo->sgt->nents,
-  DMA_TO_DEVICE);
+   dma_sync_sg_for_device(drm->dev, bo->sgt->sgl,
+  bo->sgt->orig_nents, DMA_TO_DEVICE);
 
return 0;
 }
diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c
index 9ccfb56..3982bf8 100644
--- a/drivers/gpu/drm/tegra/plane.c
+++ b/drivers/gpu/drm/tegra/plane.c
@@ -130,9 +130,10 @@ static int tegra_dc_pin(struct tegra_dc *dc, struct 
tegra_plane_state *state)
}
 
if (sgt) {
-   err = dma_map_sg(dc->dev, sgt->sgl, sgt->nents,
-DMA_TO_DEVICE);
-   if (err == 0) {
+   sgt->nents = dma_map_sg(dc->dev, sgt->sgl,
+   

[RFC 17/17] dmabuf: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/dma-buf/heaps/heap-helpers.c | 7 ---
 drivers/dma-buf/udmabuf.c| 5 +++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/dma-buf/heaps/heap-helpers.c 
b/drivers/dma-buf/heaps/heap-helpers.c
index 9f964ca..b923863 100644
--- a/drivers/dma-buf/heaps/heap-helpers.c
+++ b/drivers/dma-buf/heaps/heap-helpers.c
@@ -144,8 +144,9 @@ struct sg_table *dma_heap_map_dma_buf(struct 
dma_buf_attachment *attachment,
 
table = >table;
 
-   if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
-   direction))
+   table->nents = dma_map_sg(attachment->dev, table->sgl,
+ table->orig_nents, direction);
+   if (!table->nents)
table = ERR_PTR(-ENOMEM);
return table;
 }
@@ -154,7 +155,7 @@ static void dma_heap_unmap_dma_buf(struct 
dma_buf_attachment *attachment,
   struct sg_table *table,
   enum dma_data_direction direction)
 {
-   dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
+   dma_unmap_sg(attachment->dev, table->sgl, table->orig_nents, direction);
 }
 
 static vm_fault_t dma_heap_vm_fault(struct vm_fault *vmf)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index acb26c6..ea0cf71 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -63,7 +63,8 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
GFP_KERNEL);
if (ret < 0)
goto err;
-   if (!dma_map_sg(dev, sg->sgl, sg->nents, direction)) {
+   sg->nents = dma_map_sg(dev, sg->sgl, sg->orig_nents, direction);
+   if (!sg->nents) {
ret = -EINVAL;
goto err;
}
@@ -78,7 +79,7 @@ static struct sg_table *get_sg_table(struct device *dev, 
struct dma_buf *buf,
 static void put_sg_table(struct device *dev, struct sg_table *sg,
 enum dma_data_direction direction)
 {
-   dma_unmap_sg(dev, sg->sgl, sg->nents, direction);
+   dma_unmap_sg(dev, sg->sgl, sg->orig_nents, direction);
sg_free_table(sg);
kfree(sg);
 }
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 03/17] drm: armada: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/armada/armada_gem.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index 976685f..749647f 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -407,8 +407,10 @@ int armada_gem_pwrite_ioctl(struct drm_device *dev, void 
*data,
sg_set_page(sg, page, PAGE_SIZE, 0);
}
 
-   if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0) {
-   num = sgt->nents;
+   sgt->nents = dma_map_sg(attach->dev, sgt->sgl, sgt->orig_nents,
+   dir);
+   if (sgt->nents == 0) {
+   num = sgt->orig_nents;
goto release;
}
} else if (dobj->page) {
@@ -418,7 +420,9 @@ int armada_gem_pwrite_ioctl(struct drm_device *dev, void 
*data,
 
sg_set_page(sgt->sgl, dobj->page, dobj->obj.size, 0);
 
-   if (dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir) == 0)
+   sgt->nents = dma_map_sg(attach->dev, sgt->sgl, sgt->orig_nents,
+   dir);
+   if (sgt->nents == 0)
goto free_table;
} else if (dobj->linear) {
/* Single contiguous physical region - no struct page */
@@ -449,11 +453,11 @@ static void armada_gem_prime_unmap_dma_buf(struct 
dma_buf_attachment *attach,
int i;
 
if (!dobj->linear)
-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir);
 
if (dobj->obj.filp) {
struct scatterlist *sg;
-   for_each_sg(sgt->sgl, sg, sgt->nents, i)
+   for_each_sg(sgt->sgl, sg, sgt->orig_nents, i)
put_page(sg_page(sg));
}
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 13/17] drm: virtio: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/virtio/virtgpu_object.c | 11 ++-
 drivers/gpu/drm/virtio/virtgpu_vq.c |  8 
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
b/drivers/gpu/drm/virtio/virtgpu_object.c
index 6ccbd01..12f6bee 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -73,7 +73,8 @@ void virtio_gpu_cleanup_object(struct virtio_gpu_object *bo)
if (shmem->pages) {
if (shmem->mapped) {
dma_unmap_sg(vgdev->vdev->dev.parent,
-shmem->pages->sgl, shmem->mapped,
+shmem->pages->sgl,
+shmem->pages->orig_nents,
 DMA_TO_DEVICE);
shmem->mapped = 0;
}
@@ -157,13 +158,13 @@ static int virtio_gpu_object_shmem_init(struct 
virtio_gpu_device *vgdev,
}
 
if (use_dma_api) {
-   shmem->mapped = dma_map_sg(vgdev->vdev->dev.parent,
+   shmem->pages->nents = dma_map_sg(vgdev->vdev->dev.parent,
   shmem->pages->sgl,
-  shmem->pages->nents,
+  shmem->pages->orig_nents,
   DMA_TO_DEVICE);
-   *nents = shmem->mapped;
+   *nents = shmem->mapped = shmem->pages->nents;
} else {
-   *nents = shmem->pages->nents;
+   *nents = shmem->pages->orig_nents;
}
 
*ents = kmalloc_array(*nents, sizeof(struct virtio_gpu_mem_entry),
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 9e663a5..661325b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -604,8 +604,8 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct 
virtio_gpu_device *vgdev,
 
if (use_dma_api)
dma_sync_sg_for_device(vgdev->vdev->dev.parent,
-  shmem->pages->sgl, shmem->pages->nents,
-  DMA_TO_DEVICE);
+  shmem->pages->sgl,
+  shmem->pages->orig_nents, DMA_TO_DEVICE);
 
cmd_p = virtio_gpu_alloc_cmd(vgdev, , sizeof(*cmd_p));
memset(cmd_p, 0, sizeof(*cmd_p));
@@ -1020,8 +1020,8 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct 
virtio_gpu_device *vgdev,
 
if (use_dma_api)
dma_sync_sg_for_device(vgdev->vdev->dev.parent,
-  shmem->pages->sgl, shmem->pages->nents,
-  DMA_TO_DEVICE);
+  shmem->pages->sgl,
+  shmem->pages->orig_nents, DMA_TO_DEVICE);
 
cmd_p = virtio_gpu_alloc_cmd(vgdev, , sizeof(*cmd_p));
memset(cmd_p, 0, sizeof(*cmd_p));
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 14/17] drm: vmwgfx: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index bf0bc46..a5fd128 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -362,7 +362,7 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
 {
struct device *dev = vmw_tt->dev_priv->dev->dev;
 
-   dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
+   dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.orig_nents,
DMA_BIDIRECTIONAL);
vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
 }
@@ -449,10 +449,10 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
if (unlikely(ret != 0))
goto out_sg_alloc_fail;
 
-   if (vsgt->num_pages > vmw_tt->sgt.nents) {
+   if (vsgt->num_pages > vmw_tt->sgt.orig_nents) {
uint64_t over_alloc =
sgl_size * (vsgt->num_pages -
-   vmw_tt->sgt.nents);
+   vmw_tt->sgt.orig_nents);
 
ttm_mem_global_free(glob, over_alloc);
vmw_tt->sg_alloc_size -= over_alloc;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 09/17] drm: panfrost: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 3 ++-
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 17b654e..22fec7c 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -42,7 +42,8 @@ static void panfrost_gem_free_object(struct drm_gem_object 
*obj)
for (i = 0; i < n_sgt; i++) {
if (bo->sgts[i].sgl) {
dma_unmap_sg(pfdev->dev, bo->sgts[i].sgl,
-bo->sgts[i].nents, 
DMA_BIDIRECTIONAL);
+bo->sgts[i].orig_nents,
+DMA_BIDIRECTIONAL);
sg_free_table(>sgts[i]);
}
}
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index ed28aeb..2d9b1f9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -517,7 +517,9 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
if (ret)
goto err_pages;
 
-   if (!dma_map_sg(pfdev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL)) {
+   sgt->nents = dma_map_sg(pfdev->dev, sgt->sgl, sgt->orig_nents,
+   DMA_BIDIRECTIONAL);
+   if (!sgt->nents) {
ret = -EINVAL;
goto err_map;
}
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 05/17] drm: exynos: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index fcee33a..e27715c 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -396,7 +396,7 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 
 out:
dma_unmap_sg(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt->sgl,
-   g2d_userptr->sgt->nents, DMA_BIDIRECTIONAL);
+g2d_userptr->sgt->orig_nents, DMA_BIDIRECTIONAL);
 
pages = frame_vector_pages(g2d_userptr->vec);
if (!IS_ERR(pages)) {
@@ -511,8 +511,9 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data 
*g2d,
 
g2d_userptr->sgt = sgt;
 
-   if (!dma_map_sg(to_dma_dev(g2d->drm_dev), sgt->sgl, sgt->nents,
-   DMA_BIDIRECTIONAL)) {
+   sgt->nents = dma_map_sg(to_dma_dev(g2d->drm_dev), sgt->sgl,
+   sgt->orig_nents, DMA_BIDIRECTIONAL)
+   if (!sgt->nents) {
DRM_DEV_ERROR(g2d->dev, "failed to map sgt with dma region.\n");
ret = -ENOMEM;
goto err_sg_free_table;
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 04/17] drm: etnaviv: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index dc9ef30..a224a97 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -27,7 +27,8 @@ static void etnaviv_gem_scatter_map(struct etnaviv_gem_object 
*etnaviv_obj)
 * because display controller, GPU, etc. are not coherent.
 */
if (etnaviv_obj->flags & ETNA_BO_CACHE_MASK)
-   dma_map_sg(dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+   sgt->nents = dma_map_sg(dev->dev, sgt->sgl, sgt->orig_nents,
+   DMA_BIDIRECTIONAL);
 }
 
 static void etnaviv_gem_scatterlist_unmap(struct etnaviv_gem_object 
*etnaviv_obj)
@@ -51,7 +52,8 @@ static void etnaviv_gem_scatterlist_unmap(struct 
etnaviv_gem_object *etnaviv_obj
 * discard those writes.
 */
if (etnaviv_obj->flags & ETNA_BO_CACHE_MASK)
-   dma_unmap_sg(dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+   dma_unmap_sg(dev->dev, sgt->sgl, sgt->orig_nents,
+DMA_BIDIRECTIONAL);
 }
 
 /* called with etnaviv_obj->lock held */
@@ -405,7 +407,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
 
if (etnaviv_obj->flags & ETNA_BO_CACHED) {
dma_sync_sg_for_cpu(dev->dev, etnaviv_obj->sgt->sgl,
-   etnaviv_obj->sgt->nents,
+   etnaviv_obj->sgt->orig_nents,
etnaviv_op_to_dma_dir(op));
etnaviv_obj->last_cpu_prep_op = op;
}
@@ -422,7 +424,7 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
/* fini without a prep is almost certainly a userspace error */
WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
dma_sync_sg_for_device(dev->dev, etnaviv_obj->sgt->sgl,
-   etnaviv_obj->sgt->nents,
+   etnaviv_obj->sgt->orig_nents,
etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
etnaviv_obj->last_cpu_prep_op = 0;
}
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 15/17] drm: xen: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/xen/xen_drm_front_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index f0b85e0..ba4bdc5 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -215,7 +215,7 @@ struct drm_gem_object *
return ERR_PTR(ret);
 
DRM_DEBUG("Imported buffer of size %zu with nents %u\n",
- size, sgt->nents);
+ size, sgt->orig_nents);
 
return _obj->base;
 }
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 10/17] drm: radeon: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5d50c9e..4770880 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -481,7 +481,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 {
struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
struct radeon_ttm_tt *gtt = (void *)ttm;
-   unsigned pinned = 0, nents;
+   unsigned pinned = 0;
int r;
 
int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY);
@@ -522,8 +522,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
goto release_sg;
 
r = -ENOMEM;
-   nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   ttm->sg->nents = dma_map_sg(rdev->dev, ttm->sg->sgl,
+   if (ttm->sg->nents == 0)
goto release_sg;
 
drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
@@ -554,9 +554,9 @@ static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
return;
 
/* free the sg table and pages again */
-   dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
+   dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction);
 
-   for_each_sg_page(ttm->sg->sgl, _iter, ttm->sg->nents, 0) {
+   for_each_sg_page(ttm->sg->sgl, _iter, ttm->sg->orig_nents, 0) {
struct page *page = sg_page_iter_page(_iter);
if (!(gtt->userflags & RADEON_GEM_USERPTR_READONLY))
set_page_dirty(page);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 16/17] drm: host1x: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/host1x/job.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index a10643a..3ea185e 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -166,8 +166,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
goto unpin;
}
 
-   err = dma_map_sg(dev, sgt->sgl, sgt->nents, dir);
-   if (!err) {
+   sgt->nents = dma_map_sg(dev, sgt->sgl, sgt->orig_nents,
+   dir);
+   if (!sgt->nents) {
err = -ENOMEM;
goto unpin;
}
@@ -217,7 +218,7 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
}
 
if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && host->domain) {
-   for_each_sg(sgt->sgl, sg, sgt->nents, j)
+   for_each_sg(sgt->sgl, sg, sgt->orig_nents, j)
gather_size += sg->length;
gather_size = iova_align(>iova, gather_size);
 
@@ -231,7 +232,7 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
 
err = iommu_map_sg(host->domain,
iova_dma_addr(>iova, alloc),
-   sgt->sgl, sgt->nents, IOMMU_READ);
+   sgt->sgl, sgt->orig_nents, IOMMU_READ);
if (err == 0) {
__free_iova(>iova, alloc);
err = -EINVAL;
@@ -241,9 +242,9 @@ static unsigned int pin_job(struct host1x *host, struct 
host1x_job *job)
job->unpins[job->num_unpins].size = gather_size;
phys_addr = iova_dma_addr(>iova, alloc);
} else if (sgt) {
-   err = dma_map_sg(host->dev, sgt->sgl, sgt->nents,
-DMA_TO_DEVICE);
-   if (!err) {
+   sgt->nents = dma_map_sg(host->dev, sgt->sgl,
+   sgt->orig_nents, DMA_TO_DEVICE);
+   if (!sgt->nents) {
err = -ENOMEM;
goto unpin;
}
@@ -647,7 +648,7 @@ void host1x_job_unpin(struct host1x_job *job)
}
 
if (unpin->dev && sgt)
-   dma_unmap_sg(unpin->dev, sgt->sgl, sgt->nents,
+   dma_unmap_sg(unpin->dev, sgt->sgl, sgt->orig_nents,
 unpin->dir);
 
host1x_bo_unpin(dev, unpin->bo, sgt);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 01/17] drm: core: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/drm_cache.c| 2 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c | 7 ---
 drivers/gpu/drm/drm_prime.c| 9 +
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 03e01b0..63bd497 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -127,7 +127,7 @@ static void drm_cache_flush_clflush(struct page *pages[],
struct sg_page_iter sg_iter;
 
mb(); /*CLFLUSH is ordered only by using memory barriers*/
-   for_each_sg_page(st->sgl, _iter, st->nents, 0)
+   for_each_sg_page(st->sgl, _iter, st->orig_nents, 0)
drm_clflush_page(sg_page_iter_page(_iter));
mb(); /*Make sure that all cache line entry is flushed*/
 
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index df31e57..f47caa7 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -118,7 +118,7 @@ void drm_gem_shmem_free_object(struct drm_gem_object *obj)
} else {
if (shmem->sgt) {
dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
-shmem->sgt->nents, DMA_BIDIRECTIONAL);
+shmem->sgt->orig_nents, DMA_BIDIRECTIONAL);
sg_free_table(shmem->sgt);
kfree(shmem->sgt);
}
@@ -396,7 +396,7 @@ void drm_gem_shmem_purge_locked(struct drm_gem_object *obj)
WARN_ON(!drm_gem_shmem_is_purgeable(shmem));
 
dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
-shmem->sgt->nents, DMA_BIDIRECTIONAL);
+shmem->sgt->orig_nents, DMA_BIDIRECTIONAL);
sg_free_table(shmem->sgt);
kfree(shmem->sgt);
shmem->sgt = NULL;
@@ -623,7 +623,8 @@ struct sg_table *drm_gem_shmem_get_pages_sgt(struct 
drm_gem_object *obj)
goto err_put_pages;
}
/* Map the pages for use by the h/w. */
-   dma_map_sg(obj->dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+   sgt->nents = dma_map_sg(obj->dev->dev, sgt->sgl, sgt->orig_nents,
+   DMA_BIDIRECTIONAL);
 
shmem->sgt = sgt;
 
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 282774e..f3e2d2a 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -626,8 +626,9 @@ struct sg_table *drm_gem_map_dma_buf(struct 
dma_buf_attachment *attach,
else
sgt = obj->dev->driver->gem_prime_get_sg_table(obj);
 
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
+   sgt->nents = dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->orig_nents,
+ dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (!sgt->nents) {
sg_free_table(sgt);
kfree(sgt);
sgt = ERR_PTR(-ENOMEM);
@@ -652,7 +653,7 @@ void drm_gem_unmap_dma_buf(struct dma_buf_attachment 
*attach,
if (!sgt)
return;
 
-   dma_unmap_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+   dma_unmap_sg_attrs(attach->dev, sgt->sgl, sgt->orig_nents, dir,
   DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sgt);
kfree(sgt);
@@ -975,7 +976,7 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, 
struct page **pages,
 */
page_index = 0;
dma_index = 0;
-   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
+   for_each_sg(sgt->sgl, sg, sgt->orig_nents, count) {
page_len = sg->length;
page = sg_page(sg);
dma_len = sg_dma_len(sg);
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 02/17] drm: amdgpu: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 7 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 43d8ed7..4df813e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -307,8 +307,9 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (IS_ERR(sgt))
return sgt;
 
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC))
+   sgt->nents = dma_map_sg_attrs(attach->dev, sgt->sgl, 
sgt->orig_nents,
+ dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (!sgt->nents)
goto error_free;
break;
 
@@ -349,7 +350,7 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment 
*attach,
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 
if (sgt->sgl->page_link) {
-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sg(attach->dev, sgt->sgl, sgt->orig_nents, dir);
sg_free_table(sgt);
kfree(sgt);
} else {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d5543c2..5f31585 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1043,7 +1043,6 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
struct amdgpu_ttm_tt *gtt = (void *)ttm;
-   unsigned nents;
int r;
 
int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);
@@ -1059,8 +1058,9 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 
/* Map SG to device */
r = -ENOMEM;
-   nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   ttm->sg->nents = dma_map_sg(adev->dev, ttm->sg->sgl,
+   ttm->sg->orig_nents, direction);
+   if (ttm->sg->nents == 0)
goto release_sg;
 
/* convert SG to linear array of pages and dma addresses */
@@ -1091,7 +1091,7 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt 
*ttm)
return;
 
/* unmap the pages mapped to the device */
-   dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
+   dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->orig_nents, direction);
 
sg_free_table(ttm->sg);
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 07/17] drm: lima: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/lima/lima_gem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 5404e0d..3edd2ff 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -70,7 +70,7 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm)
 
if (bo->base.sgt) {
dma_unmap_sg(dev, bo->base.sgt->sgl,
-bo->base.sgt->nents, DMA_BIDIRECTIONAL);
+bo->base.sgt->orig_nents, DMA_BIDIRECTIONAL);
sg_free_table(bo->base.sgt);
} else {
bo->base.sgt = kmalloc(sizeof(*bo->base.sgt), GFP_KERNEL);
@@ -80,7 +80,7 @@ int lima_heap_alloc(struct lima_bo *bo, struct lima_vm *vm)
}
}
 
-   dma_map_sg(dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL);
+   sgt.nents = dma_map_sg(dev, sgt.sgl, sgt.orig_nents, DMA_BIDIRECTIONAL);
 
*bo->base.sgt = sgt;
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 11/17] drm: rockchip: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index 0d18846..a024c71 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -37,7 +37,7 @@ static int rockchip_gem_iommu_map(struct rockchip_gem_object 
*rk_obj)
rk_obj->dma_addr = rk_obj->mm.start;
 
ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl,
-  rk_obj->sgt->nents, prot);
+  rk_obj->sgt->orig_nents, prot);
if (ret < rk_obj->base.size) {
DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n",
  ret, rk_obj->base.size);
@@ -98,11 +98,11 @@ static int rockchip_gem_get_pages(struct 
rockchip_gem_object *rk_obj)
 * TODO: Replace this by drm_clflush_sg() once it can be implemented
 * without relying on symbols that are not exported.
 */
-   for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->nents, i)
+   for_each_sg(rk_obj->sgt->sgl, s, rk_obj->sgt->orig_nents, i)
sg_dma_address(s) = sg_phys(s);
 
-   dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl, rk_obj->sgt->nents,
-  DMA_TO_DEVICE);
+   dma_sync_sg_for_device(drm->dev, rk_obj->sgt->sgl,
+  rk_obj->sgt->orig_nents, DMA_TO_DEVICE);
 
return 0;
 
@@ -351,7 +351,8 @@ void rockchip_gem_free_object(struct drm_gem_object *obj)
rockchip_gem_iommu_unmap(rk_obj);
} else {
dma_unmap_sg(drm->dev, rk_obj->sgt->sgl,
-rk_obj->sgt->nents, DMA_BIDIRECTIONAL);
+rk_obj->sgt->orig_nents,
+DMA_BIDIRECTIONAL);
}
drm_prime_gem_destroy(obj, rk_obj->sgt);
} else {
@@ -493,14 +494,14 @@ static unsigned long 
rockchip_sg_get_contiguous_size(struct sg_table *sgt,
struct sg_table *sg,
struct rockchip_gem_object *rk_obj)
 {
-   int count = dma_map_sg(drm->dev, sg->sgl, sg->nents,
+   int count = dma_map_sg(drm->dev, sg->sgl, sg->orig_nents,
   DMA_BIDIRECTIONAL);
if (!count)
return -EINVAL;
 
if (rockchip_sg_get_contiguous_size(sg, count) < attach->dmabuf->size) {
DRM_ERROR("failed to map sg_table to contiguous linear 
address.\n");
-   dma_unmap_sg(drm->dev, sg->sgl, sg->nents,
+   dma_unmap_sg(drm->dev, sg->sgl, sg->orig_nents,
 DMA_BIDIRECTIONAL);
return -EINVAL;
}
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 06/17] drm: i915: fix sg_table nents vs. orig_nents misuse

2020-04-28 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that dma_map_sg returns the
numer of the created entries in the DMA address space. However the
subsequent calls to dma_sync_sg_for_{device,cpu} and dma_unmap_sg must be
called with the original number of entries passed to dma_map_sg. The
sg_table->nents in turn holds the result of the dma_map_sg call as stated
in include/linux/scatterlist.h. Adapt the code to obey those rules.

Signed-off-by: Marek Szyprowski 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   | 13 +++--
 drivers/gpu/drm/i915/gem/i915_gem_internal.c |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_region.c   |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c|  5 +++--
 drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 10 +-
 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c |  5 +++--
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 12 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c  | 12 +++-
 drivers/gpu/drm/i915/i915_scatterlist.c  |  4 ++--
 drivers/gpu/drm/i915/selftests/scatterlist.c |  8 
 10 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 7db5a79..d829852 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -36,21 +36,22 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
goto err_unpin_pages;
}
 
-   ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
+   ret = sg_alloc_table(st, obj->mm.pages->orig_nents, GFP_KERNEL);
if (ret)
goto err_free;
 
src = obj->mm.pages->sgl;
dst = st->sgl;
-   for (i = 0; i < obj->mm.pages->nents; i++) {
+   for (i = 0; i < obj->mm.pages->orig_nents; i++) {
sg_set_page(dst, sg_page(src), src->length, 0);
dst = sg_next(dst);
src = sg_next(src);
}
 
-   if (!dma_map_sg_attrs(attachment->dev,
- st->sgl, st->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC)) {
+   st->nents = dma_map_sg_attrs(attachment->dev,
+st->sgl, st->orig_nents, dir,
+DMA_ATTR_SKIP_CPU_SYNC);
+   if (!st->nents) {
ret = -ENOMEM;
goto err_free_sg;
}
@@ -74,7 +75,7 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
 
dma_unmap_sg_attrs(attachment->dev,
-  sg->sgl, sg->nents, dir,
+  sg->sgl, sg->orig_nents, dir,
   DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index cbbff81..a8ebfdd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -73,7 +73,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
}
 
sg = st->sgl;
-   st->nents = 0;
+   st->nents = st->orig_nents = 0;
sg_page_sizes = 0;
 
do {
@@ -94,7 +94,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
sg_set_page(sg, page, PAGE_SIZE << order, 0);
sg_page_sizes |= PAGE_SIZE << order;
-   st->nents++;
+   st->nents = st->orig_nents = st->nents + 1;
 
npages -= 1 << order;
if (!npages) {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c 
b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index 1515384..58ca560 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -53,7 +53,7 @@
GEM_BUG_ON(list_empty(blocks));
 
sg = st->sgl;
-   st->nents = 0;
+   st->nents = st->orig_nents = 0;
sg_page_sizes = 0;
prev_end = (resource_size_t)-1;
 
@@ -78,7 +78,7 @@
 
sg->length = block_size;
 
-   st->nents++;
+   st->nents = st->orig_nents = st->nents + 1;
} else {
sg->length += block_size;
sg_dma_len(sg) += block_size;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 5d5d7ee..851a732 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -80,7 +80,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
 
sg = st->sgl;
-   st->nents = 0;
+   st->nents = st->orig_nents = 0;
sg_page_sizes = 0;
for (i = 0; i 

[PATCH 5/5] virtio: Add bounce DMA ops

2020-04-28 Thread Srivatsa Vaddagiri
For better security, its desirable that a guest VM's memory is
not accessible to any entity that executes outside the context of
guest VM. In case of virtio, backend drivers execute outside the
context of guest VM and in general will need access to complete
guest VM memory.  One option to restrict the access provided to
backend driver is to make use of a bounce buffer. The bounce
buffer is accessible to both backend and frontend drivers. All IO
buffers that are in private space of guest VM are bounced to be
accessible to backend.

This patch proposes a new memory pool to be used for this bounce
purpose, rather than the default swiotlb memory pool. That will
avoid any conflicts that may arise in situations where a VM needs
to use swiotlb pool for driving any pass-through devices (in
which case swiotlb memory needs not be shared with another VM) as
well as virtio devices (which will require swiotlb memory to be
shared with backend VM). As a possible extension to this patch,
we can provide an option for virtio to make use of default
swiotlb memory pool itself, where no such conflicts may exist in
a given deployment.

Signed-off-by: Srivatsa Vaddagiri 
---
 drivers/virtio/Makefile|   2 +-
 drivers/virtio/virtio.c|   2 +
 drivers/virtio/virtio_bounce.c | 150 +
 include/linux/virtio.h |   4 ++
 4 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 drivers/virtio/virtio_bounce.c

diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 29a1386e..3fd3515 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
+obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a977e32..bc2f779 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
 
dev->index = err;
dev_set_name(>dev, "virtio%u", dev->index);
+   virtio_bounce_set_dma_ops(dev);
 
spin_lock_init(>config_lock);
dev->config_enabled = false;
@@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore);
 
 static int virtio_init(void)
 {
+   virtio_map_bounce_buffer();
if (bus_register(_bus) != 0)
panic("virtio bus registration failed");
return 0;
diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
new file mode 100644
index 000..3de8e0e
--- /dev/null
+++ b/drivers/virtio/virtio_bounce.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio DMA ops to bounce buffers
+ *
+ *  Copyright (c) 2020, The Linux Foundation. All rights reserved.
+ *
+ * This module allows bouncing of IO buffers to a region which will be
+ * accessible to backend drivers.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static phys_addr_t bounce_buf_paddr;
+static void *bounce_buf_vaddr;
+static size_t bounce_buf_size;
+struct swiotlb_pool *virtio_pool;
+
+#define VIRTIO_MAX_BOUNCE_SIZE (16*4096)
+
+static void *virtio_alloc_coherent(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs)
+{
+   phys_addr_t addr;
+
+   if (!virtio_pool)
+   return NULL;
+
+   addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX);
+   if (addr == DMA_MAPPING_ERROR)
+   return NULL;
+
+   *dma_handle = (addr - bounce_buf_paddr);
+
+   return bounce_buf_vaddr + (addr - bounce_buf_paddr);
+}
+
+static void virtio_free_coherent(struct device *dev, size_t size, void *vaddr,
+   dma_addr_t dma_handle, unsigned long attrs)
+{
+   phys_addr_t addr = (dma_handle + bounce_buf_paddr);
+
+   swiotlb_free(virtio_pool, addr, size);
+}
+
+static dma_addr_t virtio_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size,
+   enum dma_data_direction dir, unsigned long attrs)
+{
+   void *ptr = page_address(page) + offset;
+   phys_addr_t paddr = virt_to_phys(ptr);
+   dma_addr_t handle;
+
+   if (!virtio_pool)
+   return DMA_MAPPING_ERROR;
+
+   handle = _swiotlb_tbl_map_single(virtio_pool, dev, bounce_buf_paddr,
+paddr, size, size, dir, attrs);
+   if (handle == (phys_addr_t)DMA_MAPPING_ERROR)
+   return DMA_MAPPING_ERROR;
+
+   return handle - bounce_buf_paddr;
+}
+
+static void virtio_unmap_page(struct device *dev, dma_addr_t dev_addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+   phys_addr_t addr = dev_addr + bounce_buf_paddr;
+
+   

[PATCH 3/5] swiotlb: Add alloc and free APIs

2020-04-28 Thread Srivatsa Vaddagiri
Move the memory allocation and free portion of swiotlb driver
into independent routines. They will be useful for drivers that
need swiotlb driver to just allocate/free memory chunks and not
additionally bounce memory.

Signed-off-by: Srivatsa Vaddagiri 
---
 include/linux/swiotlb.h |  17 ++
 kernel/dma/swiotlb.c| 151 
 2 files changed, 106 insertions(+), 62 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index c634b4d..957697e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -186,6 +186,10 @@ void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
+extern phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, size_t alloc_size,
+   unsigned long tbl_dma_addr, unsigned long mask);
+extern void swiotlb_free(struct swiotlb_pool *pool,
+   phys_addr_t tlb_addr, size_t alloc_size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
 
@@ -219,6 +223,19 @@ static inline bool is_swiotlb_active(void)
 {
return false;
 }
+
+static inline phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool,
+   size_t alloc_size, unsigned long tbl_dma_addr,
+   unsigned long mask)
+{
+   return DMA_MAPPING_ERROR;
+}
+
+static inline void swiotlb_free(struct swiotlb_pool *pool,
+   phys_addr_t tlb_addr, size_t alloc_size)
+{
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8cf0b57..7411ce5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -444,37 +444,14 @@ static inline void *tlb_vaddr(struct swiotlb_pool *pool, 
phys_addr_t tlb_addr)
return pool->io_tlb_vstart + (tlb_addr - pool->io_tlb_start);
 }
 
-phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
-  struct device *hwdev,
-  dma_addr_t tbl_dma_addr,
-  phys_addr_t orig_addr,
-  size_t mapping_size,
-  size_t alloc_size,
-  enum dma_data_direction dir,
-  unsigned long attrs)
+phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, size_t alloc_size,
+   unsigned long tbl_dma_addr, unsigned long mask)
 {
unsigned long flags;
phys_addr_t tlb_addr;
-   unsigned int nslots, stride, index, wrap;
-   int i;
-   unsigned long mask;
+   unsigned int i, nslots, stride, index, wrap;
unsigned long offset_slots;
unsigned long max_slots;
-   unsigned long tmp_io_tlb_used;
-
-   if (pool->no_iotlb_memory)
-   panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
-
-   if (mem_encrypt_active())
-   pr_warn_once("Memory encryption is active and system is using 
DMA bounce buffers\n");
-
-   if (mapping_size > alloc_size) {
-   dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: 
%zd bytes)",
- mapping_size, alloc_size);
-   return (phys_addr_t)DMA_MAPPING_ERROR;
-   }
-
-   mask = dma_get_seg_boundary(hwdev);
 
tbl_dma_addr &= mask;
 
@@ -555,54 +532,23 @@ phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool 
*pool,
} while (index != wrap);
 
 not_found:
-   tmp_io_tlb_used = pool->io_tlb_used;
-
spin_unlock_irqrestore(>io_tlb_lock, flags);
-   if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
-   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total 
%lu (slots), used %lu (slots)\n",
-alloc_size, pool->io_tlb_nslabs, tmp_io_tlb_used);
return (phys_addr_t)DMA_MAPPING_ERROR;
+
 found:
pool->io_tlb_used += nslots;
spin_unlock_irqrestore(>io_tlb_lock, flags);
 
-   /*
-* Save away the mapping from the original address to the DMA address.
-* This is needed when we sync the memory.  Then we sync the buffer if
-* needed.
-*/
-   for (i = 0; i < nslots; i++)
-   pool->io_tlb_orig_addr[index+i] = orig_addr +
-   (i << IO_TLB_SHIFT);
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
-   mapping_size, DMA_TO_DEVICE);
-
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
-   struct device *hwdev, phys_addr_t tlb_addr,
-   size_t mapping_size, size_t 

[PATCH 1/5] swiotlb: Introduce concept of swiotlb_pool

2020-04-28 Thread Srivatsa Vaddagiri
Currently swiotlb driver manages a global pool of memory which
acts as bounce buffers for memory that is not accessible to some
devices. The core functions provides by this driver to
allocate/free/bounce memory chunks will be more
useful if this driver can manage more than one pool. An immediate
application of such extension to swiotlb driver is to bounce
virtio buffers between private and shared space of a VM.

This patch introduces the concept of a swiotlb memory pool and
reorganizes current driver to work with the default global pool.
There is no functional change introduced by this patch.
Subsequent patches allow the swiotlb driver to work with more
than one pool of memory.

Signed-off-by: Srivatsa Vaddagiri 
---
 drivers/xen/swiotlb-xen.c |   4 +-
 include/linux/swiotlb.h   | 129 -
 kernel/dma/swiotlb.c  | 359 +++---
 3 files changed, 307 insertions(+), 185 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b6d2776..c2dc9c8 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -190,8 +190,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
/*
 * IO TLB memory already allocated. Just use it.
 */
-   if (io_tlb_start != 0) {
-   xen_io_tlb_start = phys_to_virt(io_tlb_start);
+   if (swiotlb_start()) {
+   xen_io_tlb_start = phys_to_virt(swiotlb_start());
goto end;
}
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 046bb94..8c7843f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -44,7 +44,59 @@ enum dma_sync_target {
SYNC_FOR_DEVICE = 1,
 };
 
-extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
+#define MAX_POOL_NAME_SIZE 16
+
+struct swiotlb_pool {
+   char name[MAX_POOL_NAME_SIZE];
+   bool no_iotlb_memory;
+   int late_alloc;
+
+   spinlock_t io_tlb_lock;
+
+   /*
+* Used to do a quick range check in swiotlb_tbl_unmap_single and
+* swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated
+* by this API.
+*/
+
+   phys_addr_t io_tlb_start, io_tlb_end;
+
+   /*
+* The number of IO TLB blocks (in groups of 64) between io_tlb_start
+* and io_tlb_end.  This is command line adjustable via
+* setup_io_tlb_npages.
+*/
+   unsigned long io_tlb_nslabs;
+
+   /*
+* The number of used IO TLB block
+*/
+   unsigned long io_tlb_used;
+
+   /*
+* This is a free list describing the number of free entries available
+* from each index
+*/
+   unsigned int *io_tlb_list;
+   unsigned int io_tlb_index;
+
+   /*
+* We need to save away the original address corresponding to a mapped
+* entry for the sync operations.
+*/
+   phys_addr_t *io_tlb_orig_addr;
+
+   /*
+* Max segment that we can provide which (if pages are contingous) will
+* not be bounced (unless SWIOTLB_FORCE is set).
+*/
+   unsigned int max_segment;
+};
+
+extern struct swiotlb_pool default_swiotlb_pool;
+
+extern phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
+ struct device *hwdev,
  dma_addr_t tbl_dma_addr,
  phys_addr_t phys,
  size_t mapping_size,
@@ -52,28 +104,80 @@ extern phys_addr_t swiotlb_tbl_map_single(struct device 
*hwdev,
  enum dma_data_direction dir,
  unsigned long attrs);
 
-extern void swiotlb_tbl_unmap_single(struct device *hwdev,
+extern void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
+struct device *hwdev,
 phys_addr_t tlb_addr,
 size_t mapping_size,
 size_t alloc_size,
 enum dma_data_direction dir,
 unsigned long attrs);
 
-extern void swiotlb_tbl_sync_single(struct device *hwdev,
+extern void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool,
+   struct device *hwdev,
phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target);
 
-dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t _swiotlb_map(struct swiotlb_pool *pool, struct device *dev,
+   phys_addr_t phys, size_t size, enum dma_data_direction dir,
+   unsigned long attrs);
+
+static inline phys_addr_t 

[PATCH 4/5] swiotlb: Add API to register new pool

2020-04-28 Thread Srivatsa Vaddagiri
This patch adds an interface for the swiotlb driver to recognize
a new memory pool. Upon successful initialization of the pool,
swiotlb returns a handle, which needs to be passed as an argument
for any future operations on the pool (map/unmap/alloc/free).

Signed-off-by: Srivatsa Vaddagiri 
---
 include/linux/swiotlb.h |  9 
 kernel/dma/swiotlb.c| 60 +
 2 files changed, 69 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 957697e..97ac82a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -182,6 +182,9 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
return paddr >= swiotlb_start() && paddr < swiotlb_end();
 }
 
+extern struct swiotlb_pool *swiotlb_register_pool(char *name,
+   phys_addr_t start, void *vstart, size_t size);
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -236,6 +239,12 @@ static inline void swiotlb_free(struct swiotlb_pool *pool,
 {
 }
 
+static struct swiotlb_pool *swiotlb_register_pool(char *name,
+   phys_addr_t start, void *vstart, size_t size)
+{
+   return NULL;
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7411ce5..9883744 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
@@ -736,6 +737,65 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
 }
 
+struct swiotlb_pool *swiotlb_register_pool(char *name, phys_addr_t start,
+   void *vstart, size_t size)
+{
+   struct swiotlb_pool *pool;
+   unsigned long i, bytes;
+   unsigned long nslabs;
+
+   nslabs = size >> IO_TLB_SHIFT;
+   if (!nslabs)
+   return ERR_PTR(-EINVAL);
+
+   pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+   if (!pool)
+   return ERR_PTR(-ENOMEM);
+
+   bytes = nslabs << IO_TLB_SHIFT;
+
+   strncpy(pool->name, name, sizeof(pool->name));
+   spin_lock_init(>io_tlb_lock);
+   pool->late_alloc = 1;
+   pool->io_tlb_start = start;
+   pool->io_tlb_end = start + bytes;
+   pool->io_tlb_vstart = vstart;
+   pool->io_tlb_nslabs = nslabs;
+   pool->max_segment = rounddown(bytes, PAGE_SIZE);
+
+   /*
+* Allocate and initialize the free list array.  This array is used
+* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
+* between io_tlb_start and io_tlb_end.
+*/
+   pool->io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL,
+   get_order(pool->io_tlb_nslabs * sizeof(int)));
+   if (!pool->io_tlb_list)
+   goto cleanup;
+
+   pool->io_tlb_orig_addr = (phys_addr_t *)
+   __get_free_pages(GFP_KERNEL,
+get_order(pool->io_tlb_nslabs *
+  sizeof(phys_addr_t)));
+   if (!pool->io_tlb_orig_addr)
+   goto cleanup;
+
+   for (i = 0; i < pool->io_tlb_nslabs; i++) {
+   pool->io_tlb_list[i] = IO_TLB_SEGSIZE -
+   OFFSET(i, IO_TLB_SEGSIZE);
+   pool->io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+   }
+
+   return pool;
+
+cleanup:
+   kfree(pool->io_tlb_list);
+   kfree(pool->io_tlb_orig_addr);
+   kfree(pool);
+
+   return ERR_PTR(-ENOMEM);
+}
+
 bool is_swiotlb_active(void)
 {
/*
-- 
2.7.4

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5] virtio on Type-1 hypervisor

2020-04-28 Thread Srivatsa Vaddagiri
We ran into several problems in using virtio for IO paravirtualization on a
Type-1 hypervisor with these characteristics:

* By default, all of a guests's memory is private to it (no other guest can
  access its memory).

* One of the VM is considered as primary and has access to most IO devices. This
  is similar to dom0 VM in case of Xen. All other VMs are considered as
  secondary VMs

* virtio-backend drivers for all secondary VMs need to be hosted in primary VM

* Since secondary VM's memory is not accessible to primary VM, to make virtio
  backend driver work, instead an additional piece of memory is provisioned 
  by the hypervisor that is shared between primary and secondary VMs. This
  shared memory can be used, for example, to host virtio-ring structures
  and also to bounce IO buffers of secondary VM.

* Message-queue and doorbell interfaces available in hypervisor to support
  inter-VM communication. Messge-queue API (send/recv) allows one VM to send
  short messages to another VM. Doorbell interface allows a VM to inject
  an interrupt into another VM.

* No support for MMIO transport i.e hypervisor does not support trapping MMIO
  config space access by front-end driver and having it handled in backend
  drivers.

Few problem statements arising out of this:

1) How can we make use of the shared memory region effectively to make virtio
work in this scenario?

What is proposed in the patch series for this problem is a virtio bounce driver
that recognizes a shared memory region (shared between VMs) and makes use of
swiotlb driver interfaces to bounce IO buffers between private and shared space.
Some changes are proposed to swiotlb driver in this regard, that can let us
reuse swiotlb functions to work with the shared memory pool. swiotlb driver can
now recognize more than one pool of memory and extend its functions
(allocate/free/bounce memory chunks) for each pool.

2) What transport mechanism works best in this case? 

I realize that ivshmem2-virtio proposal has discussed the possibility of using
shared memory + doorbell as a virtio transport option. We can consider using
that as a transport in case it will be acceptable upstream. Other option we had
considered was to modify virtio_mmio.c itself to use message_queue send/recv
hypercall interface (in place of readl/writel). That could be abstracted via
'mmio_ops' structure providing suitable implementation of readl/writel. Another
option suggested by Christopher Dall is to unmap the config space from kernel
address space and as part of the fault handler, use hypervisor specific APIs to
achieve config space IO.

3) Which virtio backend drivers to leverage?

We realized there are multiple implementations of virtio backend drivers,
bundled as part of separate projects (Qemu, lkvm etc). We think it would be nice
if we had some consolidation in that regard so that we can make use of the
backend drivers that are not tightly coupled with a VMM. In our case, we need to
be able to run virtio backend drivers as standalone programs (and not coupled
with any VMM).


Srivatsa Vaddagiri (5):
  swiotlb: Introduce concept of swiotlb_pool
  swiotlb: Allow for non-linear mapping between paddr and vaddr
  swiotlb: Add alloc and free APIs
  swiotlb: Add API to register new pool
  virtio: Add bounce DMA ops

 drivers/virtio/Makefile|   2 +-
 drivers/virtio/virtio.c|   2 +
 drivers/virtio/virtio_bounce.c | 150 +++
 drivers/xen/swiotlb-xen.c  |   4 +-
 include/linux/swiotlb.h| 157 +++-
 include/linux/virtio.h |   4 +
 kernel/dma/swiotlb.c   | 556 -
 7 files changed, 638 insertions(+), 237 deletions(-)
 create mode 100644 drivers/virtio/virtio_bounce.c

-- 
2.7.4

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] swiotlb: Allow for non-linear mapping between paddr and vaddr

2020-04-28 Thread Srivatsa Vaddagiri
Some of the memory pool managed by swiotlb driver could fall
outside the direct-mapped range, made accessible via memremap()
routine. To facilitate easy conversion between virtual and
physical address of such memory, store the virtual address of
memory pool in addition to its physical address.

Signed-off-by: Srivatsa Vaddagiri 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 20 ++--
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8c7843f..c634b4d 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -61,6 +61,8 @@ struct swiotlb_pool {
 
phys_addr_t io_tlb_start, io_tlb_end;
 
+   void *io_tlb_vstart;
+
/*
 * The number of IO TLB blocks (in groups of 64) between io_tlb_start
 * and io_tlb_end.  This is command line adjustable via
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9c504d3..8cf0b57 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -178,6 +178,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
default_swiotlb_pool.io_tlb_start = __pa(tlb);
default_swiotlb_pool.io_tlb_end =
default_swiotlb_pool.io_tlb_start + bytes;
+   default_swiotlb_pool.io_tlb_vstart = tlb;
 
/*
 * Allocate and initialize the free list array.  This array is used
@@ -307,6 +308,7 @@ static void swiotlb_cleanup(void)
default_swiotlb_pool.io_tlb_start = 0;
default_swiotlb_pool.io_tlb_nslabs = 0;
default_swiotlb_pool.max_segment = 0;
+   default_swiotlb_pool.io_tlb_vstart = NULL;
 }
 
 int
@@ -320,6 +322,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
default_swiotlb_pool.io_tlb_start = virt_to_phys(tlb);
default_swiotlb_pool.io_tlb_end =
default_swiotlb_pool.io_tlb_start + bytes;
+   default_swiotlb_pool.io_tlb_vstart = tlb;
 
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
memset(tlb, 0, bytes);
@@ -400,11 +403,10 @@ void __init swiotlb_exit(void)
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
-static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
+static void swiotlb_bounce(phys_addr_t orig_addr, void *vaddr,
   size_t size, enum dma_data_direction dir)
 {
unsigned long pfn = PFN_DOWN(orig_addr);
-   unsigned char *vaddr = phys_to_virt(tlb_addr);
 
if (PageHighMem(pfn_to_page(pfn))) {
/* The buffer does not have a mapping.  Map it in and copy */
@@ -437,6 +439,11 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
}
 }
 
+static inline void *tlb_vaddr(struct swiotlb_pool *pool, phys_addr_t tlb_addr)
+{
+   return pool->io_tlb_vstart + (tlb_addr - pool->io_tlb_start);
+}
+
 phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
   struct device *hwdev,
   dma_addr_t tbl_dma_addr,
@@ -569,7 +576,7 @@ phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool 
*pool,
(i << IO_TLB_SHIFT);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(orig_addr, tlb_addr,
+   swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
mapping_size, DMA_TO_DEVICE);
 
return tlb_addr;
@@ -594,7 +601,8 @@ void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
if (orig_addr != INVALID_PHYS_ADDR &&
!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
-   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_FROM_DEVICE);
+   swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
+   mapping_size, DMA_FROM_DEVICE);
 
/*
 * Return the buffer to the free list by setting the corresponding
@@ -643,14 +651,14 @@ void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool,
switch (target) {
case SYNC_FOR_CPU:
if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(orig_addr, tlb_addr,
+   swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
   size, DMA_FROM_DEVICE);
else
BUG_ON(dir != DMA_TO_DEVICE);
break;
case SYNC_FOR_DEVICE:
if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(orig_addr, tlb_addr,
+   swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
   size, DMA_TO_DEVICE);

Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Jürgen Groß

On 28.04.20 10:25, Peng Fan wrote:

Subject: Re: [PATCH] xen/swiotlb: correct the check for
xen_destroy_contiguous_region

On 28.04.20 09:33, peng@nxp.com wrote:

From: Peng Fan 

When booting xen on i.MX8QM, met:
"
[3.602128] Unable to handle kernel paging request at virtual address

00272d40

[3.610804] Mem abort info:
[3.613905]   ESR = 0x9604
[3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
[3.623211]   SET = 0, FnV = 0
[3.626628]   EA = 0, S1PTW = 0
[3.630128] Data abort info:
[3.633362]   ISV = 0, ISS = 0x0004
[3.637630]   CM = 0, WnR = 0
[3.640955] [00272d40] user address but active_mm is

swapper

[3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
[3.654137] Modules linked in:
[3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
[3.677302] Workqueue: events deferred_probe_work_func
[3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
[3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
[3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
[3.693993] pci_bus :00: root bus resource [bus 00-ff]
[3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
"

In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask)
or range_straddles_page_boundary(phys, size) are true, it will create
contiguous region. So when free, we need to free contiguous region use
upper check condition.


No, this will break PV guests on x86.


Could you share more details why alloc and free not matching for the check?


xen_create_contiguous_region() is needed only in case:

- the bus address is not within dma_mask, or
- the memory region is not physically contiguous (can happen only for
  PV guests)

In any case it should arrange for the memory to be suitable for the
DMA operation, so to be contiguous and within dma_mask afterwards. So
xen_destroy_contiguous_region() should only ever called for areas
which match above criteria, as otherwise we can be sure
xen_create_contiguous_region() was not used for making the area DMA-able
in the beginning.

And this is very important in the PV case, as in those guests the page
tables are containing the host-PFNs, not the guest-PFNS, and
xen_create_contiguous_region() will fiddle with host- vs. guest-PFN
arrangements, and xen_destroy_contiguous_region() is reverting this
fiddling. Any call of xen_destroy_contiguous_region() for an area it
was not intended to be called for might swap physical pages beneath
random virtual addresses, which was the reason for this test to be
added by me.


Juergen



Thanks,
Peng.



I think there is something wrong with your setup in combination with the ARM
xen_create_contiguous_region() implementation.

Stefano?


Juergen



Signed-off-by: Peng Fan 
---
   drivers/xen/swiotlb-xen.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b6d27762c6f8..ab96e468584f 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev,

size_t size, void *vaddr,

/* Convert the size to actually allocated. */
size = 1UL << (order + XEN_PAGE_SHIFT);

-   if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
-range_straddles_page_boundary(phys, size)) &&
+   if (((dev_addr + size - 1 > dma_mask) ||
+   range_straddles_page_boundary(phys, size)) &&
TestClearPageXenRemapped(virt_to_page(vaddr)))
xen_destroy_contiguous_region(phys, order);






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread peng . fan
From: Peng Fan 

When booting xen on i.MX8QM, met:
"
[3.602128] Unable to handle kernel paging request at virtual address 
00272d40
[3.610804] Mem abort info:
[3.613905]   ESR = 0x9604
[3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
[3.623211]   SET = 0, FnV = 0
[3.626628]   EA = 0, S1PTW = 0
[3.630128] Data abort info:
[3.633362]   ISV = 0, ISS = 0x0004
[3.637630]   CM = 0, WnR = 0
[3.640955] [00272d40] user address but active_mm is swapper
[3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
[3.654137] Modules linked in:
[3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
[3.677302] Workqueue: events deferred_probe_work_func
[3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
[3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
[3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
[3.693993] pci_bus :00: root bus resource [bus 00-ff]
[3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
"

In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or
range_straddles_page_boundary(phys, size) are true, it will
create contiguous region. So when free, we need to free contiguous
region use upper check condition.

Signed-off-by: Peng Fan 
---
 drivers/xen/swiotlb-xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b6d27762c6f8..ab96e468584f 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
/* Convert the size to actually allocated. */
size = 1UL << (order + XEN_PAGE_SHIFT);
 
-   if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
-range_straddles_page_boundary(phys, size)) &&
+   if (((dev_addr + size - 1 > dma_mask) ||
+   range_straddles_page_boundary(phys, size)) &&
TestClearPageXenRemapped(virt_to_page(vaddr)))
xen_destroy_contiguous_region(phys, order);
 
-- 
2.16.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: iommu_iova slab eats too much memory

2020-04-28 Thread Salil Mehta
Hi Bin,

Few questions:

1. If there is a leak of IOVA due to dma_unmap_* not being called somewhere then
at certain point the throughput will drastically fall and will almost become 
equal
to zero. This should be due to unavailability of the mapping anymore. But in 
your
case VM is getting killed so this could be actual DMA buffer leak not DMA 
mapping
leak. I doubt VM will get killed due to exhaustion of the DMA mappings in the 
IOMMU
Layer for a transient reason or even due to mapping/unmapping leak.

2. Could you check if you have TSO offload enabled on Intel 82599? It will help
in reducing the number of mappings and will take off IOVA mapping pressure from
the IOMMU/VT-d? Though I am not sure it will help in reducing the amount of 
memory
required for the buffers.

3. Also, have you checked the cpu-usage while your experiment is going on?

Thanks
Salil.

> -Original Message-
> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> Robin Murphy
> Sent: Friday, April 24, 2020 5:31 PM
> To: Bin 
> Cc: iommu@lists.linux-foundation.org
> Subject: Re: iommu_iova slab eats too much memory
> 
> On 2020-04-24 2:20 pm, Bin wrote:
> > Dear Robin:
> >  Thank you for your explanation. Now, I understand that this could be
> > NIC driver's fault, but how could I confirm it? Do I have to debug the
> > driver myself?
> 
> I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> memory about an order of magnitude faster than the IOVAs alone, but it
> should shed some light on whether DMA API usage looks suspicious, and
> dumping the mappings should help track down the responsible driver(s).
> Although the debugfs code doesn't show the stacktrace of where each
> mapping was made, I guess it would be fairly simple to tweak that for a
> quick way to narrow down where to start looking in an offending driver.
> 
> Robin.
> 
> > Robin Murphy  于2020年4月24日周五 下午8:15写道:
> >
> >> On 2020-04-24 1:06 pm, Bin wrote:
> >>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> >>> leaking DMA mappings", is it possible that some other kernel module like
> >>> KVM or NIC driver leads to the leaking problem instead of the iommu
> >> module
> >>> itself?
> >>
> >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> >> should, since I'd expect a lot of people to have noticed that. It's far
> >> more likely that some driver is failing to call dma_unmap_* when it's
> >> finished with a buffer - with the IOMMU disabled that would be a no-op
> >> on x86 with a modern 64-bit-capable device, so such a latent bug could
> >> have been easily overlooked.
> >>
> >> Robin.
> >>
> >>> Bin  于 2020年4月24日周五 20:00写道:
> >>>
>  Well, that's the problem! I'm assuming the iommu kernel module is
> >> leaking
>  memory. But I don't know why and how.
> 
>  Do you have any idea about it? Or any further information is needed?
> 
>  Robin Murphy  于 2020年4月24日周五 19:20写道:
> 
> > On 2020-04-24 1:40 am, Bin wrote:
> >> Hello? anyone there?
> >>
> >> Bin  于2020年4月23日周四 下午5:14写道:
> >>
> >>> Forget to mention, I've already disabled the slab merge, so this is
> > what
> >>> it is.
> >>>
> >>> Bin  于2020年4月23日周四 下午5:11写道:
> >>>
>  Hey, guys:
> 
>  I'm running a batch of CoreOS boxes, the lsb_release is:
> 
>  ```
>  # cat /etc/lsb-release
>  DISTRIB_ID="Container Linux by CoreOS"
>  DISTRIB_RELEASE=2303.3.0
>  DISTRIB_CODENAME="Rhyolite"
>  DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>  ```
> 
>  ```
>  # uname -a
>  Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
> > 2019
>  x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> > GNU/Linux
>  ```
>  Recently, I found my vms constently being killed due to OOM, and
> >> after
>  digging into the problem, I finally realized that the kernel is
> > leaking
>  memory.
> 
>  Here's my slabinfo:
> 
>  Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
>  Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
>  Active / Total Caches (% used) : 152 / 217 (70.0%)
>  Active / Total Size (% used)   : 5828768.08K / 5996848.72K
> > (97.2%)
>  Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> 
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 
>  80253888 80253888 100%0.06K 1253967   64   5015868K
> >> iommu_iova
> >
> > Do you really have a peak demand of ~80 million simultaneous DMA
> > buffers, or is some driver leaking DMA mappings?
> >
> > Robin.
> >
>  489472 489123  99%0.03K   3824  128 15296K kmalloc-32
> 
> 

RE: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Peng Fan
> Subject: Re: [PATCH] xen/swiotlb: correct the check for
> xen_destroy_contiguous_region
> 
> On 28.04.20 09:33, peng@nxp.com wrote:
> > From: Peng Fan 
> >
> > When booting xen on i.MX8QM, met:
> > "
> > [3.602128] Unable to handle kernel paging request at virtual address
> 00272d40
> > [3.610804] Mem abort info:
> > [3.613905]   ESR = 0x9604
> > [3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [3.623211]   SET = 0, FnV = 0
> > [3.626628]   EA = 0, S1PTW = 0
> > [3.630128] Data abort info:
> > [3.633362]   ISV = 0, ISS = 0x0004
> > [3.637630]   CM = 0, WnR = 0
> > [3.640955] [00272d40] user address but active_mm is
> swapper
> > [3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
> > [3.654137] Modules linked in:
> > [3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
> > [3.677302] Workqueue: events deferred_probe_work_func
> > [3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
> > [3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
> > [3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
> > [3.693993] pci_bus :00: root bus resource [bus 00-ff]
> > [3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
> > "
> >
> > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask)
> > or range_straddles_page_boundary(phys, size) are true, it will create
> > contiguous region. So when free, we need to free contiguous region use
> > upper check condition.
> 
> No, this will break PV guests on x86.

Could you share more details why alloc and free not matching for the check?

Thanks,
Peng.

> 
> I think there is something wrong with your setup in combination with the ARM
> xen_create_contiguous_region() implementation.
> 
> Stefano?
> 
> 
> Juergen
> 
> >
> > Signed-off-by: Peng Fan 
> > ---
> >   drivers/xen/swiotlb-xen.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index b6d27762c6f8..ab96e468584f 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev,
> size_t size, void *vaddr,
> > /* Convert the size to actually allocated. */
> > size = 1UL << (order + XEN_PAGE_SHIFT);
> >
> > -   if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> > -range_straddles_page_boundary(phys, size)) &&
> > +   if (((dev_addr + size - 1 > dma_mask) ||
> > +   range_straddles_page_boundary(phys, size)) &&
> > TestClearPageXenRemapped(virt_to_page(vaddr)))
> > xen_destroy_contiguous_region(phys, order);
> >
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Peng Fan
> Subject: Re: [PATCH] xen/swiotlb: correct the check for
> xen_destroy_contiguous_region
> 
> On Tue, Apr 28, 2020 at 03:33:45PM +0800, peng@nxp.com wrote:
> >
> > In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask)
> > or range_straddles_page_boundary(phys, size) are true, it will create
> > contiguous region. So when free, we need to free contiguous region use
> > upper check condition.
> >
> > Signed-off-by: Peng Fan 
> > ---
> >  drivers/xen/swiotlb-xen.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index b6d27762c6f8..ab96e468584f 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev,
> size_t size, void *vaddr,
> > /* Convert the size to actually allocated. */
> > size = 1UL << (order + XEN_PAGE_SHIFT);
> >
> > -   if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> > -range_straddles_page_boundary(phys, size)) &&
> > +   if (((dev_addr + size - 1 > dma_mask) ||
> > +   range_straddles_page_boundary(phys, size)) &&
> > TestClearPageXenRemapped(virt_to_page(vaddr)))
> 
> No need for the inner braces.
> 
> But more importantly please factor our a helper that can be used by alloc and
> free to make sure that they always stay in sync.  Something

Thanks for reviewing. I'll take your suggestion in v2. Before that,
I would wait to see if there are more comments in this patch,
because there are several history commits touching this place.

Thanks,
Peng.

> like:
> 
> static inline bool xen_swiotlb_need_contiguous_region(struct device *dev,
>   phys_addr_t phys, size_t size)
> {
> 
>   return xen_phys_to_bus(phys) + size - 1 > dev->coherent_dma_mask ||
>   range_straddles_page_boundary(phys, size)) }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Jürgen Groß

On 28.04.20 09:33, peng@nxp.com wrote:

From: Peng Fan 

When booting xen on i.MX8QM, met:
"
[3.602128] Unable to handle kernel paging request at virtual address 
00272d40
[3.610804] Mem abort info:
[3.613905]   ESR = 0x9604
[3.617332]   EC = 0x25: DABT (current EL), IL = 32 bits
[3.623211]   SET = 0, FnV = 0
[3.626628]   EA = 0, S1PTW = 0
[3.630128] Data abort info:
[3.633362]   ISV = 0, ISS = 0x0004
[3.637630]   CM = 0, WnR = 0
[3.640955] [00272d40] user address but active_mm is swapper
[3.647983] Internal error: Oops: 9604 [#1] PREEMPT SMP
[3.654137] Modules linked in:
[3.677285] Hardware name: Freescale i.MX8QM MEK (DT)
[3.677302] Workqueue: events deferred_probe_work_func
[3.684253] imx6q-pcie 5f00.pcie: PCI host bridge to bus :00
[3.688297] pstate: 6005 (nZCv daif -PAN -UAO)
[3.688310] pc : xen_swiotlb_free_coherent+0x180/0x1c0
[3.693993] pci_bus :00: root bus resource [bus 00-ff]
[3.701002] lr : xen_swiotlb_free_coherent+0x44/0x1c0
"

In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or
range_straddles_page_boundary(phys, size) are true, it will
create contiguous region. So when free, we need to free contiguous
region use upper check condition.


No, this will break PV guests on x86.

I think there is something wrong with your setup in combination with
the ARM xen_create_contiguous_region() implementation.

Stefano?


Juergen



Signed-off-by: Peng Fan 
---
  drivers/xen/swiotlb-xen.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b6d27762c6f8..ab96e468584f 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
/* Convert the size to actually allocated. */
size = 1UL << (order + XEN_PAGE_SHIFT);
  
-	if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||

-range_straddles_page_boundary(phys, size)) &&
+   if (((dev_addr + size - 1 > dma_mask) ||
+   range_straddles_page_boundary(phys, size)) &&
TestClearPageXenRemapped(virt_to_page(vaddr)))
xen_destroy_contiguous_region(phys, order);
  



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] xen/swiotlb: correct the check for xen_destroy_contiguous_region

2020-04-28 Thread Christoph Hellwig
On Tue, Apr 28, 2020 at 03:33:45PM +0800, peng@nxp.com wrote:
> 
> In xen_swiotlb_alloc_coherent, if !(dev_addr + size - 1 <= dma_mask) or
> range_straddles_page_boundary(phys, size) are true, it will
> create contiguous region. So when free, we need to free contiguous
> region use upper check condition.
> 
> Signed-off-by: Peng Fan 
> ---
>  drivers/xen/swiotlb-xen.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index b6d27762c6f8..ab96e468584f 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -346,8 +346,8 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
> size, void *vaddr,
>   /* Convert the size to actually allocated. */
>   size = 1UL << (order + XEN_PAGE_SHIFT);
>  
> - if (!WARN_ON((dev_addr + size - 1 > dma_mask) ||
> -  range_straddles_page_boundary(phys, size)) &&
> + if (((dev_addr + size - 1 > dma_mask) ||
> + range_straddles_page_boundary(phys, size)) &&
>   TestClearPageXenRemapped(virt_to_page(vaddr)))

No need for the inner braces.

But more importantly please factor our a helper that can be used by
alloc and free to make sure that they always stay in sync.  Something
like:

static inline bool xen_swiotlb_need_contiguous_region(struct device *dev,
phys_addr_t phys, size_t size)
{

return xen_phys_to_bus(phys) + size - 1 > dev->coherent_dma_mask ||
range_straddles_page_boundary(phys, size))
}

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu