Re: [PATCH v5 4/8] iommu/vt-d: Aux-domain specific domain attach/detach

2019-01-14 Thread Lu Baolu

Hi,

On 1/14/19 8:26 PM, Jonathan Cameron wrote:

On Thu, 10 Jan 2019 11:00:23 +0800
Lu Baolu  wrote:


When multiple domains per device has been enabled by the
device driver, the device will tag the default PASID for
the domain to all DMA traffics out of the subset of this
device; and the IOMMU should translate the DMA requests
in PASID granularity.

This adds the intel_iommu_aux_attach/detach_device() ops
to support managing PASID granular translation structures
when the device driver has enabled multiple domains per
device.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 


The following is probably a rather naive review given I don't know
the driver or hardware well at all.  Still, it seems like things
are a lot less balanced than I'd expect and isn't totally obvious
to me why that is.


Thank you!




---
  drivers/iommu/intel-iommu.c | 152 
  include/linux/intel-iommu.h |  10 +++
  2 files changed, 162 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e9119d45a29d..b8fb6a4bd447 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2482,6 +2482,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->iommu = iommu;
info->pasid_table = NULL;
info->auxd_enabled = 0;
+   INIT_LIST_HEAD(>auxiliary_domains);
  
  	if (dev && dev_is_pci(dev)) {

struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -5058,6 +5059,131 @@ static void intel_iommu_domain_free(struct iommu_domain 
*domain)
domain_exit(to_dmar_domain(domain));
  }
  
+/*

+ * Check whether a @domain could be attached to the @dev through the
+ * aux-domain attach/detach APIs.
+ */
+static inline bool
+is_aux_domain(struct device *dev, struct iommu_domain *domain)


I'm finding the distinction between an aux domain capability on
a given device and whether one is actually in use to be obscured
slightly in the function naming.

This one for example is actually checking if we have a domain
that is capable of being enabled for aux domain use, but not
yet actually in that mode?

Mind you I'm not sure I have a better answer for the naming.
can_aux_domain_be_enabled?  is_unattached_aux_domain?




device aux mode vs. normal mode
===

When we talk about the auxiliary mode (simply aux-mode), it means "the
device works in aux-mode or normal mode". "normal mode" means that the
device (and it's corresponding IOMMU) supports only RID (PCI Request ID)
based DMA translation; while, aux-mode means the the device (and it's
IOMMU) supports fine-grained DMA translation, like PASID based DMA
translation with Intel VT-d scalable mode.

We are adding below APIs to switch a device between these two modes:

int iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)

And this API (still under discussion) to check which mode the device is
working in:

bool iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)

aux-domain
==

If a device is working in aux-mode and we are going to attach a domain
to this device, we say "this domain will be attached to the device in
aux mode", and simply "aux domain". So a domain is "normal" when it is
going to attach to a device in normal mode; and is "aux-domain" when it
is going to attach to a device in aux mode.




+{
+   struct device_domain_info *info = dev->archdata.iommu;
+
+   return info && info->auxd_enabled &&
+   domain->type == IOMMU_DOMAIN_UNMANAGED;
+}
+
+static void auxiliary_link_device(struct dmar_domain *domain,
+ struct device *dev)
+{
+   struct device_domain_info *info = dev->archdata.iommu;
+
+   assert_spin_locked(_domain_lock);
+   if (WARN_ON(!info))
+   return;
+
+   domain->auxd_refcnt++;
+   list_add(>auxd, >auxiliary_domains);
+}
+
+static void auxiliary_unlink_device(struct dmar_domain *domain,
+   struct device *dev)
+{
+   struct device_domain_info *info = dev->archdata.iommu;
+
+   assert_spin_locked(_domain_lock);
+   if (WARN_ON(!info))
+   return;
+
+   list_del(>auxd);
+   domain->auxd_refcnt--;
+
+   if (!domain->auxd_refcnt && domain->default_pasid > 0)
+   intel_pasid_free_id(domain->default_pasid);


This seems unbalanced wrt to what is happening in auxiliary_link_device.
If this is necessary then it would be good to have comments saying why.
To my uniformed eye, looks like we could do this at the end of
aux_domain_remove_dev, except that we need to hold the lock.
As such perhaps it makes sense to do the pasid allocation under that
lock in the first place?

I'm not 100% sure, but is there a race if you get the final
remove running against a new add currently?


Yes. This is unbalanced.

I will move pasid free out here and it should be done 

Re: [PATCH v5 1/8] iommu: Add APIs for multiple domains per device

2019-01-14 Thread Lu Baolu

Hi,

On 1/14/19 7:22 PM, Jonathan Cameron wrote:

On Thu, 10 Jan 2019 11:00:20 +0800
Lu Baolu  wrote:


Sharing a physical PCI device in a finer-granularity way
is becoming a consensus in the industry. IOMMU vendors
are also engaging efforts to support such sharing as well
as possible. Among the efforts, the capability of support
finer-granularity DMA isolation is a common requirement
due to the security consideration. With finer-granularity
DMA isolation, all DMA requests out of or to a subset of
a physical PCI device can be protected by the IOMMU. As a
result, there is a request in software to attach multiple
domains to a physical PCI device. One example of such use
model is the Intel Scalable IOV [1] [2]. The Intel vt-d
3.0 spec [3] introduces the scalable mode which enables
PASID granularity DMA isolation.

This adds the APIs to support multiple domains per device.
In order to ease the discussions, we call it 'a domain in
auxiliary mode' or simply 'auxiliary domain' when multiple
domains are attached to a physical device.

The APIs include:

* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
   - Check whether both IOMMU and device support IOMMU aux
 domain feature. Below aux-domain specific interfaces
 are available only after this returns true.

* iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
   - Enable/disable device specific aux-domain feature.

* iommu_aux_attach_device(domain, dev)
   - Attaches @domain to @dev in the auxiliary mode. Multiple
 domains could be attached to a single device in the
 auxiliary mode with each domain representing an isolated
 address space for an assignable subset of the device.

* iommu_aux_detach_device(domain, dev)
   - Detach @domain which has been attached to @dev in the
 auxiliary mode.

* iommu_aux_get_pasid(domain, dev)
   - Return ID used for finer-granularity DMA translation.
 For the Intel Scalable IOV usage model, this will be
 a PASID. The device which supports Scalable IOV needs
 to write this ID to the device register so that DMA
 requests could be tagged with a right PASID prefix.

This has been updated with the latest proposal from Joerg
posted here [5].

Many people involved in discussions of this design.

Kevin Tian 
Liu Yi L 
Ashok Raj 
Sanjay Kumar 
Jacob Pan 
Alex Williamson 
Jean-Philippe Brucker 
Joerg Roedel 

and some discussions can be found here [4] [5].

[1] 
https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[3] 
https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[4] https://lkml.org/lkml/2018/7/26/4
[5] https://www.spinics.net/lists/iommu/msg31874.html

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Suggested-by: Kevin Tian 
Suggested-by: Jean-Philippe Brucker 
Suggested-by: Joerg Roedel 
Signed-off-by: Lu Baolu 


One trivial comment inline.


Thank you!




---
  drivers/iommu/iommu.c | 80 +++
  include/linux/iommu.h | 61 +
  2 files changed, 141 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3ed4db334341..9166b6145409 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2033,3 +2033,83 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, 
int num_ids)
return 0;
  }
  EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/*
+ * Per device IOMMU features.
+ */
+bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features feat)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (ops && ops->dev_has_feat)
+   return ops->dev_has_feat(dev, feat);
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_has_feature);
+
+int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (ops && ops->dev_enable_feat)
+   return ops->dev_enable_feat(dev, feat);
+
+   return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
+
+int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (ops && ops->dev_disable_feat)
+   return ops->dev_disable_feat(dev, feat);
+
+   return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
+
+/*
+ * Aux-domain specific attach/detach.
+ *
+ * Only works if iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX) returns true.
+ * Also, as long as domains are attached to a device through this interface,
+ * any tries to call iommu_attach_device() should fail (iommu_detach_device()
+ * can't fail, so we fail on the tryint to re-attach). This should make us safe


when trying to re-attach. (perhaps?)


Yes. I will fix it.

Best regards,
Lu Baolu




+ * against a device being attached to a guest as 

Re: [RFC v3 18/21] vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type

2019-01-14 Thread Alex Williamson
On Mon, 14 Jan 2019 21:48:06 +0100
Auger Eric  wrote:

> Hi Alex,
> 
> On 1/12/19 12:58 AM, Alex Williamson wrote:
> > On Tue,  8 Jan 2019 11:26:30 +0100
> > Eric Auger  wrote:
> >   
> >> This patch adds a new 64kB region aiming to report nested mode
> >> translation faults.
> >>
> >> The region contains a header with the size of the queue,
> >> the producer and consumer indices and then the actual
> >> fault queue data. The producer is updated by the kernel while
> >> the consumer is updated by the userspace.
> >>
> >> Signed-off-by: Eric Auger 
> >>
> >> ---
> >> ---
> >>  drivers/vfio/pci/vfio_pci.c | 102 +++-
> >>  drivers/vfio/pci/vfio_pci_private.h |   2 +
> >>  include/uapi/linux/vfio.h   |  15 
> >>  3 files changed, 118 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >> index ff60bd1ea587..2ba181ab2edd 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -56,6 +56,11 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
> >>  MODULE_PARM_DESC(disable_idle_d3,
> >> "Disable using the PCI D3 low power state for idle, unused 
> >> devices");
> >>  
> >> +#define VFIO_FAULT_REGION_SIZE 0x1  
> > 
> > Why 64K?  
> For the region to be mmappable with 64kB page size.

Isn't hard coding 64K just as bad as hard coding 4K?  The kernel knows
what PAGE_SIZE is after all.  Is there some target number of queue
entries here that we could round up to a multiple of PAGE_SIZE?
 
> >> +#define VFIO_FAULT_QUEUE_SIZE \
> >> +  ((VFIO_FAULT_REGION_SIZE - sizeof(struct vfio_fault_region_header)) / \
> >> +  sizeof(struct iommu_fault))
> >> +
> >>  static inline bool vfio_vga_disabled(void)
> >>  {
> >>  #ifdef CONFIG_VFIO_PCI_VGA
> >> @@ -1226,6 +1231,100 @@ static const struct vfio_device_ops vfio_pci_ops = 
> >> {
> >>  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
> >>  static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
> >>  
> >> +static size_t
> >> +vfio_pci_dma_fault_rw(struct vfio_pci_device *vdev, char __user *buf,
> >> +size_t count, loff_t *ppos, bool iswrite)
> >> +{
> >> +  unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
> >> +  void *base = vdev->region[i].data;
> >> +  loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +
> >> +  if (pos >= vdev->region[i].size)
> >> +  return -EINVAL;
> >> +
> >> +  count = min(count, (size_t)(vdev->region[i].size - pos));
> >> +
> >> +  if (copy_to_user(buf, base + pos, count))
> >> +  return -EFAULT;
> >> +
> >> +  *ppos += count;
> >> +
> >> +  return count;
> >> +}
> >> +
> >> +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> >> + struct vfio_pci_region *region,
> >> + struct vm_area_struct *vma)
> >> +{
> >> +  u64 phys_len, req_len, pgoff, req_start;
> >> +  unsigned long long addr;
> >> +  unsigned int index;
> >> +
> >> +  index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
> >> +
> >> +  if (vma->vm_end < vma->vm_start)
> >> +  return -EINVAL;
> >> +  if ((vma->vm_flags & VM_SHARED) == 0)
> >> +  return -EINVAL;
> >> +
> >> +  phys_len = VFIO_FAULT_REGION_SIZE;
> >> +
> >> +  req_len = vma->vm_end - vma->vm_start;
> >> +  pgoff = vma->vm_pgoff &
> >> +  ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
> >> +  req_start = pgoff << PAGE_SHIFT;
> >> +
> >> +  if (req_start + req_len > phys_len)
> >> +  return -EINVAL;
> >> +
> >> +  addr = virt_to_phys(vdev->fault_region);
> >> +  vma->vm_private_data = vdev;
> >> +  vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
> >> +
> >> +  return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> >> + req_len, vma->vm_page_prot);
> >> +}
> >> +
> >> +void vfio_pci_dma_fault_release(struct vfio_pci_device *vdev,
> >> +  struct vfio_pci_region *region)
> >> +{
> >> +}
> >> +
> >> +static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {
> >> +  .rw = vfio_pci_dma_fault_rw,
> >> +  .mmap   = vfio_pci_dma_fault_mmap,
> >> +  .release= vfio_pci_dma_fault_release,
> >> +};
> >> +
> >> +static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)
> >> +{
> >> +  u32 flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE |
> >> +  VFIO_REGION_INFO_FLAG_MMAP;
> >> +  int ret;
> >> +
> >> +  spin_lock_init(>fault_queue_lock);
> >> +
> >> +  vdev->fault_region = kmalloc(VFIO_FAULT_REGION_SIZE, GFP_KERNEL);
> >> +  if (!vdev->fault_region)
> >> +  return -ENOMEM;
> >> +
> >> +  ret = vfio_pci_register_dev_region(vdev,
> >> +  VFIO_REGION_TYPE_NESTED,
> >> +  VFIO_REGION_SUBTYPE_NESTED_FAULT_REGION,
> >> +  _pci_dma_fault_regops, VFIO_FAULT_REGION_SIZE,
> >> +  flags, vdev->fault_region);
> >> +  if (ret) {
> 

Re: [PATCH 1/3] swiotlb: Export maximum allocation size

2019-01-14 Thread Konrad Rzeszutek Wilk
On Fri, Jan 11, 2019 at 10:12:31AM +0100, Joerg Roedel wrote:
> On Thu, Jan 10, 2019 at 12:02:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > Why not use swiotlb_nr_tbl ? That is how drivers/gpu/drm use to figure if 
> > they
> > need to limit the size of pages.
> 
> That function just exports the overall size of the swiotlb aperture, no?
> What I need here is the maximum size for a single mapping.

Yes. The other drivers just assumed that if there is SWIOTLB they would use
the smaller size by default (as in they knew the limitation).

But I agree it would be better to have something smarter - and also convert the
DRM drivers to piggy back on this.

Or alternatively we could make SWIOTLB handle bigger sizes..
> 
> Regards,
> 
>   Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Michael S. Tsirkin
On Mon, Jan 14, 2019 at 09:19:35PM +0100, Christoph Hellwig wrote:
> > Christoph is saying that !IOMMU_PLATFORM devices should hide the
> > compatibility code in a special per-device DMA API implementation.
> > Which would be fine especially if we can manage not to introduce a bunch
> > of indirect calls all over the place and hurt performance.  It's just
> > that the benefit is unlikely to be big (e.g. we can't also get rid of
> > the virtio specific memory barriers) so no one was motivated enough to
> > work on it.
> 
> No.  

Oh ok then.

> The problem is that we still haven't fully specified what
> IOMMU_PLATFORM and !IOMMU_PLATFORM actually mean.  Your
> "ACCESS_PLATFORM/ORDER_PLATFORM" commit in the virtio-spec repo
> improves it a little bit, but it is still far from enough.
> 
> As a start VIRTIO_F_ORDER_PLATFORM and VIRTIO_F_ACCESS_PLATFORM
> absolutely MUST be set for hardware implementations.  Otherwise said
> hardware has no chance of working on anything but the most x86-like
> systems.

I think you are exaggerating a little here.  Limited access with e.g.
need to flush caches is common on lower end but less common on higher
end systems used for virtualization.  As you point out that changes with
e.g. SEV but then SEV is a pretty niche thing so far.

So I would make that a SHOULD probably.

> Second software implementations SHOULD set VIRTIO_F_ACCESS_PLATFORM,
> because otherwise we can't add proper handling for things like SEV or
> the IBM "secure hypervisor" thing.

Yes. If host does not have access to all of memory it SHOULD set
VIRTIO_F_ACCESS_PLATFORM.

It seems rather straight-forward to add conformance clauses
for this, thanks for reminding me.


> Last but not least a lot of wording outside the area describing these
> flags really needs some major updates in terms of DMA access.

Thanks for the reminder. Yes generally spec tends to still say e.g.
"physical address" where it really means a "DMA address" in Linux terms.
Whether changing that will make things much clearer I'm not sure. E.g.
hardware designers don't much care I guess.

If you want to list the most problematic cases, e.g. in a github issue,
we might get to clarifying them sooner.

Thanks!

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 18/21] vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type

2019-01-14 Thread Auger Eric
Hi Alex,

On 1/12/19 12:58 AM, Alex Williamson wrote:
> On Tue,  8 Jan 2019 11:26:30 +0100
> Eric Auger  wrote:
> 
>> This patch adds a new 64kB region aiming to report nested mode
>> translation faults.
>>
>> The region contains a header with the size of the queue,
>> the producer and consumer indices and then the actual
>> fault queue data. The producer is updated by the kernel while
>> the consumer is updated by the userspace.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> ---
>>  drivers/vfio/pci/vfio_pci.c | 102 +++-
>>  drivers/vfio/pci/vfio_pci_private.h |   2 +
>>  include/uapi/linux/vfio.h   |  15 
>>  3 files changed, 118 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index ff60bd1ea587..2ba181ab2edd 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -56,6 +56,11 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
>>  MODULE_PARM_DESC(disable_idle_d3,
>>   "Disable using the PCI D3 low power state for idle, unused 
>> devices");
>>  
>> +#define VFIO_FAULT_REGION_SIZE 0x1
> 
> Why 64K?
For the region to be mmappable with 64kB page size.
> 
>> +#define VFIO_FAULT_QUEUE_SIZE   \
>> +((VFIO_FAULT_REGION_SIZE - sizeof(struct vfio_fault_region_header)) / \
>> +sizeof(struct iommu_fault))
>> +
>>  static inline bool vfio_vga_disabled(void)
>>  {
>>  #ifdef CONFIG_VFIO_PCI_VGA
>> @@ -1226,6 +1231,100 @@ static const struct vfio_device_ops vfio_pci_ops = {
>>  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
>>  static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
>>  
>> +static size_t
>> +vfio_pci_dma_fault_rw(struct vfio_pci_device *vdev, char __user *buf,
>> +  size_t count, loff_t *ppos, bool iswrite)
>> +{
>> +unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
>> +void *base = vdev->region[i].data;
>> +loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>> +
>> +if (pos >= vdev->region[i].size)
>> +return -EINVAL;
>> +
>> +count = min(count, (size_t)(vdev->region[i].size - pos));
>> +
>> +if (copy_to_user(buf, base + pos, count))
>> +return -EFAULT;
>> +
>> +*ppos += count;
>> +
>> +return count;
>> +}
>> +
>> +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>> +   struct vfio_pci_region *region,
>> +   struct vm_area_struct *vma)
>> +{
>> +u64 phys_len, req_len, pgoff, req_start;
>> +unsigned long long addr;
>> +unsigned int index;
>> +
>> +index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>> +
>> +if (vma->vm_end < vma->vm_start)
>> +return -EINVAL;
>> +if ((vma->vm_flags & VM_SHARED) == 0)
>> +return -EINVAL;
>> +
>> +phys_len = VFIO_FAULT_REGION_SIZE;
>> +
>> +req_len = vma->vm_end - vma->vm_start;
>> +pgoff = vma->vm_pgoff &
>> +((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>> +req_start = pgoff << PAGE_SHIFT;
>> +
>> +if (req_start + req_len > phys_len)
>> +return -EINVAL;
>> +
>> +addr = virt_to_phys(vdev->fault_region);
>> +vma->vm_private_data = vdev;
>> +vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
>> +
>> +return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
>> +   req_len, vma->vm_page_prot);
>> +}
>> +
>> +void vfio_pci_dma_fault_release(struct vfio_pci_device *vdev,
>> +struct vfio_pci_region *region)
>> +{
>> +}
>> +
>> +static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {
>> +.rw = vfio_pci_dma_fault_rw,
>> +.mmap   = vfio_pci_dma_fault_mmap,
>> +.release= vfio_pci_dma_fault_release,
>> +};
>> +
>> +static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)
>> +{
>> +u32 flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE |
>> +VFIO_REGION_INFO_FLAG_MMAP;
>> +int ret;
>> +
>> +spin_lock_init(>fault_queue_lock);
>> +
>> +vdev->fault_region = kmalloc(VFIO_FAULT_REGION_SIZE, GFP_KERNEL);
>> +if (!vdev->fault_region)
>> +return -ENOMEM;
>> +
>> +ret = vfio_pci_register_dev_region(vdev,
>> +VFIO_REGION_TYPE_NESTED,
>> +VFIO_REGION_SUBTYPE_NESTED_FAULT_REGION,
>> +_pci_dma_fault_regops, VFIO_FAULT_REGION_SIZE,
>> +flags, vdev->fault_region);
>> +if (ret) {
>> +kfree(vdev->fault_region);
>> +return ret;
>> +}
>> +
>> +vdev->fault_region->header.prod = 0;
>> +vdev->fault_region->header.cons = 0;
>> +vdev->fault_region->header.reserved = 0;
> 
> Use kzalloc above or else we're leaking kernel memory to userspace
> anyway.
sure
> 
>> +vdev->fault_region->header.size = VFIO_FAULT_QUEUE_SIZE;
>> +return 0;
>> +}
>> +
>>  static int 

Re: [PATCH] swiotlb: clear io_tlb_start and io_tlb_end in swiotlb_exit

2019-01-14 Thread Konrad Rzeszutek Wilk
On Mon, Jan 14, 2019 at 09:14:08PM +0100, Christoph Hellwig wrote:
> Otherwise is_swiotlb_buffer will return false positives when
> we first initialize a swiotlb buffer, but then free it because
> we have an IOMMU available.
> 
> Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct 
> code")
> Reported-by: Sibren Vasse 
> Signed-off-by: Christoph Hellwig 
> Tested-by: Sibren Vasse 

Reviewed-by: Konrad Rzeszutek Wilk 

Thank you!
> ---
>  kernel/dma/swiotlb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index d6361776dc5c..1fb6fd68b9c7 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -378,6 +378,8 @@ void __init swiotlb_exit(void)
>   memblock_free_late(io_tlb_start,
>  PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
>   }
> + io_tlb_start = 0;
> + io_tlb_end = 0;
>   io_tlb_nslabs = 0;
>   max_segment = 0;
>  }
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Michael S. Tsirkin
On Mon, Jan 14, 2019 at 07:12:08PM +, Robin Murphy wrote:
> On 14/01/2019 18:20, Michael S. Tsirkin wrote:
> > On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
> > > 
> > > On 2019/1/14 下午5:50, Christoph Hellwig wrote:
> > > > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
> > > > > On 2019/1/11 下午5:15, Joerg Roedel wrote:
> > > > > > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
> > > > > > > Just wonder if my understanding is correct IOMMU_PLATFORM must be 
> > > > > > > set for
> > > > > > > all virtio devices under AMD-SEV guests?
> > > > > > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
> > > > > > aperture, because that memory is not encrypted. The guest bounces 
> > > > > > the
> > > > > > data then to its encrypted memory.
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Joerg
> > > > > 
> > > > > Thanks, have you tested vhost-net in this case. I suspect it may not 
> > > > > work
> > > > Which brings me back to my pet pevee that we need to take actions
> > > > that virtio uses the proper dma mapping API by default with quirks
> > > > for legacy cases.  The magic bypass it uses is just causing problems
> > > > over problems.
> > > 
> > > 
> > > Yes, I fully agree with you. This is probably an exact example of such
> > > problem.
> > > 
> > > Thanks
> > 
> > I don't think so - the issue is really that DMA API does not yet handle
> > the SEV case 100% correctly. I suspect passthrough devices would have
> > the same issue.
> 
> Huh? Regardless of which virtio devices use it or not, the DMA API is
> handling the SEV case as correctly as it possibly can, by forcing everything
> through the unencrypted bounce buffer. If the segments being mapped are too
> big for that bounce buffer in the first place, there's nothing it can
> possibly do except fail, gracefully or otherwise.

It seems reasonable to be able to ask it what it's capabilities are
though.

> Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor
> SEV - any driver whose device has a sufficiently large DMA segment size and
> who manages to get sufficient physically-contiguous memory could technically
> generate a scatterlist segment longer than SWIOTLB can handle. However, in
> practice that basically never happens, not least because very few drivers
> ever override the default 64K DMA segment limit. AFAICS nothing in
> drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning any
> dma_parms to replace the defaults either, so the really interesting question
> here is how are these apparently-out-of-spec 256K segments getting generated
> at all?
> 
> Robin.

I guess this is what you are looking for:

/* Host can optionally specify maximum segment size and number of
 * segments. */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
   struct virtio_blk_config, size_max, );
if (!err)
blk_queue_max_segment_size(q, v);
else
blk_queue_max_segment_size(q, -1U);

virtio isn't the only caller with a value >64K:

$ git grep -A1 blk_queue_max_segment_size
Documentation/block/biodoc.txt: blk_queue_max_segment_size(q, max_seg_size)
Documentation/block/biodoc.txt- Maximum size of a clustered segment, 
64kB default.
--
block/blk-settings.c: * blk_queue_max_segment_size - set max segment size for 
blk_rq_map_sg
block/blk-settings.c- * @q:  the request queue for the device
--
block/blk-settings.c:void blk_queue_max_segment_size(struct request_queue *q, 
unsigned int max_size)
block/blk-settings.c-{
--
block/blk-settings.c:EXPORT_SYMBOL(blk_queue_max_segment_size);
block/blk-settings.c-
--
drivers/block/mtip32xx/mtip32xx.c:  blk_queue_max_segment_size(dd->queue, 
0x40);
drivers/block/mtip32xx/mtip32xx.c-  blk_queue_io_min(dd->queue, 4096);
--
drivers/block/nbd.c:blk_queue_max_segment_size(disk->queue, UINT_MAX);
drivers/block/nbd.c-blk_queue_max_segments(disk->queue, USHRT_MAX);
--
drivers/block/ps3disk.c:blk_queue_max_segment_size(queue, 
dev->bounce_size);
drivers/block/ps3disk.c-
--
drivers/block/ps3vram.c:blk_queue_max_segment_size(queue, 
BLK_MAX_SEGMENT_SIZE);
drivers/block/ps3vram.c-blk_queue_max_hw_sectors(queue, 
BLK_SAFE_MAX_SECTORS);
--
drivers/block/rbd.c:blk_queue_max_segment_size(q, UINT_MAX);
drivers/block/rbd.c-blk_queue_io_min(q, objset_bytes);
--
drivers/block/sunvdc.c: blk_queue_max_segment_size(q, PAGE_SIZE);
drivers/block/sunvdc.c-
--
drivers/block/virtio_blk.c: blk_queue_max_segment_size(q, v);
drivers/block/virtio_blk.c- else
drivers/block/virtio_blk.c: blk_queue_max_segment_size(q, -1U);
drivers/block/virtio_blk.c-
--
drivers/block/xen-blkfront.c:   blk_queue_max_segment_size(rq, PAGE_SIZE);
drivers/block/xen-blkfront.c-
--
drivers/cdrom/gdrom.c:  blk_queue_max_segment_size(gd.gdrom_rq, 0x4);
drivers/cdrom/gdrom.c-  

Re: [PATCH] swiotlb: clear io_tlb_start and io_tlb_end in swiotlb_exit

2019-01-14 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 03:25:13PM -0500, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 14, 2019 at 09:14:08PM +0100, Christoph Hellwig wrote:
> > Otherwise is_swiotlb_buffer will return false positives when
> > we first initialize a swiotlb buffer, but then free it because
> > we have an IOMMU available.
> > 
> > Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct 
> > code")
> > Reported-by: Sibren Vasse 
> > Signed-off-by: Christoph Hellwig 
> > Tested-by: Sibren Vasse 
> 
> Reviewed-by: Konrad Rzeszutek Wilk 

Do you want to send this to Linus, or do you want me to pick it up
through the dma-mapping tree?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Revert "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()"

2019-01-14 Thread Tobias Jakobi
Hello everyone,

first of all thanks to Marek for looking into this.

@Robin: This works for me. The drivers now probe and bind correctly again.

With best wishes,
Tobias

Robin Murphy wrote:
> On 14/01/2019 16:09, Thierry Reding wrote:
>> On Mon, Jan 14, 2019 at 02:22:40PM +0100, Marek Szyprowski wrote:
>>> This reverts commit 1874619a7df4b14b23b14877e705ae15325773e3.
>>>
>>> That patch broke IOMMU support for devices, which fails to probe for the
>>> first time and use deferred probe approach. When non-NULL dma_ops is set
>>> in arm_iommu_detach_device(), the given device later ignored by
>>> arch_setup_dma_ops() and stays with non-IOMMU dma_ops.
>>>
>>> Reported-by: Tobias Jakobi 
>>> Fixes: 1874619a7df4 "ARM: dma-mapping: Set proper DMA ops in
>>> arm_iommu_detach_device()"
>>> Signed-off-by: Marek Szyprowski 
>>> ---
>>>   arch/arm/mm/dma-mapping.c | 12 ++--
>>>   1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> Can you point out exactly what drivers break because of this change? We
>> need to find a solution that works for everyone. Reverting is only
>> marginally useful because somebody will just end up wanting to revert
>> the revert because a different driver is now broken.
> 
> At first glance, it sounds like what we really want is for
> arch_teardown_iommu_ops() to completely clear any ops that 
> arch_setup_dma_ops()
> installed - does the below suffice?
> 
> Robin.
> 
> ->8-
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index f1e2922e447c..1e3e08a1c456 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -2390,4 +2390,6 @@ void arch_teardown_dma_ops(struct device *dev)
>  return;
> 
>  arm_teardown_iommu_dma_ops(dev);
> +    /* Let arch_setup_dma_ops() start again from scratch upon re-probe */
> +    set_dma_ops(dev, NULL);
>  }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 07:12:08PM +, Robin Murphy wrote:
> Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor 
> SEV - any driver whose device has a sufficiently large DMA segment size and 
> who manages to get sufficient physically-contiguous memory could 
> technically generate a scatterlist segment longer than SWIOTLB can handle. 
> However, in practice that basically never happens, not least because very 
> few drivers ever override the default 64K DMA segment limit. AFAICS nothing 
> in drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning 
> any dma_parms to replace the defaults either, so the really interesting 
> question here is how are these apparently-out-of-spec 256K segments getting 
> generated at all?

drivers/block/virtio_blk.c:virtblk_probe():

/* Host can optionally specify maximum segment size and number of
 * segments. */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
   struct virtio_blk_config, size_max, );
if (!err)
blk_queue_max_segment_size(q, v);
else
blk_queue_max_segment_size(q, -1U);

So it really is virtio_blk that is special here.  The host could
set VIRTIO_BLK_F_SIZE_MAX to paper over the problem, but I really
think we need a dma_max_segment_size API that is wired up through
the dma_map_ops to sort this out for real.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 01:20:45PM -0500, Michael S. Tsirkin wrote:
> I don't think so - the issue is really that DMA API does not yet handle
> the SEV case 100% correctly. I suspect passthrough devices would have
> the same issue.

The DMA API handles the SEV case perfectly.  Its just that virtio_blk
supports huge segments that virtio does not generally support, but that
is not related to SEV in any way.

> In fact whoever sets IOMMU_PLATFORM is completely unaffected by
> Christoph's pet peeve.

No, the above happens only when we set IOMMU_PLATFORM.

> Christoph is saying that !IOMMU_PLATFORM devices should hide the
> compatibility code in a special per-device DMA API implementation.
> Which would be fine especially if we can manage not to introduce a bunch
> of indirect calls all over the place and hurt performance.  It's just
> that the benefit is unlikely to be big (e.g. we can't also get rid of
> the virtio specific memory barriers) so no one was motivated enough to
> work on it.

No.  The problem is that we still haven't fully specified what
IOMMU_PLATFORM and !IOMMU_PLATFORM actually mean.  Your
"ACCESS_PLATFORM/ORDER_PLATFORM" commit in the virtio-spec repo
improves it a little bit, but it is still far from enough.

As a start VIRTIO_F_ORDER_PLATFORM and VIRTIO_F_ACCESS_PLATFORM
absolutely MUST be set for hardware implementations.  Otherwise said
hardware has no chance of working on anything but the most x86-like
systems.

Second software implementations SHOULD set VIRTIO_F_ACCESS_PLATFORM,
because otherwise we can't add proper handling for things like SEV or
the IBM "secure hypervisor" thing.

Last but not least a lot of wording outside the area describing these
flags really needs some major updates in terms of DMA access.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Singh, Brijesh


On 1/14/19 12:20 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
>>
>> On 2019/1/14 下午5:50, Christoph Hellwig wrote:
>>> On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
 On 2019/1/11 下午5:15, Joerg Roedel wrote:
> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
>> all virtio devices under AMD-SEV guests?
> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
> aperture, because that memory is not encrypted. The guest bounces the
> data then to its encrypted memory.
>
> Regards,
>
>   Joerg

 Thanks, have you tested vhost-net in this case. I suspect it may not work
>>> Which brings me back to my pet pevee that we need to take actions
>>> that virtio uses the proper dma mapping API by default with quirks
>>> for legacy cases.  The magic bypass it uses is just causing problems
>>> over problems.
>>
>>
>> Yes, I fully agree with you. This is probably an exact example of such
>> problem.
>>
>> Thanks
> 
> I don't think so - the issue is really that DMA API does not yet handle
> the SEV case 100% correctly. I suspect passthrough devices would have
> the same issue.
> 


In case of SEV, emulated DMA is performed through the SWIOTLB
(which bounces the encrypted buffers). The issue reported here will
happen on any platform which is making use of SWIOTLB. We could
easily reproduce the the virtio-blk failure if we configure
swiotlb=force in non SEV guest. Unfortunately in case of SEV the
SWIOTLB is must. As Jorge highlighted the main issue is limitation
of the SWIOTLB, it does not support allocation/map larger than 256Kb.


> In fact whoever sets IOMMU_PLATFORM is completely unaffected by
> Christoph's pet peeve.
> 
> Christoph is saying that !IOMMU_PLATFORM devices should hide the
> compatibility code in a special per-device DMA API implementation.
> Which would be fine especially if we can manage not to introduce a bunch
> of indirect calls all over the place and hurt performance.  It's just
> that the benefit is unlikely to be big (e.g. we can't also get rid of
> the virtio specific memory barriers) so no one was motivated enough to
> work on it.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Sibren Vasse
On Mon, 14 Jan 2019 at 19:10, Christoph Hellwig  wrote:
>
> On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote:
> > On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig  wrote:
> > >
> > > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> > > >>  From the trace it looks like we git the case where swiotlb tries
> > > >> to copy back data from a bounce buffer, but hits a dangling or NULL
> > > >> pointer.  So a couple questions for the submitter:
> > > >>
> > > >>   - does the system have more than 4GB memory and thus use swiotlb?
> > > >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > > >>   - does the device this happens on have a DMA mask smaller than
> > > >> the available memory, that is should swiotlb be used here to start
> > > >> with?
> > > >
> > > > Rather unlikely. The device is an AMD GPU, so we can address memory up 
> > > > to
> > > > 1TB.
> > >
> > > So we probably somehow got a false positive.
> > >
> > > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> > > backtrace really is in the swiotlb code (I can't think of anything else,
> > > but I'd rather be sure).
> > I'm not sure what you want me to confirm. Could you elaborate?
>
> Please open the vmlinux file for which this happend in gdb,
> then send the output from this command
>
> l *(dma_direct_unmap_page+0x92)
>
> to this thread.
My call trace contained:
Jan 10 16:34:51  kernel:  dma_direct_unmap_page+0x7a/0x80

(gdb) list *(dma_direct_unmap_page+0x7a)
0x810fa28a is in dma_direct_unmap_page (kernel/dma/direct.c:291).
286 size_t size, enum dma_data_direction dir,
unsigned long attrs)
287 {
288 phys_addr_t phys = dma_to_phys(dev, addr);
289
290 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
291 dma_direct_sync_single_for_cpu(dev, addr, size, dir);
292
293 if (unlikely(is_swiotlb_buffer(phys)))
294 swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
295 }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Michael S. Tsirkin
On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
> 
> On 2019/1/14 下午5:50, Christoph Hellwig wrote:
> > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
> > > On 2019/1/11 下午5:15, Joerg Roedel wrote:
> > > > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
> > > > > Just wonder if my understanding is correct IOMMU_PLATFORM must be set 
> > > > > for
> > > > > all virtio devices under AMD-SEV guests?
> > > > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
> > > > aperture, because that memory is not encrypted. The guest bounces the
> > > > data then to its encrypted memory.
> > > > 
> > > > Regards,
> > > > 
> > > > Joerg
> > > 
> > > Thanks, have you tested vhost-net in this case. I suspect it may not work
> > Which brings me back to my pet pevee that we need to take actions
> > that virtio uses the proper dma mapping API by default with quirks
> > for legacy cases.  The magic bypass it uses is just causing problems
> > over problems.
> 
> 
> Yes, I fully agree with you. This is probably an exact example of such
> problem.
> 
> Thanks

I don't think so - the issue is really that DMA API does not yet handle
the SEV case 100% correctly. I suspect passthrough devices would have
the same issue.

In fact whoever sets IOMMU_PLATFORM is completely unaffected by
Christoph's pet peeve.

Christoph is saying that !IOMMU_PLATFORM devices should hide the
compatibility code in a special per-device DMA API implementation.
Which would be fine especially if we can manage not to introduce a bunch
of indirect calls all over the place and hurt performance.  It's just
that the benefit is unlikely to be big (e.g. we can't also get rid of
the virtio specific memory barriers) so no one was motivated enough to
work on it.

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Christoph Hellwig
Hmm, I wonder if we are not actually using swiotlb in the end,
can you check if your dmesg contains this line or not?

PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

If not I guess we found a bug in swiotlb exit vs is_swiotlb_buffer,
and you can try this patch:

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d6361776dc5c..1fb6fd68b9c7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -378,6 +378,8 @@ void __init swiotlb_exit(void)
memblock_free_late(io_tlb_start,
   PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
+   io_tlb_start = 0;
+   io_tlb_end = 0;
io_tlb_nslabs = 0;
max_segment = 0;
 }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote:
> On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig  wrote:
> >
> > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> > >>  From the trace it looks like we git the case where swiotlb tries
> > >> to copy back data from a bounce buffer, but hits a dangling or NULL
> > >> pointer.  So a couple questions for the submitter:
> > >>
> > >>   - does the system have more than 4GB memory and thus use swiotlb?
> > >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > >>   - does the device this happens on have a DMA mask smaller than
> > >> the available memory, that is should swiotlb be used here to start
> > >> with?
> > >
> > > Rather unlikely. The device is an AMD GPU, so we can address memory up to
> > > 1TB.
> >
> > So we probably somehow got a false positive.
> >
> > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> > backtrace really is in the swiotlb code (I can't think of anything else,
> > but I'd rather be sure).
> I'm not sure what you want me to confirm. Could you elaborate?

Please open the vmlinux file for which this happend in gdb,
then send the output from this command

l *(dma_direct_unmap_page+0x92)

to this thread.

> > Second it would be great to print what the contents of io_tlb_start
> > and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> > maybe that gives a clue why we are hitting the swiotlb code here.
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 7c007ed7505f..042246dbae00 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -69,6 +69,7 @@ extern phys_addr_t io_tlb_start, io_tlb_end;
> 
>  static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  {
> +printk_once(KERN_INFO "io_tlb_start: %llu, io_tlb_end: %llu",
> io_tlb_start, io_tlb_end);
>  return paddr >= io_tlb_start && paddr < io_tlb_end;
>  }
> 
> Result on boot:
> [   11.405558] io_tlb_start: 3782983680, io_tlb_end: 3850092544

So this is a normal swiotlb location, and it does defintively exist.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Michel Dänzer
On 2019-01-10 3:48 p.m., Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
>>>  From the trace it looks like we git the case where swiotlb tries
>>> to copy back data from a bounce buffer, but hits a dangling or NULL
>>> pointer.  So a couple questions for the submitter:
>>>
>>>   - does the system have more than 4GB memory and thus use swiotlb?
>>> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
>>>   - does the device this happens on have a DMA mask smaller than
>>> the available memory, that is should swiotlb be used here to start
>>> with?
>>
>> Rather unlikely. The device is an AMD GPU, so we can address memory up to 
>> 1TB.
> 
> So we probably somehow got a false positive.
> 
> For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> backtrace really is in the swiotlb code (I can't think of anything else,
> but I'd rather be sure).
> 
> Second it would be great to print what the contents of io_tlb_start
> and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> maybe that gives a clue why we are hitting the swiotlb code here.

Any progress? https://bugzilla.kernel.org/show_bug.cgi?id=202261 was
also filed about this.

I hope everyone's clear that this needs to be resolved one way or
another by 5.0 final (though the sooner, the better :).


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] dma-mapping: remove the default map_resource implementation

2019-01-14 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 01:12:33PM +, Robin Murphy wrote:
> Ignoring the offset was kind of intentional there, because at the time I 
> was primarily thinking about it in terms of the Keystone 2 platform where 
> the peripherals are all in the same place (0-2GB) in both the bus and CPU 
> physical address maps, and only the view of RAM differs between the two 
> (2-4GB vs. 32-34GB). However, on something like BCM283x, the peripherals 
> region is also offset from its bus address in the CPU view, but at a 
> *different* offset relative to that of RAM.

I was more thinking of the PCIe P2P case, where we need to apply a
consistent offset to translate between the CPU and the bus view.

But this isn't really used for PCIe P2P, so I guess keeping the original
sematics might be a better idea.  That being said the videobuf code seems
to rely on these offsets, so we might be between a rock and a hard place.

> Fortunately, I'm not aware of any platform which has a DMA engine behind an 
> IOMMU (and thus *needs* to use dma_map_resource() to avoid said IOMMU 
> blocking the slave device register reads/writes) and also has any nonzero 
> offsets, and AFAIK the IOMMU-less platforms above aren't using 
> dma_map_resource() at all, so this change shouldn't actually break 
> anything, but I guess we have a bit of a problem making it truly generic 
> and robust :(

Note that we don't actually use the code in this patch for ARM/ARM64
platforms with IOMMUs, as both the ARM and the ARM64 iommu code have
their own implementations of ->map_resource that actually program
the iommu (which at least for the PCIe P2P case would be wrong).

> Is this perhaps another shove in the direction of overhauling 
> dma_pfn_offset into an arbitrary "DMA ranges" lookup table?

It is long overdue anyway.

>>  addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
>
> Might it be reasonable to do:
>
>   if (!dma_is_direct(ops) && ops->map_resource)
>   addr = ops->map_resource(...);
>   else
>   addr = dma_direct_map_resource(...);
>
> and avoid having to explicitly wire up the dma_direct callback elsewhere?

No, I absolutely _want_ the callback explicitly wired up.  That is the
only way to ensure we actually intentionally support it and don't just
get a default that often won't work.  Same issue for ->mmap and
->get_sgtable.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove block layer bounce buffering for MMC

2019-01-14 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 04:52:40PM +, Robin Murphy wrote:
> One general point for the kmap() conversions - it's not obvious (to me at 
> least) whether or how that would work for a segment where sg->length > 
> PAGE_SIZE. Or is there some cast-iron guarantee from the MMC mid-layer that 
> it will never let the block layer generate such things in the first place?

None of this will with such segments.  But yes, I guess the old case
could have worked as long as any physical contigous ranges are also
virtually contigous.  So we might have to throw in a page size segment
boundary here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove block layer bounce buffering for MMC

2019-01-14 Thread Robin Murphy

On 14/01/2019 09:57, Christoph Hellwig wrote:

Hi everyone,

this series converts the remaining MMC host drivers to properly kmap the
scatterlist entries it does PIO operations on, and then goes on to
remove the usage of block layer bounce buffering (which I plan to remove
eventually) from the MMC layer.

As a bonus I've converted various drivers to the proper scatterlist
helpers so that at least in theory they are ready for chained
scatterlists.

All the changes are compile tested only as I don't have any of the
hardware, so a careful review would be appreciated.


One general point for the kmap() conversions - it's not obvious (to me 
at least) whether or how that would work for a segment where sg->length 
> PAGE_SIZE. Or is there some cast-iron guarantee from the MMC 
mid-layer that it will never let the block layer generate such things in 
the first place?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Revert "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()"

2019-01-14 Thread Robin Murphy

On 14/01/2019 16:09, Thierry Reding wrote:

On Mon, Jan 14, 2019 at 02:22:40PM +0100, Marek Szyprowski wrote:

This reverts commit 1874619a7df4b14b23b14877e705ae15325773e3.

That patch broke IOMMU support for devices, which fails to probe for the
first time and use deferred probe approach. When non-NULL dma_ops is set
in arm_iommu_detach_device(), the given device later ignored by
arch_setup_dma_ops() and stays with non-IOMMU dma_ops.

Reported-by: Tobias Jakobi 
Fixes: 1874619a7df4 "ARM: dma-mapping: Set proper DMA ops in 
arm_iommu_detach_device()"
Signed-off-by: Marek Szyprowski 
---
  arch/arm/mm/dma-mapping.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)


Can you point out exactly what drivers break because of this change? We
need to find a solution that works for everyone. Reverting is only
marginally useful because somebody will just end up wanting to revert
the revert because a different driver is now broken.


At first glance, it sounds like what we really want is for 
arch_teardown_iommu_ops() to completely clear any ops that 
arch_setup_dma_ops() installed - does the below suffice?


Robin.

->8-
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f1e2922e447c..1e3e08a1c456 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2390,4 +2390,6 @@ void arch_teardown_dma_ops(struct device *dev)
return;

arm_teardown_iommu_dma_ops(dev);
+   /* Let arch_setup_dma_ops() start again from scratch upon re-probe */
+   set_dma_ops(dev, NULL);
 }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Revert "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()"

2019-01-14 Thread Thierry Reding
On Mon, Jan 14, 2019 at 02:22:40PM +0100, Marek Szyprowski wrote:
> This reverts commit 1874619a7df4b14b23b14877e705ae15325773e3.
> 
> That patch broke IOMMU support for devices, which fails to probe for the
> first time and use deferred probe approach. When non-NULL dma_ops is set
> in arm_iommu_detach_device(), the given device later ignored by
> arch_setup_dma_ops() and stays with non-IOMMU dma_ops.
> 
> Reported-by: Tobias Jakobi 
> Fixes: 1874619a7df4 "ARM: dma-mapping: Set proper DMA ops in 
> arm_iommu_detach_device()"
> Signed-off-by: Marek Szyprowski 
> ---
>  arch/arm/mm/dma-mapping.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Can you point out exactly what drivers break because of this change? We
need to find a solution that works for everyone. Reverting is only
marginally useful because somebody will just end up wanting to revert
the revert because a different driver is now broken.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] ACPI/IORT: Handle potential overflow in iort_dma_setup

2019-01-14 Thread Auger Eric
Hi Robin,

On 1/14/19 12:10 PM, Robin Murphy wrote:
> On 10/01/2019 10:44, Auger Eric wrote:
>> Hi Robin, Drew,
>>
>> On 12/19/18 2:18 PM, Andrew Jones wrote:
>>> On Wed, Dec 19, 2018 at 12:21:35PM +, Robin Murphy wrote:
 On 18/12/2018 18:48, Andrew Jones wrote:
> The sum of dmaaddr and size may overflow, particularly considering
> there are cases where size will be U64_MAX.

 Only if the firmware is broken in the first place, though. It would
 be weird
 to describe an explicit _DMA range of base=0 and size=U64_MAX,
 because it's
 effectively the same as just not having one at all, but it's not
 strictly
 illegal. However, since the ACPI System Memory address space is at most
 64-bit, anything that would actually overflow here is already
 describing an
 impossibility - really, we should probably scream even louder about a
 firmware bug and reject it entirely, rather than quietly hiding it.
>>>
>>> Ah, looking again I see the paths. Either acpi_dma_get_range() returns
>>> success, in which case base and size are fine, or it returns an EINVAL,
>>> in which case base=size=0, or it returns ENODEV in which case base is
>>> zero, so size may be set to U64_MAX by rc_dma_get_range() with no
>>> problem.
>>> The !dev_is_pci(dev) path is also fine since base=0.
>>
>> So practically putting an explicit memory_address_limit=64 is harmless
>> as dmaaddr always is 0, right?
>>
>> In QEMU I intended to update the ACPI code to comply with the rev D
>> spec. in that case the RC table revision is 1 (rev D) and the
>> memory_address_limit needs to be filled. If we don't want to restrict
>> the limit, isn't it the right choice to set 64 here?
> 
> Indeed, the Memory Address Size Limit doesn't cater for offsets so can't
> run into this kind of overflow in the first place. For a fully-emulated
> PCI hierarchy I'd say 64 is not just harmless but in fact entirely
> correct - you're going to have more fun with VFIO passthrough if the
> host tables have more restrictive limits, but I guess that's a problem
> for the future ;)
Yes but that's a good point to notice!

Thanks

Eric
> 
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] dma-mapping: don't BUG when calling dma_map_resource on RAM

2019-01-14 Thread Robin Murphy

On 11/01/2019 18:17, Christoph Hellwig wrote:

Use WARN_ON_ONCE to print a stack trace and return a proper error
code instead.


I was racking my brain to remember the reasoning behind BUG_ON() being 
the only viable way to prevent errors getting through unhandled, but of 
course that was before we had a standardised DMA_MAPPING_ERROR that 
would work across all implementations.



Signed-off-by: Christoph Hellwig 
---
  include/linux/dma-mapping.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index d3087829a6df..91add0751aa5 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -353,7 +353,8 @@ static inline dma_addr_t dma_map_resource(struct device 
*dev,
BUG_ON(!valid_dma_direction(dir));
  
  	/* Don't allow RAM to be mapped */


Ugh, I'm pretty sure that that "pfn_valid means RAM" misunderstanding 
originally came from me - it might be nice to have a less-misleading 
comment here, but off-hand I can't think of a succinct way to say "only 
for 'DMA' access to MMIO registers/SRAMs/etc. and not for anything the 
kernel knows as actual system/device memory" to better explain the WARN...


Either way, though,

Reviewed-by: Robin Murphy 


-   BUG_ON(pfn_valid(PHYS_PFN(phys_addr)));
+   if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
+   return DMA_MAPPING_ERROR;
  
  	if (dma_is_direct(ops))

addr = dma_direct_map_resource(dev, phys_addr, size, dir, 
attrs);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] Revert "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()"

2019-01-14 Thread Marek Szyprowski
This reverts commit 1874619a7df4b14b23b14877e705ae15325773e3.

That patch broke IOMMU support for devices, which fails to probe for the
first time and use deferred probe approach. When non-NULL dma_ops is set
in arm_iommu_detach_device(), the given device later ignored by
arch_setup_dma_ops() and stays with non-IOMMU dma_ops.

Reported-by: Tobias Jakobi 
Fixes: 1874619a7df4 "ARM: dma-mapping: Set proper DMA ops in 
arm_iommu_detach_device()"
Signed-off-by: Marek Szyprowski 
---
 arch/arm/mm/dma-mapping.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 3c8534904209..5827a89b7de1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1145,11 +1145,6 @@ int arm_dma_supported(struct device *dev, u64 mask)
return __dma_supported(dev, mask, false);
 }
 
-static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
-{
-   return coherent ? _coherent_dma_ops : _dma_ops;
-}
-
 #ifdef CONFIG_ARM_DMA_USE_IOMMU
 
 static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long attrs)
@@ -2294,7 +2289,7 @@ void arm_iommu_detach_device(struct device *dev)
iommu_detach_device(mapping->domain, dev);
kref_put(>kref, release_iommu_mapping);
to_dma_iommu_mapping(dev) = NULL;
-   set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
+   set_dma_ops(dev, NULL);
 
pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
 }
@@ -2355,6 +2350,11 @@ static void arm_teardown_iommu_dma_ops(struct device 
*dev) { }
 
 #endif /* CONFIG_ARM_DMA_USE_IOMMU */
 
+static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
+{
+   return coherent ? _coherent_dma_ops : _dma_ops;
+}
+
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] dma-mapping: remove the default map_resource implementation

2019-01-14 Thread Robin Murphy

On 11/01/2019 18:17, Christoph Hellwig wrote:

Just returning the physical address when not map_resource method is
present is highly dangerous as it doesn't take any offset in the
direct mapping into account and does the completely wrong thing for
IOMMUs.  Instead provide a proper implementation in the direct mapping
code, and also wire it up for arm and powerpc.


Ignoring the offset was kind of intentional there, because at the time I 
was primarily thinking about it in terms of the Keystone 2 platform 
where the peripherals are all in the same place (0-2GB) in both the bus 
and CPU physical address maps, and only the view of RAM differs between 
the two (2-4GB vs. 32-34GB). However, on something like BCM283x, the 
peripherals region is also offset from its bus address in the CPU view, 
but at a *different* offset relative to that of RAM.


Fortunately, I'm not aware of any platform which has a DMA engine behind 
an IOMMU (and thus *needs* to use dma_map_resource() to avoid said IOMMU 
blocking the slave device register reads/writes) and also has any 
nonzero offsets, and AFAIK the IOMMU-less platforms above aren't using 
dma_map_resource() at all, so this change shouldn't actually break 
anything, but I guess we have a bit of a problem making it truly generic 
and robust :(


Is this perhaps another shove in the direction of overhauling 
dma_pfn_offset into an arbitrary "DMA ranges" lookup table?



Signed-off-by: Christoph Hellwig 
---
  arch/arm/mm/dma-mapping.c |  2 ++
  arch/powerpc/kernel/dma-swiotlb.c |  1 +
  arch/powerpc/kernel/dma.c |  1 +
  include/linux/dma-mapping.h   | 12 +++-
  kernel/dma/direct.c   | 14 ++
  5 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f1e2922e447c..3c8534904209 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -188,6 +188,7 @@ const struct dma_map_ops arm_dma_ops = {
.unmap_page = arm_dma_unmap_page,
.map_sg = arm_dma_map_sg,
.unmap_sg   = arm_dma_unmap_sg,
+   .map_resource   = dma_direct_map_resource,
.sync_single_for_cpu= arm_dma_sync_single_for_cpu,
.sync_single_for_device = arm_dma_sync_single_for_device,
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
@@ -211,6 +212,7 @@ const struct dma_map_ops arm_coherent_dma_ops = {
.get_sgtable= arm_dma_get_sgtable,
.map_page   = arm_coherent_dma_map_page,
.map_sg = arm_dma_map_sg,
+   .map_resource   = dma_direct_map_resource,
.dma_supported  = arm_dma_supported,
  };
  EXPORT_SYMBOL(arm_coherent_dma_ops);
diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index 7d5fc9751622..fbb2506a414e 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -55,6 +55,7 @@ const struct dma_map_ops powerpc_swiotlb_dma_ops = {
.dma_supported = swiotlb_dma_supported,
.map_page = dma_direct_map_page,
.unmap_page = dma_direct_unmap_page,
+   .map_resource = dma_direct_map_resource,
.sync_single_for_cpu = dma_direct_sync_single_for_cpu,
.sync_single_for_device = dma_direct_sync_single_for_device,
.sync_sg_for_cpu = dma_direct_sync_sg_for_cpu,
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index b1903ebb2e9c..258b9e8ebb99 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -273,6 +273,7 @@ const struct dma_map_ops dma_nommu_ops = {
.dma_supported  = dma_nommu_dma_supported,
.map_page   = dma_nommu_map_page,
.unmap_page = dma_nommu_unmap_page,
+   .map_resource   = dma_direct_map_resource,
.get_required_mask  = dma_nommu_get_required_mask,
  #ifdef CONFIG_NOT_COHERENT_CACHE
.sync_single_for_cpu= dma_nommu_sync_single,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index cef2127e1d70..d3087829a6df 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -208,6 +208,8 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct 
page *page,
unsigned long attrs);
  int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs);
  
  #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \

  defined(CONFIG_SWIOTLB)
@@ -346,19 +348,19 @@ static inline dma_addr_t dma_map_resource(struct device 
*dev,
  unsigned long attrs)
  {
const struct dma_map_ops *ops = 

Re: fix a layering violation in videobuf2 and improve dma_map_resource

2019-01-14 Thread Marek Szyprowski
Hi Christoph,

On 2019-01-11 19:17, Christoph Hellwig wrote:
> Hi all,
>
> this series fixes a rather gross layering violation in videobuf2, which
> pokes into arm DMA mapping internals to get a DMA address for memory that
> does not have a page structure, and to do so fixes up the dma_map_resource
> implementation to be practically useful.

Thanks for rewriting this 'temporary code'! It predates
dma_map_resource() and that time this was the only way to get it working
somehow. Good that now it is possible to implement in it a clean way
without any unwritten assumptions about the DMA mapping internals. Feel
free to add my:

Reviewed-by: Marek Szyprowski 

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/8] iommu/vt-d: Move common code out of iommu_attch_device()

2019-01-14 Thread Jonathan Cameron
On Thu, 10 Jan 2019 11:00:22 +0800
Lu Baolu  wrote:

> This part of code could be used by both normal and aux
> domain specific attach entries. Hence move them into a
> common function to avoid duplication.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Signed-off-by: Lu Baolu 
Another trivial one (it's going to be one of those days).
Typo in the patch title.

> ---
>  drivers/iommu/intel-iommu.c | 60 ++---
>  1 file changed, 36 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ee8832d26f7e..e9119d45a29d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5058,35 +5058,14 @@ static void intel_iommu_domain_free(struct 
> iommu_domain *domain)
>   domain_exit(to_dmar_domain(domain));
>  }
>  
> -static int intel_iommu_attach_device(struct iommu_domain *domain,
> -  struct device *dev)
> +static int prepare_domain_attach_device(struct iommu_domain *domain,
> + struct device *dev)
>  {
>   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>   struct intel_iommu *iommu;
>   int addr_width;
>   u8 bus, devfn;
>  
> - if (device_is_rmrr_locked(dev)) {
> - dev_warn(dev, "Device is ineligible for IOMMU domain attach due 
> to platform RMRR requirement.  Contact your platform vendor.\n");
> - return -EPERM;
> - }
> -
> - /* normally dev is not mapped */
> - if (unlikely(domain_context_mapped(dev))) {
> - struct dmar_domain *old_domain;
> -
> - old_domain = find_domain(dev);
> - if (old_domain) {
> - rcu_read_lock();
> - dmar_remove_one_dev_info(old_domain, dev);
> - rcu_read_unlock();
> -
> - if (!domain_type_is_vm_or_si(old_domain) &&
> -  list_empty(_domain->devices))
> - domain_exit(old_domain);
> - }
> - }
> -
>   iommu = device_to_iommu(dev, , );
>   if (!iommu)
>   return -ENODEV;
> @@ -5119,7 +5098,40 @@ static int intel_iommu_attach_device(struct 
> iommu_domain *domain,
>   dmar_domain->agaw--;
>   }
>  
> - return domain_add_dev_info(dmar_domain, dev);
> + return 0;
> +}
> +
> +static int intel_iommu_attach_device(struct iommu_domain *domain,
> +  struct device *dev)
> +{
> + int ret;
> +
> + if (device_is_rmrr_locked(dev)) {
> + dev_warn(dev, "Device is ineligible for IOMMU domain attach due 
> to platform RMRR requirement.  Contact your platform vendor.\n");
> + return -EPERM;
> + }
> +
> + /* normally dev is not mapped */
> + if (unlikely(domain_context_mapped(dev))) {
> + struct dmar_domain *old_domain;
> +
> + old_domain = find_domain(dev);
> + if (old_domain) {
> + rcu_read_lock();
> + dmar_remove_one_dev_info(old_domain, dev);
> + rcu_read_unlock();
> +
> + if (!domain_type_is_vm_or_si(old_domain) &&
> + list_empty(_domain->devices))
> + domain_exit(old_domain);
> + }
> + }
> +
> + ret = prepare_domain_attach_device(domain, dev);
> + if (ret)
> + return ret;
> +
> + return domain_add_dev_info(to_dmar_domain(domain), dev);
>  }
>  
>  static void intel_iommu_detach_device(struct iommu_domain *domain,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/8] iommu: Add APIs for multiple domains per device

2019-01-14 Thread Jonathan Cameron
On Thu, 10 Jan 2019 11:00:20 +0800
Lu Baolu  wrote:

> Sharing a physical PCI device in a finer-granularity way
> is becoming a consensus in the industry. IOMMU vendors
> are also engaging efforts to support such sharing as well
> as possible. Among the efforts, the capability of support
> finer-granularity DMA isolation is a common requirement
> due to the security consideration. With finer-granularity
> DMA isolation, all DMA requests out of or to a subset of
> a physical PCI device can be protected by the IOMMU. As a
> result, there is a request in software to attach multiple
> domains to a physical PCI device. One example of such use
> model is the Intel Scalable IOV [1] [2]. The Intel vt-d
> 3.0 spec [3] introduces the scalable mode which enables
> PASID granularity DMA isolation.
> 
> This adds the APIs to support multiple domains per device.
> In order to ease the discussions, we call it 'a domain in
> auxiliary mode' or simply 'auxiliary domain' when multiple
> domains are attached to a physical device.
> 
> The APIs include:
> 
> * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
>   - Check whether both IOMMU and device support IOMMU aux
> domain feature. Below aux-domain specific interfaces
> are available only after this returns true.
> 
> * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
>   - Enable/disable device specific aux-domain feature.
> 
> * iommu_aux_attach_device(domain, dev)
>   - Attaches @domain to @dev in the auxiliary mode. Multiple
> domains could be attached to a single device in the
> auxiliary mode with each domain representing an isolated
> address space for an assignable subset of the device.
> 
> * iommu_aux_detach_device(domain, dev)
>   - Detach @domain which has been attached to @dev in the
> auxiliary mode.
> 
> * iommu_aux_get_pasid(domain, dev)
>   - Return ID used for finer-granularity DMA translation.
> For the Intel Scalable IOV usage model, this will be
> a PASID. The device which supports Scalable IOV needs
> to write this ID to the device register so that DMA
> requests could be tagged with a right PASID prefix.
> 
> This has been updated with the latest proposal from Joerg
> posted here [5].
> 
> Many people involved in discussions of this design.
> 
> Kevin Tian 
> Liu Yi L 
> Ashok Raj 
> Sanjay Kumar 
> Jacob Pan 
> Alex Williamson 
> Jean-Philippe Brucker 
> Joerg Roedel 
> 
> and some discussions can be found here [4] [5].
> 
> [1] 
> https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
> [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> [3] 
> https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
> [4] https://lkml.org/lkml/2018/7/26/4
> [5] https://www.spinics.net/lists/iommu/msg31874.html
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Suggested-by: Kevin Tian 
> Suggested-by: Jean-Philippe Brucker 
> Suggested-by: Joerg Roedel 
> Signed-off-by: Lu Baolu 

One trivial comment inline.

> ---
>  drivers/iommu/iommu.c | 80 +++
>  include/linux/iommu.h | 61 +
>  2 files changed, 141 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3ed4db334341..9166b6145409 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2033,3 +2033,83 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, 
> int num_ids)
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/*
> + * Per device IOMMU features.
> + */
> +bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features feat)
> +{
> + const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> + if (ops && ops->dev_has_feat)
> + return ops->dev_has_feat(dev, feat);
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(iommu_dev_has_feature);
> +
> +int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features 
> feat)
> +{
> + const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> + if (ops && ops->dev_enable_feat)
> + return ops->dev_enable_feat(dev, feat);
> +
> + return -ENODEV;
> +}
> +EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
> +
> +int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features 
> feat)
> +{
> + const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> + if (ops && ops->dev_disable_feat)
> + return ops->dev_disable_feat(dev, feat);
> +
> + return -ENODEV;
> +}
> +EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
> +
> +/*
> + * Aux-domain specific attach/detach.
> + *
> + * Only works if iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX) returns true.
> + * Also, as long as domains are attached to a device through this interface,
> + * any tries to call iommu_attach_device() should fail (iommu_detach_device()
> + * can't fail, so we fail on the tryint to re-attach). 

Re: [PATCH 1/2] ACPI/IORT: Handle potential overflow in iort_dma_setup

2019-01-14 Thread Robin Murphy

On 10/01/2019 10:44, Auger Eric wrote:

Hi Robin, Drew,

On 12/19/18 2:18 PM, Andrew Jones wrote:

On Wed, Dec 19, 2018 at 12:21:35PM +, Robin Murphy wrote:

On 18/12/2018 18:48, Andrew Jones wrote:

The sum of dmaaddr and size may overflow, particularly considering
there are cases where size will be U64_MAX.


Only if the firmware is broken in the first place, though. It would be weird
to describe an explicit _DMA range of base=0 and size=U64_MAX, because it's
effectively the same as just not having one at all, but it's not strictly
illegal. However, since the ACPI System Memory address space is at most
64-bit, anything that would actually overflow here is already describing an
impossibility - really, we should probably scream even louder about a
firmware bug and reject it entirely, rather than quietly hiding it.


Ah, looking again I see the paths. Either acpi_dma_get_range() returns
success, in which case base and size are fine, or it returns an EINVAL,
in which case base=size=0, or it returns ENODEV in which case base is
zero, so size may be set to U64_MAX by rc_dma_get_range() with no problem.
The !dev_is_pci(dev) path is also fine since base=0.


So practically putting an explicit memory_address_limit=64 is harmless
as dmaaddr always is 0, right?

In QEMU I intended to update the ACPI code to comply with the rev D
spec. in that case the RC table revision is 1 (rev D) and the
memory_address_limit needs to be filled. If we don't want to restrict
the limit, isn't it the right choice to set 64 here?


Indeed, the Memory Address Size Limit doesn't cater for offsets so can't 
run into this kind of overflow in the first place. For a fully-emulated 
PCI hierarchy I'd say 64 is not just harmless but in fact entirely 
correct - you're going to have more fun with VFIO passthrough if the 
host tables have more restrictive limits, but I guess that's a problem 
for the future ;)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] videobuf2: replace a layering violation with dma_map_resource

2019-01-14 Thread Mauro Carvalho Chehab
Em Mon, 14 Jan 2019 11:31:39 +0100
Christoph Hellwig  escreveu:

> On Fri, Jan 11, 2019 at 05:54:16PM -0200, Mauro Carvalho Chehab wrote:
> > Em Fri, 11 Jan 2019 19:17:31 +0100
> > Christoph Hellwig  escreveu:
> >   
> > > vb2_dc_get_userptr pokes into arm direct mapping details to get the
> > > resemblance of a dma address for a a physical address that does is
> > > not backed by a page struct.  Not only is this not portable to other
> > > architectures with dma direct mapping offsets, but also not to uses
> > > of IOMMUs of any kind.  Switch to the proper dma_map_resource /
> > > dma_unmap_resource interface instead.  
> > 
> > Makes sense to me. I'm assuming that you'll be pushing it together
> > with other mm patches, so:  
> 
> Not really mm, but rather DMA mapping, but yes, I'd love to take it
> all together.

Ah, OK! Anyway, feel free to place it altogether. 

It would be good if you could later send us a stable branch where
you merged, in order to allow us to run some tests with the new
DMA mapping patchset and be sure that it won't cause regressions
to videobuf2.

Thank you!

Mauro
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove block layer bounce buffering for MMC

2019-01-14 Thread Ulf Hansson
+Linus Walleij (recently made a cleanup of the mmc bounce buffering code).

On Mon, 14 Jan 2019 at 10:58, Christoph Hellwig  wrote:
>
> Hi everyone,
>
> this series converts the remaining MMC host drivers to properly kmap the
> scatterlist entries it does PIO operations on, and then goes on to
> remove the usage of block layer bounce buffering (which I plan to remove
> eventually) from the MMC layer.
>
> As a bonus I've converted various drivers to the proper scatterlist
> helpers so that at least in theory they are ready for chained
> scatterlists.
>
> All the changes are compile tested only as I don't have any of the
> hardware, so a careful review would be appreciated.

Thanks for posting this. I will have a look as soon as I can, but
first needs to catch up with things since the holidays.

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] videobuf2: replace a layering violation with dma_map_resource

2019-01-14 Thread Christoph Hellwig
On Fri, Jan 11, 2019 at 05:54:16PM -0200, Mauro Carvalho Chehab wrote:
> Em Fri, 11 Jan 2019 19:17:31 +0100
> Christoph Hellwig  escreveu:
> 
> > vb2_dc_get_userptr pokes into arm direct mapping details to get the
> > resemblance of a dma address for a a physical address that does is
> > not backed by a page struct.  Not only is this not portable to other
> > architectures with dma direct mapping offsets, but also not to uses
> > of IOMMUs of any kind.  Switch to the proper dma_map_resource /
> > dma_unmap_resource interface instead.
> 
> Makes sense to me. I'm assuming that you'll be pushing it together
> with other mm patches, so:

Not really mm, but rather DMA mapping, but yes, I'd love to take it
all together.

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/11] mmc: core: don't use block layer bounce buffers

2019-01-14 Thread Christoph Hellwig
All MMC and SD host drivers are highmem safe now, and bounce buffering
for addressing limitations is handled in the DMA layer now.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/core/queue.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 35cc138b096d..26a18b851397 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -354,17 +354,12 @@ static const struct blk_mq_ops mmc_mq_ops = {
 static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 {
struct mmc_host *host = card->host;
-   u64 limit = BLK_BOUNCE_HIGH;
-
-   if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
-   limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
 
blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
 
-   blk_queue_bounce_limit(mq->queue, limit);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
blk_queue_max_segments(mq->queue, host->max_segs);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/11] mmc: s3cmci: handle highmem pages

2019-01-14 Thread Christoph Hellwig
Instead of setting up a kernel pointer to track the current PIO address,
track the offset in the current page, and do an atomic kmap for the page
while doing the actual PIO operations.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/s3cmci.c | 107 +++---
 drivers/mmc/host/s3cmci.h |   3 +-
 2 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/drivers/mmc/host/s3cmci.c b/drivers/mmc/host/s3cmci.c
index 10f5219b3b40..1be84426c817 100644
--- a/drivers/mmc/host/s3cmci.c
+++ b/drivers/mmc/host/s3cmci.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -317,26 +318,17 @@ static void s3cmci_check_sdio_irq(struct s3cmci_host 
*host)
}
 }
 
-static inline int get_data_buffer(struct s3cmci_host *host,
- u32 *bytes, u32 **pointer)
+static inline int get_data_buffer(struct s3cmci_host *host)
 {
-   struct scatterlist *sg;
-
-   if (host->pio_active == XFER_NONE)
-   return -EINVAL;
-
-   if ((!host->mrq) || (!host->mrq->data))
-   return -EINVAL;
-
if (host->pio_sgptr >= host->mrq->data->sg_len) {
dbg(host, dbg_debug, "no more buffers (%i/%i)\n",
  host->pio_sgptr, host->mrq->data->sg_len);
return -EBUSY;
}
-   sg = >mrq->data->sg[host->pio_sgptr];
+   host->cur_sg = >mrq->data->sg[host->pio_sgptr];
 
-   *bytes = sg->length;
-   *pointer = sg_virt(sg);
+   host->pio_bytes = host->cur_sg->length;
+   host->pio_offset = host->cur_sg->offset;
 
host->pio_sgptr++;
 
@@ -422,11 +414,16 @@ static void s3cmci_disable_irq(struct s3cmci_host *host, 
bool transfer)
 
 static void do_pio_read(struct s3cmci_host *host)
 {
-   int res;
u32 fifo;
u32 *ptr;
u32 fifo_words;
void __iomem *from_ptr;
+   void *buf;
+
+   if (host->pio_active == XFER_NONE)
+   goto done;
+   if (!host->mrq || !host->mrq->data)
+   goto done;
 
/* write real prescaler to host, it might be set slow to fix */
writel(host->prescaler, host->base + S3C2410_SDIPRE);
@@ -435,20 +432,12 @@ static void do_pio_read(struct s3cmci_host *host)
 
while ((fifo = fifo_count(host))) {
if (!host->pio_bytes) {
-   res = get_data_buffer(host, >pio_bytes,
- >pio_ptr);
-   if (res) {
-   host->pio_active = XFER_NONE;
-   host->complete_what = COMPLETION_FINALIZE;
-
-   dbg(host, dbg_pio, "pio_read(): "
-   "complete (no more data).\n");
-   return;
-   }
+   if (get_data_buffer(host) < 0)
+   goto done;
 
dbg(host, dbg_pio,
-   "pio_read(): new target: [%i]@[%p]\n",
-   host->pio_bytes, host->pio_ptr);
+   "pio_read(): new target: [%i]@[%zu]\n",
+   host->pio_bytes, host->pio_offset);
}
 
dbg(host, dbg_pio,
@@ -470,63 +459,65 @@ static void do_pio_read(struct s3cmci_host *host)
host->pio_count += fifo;
 
fifo_words = fifo >> 2;
-   ptr = host->pio_ptr;
-   while (fifo_words--)
+   
+   buf = (kmap_atomic(sg_page(host->cur_sg)) + host->pio_offset);
+   ptr = buf;
+   while (fifo_words--) {
*ptr++ = readl(from_ptr);
-   host->pio_ptr = ptr;
+   host->pio_offset += 4;
+   }
 
if (fifo & 3) {
u32 n = fifo & 3;
u32 data = readl(from_ptr);
-   u8 *p = (u8 *)host->pio_ptr;
+   u8 *p = (u8 *)ptr;
 
while (n--) {
*p++ = data;
data >>= 8;
+   host->pio_offset++;
}
}
+   kunmap_atomic(buf);
}
 
if (!host->pio_bytes) {
-   res = get_data_buffer(host, >pio_bytes, >pio_ptr);
-   if (res) {
-   dbg(host, dbg_pio,
-   "pio_read(): complete (no more buffers).\n");
-   host->pio_active = XFER_NONE;
-   host->complete_what = COMPLETION_FINALIZE;
-
-   return;
-   }
+   if (get_data_buffer(host) < 0)
+   goto done;
}
 
enable_imask(host,
 S3C2410_SDIIMSK_RXFIFOHALF | S3C2410_SDIIMSK_RXFIFOLAST);
+   return;
+
+done:
+  

[PATCH 02/11] mmc: moxart: handle highmem pages

2019-01-14 Thread Christoph Hellwig
Instead of setting up a kernel pointer to track the current PIO address,
track the offset in the current page, and do a kmap for the page while
doing the actual PIO operations.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/moxart-mmc.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/host/moxart-mmc.c b/drivers/mmc/host/moxart-mmc.c
index a0670e9cd012..7142bca425c8 100644
--- a/drivers/mmc/host/moxart-mmc.c
+++ b/drivers/mmc/host/moxart-mmc.c
@@ -311,7 +311,7 @@ static void moxart_transfer_pio(struct moxart_host *host)
if (host->data_len == data->bytes_xfered)
return;
 
-   sgp = sg_virt(host->cur_sg);
+   sgp = kmap(sg_page(host->cur_sg)) + host->cur_sg->offset;
remain = host->data_remain;
 
if (data->flags & MMC_DATA_WRITE) {
@@ -319,8 +319,7 @@ static void moxart_transfer_pio(struct moxart_host *host)
if (moxart_wait_for_status(host, FIFO_URUN, )
 == -ETIMEDOUT) {
data->error = -ETIMEDOUT;
-   complete(>pio_complete);
-   return;
+   goto done;
}
for (len = 0; len < remain && len < host->fifo_width;) {
iowrite32(*sgp, host->base + REG_DATA_WINDOW);
@@ -335,8 +334,7 @@ static void moxart_transfer_pio(struct moxart_host *host)
if (moxart_wait_for_status(host, FIFO_ORUN, )
== -ETIMEDOUT) {
data->error = -ETIMEDOUT;
-   complete(>pio_complete);
-   return;
+   goto done;
}
for (len = 0; len < remain && len < host->fifo_width;) {
/* SCR data must be read in big endian. */
@@ -356,10 +354,15 @@ static void moxart_transfer_pio(struct moxart_host *host)
data->bytes_xfered += host->data_remain - remain;
host->data_remain = remain;
 
-   if (host->data_len != data->bytes_xfered)
+   if (host->data_len != data->bytes_xfered) {
+   kunmap(sg_page(host->cur_sg));
moxart_next_sg(host);
-   else
-   complete(>pio_complete);
+   return;
+   }
+
+done:
+   kunmap(sg_page(host->cur_sg));
+   complete(>pio_complete);
 }
 
 static void moxart_prepare_data(struct moxart_host *host)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/11] dma-mapping: remove dma_max_pfn

2019-01-14 Thread Christoph Hellwig
These days the DMA mapping code must bounce buffer for any not supported
address, and if they driver needs to optimize for natively supported
ranged it should use dma_get_required_mask.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/dma-mapping.h | 7 ---
 include/linux/dma-mapping.h| 7 ---
 2 files changed, 14 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 31d3b96f0f4b..496b36b9a7ff 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -89,13 +89,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, 
void *addr)
 }
 #endif
 
-/* The ARM override for dma_max_pfn() */
-static inline unsigned long dma_max_pfn(struct device *dev)
-{
-   return dma_to_pfn(dev, *dev->dma_mask);
-}
-#define dma_max_pfn(dev) dma_max_pfn(dev)
-
 #define arch_setup_dma_ops arch_setup_dma_ops
 extern void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
   const struct iommu_ops *iommu, bool coherent);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f6ded992c183..c6dbc287e466 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -710,13 +710,6 @@ static inline int dma_set_seg_boundary(struct device *dev, 
unsigned long mask)
return -EIO;
 }
 
-#ifndef dma_max_pfn
-static inline unsigned long dma_max_pfn(struct device *dev)
-{
-   return (*dev->dma_mask >> PAGE_SHIFT) + dev->dma_pfn_offset;
-}
-#endif
-
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/11] mmc: sh_mmcif: handle chained sglists

2019-01-14 Thread Christoph Hellwig
Use the proper sg_next() helper to move to the next scatterlist element
to support chained scatterlists.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/sh_mmcif.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c
index 0d2bbb8943f5..a4116be5ebc7 100644
--- a/drivers/mmc/host/sh_mmcif.c
+++ b/drivers/mmc/host/sh_mmcif.c
@@ -234,7 +234,6 @@ struct sh_mmcif_host {
enum sh_mmcif_wait_for wait_for;
struct delayed_work timeout_work;
size_t blocksize;
-   int sg_idx;
int sg_blkidx;
bool power;
bool ccs_enable;/* Command Completion Signal support */
@@ -606,13 +605,13 @@ static bool sh_mmcif_next_block(struct sh_mmcif_host 
*host)
 
if (host->sg_blkidx == data->sg->length) {
host->sg_blkidx = 0;
-   if (++host->sg_idx < data->sg_len) {
-   data->sg++;
-   host->pio_offset = data->sg->offset / 4;
-   }
+   data->sg = sg_next(data->sg);
+   if (!data->sg)
+   return false;
+   host->pio_offset = data->sg->offset / 4;
}
 
-   return host->sg_idx != data->sg_len;
+   return true;
 }
 
 static void sh_mmcif_single_read(struct sh_mmcif_host *host,
@@ -665,7 +664,6 @@ static void sh_mmcif_multi_read(struct sh_mmcif_host *host,
BLOCK_SIZE_MASK;
 
host->wait_for = MMCIF_WAIT_FOR_MREAD;
-   host->sg_idx = 0;
host->sg_blkidx = 0;
host->pio_offset = data->sg->offset / 4;
 
@@ -752,7 +750,6 @@ static void sh_mmcif_multi_write(struct sh_mmcif_host *host,
BLOCK_SIZE_MASK;
 
host->wait_for = MMCIF_WAIT_FOR_MWRITE;
-   host->sg_idx = 0;
host->sg_blkidx = 0;
host->pio_offset = data->sg->offset / 4;
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/11] mmc: mvsdio: handle highmem pages

2019-01-14 Thread Christoph Hellwig
Instead of setting up a kernel pointer to track the current PIO address,
track the offset in the current page, and do an atomic kmap for the page
while doing the actual PIO operations.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/mvsdio.c | 48 +++
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/mmc/host/mvsdio.c b/drivers/mmc/host/mvsdio.c
index e22bbff89c8d..545e370d6dae 100644
--- a/drivers/mmc/host/mvsdio.c
+++ b/drivers/mmc/host/mvsdio.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -42,7 +43,8 @@ struct mvsd_host {
unsigned int intr_en;
unsigned int ctrl;
unsigned int pio_size;
-   void *pio_ptr;
+   struct scatterlist *pio_sg;
+   unsigned int pio_offset; /* offset in words into the segment */
unsigned int sg_frags;
unsigned int ns_per_clk;
unsigned int clock;
@@ -96,9 +98,9 @@ static int mvsd_setup_data(struct mvsd_host *host, struct 
mmc_data *data)
if (tmout_index > MVSD_HOST_CTRL_TMOUT_MAX)
tmout_index = MVSD_HOST_CTRL_TMOUT_MAX;
 
-   dev_dbg(host->dev, "data %s at 0x%08x: blocks=%d blksz=%d tmout=%u 
(%d)\n",
+   dev_dbg(host->dev, "data %s at 0x%08llx: blocks=%d blksz=%d tmout=%u 
(%d)\n",
(data->flags & MMC_DATA_READ) ? "read" : "write",
-   (u32)sg_virt(data->sg), data->blocks, data->blksz,
+   (u64)sg_phys(data->sg), data->blocks, data->blksz,
tmout, tmout_index);
 
host->ctrl &= ~MVSD_HOST_CTRL_TMOUT_MASK;
@@ -118,10 +120,11 @@ static int mvsd_setup_data(struct mvsd_host *host, struct 
mmc_data *data)
 * boundary.
 */
host->pio_size = data->blocks * data->blksz;
-   host->pio_ptr = sg_virt(data->sg);
+   host->pio_sg = data->sg;
+   host->pio_offset = data->sg->offset / 2;
if (!nodma)
-   dev_dbg(host->dev, "fallback to PIO for data at 0x%p 
size %d\n",
-   host->pio_ptr, host->pio_size);
+   dev_dbg(host->dev, "fallback to PIO for data at 0x%x 
size %d\n",
+   host->pio_offset, host->pio_size);
return 1;
} else {
dma_addr_t phys_addr;
@@ -291,8 +294,9 @@ static u32 mvsd_finish_data(struct mvsd_host *host, struct 
mmc_data *data,
 {
void __iomem *iobase = host->base;
 
-   if (host->pio_ptr) {
-   host->pio_ptr = NULL;
+   if (host->pio_sg) {
+   host->pio_sg = NULL;
+   host->pio_offset = 0;
host->pio_size = 0;
} else {
dma_unmap_sg(mmc_dev(host->mmc), data->sg, host->sg_frags,
@@ -376,11 +380,12 @@ static irqreturn_t mvsd_irq(int irq, void *dev)
if (host->pio_size &&
(intr_status & host->intr_en &
 (MVSD_NOR_RX_READY | MVSD_NOR_RX_FIFO_8W))) {
-   u16 *p = host->pio_ptr;
+   u16 *p = kmap_atomic(sg_page(host->pio_sg));
+   unsigned int o = host->pio_offset;
int s = host->pio_size;
while (s >= 32 && (intr_status & MVSD_NOR_RX_FIFO_8W)) {
-   readsw(iobase + MVSD_FIFO, p, 16);
-   p += 16;
+   readsw(iobase + MVSD_FIFO, p + o, 16);
+   o += 16;
s -= 32;
intr_status = mvsd_read(MVSD_NOR_INTR_STATUS);
}
@@ -391,8 +396,10 @@ static irqreturn_t mvsd_irq(int irq, void *dev)
 */
if (s <= 32) {
while (s >= 4 && (intr_status & MVSD_NOR_RX_READY)) {
-   put_unaligned(mvsd_read(MVSD_FIFO), p++);
-   put_unaligned(mvsd_read(MVSD_FIFO), p++);
+   put_unaligned(mvsd_read(MVSD_FIFO), p + o);
+   o++;
+   put_unaligned(mvsd_read(MVSD_FIFO), p + o);
+   o++;
s -= 4;
intr_status = mvsd_read(MVSD_NOR_INTR_STATUS);
}
@@ -400,7 +407,7 @@ static irqreturn_t mvsd_irq(int irq, void *dev)
u16 val[2] = {0, 0};
val[0] = mvsd_read(MVSD_FIFO);
val[1] = mvsd_read(MVSD_FIFO);
-   memcpy(p, ((void *)) + 4 - s, s);
+   memcpy(p + o, ((void *)) + 4 - s, s);
s = 0;
intr_status = mvsd_read(MVSD_NOR_INTR_STATUS);
}
@@ -416,13 +423,14 @@ static irqreturn_t mvsd_irq(int irq, void *dev)
}
dev_dbg(host->dev, "pio %d 

[PATCH 03/11] mmc: omap: handle highmem pages

2019-01-14 Thread Christoph Hellwig
Instead of setting up a kernel pointer to track the current PIO address,
track the offset in the current page, and do an atomic kmap for the page
while doing the actual PIO operations.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/omap.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c
index c60a7625b1fa..0377ac6194a0 100644
--- a/drivers/mmc/host/omap.c
+++ b/drivers/mmc/host/omap.c
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -148,7 +149,7 @@ struct mmc_omap_host {
 
unsigned intsg_len;
int sg_idx;
-   u16 *   buffer;
+   u32 buffer_offset;
u32 buffer_bytes_left;
u32 total_bytes_left;
 
@@ -649,7 +650,7 @@ mmc_omap_sg_to_buf(struct mmc_omap_host *host)
 
sg = host->data->sg + host->sg_idx;
host->buffer_bytes_left = sg->length;
-   host->buffer = sg_virt(sg);
+   host->buffer_offset = sg->offset;
if (host->buffer_bytes_left > host->total_bytes_left)
host->buffer_bytes_left = host->total_bytes_left;
 }
@@ -666,7 +667,9 @@ mmc_omap_clk_timer(struct timer_list *t)
 static void
 mmc_omap_xfer_data(struct mmc_omap_host *host, int write)
 {
+   struct scatterlist *sg = host->data->sg + host->sg_idx;
int n, nwords;
+   void *p;
 
if (host->buffer_bytes_left == 0) {
host->sg_idx++;
@@ -684,15 +687,17 @@ mmc_omap_xfer_data(struct mmc_omap_host *host, int write)
host->total_bytes_left -= n;
host->data->bytes_xfered += n;
 
+   p = kmap_atomic(sg_page(sg));
if (write) {
__raw_writesw(host->virt_base + OMAP_MMC_REG(host, DATA),
- host->buffer, nwords);
+ p + host->buffer_offset, nwords);
} else {
__raw_readsw(host->virt_base + OMAP_MMC_REG(host, DATA),
-host->buffer, nwords);
+p + host->buffer_offset, nwords);
}
+   kunmap_atomic(p);
 
-   host->buffer += nwords;
+   host->buffer_offset += nwords;
 }
 
 #ifdef CONFIG_MMC_DEBUG
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/11] mmc: sh_mmcif: handle highmem pages

2019-01-14 Thread Christoph Hellwig
Instead of setting up a kernel pointer to track the current PIO address,
track the offset in the current page, and do an atomic kmap for the page
while doing the actual PIO operations.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/sh_mmcif.c | 58 +++--
 1 file changed, 36 insertions(+), 22 deletions(-)

diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c
index 81bd9afb0980..0d2bbb8943f5 100644
--- a/drivers/mmc/host/sh_mmcif.c
+++ b/drivers/mmc/host/sh_mmcif.c
@@ -228,7 +228,7 @@ struct sh_mmcif_host {
bool dying;
long timeout;
void __iomem *addr;
-   u32 *pio_ptr;
+   u32 pio_offset;
spinlock_t lock;/* protect sh_mmcif_host::state */
enum sh_mmcif_state state;
enum sh_mmcif_wait_for wait_for;
@@ -595,7 +595,7 @@ static int sh_mmcif_error_manage(struct sh_mmcif_host *host)
return ret;
 }
 
-static bool sh_mmcif_next_block(struct sh_mmcif_host *host, u32 *p)
+static bool sh_mmcif_next_block(struct sh_mmcif_host *host)
 {
struct mmc_data *data = host->mrq->data;
 
@@ -606,10 +606,10 @@ static bool sh_mmcif_next_block(struct sh_mmcif_host 
*host, u32 *p)
 
if (host->sg_blkidx == data->sg->length) {
host->sg_blkidx = 0;
-   if (++host->sg_idx < data->sg_len)
-   host->pio_ptr = sg_virt(++data->sg);
-   } else {
-   host->pio_ptr = p;
+   if (++host->sg_idx < data->sg_len) {
+   data->sg++;
+   host->pio_offset = data->sg->offset / 4;
+   }
}
 
return host->sg_idx != data->sg_len;
@@ -631,8 +631,8 @@ static bool sh_mmcif_read_block(struct sh_mmcif_host *host)
 {
struct device *dev = sh_mmcif_host_to_dev(host);
struct mmc_data *data = host->mrq->data;
-   u32 *p = sg_virt(data->sg);
-   int i;
+   u32 *p;
+   int off, i;
 
if (host->sd_error) {
data->error = sh_mmcif_error_manage(host);
@@ -640,8 +640,11 @@ static bool sh_mmcif_read_block(struct sh_mmcif_host *host)
return false;
}
 
+   p = kmap_atomic(sg_page(data->sg));
+   off = data->sg->offset / 4;
for (i = 0; i < host->blocksize / 4; i++)
-   *p++ = sh_mmcif_readl(host->addr, MMCIF_CE_DATA);
+   p[off++] = sh_mmcif_readl(host->addr, MMCIF_CE_DATA);
+   kunmap_atomic(p);
 
/* buffer read end */
sh_mmcif_bitset(host, MMCIF_CE_INT_MASK, MASK_MBUFRE);
@@ -664,7 +667,7 @@ static void sh_mmcif_multi_read(struct sh_mmcif_host *host,
host->wait_for = MMCIF_WAIT_FOR_MREAD;
host->sg_idx = 0;
host->sg_blkidx = 0;
-   host->pio_ptr = sg_virt(data->sg);
+   host->pio_offset = data->sg->offset / 4;
 
sh_mmcif_bitset(host, MMCIF_CE_INT_MASK, MASK_MBUFREN);
 }
@@ -673,7 +676,7 @@ static bool sh_mmcif_mread_block(struct sh_mmcif_host *host)
 {
struct device *dev = sh_mmcif_host_to_dev(host);
struct mmc_data *data = host->mrq->data;
-   u32 *p = host->pio_ptr;
+   u32 *p;
int i;
 
if (host->sd_error) {
@@ -684,10 +687,14 @@ static bool sh_mmcif_mread_block(struct sh_mmcif_host 
*host)
 
BUG_ON(!data->sg->length);
 
-   for (i = 0; i < host->blocksize / 4; i++)
-   *p++ = sh_mmcif_readl(host->addr, MMCIF_CE_DATA);
+   p = kmap_atomic(sg_page(data->sg));
+   for (i = 0; i < host->blocksize / 4; i++) {
+   p[host->pio_offset++] =
+   sh_mmcif_readl(host->addr, MMCIF_CE_DATA);
+   }
+   kunmap_atomic(p);
 
-   if (!sh_mmcif_next_block(host, p))
+   if (!sh_mmcif_next_block(host))
return false;
 
sh_mmcif_bitset(host, MMCIF_CE_INT_MASK, MASK_MBUFREN);
@@ -711,8 +718,8 @@ static bool sh_mmcif_write_block(struct sh_mmcif_host *host)
 {
struct device *dev = sh_mmcif_host_to_dev(host);
struct mmc_data *data = host->mrq->data;
-   u32 *p = sg_virt(data->sg);
-   int i;
+   u32 *p;
+   int off, i;
 
if (host->sd_error) {
data->error = sh_mmcif_error_manage(host);
@@ -720,8 +727,11 @@ static bool sh_mmcif_write_block(struct sh_mmcif_host 
*host)
return false;
}
 
+   p = kmap_atomic(sg_page(data->sg));
+   off = data->sg->offset / 4;
for (i = 0; i < host->blocksize / 4; i++)
-   sh_mmcif_writel(host->addr, MMCIF_CE_DATA, *p++);
+   sh_mmcif_writel(host->addr, MMCIF_CE_DATA, p[off++]);
+   kunmap_atomic(p);
 
/* buffer write end */
sh_mmcif_bitset(host, MMCIF_CE_INT_MASK, MASK_MDTRANE);
@@ -744,7 +754,7 @@ static void sh_mmcif_multi_write(struct sh_mmcif_host *host,
host->wait_for = MMCIF_WAIT_FOR_MWRITE;
host->sg_idx = 0;
host->sg_blkidx = 0;
-   host->pio_ptr = sg_virt(data->sg);
+   host->pio_offset 

[PATCH 06/11] mmc: s3cmci: handle chained sglists

2019-01-14 Thread Christoph Hellwig
Use the proper sg_next() helper to move to the next scatterlist element
to support chained scatterlists.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/s3cmci.c | 19 +--
 drivers/mmc/host/s3cmci.h |  2 +-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/mmc/host/s3cmci.c b/drivers/mmc/host/s3cmci.c
index 1be84426c817..2660fb61d1d0 100644
--- a/drivers/mmc/host/s3cmci.c
+++ b/drivers/mmc/host/s3cmci.c
@@ -320,20 +320,19 @@ static void s3cmci_check_sdio_irq(struct s3cmci_host 
*host)
 
 static inline int get_data_buffer(struct s3cmci_host *host)
 {
-   if (host->pio_sgptr >= host->mrq->data->sg_len) {
-   dbg(host, dbg_debug, "no more buffers (%i/%i)\n",
- host->pio_sgptr, host->mrq->data->sg_len);
+   if (!host->next_sg) {
+   dbg(host, dbg_debug, "no more buffers (%i)\n",
+ host->mrq->data->sg_len);
return -EBUSY;
}
-   host->cur_sg = >mrq->data->sg[host->pio_sgptr];
+   host->cur_sg = host->next_sg;
+   host->next_sg = sg_next(host->next_sg);
 
host->pio_bytes = host->cur_sg->length;
host->pio_offset = host->cur_sg->offset;
 
-   host->pio_sgptr++;
-
-   dbg(host, dbg_sg, "new buffer (%i/%i)\n",
-   host->pio_sgptr, host->mrq->data->sg_len);
+   dbg(host, dbg_sg, "new buffer (%i)\n",
+   host->mrq->data->sg_len);
 
return 0;
 }
@@ -1052,8 +1051,8 @@ static int s3cmci_prepare_pio(struct s3cmci_host *host, 
struct mmc_data *data)
 
BUG_ON((data->flags & BOTH_DIR) == BOTH_DIR);
 
-   host->pio_sgptr = 0;
-   host->cur_sg = >mrq->data->sg[host->pio_sgptr];
+   host->cur_sg = host->mrq->data->sg;
+   host->next_sg = sg_next(host->cur_sg);
host->pio_bytes = 0;
host->pio_count = 0;
host->pio_active = rw ? XFER_WRITE : XFER_READ;
diff --git a/drivers/mmc/host/s3cmci.h b/drivers/mmc/host/s3cmci.h
index 4320f7d832dc..caf1078d07d1 100644
--- a/drivers/mmc/host/s3cmci.h
+++ b/drivers/mmc/host/s3cmci.h
@@ -51,7 +51,7 @@ struct s3cmci_host {
int dma_complete;
 
struct scatterlist  *cur_sg;
-   u32 pio_sgptr;
+   struct scatterlist  *next_sg;
u32 pio_bytes;
u32 pio_count;
u32 pio_offset;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/11] mmc: omap: handle chained sglists

2019-01-14 Thread Christoph Hellwig
Use the proper sg_next() helper to move to the next scatterlist element
to support chained scatterlists.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/omap.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c
index 0377ac6194a0..248707aba2df 100644
--- a/drivers/mmc/host/omap.c
+++ b/drivers/mmc/host/omap.c
@@ -148,7 +148,7 @@ struct mmc_omap_host {
struct mmc_data *stop_data;
 
unsigned intsg_len;
-   int sg_idx;
+   struct scatterlist  *cur_sg;
u32 buffer_offset;
u32 buffer_bytes_left;
u32 total_bytes_left;
@@ -646,11 +646,8 @@ mmc_omap_cmd_timer(struct timer_list *t)
 static void
 mmc_omap_sg_to_buf(struct mmc_omap_host *host)
 {
-   struct scatterlist *sg;
-
-   sg = host->data->sg + host->sg_idx;
-   host->buffer_bytes_left = sg->length;
-   host->buffer_offset = sg->offset;
+   host->buffer_bytes_left = host->cur_sg->length;
+   host->buffer_offset = host->cur_sg->offset;
if (host->buffer_bytes_left > host->total_bytes_left)
host->buffer_bytes_left = host->total_bytes_left;
 }
@@ -667,13 +664,12 @@ mmc_omap_clk_timer(struct timer_list *t)
 static void
 mmc_omap_xfer_data(struct mmc_omap_host *host, int write)
 {
-   struct scatterlist *sg = host->data->sg + host->sg_idx;
+   struct scatterlist *sg = host->cur_sg;
int n, nwords;
void *p;
 
if (host->buffer_bytes_left == 0) {
-   host->sg_idx++;
-   BUG_ON(host->sg_idx == host->sg_len);
+   host->cur_sg = sg_next(host->cur_sg);
mmc_omap_sg_to_buf(host);
}
n = 64;
@@ -985,7 +981,7 @@ mmc_omap_prepare_data(struct mmc_omap_host *host, struct 
mmc_request *req)
}
}
 
-   host->sg_idx = 0;
+   host->cur_sg = host->data->sg;
if (use_dma) {
enum dma_data_direction dma_data_dir;
struct dma_async_tx_descriptor *tx;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/11] mmc: davinci: handle highmem pages

2019-01-14 Thread Christoph Hellwig
Instead of setting up a kernel pointer to track the current PIO address,
track the offset in the current page, and do an atomic kmap for the page
while doing the actual PIO operations.

Signed-off-by: Christoph Hellwig 
---
 drivers/mmc/host/davinci_mmc.c | 43 --
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index 9e68c3645e22..9c500c52b63d 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -194,11 +195,12 @@ struct mmc_davinci_host {
 #define DAVINCI_MMC_DATADIR_WRITE  2
unsigned char data_dir;
 
-   /* buffer is used during PIO of one scatterlist segment, and
-* is updated along with buffer_bytes_left.  bytes_left applies
-* to all N blocks of the PIO transfer.
+   /*
+* buffer_offset is used during PIO of one scatterlist segment, and is
+* updated along with buffer_bytes_left.  bytes_left applies to all N
+* blocks of the PIO transfer.
 */
-   u8 *buffer;
+   u32 buffer_offset;
u32 buffer_bytes_left;
u32 bytes_left;
 
@@ -229,8 +231,8 @@ static irqreturn_t mmc_davinci_irq(int irq, void *dev_id);
 /* PIO only */
 static void mmc_davinci_sg_to_buf(struct mmc_davinci_host *host)
 {
+   host->buffer_offset = host->sg->offset;
host->buffer_bytes_left = sg_dma_len(host->sg);
-   host->buffer = sg_virt(host->sg);
if (host->buffer_bytes_left > host->bytes_left)
host->buffer_bytes_left = host->bytes_left;
 }
@@ -238,7 +240,7 @@ static void mmc_davinci_sg_to_buf(struct mmc_davinci_host 
*host)
 static void davinci_fifo_data_trans(struct mmc_davinci_host *host,
unsigned int n)
 {
-   u8 *p;
+   void *p;
unsigned int i;
 
if (host->buffer_bytes_left == 0) {
@@ -246,7 +248,7 @@ static void davinci_fifo_data_trans(struct mmc_davinci_host 
*host,
mmc_davinci_sg_to_buf(host);
}
 
-   p = host->buffer;
+   p = kmap_atomic(sg_page(host->sg));
if (n > host->buffer_bytes_left)
n = host->buffer_bytes_left;
host->buffer_bytes_left -= n;
@@ -258,24 +260,31 @@ static void davinci_fifo_data_trans(struct 
mmc_davinci_host *host,
 */
if (host->data_dir == DAVINCI_MMC_DATADIR_WRITE) {
for (i = 0; i < (n >> 2); i++) {
-   writel(*((u32 *)p), host->base + DAVINCI_MMCDXR);
-   p = p + 4;
+   u32 *val = p + host->buffer_offset;
+
+   writel(*val, host->base + DAVINCI_MMCDXR);
+   host->buffer_offset += 4;
}
if (n & 3) {
-   iowrite8_rep(host->base + DAVINCI_MMCDXR, p, (n & 3));
-   p = p + (n & 3);
+   iowrite8_rep(host->base + DAVINCI_MMCDXR,
+   p + host->buffer_offset, n & 3);
+   host->buffer_offset += (n & 3);
}
} else {
for (i = 0; i < (n >> 2); i++) {
-   *((u32 *)p) = readl(host->base + DAVINCI_MMCDRR);
-   p  = p + 4;
+   u32 *val = p + host->buffer_offset;
+
+   *val = readl(host->base + DAVINCI_MMCDRR);
+   host->buffer_offset += 4;
}
if (n & 3) {
-   ioread8_rep(host->base + DAVINCI_MMCDRR, p, (n & 3));
-   p = p + (n & 3);
+   ioread8_rep(host->base + DAVINCI_MMCDRR,
+   p + host->buffer_offset, n & 3);
+   host->buffer_offset += (n & 3);
}
}
-   host->buffer = p;
+
+   kunmap_atomic(p);
 }
 
 static void mmc_davinci_start_command(struct mmc_davinci_host *host,
@@ -572,7 +581,7 @@ mmc_davinci_prepare_data(struct mmc_davinci_host *host, 
struct mmc_request *req)
host->base + DAVINCI_MMCFIFOCTL);
}
 
-   host->buffer = NULL;
+   host->buffer_offset = 0;
host->bytes_left = data->blocks * data->blksz;
 
/* For now we try to use DMA whenever we won't need partial FIFO
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


remove block layer bounce buffering for MMC

2019-01-14 Thread Christoph Hellwig
Hi everyone,

this series converts the remaining MMC host drivers to properly kmap the
scatterlist entries it does PIO operations on, and then goes on to
remove the usage of block layer bounce buffering (which I plan to remove
eventually) from the MMC layer.

As a bonus I've converted various drivers to the proper scatterlist
helpers so that at least in theory they are ready for chained
scatterlists.

All the changes are compile tested only as I don't have any of the
hardware, so a careful review would be appreciated.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
>
> On 2019/1/11 下午5:15, Joerg Roedel wrote:
>> On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
>>> Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
>>> all virtio devices under AMD-SEV guests?
>> Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
>> aperture, because that memory is not encrypted. The guest bounces the
>> data then to its encrypted memory.
>>
>> Regards,
>>
>>  Joerg
>
>
> Thanks, have you tested vhost-net in this case. I suspect it may not work

Which brings me back to my pet pevee that we need to take actions
that virtio uses the proper dma mapping API by default with quirks
for legacy cases.  The magic bypass it uses is just causing problems
over problems.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 19/19] arm64: trim includes in dma-mapping.c

2019-01-14 Thread Christoph Hellwig
With most of the previous functionality now elsewhere a lot of the
headers included in this file are not needed.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index bdfb4e985a69..b6e910d1533b 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -5,20 +5,9 @@
  */
 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
-
 #include 
 
 pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 17/19] dma-iommu: switch copyright boilerplace to SPDX

2019-01-14 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 13 +
 include/linux/dma-iommu.h | 13 +
 2 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e27909771d55..1b76121df94e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * A fairly generic DMA-API to IOMMU-API glue layer.
  *
@@ -5,18 +6,6 @@
  *
  * based in part on arch/arm/mm/dma-mapping.c:
  * Copyright (C) 2000-2004 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 
 #include 
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 5277aa8782bf..bfe9f19b1171 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -1,17 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Copyright (C) 2014-2015 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 #ifndef __DMA_IOMMU_H
 #define __DMA_IOMMU_H
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 16/19] dma-iommu: don't depend on CONFIG_DMA_DIRECT_REMAP

2019-01-14 Thread Christoph Hellwig
For entirely dma coherent architectures there is no good reason to ever
remap dma coherent allocation.  Move all the remap and pool code under
CONFIG_DMA_DIRECT_REMAP ifdefs, and drop the Kconfig dependency.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/Kconfig |  1 -
 drivers/iommu/dma-iommu.c | 10 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8b13fb7d0263..d9a25715650e 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -94,7 +94,6 @@ config IOMMU_DMA
select IOMMU_API
select IOMMU_IOVA
select NEED_SG_DMA_LENGTH
-   depends on DMA_DIRECT_REMAP
 
 config FSL_PAMU
bool "Freescale IOMMU support"
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index fd25c995bde4..e27909771d55 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -502,6 +502,7 @@ static void *iommu_dma_alloc_contiguous(struct device *dev, 
size_t size,
return page_address(page);
 }
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -775,6 +776,7 @@ static void *iommu_dma_alloc_noncoherent(struct device 
*dev, size_t size,
gfp, attrs);
return iommu_dma_alloc_remap(dev, size, dma_handle, gfp, attrs);
 }
+#endif /* CONFIG_DMA_DIRECT_REMAP */
 
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
@@ -1057,6 +1059,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 */
gfp |= __GFP_ZERO;
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (!dev_is_dma_coherent(dev))
return iommu_dma_alloc_noncoherent(dev, size, dma_handle, gfp,
attrs);
@@ -1064,6 +1067,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (gfpflags_allow_blocking(gfp) &&
!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_alloc_remap(dev, size, dma_handle, gfp, attrs);
+#endif
 
return iommu_dma_alloc_contiguous(dev, size, dma_handle, gfp, attrs);
 }
@@ -1083,6 +1087,7 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
 *
 * Hence how dodgy the below logic looks...
 */
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (dma_in_atomic_pool(cpu_addr, PAGE_ALIGN(size))) {
iommu_dma_free_pool(dev, size, cpu_addr, dma_handle);
return;
@@ -1096,6 +1101,7 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
page = vmalloc_to_page(cpu_addr);
dma_common_free_remap(cpu_addr, PAGE_ALIGN(size), VM_USERMAP);
} else
+#endif
page = virt_to_page(cpu_addr);
 
iommu_dma_free_contiguous(dev, size, page, dma_handle);
@@ -1119,11 +1125,13 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (off >= count || user_count > count - off)
return -ENXIO;
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (is_vmalloc_addr(cpu_addr)) {
if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_mmap_remap(cpu_addr, size, vma);
pfn = vmalloc_to_pfn(cpu_addr);
} else
+#endif
pfn = page_to_pfn(virt_to_page(cpu_addr));
 
return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
@@ -1137,11 +1145,13 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
struct page *page;
int ret;
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (is_vmalloc_addr(cpu_addr)) {
if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_get_sgtable_remap(sgt, cpu_addr, size);
page = vmalloc_to_page(cpu_addr);
} else
+#endif
page = virt_to_page(cpu_addr);
 
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 13/19] dma-iommu: don't remap contiguous allocations for coherent devices

2019-01-14 Thread Christoph Hellwig
There is no need to remap for pte attributes, or for a virtually
contiguous address, so just don't do it.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c9788e0c1d5d..710814a370b9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1041,10 +1041,10 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
addr = iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
attrs);
-   if (!addr)
-   return NULL;
-   page = virt_to_page(addr);
+   if (coherent || !addr)
+   return addr;
 
+   page = virt_to_page(addr);
addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
__builtin_return_address(0));
if (!addr) {
@@ -1052,8 +1052,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return NULL;
}
 
-   if (!coherent)
-   arch_dma_prep_coherent(page, iosize);
+   arch_dma_prep_coherent(page, iosize);
} else {
addr = iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
}
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/19] dma-iommu: move __iommu_dma_map

2019-01-14 Thread Christoph Hellwig
Moving this function up to its unmap counterpart helps to keep related
code together for the following changes.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 46 +++
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8f3dc6ab3da1..0727c109bcab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -437,6 +437,29 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
+static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
+   size_t size, int prot, struct iommu_domain *domain)
+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   size_t iova_off = 0;
+   dma_addr_t iova;
+
+   if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
+   iova_off = iova_offset(>iovad, phys);
+   size = iova_align(>iovad, size + iova_off);
+   }
+
+   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
+   if (!iova)
+   return DMA_MAPPING_ERROR;
+
+   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
+   iommu_dma_free_iova(cookie, iova, size);
+   return DMA_MAPPING_ERROR;
+   }
+   return iova + iova_off;
+}
+
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -684,29 +707,6 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
 
-static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot, struct iommu_domain *domain)
-{
-   struct iommu_dma_cookie *cookie = domain->iova_cookie;
-   size_t iova_off = 0;
-   dma_addr_t iova;
-
-   if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
-   iova_off = iova_offset(>iovad, phys);
-   size = iova_align(>iovad, size + iova_off);
-   }
-
-   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
-   if (!iova)
-   return DMA_MAPPING_ERROR;
-
-   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
-   iommu_dma_free_iova(cookie, iova, size);
-   return DMA_MAPPING_ERROR;
-   }
-   return iova + iova_off;
-}
-
 static dma_addr_t __iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, int prot)
 {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/19] dma-iommu: fix and refactor iommu_dma_mmap

2019-01-14 Thread Christoph Hellwig
The current iommu_dma_mmap code does not properly handle memory from the
page allocator that hasn't been remapped, which can happen in the rare
case of allocations for a coherent device that aren't allowed to block.

Fix this by replacing iommu_dma_mmap with a slightly tweaked copy of
dma_common_mmap with special handling for the remapped array of
pages allocated from __iommu_dma_alloc.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 59 +++
 1 file changed, 23 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e0ffe22775ac..26f479d49103 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -592,23 +592,27 @@ static struct page **__iommu_dma_alloc(struct device 
*dev, size_t size,
 }
 
 /**
- * __iommu_dma_mmap - Map a buffer into provided user VMA
- * @pages: Array representing buffer from __iommu_dma_alloc()
+ * iommu_dma_mmap_remap - Map a remapped page array into provided user VMA
+ * @cpu_addr: virtual address of the memory to be remapped
  * @size: Size of buffer in bytes
  * @vma: VMA describing requested userspace mapping
  *
- * Maps the pages of the buffer in @pages into @vma. The caller is responsible
+ * Maps the pages pointed to by @cpu_addr into @vma. The caller is responsible
  * for verifying the correct size and protection of @vma beforehand.
  */
-static int __iommu_dma_mmap(struct page **pages, size_t size,
+static int iommu_dma_mmap_remap(void *cpu_addr, size_t size,
struct vm_area_struct *vma)
 {
+   struct vm_struct *area = find_vm_area(cpu_addr);
unsigned long uaddr = vma->vm_start;
unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
int ret = -ENXIO;
 
+   if (WARN_ON(!area || !area->pages))
+   return -ENXIO;
+
for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
-   ret = vm_insert_page(vma, uaddr, pages[i]);
+   ret = vm_insert_page(vma, uaddr, area->pages[i]);
if (ret)
break;
uaddr += PAGE_SIZE;
@@ -1047,29 +1051,14 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
}
 }
 
-static int __iommu_dma_mmap_pfn(struct vm_area_struct *vma,
- unsigned long pfn, size_t size)
-{
-   int ret = -ENXIO;
-   unsigned long nr_vma_pages = vma_pages(vma);
-   unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long off = vma->vm_pgoff;
-
-   if (off < nr_pages && nr_vma_pages <= (nr_pages - off)) {
-   ret = remap_pfn_range(vma, vma->vm_start,
- pfn + off,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-   }
-
-   return ret;
-}
-
 static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
-   struct vm_struct *area;
+   unsigned long user_count = vma_pages(vma);
+   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   unsigned long off = vma->vm_pgoff;
+   unsigned long pfn;
int ret;
 
vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs);
@@ -1077,20 +1066,18 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, ))
return ret;
 
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   unsigned long pfn = vmalloc_to_pfn(cpu_addr);
-   return __iommu_dma_mmap_pfn(vma, pfn, size);
-   }
-
-   area = find_vm_area(cpu_addr);
-   if (WARN_ON(!area || !area->pages))
+   if (off >= count || user_count > count - off)
return -ENXIO;
 
-   return __iommu_dma_mmap(area->pages, size, vma);
+   if (is_vmalloc_addr(cpu_addr)) {
+   if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_mmap_remap(cpu_addr, size, vma);
+   pfn = vmalloc_to_pfn(cpu_addr);
+   } else
+   pfn = page_to_pfn(virt_to_page(cpu_addr));
+
+   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+   user_count << PAGE_SHIFT, vma->vm_page_prot);
 }
 
 static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page 
*page,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 14/19] dma-iommu: factor contiguous remapped allocations into helpers

2019-01-14 Thread Christoph Hellwig
This moves the last remaning non-dispatch code out of iommu_dma_alloc,
preparing to refactor the allocation method selection.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 48 +++
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 710814a370b9..956cb218c6ba 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -667,6 +667,29 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 }
 
+static void *iommu_dma_alloc_contiguous_remap(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
+   struct page *page;
+   void *addr;
+
+   addr = iommu_dma_alloc_contiguous(dev, size, dma_handle, gfp, attrs);
+   if (!addr)
+   return NULL;
+
+   page = virt_to_page(addr);
+   addr = dma_common_contiguous_remap(page, PAGE_ALIGN(size), VM_USERMAP,
+   prot, __builtin_return_address(0));
+   if (!addr)
+   goto out_free;
+   arch_dma_prep_coherent(page, size);
+   return addr;
+out_free:
+   iommu_dma_free_contiguous(dev, size, page, *dma_handle);
+   return NULL;
+}
+
 /**
  * iommu_dma_mmap_remap - Map a remapped page array into provided user VMA
  * @cpu_addr: virtual address of the memory to be remapped
@@ -1016,8 +1039,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
size_t iosize = size;
void *addr;
 
-   size = PAGE_ALIGN(size);
-
/*
 * Some drivers rely on this, and we probably don't want the
 * possibility of stale kernel data being read by devices anyway.
@@ -1036,23 +1057,12 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
attrs);
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
-   struct page *page;
-
-   addr = iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
-   attrs);
-   if (coherent || !addr)
-   return addr;
-
-   page = virt_to_page(addr);
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
-   __builtin_return_address(0));
-   if (!addr) {
-   iommu_dma_free_contiguous(dev, iosize, page, *handle);
-   return NULL;
-   }
-
-   arch_dma_prep_coherent(page, iosize);
+   if (coherent)
+   addr = iommu_dma_alloc_contiguous(dev, iosize, handle,
+   gfp, attrs);
+   else
+   addr = iommu_dma_alloc_contiguous_remap(dev, iosize,
+   handle, gfp, attrs);
} else {
addr = iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
}
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/19] dma-iommu: factor atomic pool allocations into helpers

2019-01-14 Thread Christoph Hellwig
This keeps the code together and will simplify compiling the code
out on architectures that are always dma coherent.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 51 +--
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 95d30b96e5bd..fdd283f45656 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -666,6 +666,35 @@ static int iommu_dma_get_sgtable_remap(struct sg_table 
*sgt, void *cpu_addr,
GFP_KERNEL);
 }
 
+static void iommu_dma_free_pool(struct device *dev, size_t size,
+   void *vaddr, dma_addr_t dma_handle)
+{
+   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), dma_handle, size);
+   dma_free_from_pool(vaddr, PAGE_ALIGN(size));
+}
+
+static void *iommu_dma_alloc_pool(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   bool coherent = dev_is_dma_coherent(dev);
+   struct page *page;
+   void *vaddr;
+
+   vaddr = dma_alloc_from_pool(PAGE_ALIGN(size), , gfp);
+   if (!vaddr)
+   return NULL;
+
+   *dma_handle = __iommu_dma_map(dev, page_to_phys(page), size,
+   dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs),
+   iommu_get_domain_for_dev(dev));
+   if (*dma_handle == DMA_MAPPING_ERROR) {
+   dma_free_from_pool(vaddr, PAGE_ALIGN(size));
+   return NULL;
+   }
+
+   return vaddr;
+}
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -974,21 +1003,18 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 * get the virtually contiguous buffer we need by way of a
 * physically contiguous allocation.
 */
-   if (coherent) {
-   page = alloc_pages(gfp, get_order(size));
-   addr = page ? page_address(page) : NULL;
-   } else {
-   addr = dma_alloc_from_pool(size, , gfp);
-   }
-   if (!addr)
+   if (!coherent)
+   return iommu_dma_alloc_pool(dev, iosize, handle, gfp,
+   attrs);
+
+   page = alloc_pages(gfp, get_order(size));
+   if (!page)
return NULL;
 
+   addr = page_address(page);
*handle = __iommu_dma_map_page(dev, page, 0, iosize, ioprot);
if (*handle == DMA_MAPPING_ERROR) {
-   if (coherent)
-   __free_pages(page, get_order(size));
-   else
-   dma_free_from_pool(addr, size);
+   __free_pages(page, get_order(size));
addr = NULL;
}
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
@@ -1042,8 +1068,7 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
 * Hence how dodgy the below logic looks...
 */
if (dma_in_atomic_pool(cpu_addr, size)) {
-   __iommu_dma_unmap_page(dev, handle, iosize, 0, 0);
-   dma_free_from_pool(cpu_addr, size);
+   iommu_dma_free_pool(dev, size, cpu_addr, handle);
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
struct page *page = vmalloc_to_page(cpu_addr);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/19] dma-iommu: refactor iommu_dma_free

2019-01-14 Thread Christoph Hellwig
Reorder the checks a bit so that a non-remapped allocation is the
fallthrough case, as this will ease making remapping conditional.
Also get rid of the confusing game with the size and iosize variables
and rename the handle argument to the more standard dma_handle.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 46 ---
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 73f76226ff5e..c9788e0c1d5d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1061,34 +1061,36 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 }
 
 static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
-   dma_addr_t handle, unsigned long attrs)
+   dma_addr_t dma_handle, unsigned long attrs)
 {
-   size_t iosize = size;
+   struct page *page;
 
-   size = PAGE_ALIGN(size);
/*
-* @cpu_addr will be one of 4 things depending on how it was allocated:
-* - A remapped array of pages for contiguous allocations.
-* - A remapped array of pages from iommu_dma_alloc_remap(), for all
-*   non-atomic allocations.
-* - A non-cacheable alias from the atomic pool, for atomic
-*   allocations by non-coherent devices.
-* - A normal lowmem address, for atomic allocations by
-*   coherent devices.
+* cpu_addr can be one of 4 things depending on how it was allocated:
+*
+*  (1) A non-cacheable alias from the atomic pool.
+*  (2) A remapped array of pages from iommu_dma_alloc_remap().
+*  (3) A remapped contiguous lowmem allocation.
+*  (4) A normal lowmem address.
+*
 * Hence how dodgy the below logic looks...
 */
-   if (dma_in_atomic_pool(cpu_addr, size)) {
-   iommu_dma_free_pool(dev, size, cpu_addr, handle);
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   iommu_dma_free_contiguous(dev, iosize,
-   vmalloc_to_page(cpu_addr), handle);
-   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
-   } else if (is_vmalloc_addr(cpu_addr)){
-   iommu_dma_free_remap(dev, iosize, cpu_addr, handle);
-   } else {
-   iommu_dma_free_contiguous(dev, iosize, virt_to_page(cpu_addr),
-   handle);
+   if (dma_in_atomic_pool(cpu_addr, PAGE_ALIGN(size))) {
+   iommu_dma_free_pool(dev, size, cpu_addr, dma_handle);
+   return;
}
+
+   if (is_vmalloc_addr(cpu_addr)) {
+   if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) {
+   iommu_dma_free_remap(dev, size, cpu_addr, dma_handle);
+   return;
+   }
+   page = vmalloc_to_page(cpu_addr);
+   dma_common_free_remap(cpu_addr, PAGE_ALIGN(size), VM_USERMAP);
+   } else
+   page = virt_to_page(cpu_addr);
+
+   iommu_dma_free_contiguous(dev, size, page, dma_handle);
 }
 
 static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/19] dma-iommu: factor contiguous allocations into helpers

2019-01-14 Thread Christoph Hellwig
This keeps the code together and will simplify using it in different
ways.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 110 --
 1 file changed, 59 insertions(+), 51 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index fdd283f45656..73f76226ff5e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -460,6 +460,48 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
return iova + iova_off;
 }
 
+static void iommu_dma_free_contiguous(struct device *dev, size_t size,
+   struct page *page, dma_addr_t dma_handle)
+{
+   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), dma_handle, size);
+   if (!dma_release_from_contiguous(dev, page, count))
+   __free_pages(page, get_order(size));
+}
+
+
+static void *iommu_dma_alloc_contiguous(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   bool coherent = dev_is_dma_coherent(dev);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   unsigned int page_order = get_order(size);
+   struct page *page = NULL;
+
+   if (gfpflags_allow_blocking(gfp))
+   page = dma_alloc_from_contiguous(dev, count, page_order,
+gfp & __GFP_NOWARN);
+
+   if (page)
+   memset(page_address(page), 0, PAGE_ALIGN(size));
+   else
+   page = alloc_pages(gfp, page_order);
+   if (!page)
+   return NULL;
+
+   *dma_handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
+   iommu_get_dma_domain(dev));
+   if (*dma_handle == DMA_MAPPING_ERROR) {
+   if (!dma_release_from_contiguous(dev, page, count))
+   __free_pages(page, page_order);
+   return NULL;
+   }
+
+   return page_address(page);
+}
+
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -747,19 +789,6 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
 
-static dma_addr_t __iommu_dma_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, int prot)
-{
-   return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
-   iommu_get_dma_domain(dev));
-}
-
-static void __iommu_dma_unmap_page(struct device *dev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
-}
-
 static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
@@ -984,7 +1013,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
bool coherent = dev_is_dma_coherent(dev);
-   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t iosize = size;
void *addr;
 
@@ -997,7 +1025,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
gfp |= __GFP_ZERO;
 
if (!gfpflags_allow_blocking(gfp)) {
-   struct page *page;
/*
 * In atomic context we can't remap anything, so we'll only
 * get the virtually contiguous buffer we need by way of a
@@ -1006,44 +1033,27 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!coherent)
return iommu_dma_alloc_pool(dev, iosize, handle, gfp,
attrs);
-
-   page = alloc_pages(gfp, get_order(size));
-   if (!page)
-   return NULL;
-
-   addr = page_address(page);
-   *handle = __iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   __free_pages(page, get_order(size));
-   addr = NULL;
-   }
+   return iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
+   attrs);
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
struct page *page;
 
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), gfp & __GFP_NOWARN);
-   if (!page)
+   addr = iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
+  

[PATCH 07/19] dma-iommu: fix and refactor iommu_dma_get_sgtable

2019-01-14 Thread Christoph Hellwig
The current iommu_dma_get_sgtable code does not properly handle memory
from the page allocator that hasn't been remapped, which can happen in
the rare case of allocations for a coherent device that aren't allowed
to block.

Fix this by replacing iommu_dma_get_sgtable with a slightly tweaked copy
of dma_common_get_sgtable with special handling for the remapped array
of pages allocated from __iommu_dma_alloc.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 49 +++
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 26f479d49103..8f3dc6ab3da1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -620,6 +620,18 @@ static int iommu_dma_mmap_remap(void *cpu_addr, size_t 
size,
return ret;
 }
 
+static int iommu_dma_get_sgtable_remap(struct sg_table *sgt, void *cpu_addr,
+   size_t size)
+{
+   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   struct vm_struct *area = find_vm_area(cpu_addr);
+
+   if (WARN_ON(!area || !area->pages))
+   return -ENXIO;
+   return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+   GFP_KERNEL);
+}
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -1080,37 +1092,24 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
user_count << PAGE_SHIFT, vma->vm_page_prot);
 }
 
-static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page 
*page,
-   size_t size)
-{
-   int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-
-   if (!ret)
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-   return ret;
-}
-
 static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
-   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   struct vm_struct *area = find_vm_area(cpu_addr);
-
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   struct page *page = vmalloc_to_page(cpu_addr);
-   return __iommu_dma_get_sgtable_page(sgt, page, size);
-   }
+   struct page *page;
+   int ret;
 
-   if (WARN_ON(!area || !area->pages))
-   return -ENXIO;
+   if (is_vmalloc_addr(cpu_addr)) {
+   if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_get_sgtable_remap(sgt, cpu_addr, size);
+   page = vmalloc_to_page(cpu_addr);
+   } else
+   page = virt_to_page(cpu_addr);
 
-   return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
-GFP_KERNEL);
+   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
+   if (!ret)
+   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+   return ret;
 }
 
 static const struct dma_map_ops iommu_dma_ops = {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/19] dma-iommu: don't use a scatterlist in iommu_dma_alloc

2019-01-14 Thread Christoph Hellwig
Directly iterating over the pages makes the code a bit simpler and
prepares for the following changes.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 40 +--
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d19f3d6b43c1..4f5546a103d8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct iommu_dma_msi_page {
@@ -549,9 +550,9 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
struct page **pages;
-   struct sg_table sgt;
dma_addr_t iova;
-   unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
+   unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap, i;
+   size_t mapped = 0;
 
*handle = DMA_MAPPING_ERROR;
 
@@ -576,32 +577,25 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
if (!iova)
goto out_free_pages;
 
-   if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL))
-   goto out_free_iova;
+   for (i = 0; i < count; i++) {
+   phys_addr_t phys = page_to_phys(pages[i]);
 
-   if (!(prot & IOMMU_CACHE)) {
-   struct sg_mapping_iter miter;
-   /*
-* The CPU-centric flushing implied by SG_MITER_TO_SG isn't
-* sufficient here, so skip it by using the "wrong" direction.
-*/
-   sg_miter_start(, sgt.sgl, sgt.orig_nents, 
SG_MITER_FROM_SG);
-   while (sg_miter_next())
-   flush_page(dev, miter.addr, page_to_phys(miter.page));
-   sg_miter_stop();
-   }
+   if (!(prot & IOMMU_CACHE)) {
+   void *vaddr = kmap_atomic(pages[i]);
 
-   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot)
-   < size)
-   goto out_free_sg;
+   flush_page(dev, vaddr, phys);
+   kunmap_atomic(vaddr);
+   }
+
+   if (iommu_map(domain, iova + mapped, phys, PAGE_SIZE, prot))
+   goto out_unmap;
+   mapped += PAGE_SIZE;
+   }
 
*handle = iova;
-   sg_free_table();
return pages;
-
-out_free_sg:
-   sg_free_table();
-out_free_iova:
+out_unmap:
+   iommu_unmap(domain, iova, mapped);
iommu_dma_free_iova(cookie, iova, size);
 out_free_pages:
__iommu_dma_free_pages(pages, count);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/19] dma-iommu: move the arm64 wrappers to common code

2019-01-14 Thread Christoph Hellwig
There is nothing really arm64 specific in the iommu_dma_ops
implementation, so move it to dma-iommu.c and keep a lot of symbols
self-contained.  Note the implementation does depend on the
DMA_DIRECT_REMAP infrastructure for now, so we'll have to make the
DMA_IOMMU support depend on it, but this will be relaxed soon.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 384 +---
 drivers/iommu/Kconfig   |   1 +
 drivers/iommu/dma-iommu.c   | 378 ---
 include/linux/dma-iommu.h   |  43 +---
 4 files changed, 359 insertions(+), 447 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 75fe7273a1e4..fffba9426ee4 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -58,37 +59,6 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
__dma_flush_area(page_address(page), size);
 }
 
-#ifdef CONFIG_IOMMU_DMA
-static int __swiotlb_get_sgtable_page(struct sg_table *sgt,
- struct page *page, size_t size)
-{
-   int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-
-   if (!ret)
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-
-   return ret;
-}
-
-static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
- unsigned long pfn, size_t size)
-{
-   int ret = -ENXIO;
-   unsigned long nr_vma_pages = vma_pages(vma);
-   unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long off = vma->vm_pgoff;
-
-   if (off < nr_pages && nr_vma_pages <= (nr_pages - off)) {
-   ret = remap_pfn_range(vma, vma->vm_start,
- pfn + off,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-   }
-
-   return ret;
-}
-#endif /* CONFIG_IOMMU_DMA */
-
 static int __init arm64_dma_init(void)
 {
WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
@@ -100,364 +70,18 @@ static int __init arm64_dma_init(void)
 arch_initcall(arm64_dma_init);
 
 #ifdef CONFIG_IOMMU_DMA
-#include 
-#include 
-#include 
-
-static void *__iommu_alloc_attrs(struct device *dev, size_t size,
-dma_addr_t *handle, gfp_t gfp,
-unsigned long attrs)
-{
-   bool coherent = dev_is_dma_coherent(dev);
-   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
-   size_t iosize = size;
-   void *addr;
-
-   if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
-   return NULL;
-
-   size = PAGE_ALIGN(size);
-
-   /*
-* Some drivers rely on this, and we probably don't want the
-* possibility of stale kernel data being read by devices anyway.
-*/
-   gfp |= __GFP_ZERO;
-
-   if (!gfpflags_allow_blocking(gfp)) {
-   struct page *page;
-   /*
-* In atomic context we can't remap anything, so we'll only
-* get the virtually contiguous buffer we need by way of a
-* physically contiguous allocation.
-*/
-   if (coherent) {
-   page = alloc_pages(gfp, get_order(size));
-   addr = page ? page_address(page) : NULL;
-   } else {
-   addr = dma_alloc_from_pool(size, , gfp);
-   }
-   if (!addr)
-   return NULL;
-
-   *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   if (coherent)
-   __free_pages(page, get_order(size));
-   else
-   dma_free_from_pool(addr, size);
-   addr = NULL;
-   }
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
-   struct page *page;
-
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), gfp & __GFP_NOWARN);
-   if (!page)
-   return NULL;
-
-   *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   dma_release_from_contiguous(dev, page,
-   size >> PAGE_SHIFT);
-   return NULL;
-   }
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP,
-  prot,
-  __builtin_return_address(0));
-   if (addr) {

implement generic dma_map_ops for IOMMUs

2019-01-14 Thread Christoph Hellwig
Hi Robin,

please take a look at this series, which implements a completely generic
set of dma_map_ops for IOMMU drivers.  This is done by taking the
existing arm64 code, moving it to drivers/iommu and then massaging it
so that it can also work for architectures with DMA remapping.  This
should help future ports to support IOMMUs more easily, and also allow
to remove various custom IOMMU dma_map_ops implementations, like Tom
was planning to for the AMD one.

A git tree is also available at:

git://git.infradead.org/users/hch/misc.git dma-iommu-ops

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-14 Thread Jason Wang


On 2019/1/11 下午5:15, Joerg Roedel wrote:

On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:

Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
all virtio devices under AMD-SEV guests?

Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
aperture, because that memory is not encrypted. The guest bounces the
data then to its encrypted memory.

Regards,

Joerg



Thanks, have you tested vhost-net in this case. I suspect it may not work



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 02/19] dma-iommu: cleanup dma-iommu.h

2019-01-14 Thread Christoph Hellwig
No need for a __KERNEL__ guard outside uapi, make sure we pull in the
includes unconditionally so users can rely on it, and add a missing
comment describing the #else cpp statement.  Last but not least include
 instead of the asm version, which is frowned upon.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-iommu.h | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index e760dc5d1fa8..65aa888c2768 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -16,15 +16,13 @@
 #ifndef __DMA_IOMMU_H
 #define __DMA_IOMMU_H
 
-#ifdef __KERNEL__
-#include 
-#include 
-
-#ifdef CONFIG_IOMMU_DMA
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#ifdef CONFIG_IOMMU_DMA
 int iommu_dma_init(void);
 
 /* Domain management interface for IOMMU drivers */
@@ -74,11 +72,7 @@ void iommu_dma_unmap_resource(struct device *dev, dma_addr_t 
handle,
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
-#else
-
-struct iommu_domain;
-struct msi_msg;
-struct device;
+#else /* CONFIG_IOMMU_DMA */
 
 static inline int iommu_dma_init(void)
 {
@@ -108,5 +102,4 @@ static inline void iommu_dma_get_resv_regions(struct device 
*dev, struct list_he
 }
 
 #endif /* CONFIG_IOMMU_DMA */
-#endif /* __KERNEL__ */
 #endif /* __DMA_IOMMU_H */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu