Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Christoph Hellwig
On Tue, Jan 29, 2019 at 08:58:35PM +, Jason Gunthorpe wrote:
> On Tue, Jan 29, 2019 at 01:39:49PM -0700, Logan Gunthorpe wrote:
> 
> > implement the mapping. And I don't think we should have 'special' vma's
> > for this (though we may need something to ensure we don't get mapping
> > requests mixed with different types of pages...).
> 
> I think Jerome explained the point here is to have a 'special vma'
> rather than a 'special struct page' as, really, we don't need a
> struct page at all to make this work.
> 
> If I recall your earlier attempts at adding struct page for BAR
> memory, it ran aground on issues related to O_DIRECT/sgls, etc, etc.

Struct page is what makes O_DIRECT work, using sgls or biovecs, etc on
it work.  Without struct page none of the above can work at all.  That
is why we use struct page for backing BARs in the existing P2P code.
Not that I'm a particular fan of creating struct page for this device
memory, but without major invasive surgery to large parts of the kernel
it is the only way to make it work.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Christoph Hellwig
On Wed, Jan 30, 2019 at 04:18:48AM +, Jason Gunthorpe wrote:
> Every attempt to give BAR memory to struct page has run into major
> trouble, IMHO, so I like that this approach avoids that.

Way less problems than not having struct page for doing anything
non-trivial.  If you map the BAR to userspace with remap_pfn_range
and friends the mapping is indeed very simple.  But any operation
that expects a page structure, which is at least everything using
get_user_pages won't work.

So you can't do direct I/O to your remapped BAR, you can't create MRs
on it, etc, etc.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/11] mmc: s3cmci: handle highmem pages

2019-01-30 Thread Christoph Hellwig
On Wed, Jan 30, 2019 at 08:57:47AM +0100, Ulf Hansson wrote:
> On Mon, 14 Jan 2019 at 10:58, Christoph Hellwig  wrote:
> >
> > Instead of setting up a kernel pointer to track the current PIO address,
> > track the offset in the current page, and do an atomic kmap for the page
> > while doing the actual PIO operations.
> >
> > Signed-off-by: Christoph Hellwig 
> 
> Nitpick: This one have some trailing whitespace errors, which git
> complains about when I apply it.
> 
> No need to re-spin due to this, I can fix it  up.

I'll need to respin to limit the segment size to a page anyway, so I
will take this into account as well for v2.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/11] mmc: s3cmci: handle highmem pages

2019-01-30 Thread Ulf Hansson
On Wed, 30 Jan 2019 at 09:04, Christoph Hellwig  wrote:
>
> On Wed, Jan 30, 2019 at 08:57:47AM +0100, Ulf Hansson wrote:
> > On Mon, 14 Jan 2019 at 10:58, Christoph Hellwig  wrote:
> > >
> > > Instead of setting up a kernel pointer to track the current PIO address,
> > > track the offset in the current page, and do an atomic kmap for the page
> > > while doing the actual PIO operations.
> > >
> > > Signed-off-by: Christoph Hellwig 
> >
> > Nitpick: This one have some trailing whitespace errors, which git
> > complains about when I apply it.
> >
> > No need to re-spin due to this, I can fix it  up.
>
> I'll need to respin to limit the segment size to a page anyway, so I
> will take this into account as well for v2.

Okay!

FYI, I have no further comment on the series, it looks okay to me!

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v3 02/21] iommu: Introduce cache_invalidate API

2019-01-30 Thread Auger Eric
Hi Alex,

On 1/30/19 12:16 AM, Alex Williamson wrote:
> On Fri, 25 Jan 2019 17:49:20 +0100
> Auger Eric  wrote:
> 
>> Hi Alex,
>>
>> On 1/11/19 10:30 PM, Alex Williamson wrote:
>>> On Tue,  8 Jan 2019 11:26:14 +0100
>>> Eric Auger  wrote:
>>>   
 From: "Liu, Yi L" 

 In any virtualization use case, when the first translation stage
 is "owned" by the guest OS, the host IOMMU driver has no knowledge
 of caching structure updates unless the guest invalidation activities
 are trapped by the virtualizer and passed down to the host.

 Since the invalidation data are obtained from user space and will be
 written into physical IOMMU, we must allow security check at various
 layers. Therefore, generic invalidation data format are proposed here,
 model specific IOMMU drivers need to convert them into their own format.

 Signed-off-by: Liu, Yi L 
 Signed-off-by: Jean-Philippe Brucker 
 Signed-off-by: Jacob Pan 
 Signed-off-by: Ashok Raj 
 Signed-off-by: Eric Auger 

 ---
 v1 -> v2:
 - add arch_id field
 - renamed tlb_invalidate into cache_invalidate as this API allows
   to invalidate context caches on top of IOTLBs

 v1:
 renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
 header. Commit message reworded.
 ---
  drivers/iommu/iommu.c  | 14 ++
  include/linux/iommu.h  | 14 ++
  include/uapi/linux/iommu.h | 95 ++
  3 files changed, 123 insertions(+)

 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index 0f2b7f1fc7c8..b2e248770508 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -1403,6 +1403,20 @@ int iommu_set_pasid_table(struct iommu_domain 
 *domain,
  }
  EXPORT_SYMBOL_GPL(iommu_set_pasid_table);
  
 +int iommu_cache_invalidate(struct iommu_domain *domain, struct device 
 *dev,
 + struct iommu_cache_invalidate_info *inv_info)
 +{
 +  int ret = 0;
 +
 +  if (unlikely(!domain->ops->cache_invalidate))
 +  return -ENODEV;
 +
 +  ret = domain->ops->cache_invalidate(domain, dev, inv_info);
 +
 +  return ret;
 +}
 +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 +
  static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
  {
 diff --git a/include/linux/iommu.h b/include/linux/iommu.h
 index 1da2a2357ea4..96d59886f230 100644
 --- a/include/linux/iommu.h
 +++ b/include/linux/iommu.h
 @@ -186,6 +186,7 @@ struct iommu_resv_region {
   * @of_xlate: add OF master IDs to iommu grouping
   * @pgsize_bitmap: bitmap of all possible supported page sizes
   * @set_pasid_table: set pasid table
 + * @cache_invalidate: invalidate translation caches
   */
  struct iommu_ops {
bool (*capable)(enum iommu_cap);
 @@ -231,6 +232,9 @@ struct iommu_ops {
int (*set_pasid_table)(struct iommu_domain *domain,
   struct iommu_pasid_table_config *cfg);
  
 +  int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
 +  struct iommu_cache_invalidate_info *inv_info);
 +
unsigned long pgsize_bitmap;
  };
  
 @@ -294,6 +298,9 @@ extern void iommu_detach_device(struct iommu_domain 
 *domain,
struct device *dev);
  extern int iommu_set_pasid_table(struct iommu_domain *domain,
 struct iommu_pasid_table_config *cfg);
 +extern int iommu_cache_invalidate(struct iommu_domain *domain,
 +  struct device *dev,
 +  struct iommu_cache_invalidate_info *inv_info);
  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 @@ -709,6 +716,13 @@ int iommu_set_pasid_table(struct iommu_domain *domain,
  {
return -ENODEV;
  }
 +static inline int
 +iommu_cache_invalidate(struct iommu_domain *domain,
 + struct device *dev,
 + struct iommu_cache_invalidate_info *inv_info)
 +{
 +  return -ENODEV;
 +}
  
  #endif /* CONFIG_IOMMU_API */
  
 diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
 index 7a7cf7a3de7c..4605f5cfac84 100644
 --- a/include/uapi/linux/iommu.h
 +++ b/include/uapi/linux/iommu.h
 @@ -47,4 +47,99 @@ struct iommu_pasid_table_config {
};
  };
  
 +/**
 + * enum iommu_inv_granularity - Generic invalidation granularity
 + * @IOMMU_INV_GRANU_DOMAIN_ALL_PASID: TLB entries or PASID caches of 
 all
 + *

Re: [PATCH 2/2] iommu/amd: Remove clear_flush_young notifier

2019-01-30 Thread Jean-Philippe Brucker
Hi Peter,

On 30/01/2019 05:57, Peter Xu wrote:
> AMD IOMMU driver is using the clear_flush_young() to do cache flushing
> but that's actually already covered by invalidate_range().  Remove the
> extra notifier and the chunks.

Aren't uses of clear_flush_young() and invalidate_range() orthogonal? If
I understood correctly, when doing reclaim the kernel clears the young
bit from the PTE. This requires flushing secondary TLBs (ATCs), so that
new accesses from devices will go through the IOMMU, set the young bit
again and the kernel can accurately track page use. I didn't see
invalidate_range() being called by rmap or vmscan in this case, but
might just be missing it.

Two MMU notifiers are used for the young bit, clear_flush_young() and
clear_young(). I believe the former should invalidate ATCs so that DMA
accesses participate in PTE aging. Otherwise the kernel can't know that
the device is still using entries marked 'old'. The latter,
clear_young() is exempted from invalidating the secondary TLBs by
mmu_notifier.h and the IOMMU driver doesn't need to implement it.

Thanks,
Jean

> 
> Signed-off-by: Peter Xu 
> ---
>  drivers/iommu/amd_iommu_v2.c | 24 
>  1 file changed, 24 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
> index 23dae9348ace..5d7ef750e4a0 100644
> --- a/drivers/iommu/amd_iommu_v2.c
> +++ b/drivers/iommu/amd_iommu_v2.c
> @@ -370,29 +370,6 @@ static struct pasid_state *mn_to_state(struct 
> mmu_notifier *mn)
>   return container_of(mn, struct pasid_state, mn);
>  }
>  
> -static void __mn_flush_page(struct mmu_notifier *mn,
> - unsigned long address)
> -{
> - struct pasid_state *pasid_state;
> - struct device_state *dev_state;
> -
> - pasid_state = mn_to_state(mn);
> - dev_state   = pasid_state->device_state;
> -
> - amd_iommu_flush_page(dev_state->domain, pasid_state->pasid, address);
> -}
> -
> -static int mn_clear_flush_young(struct mmu_notifier *mn,
> - struct mm_struct *mm,
> - unsigned long start,
> - unsigned long end)
> -{
> - for (; start < end; start += PAGE_SIZE)
> - __mn_flush_page(mn, start);
> -
> - return 0;
> -}
> -
>  static void mn_invalidate_range(struct mmu_notifier *mn,
>   struct mm_struct *mm,
>   unsigned long start, unsigned long end)
> @@ -430,7 +407,6 @@ static void mn_release(struct mmu_notifier *mn, struct 
> mm_struct *mm)
>  
>  static const struct mmu_notifier_ops iommu_mn = {
>   .release= mn_release,
> - .clear_flush_young  = mn_clear_flush_young,
>   .invalidate_range   = mn_invalidate_range,
>  };
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] virtio: Introduce virtio_max_dma_size()

2019-01-30 Thread Lendacky, Thomas
On 1/29/19 2:43 AM, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> This function returns the maximum segment size for a single
> dma transaction of a virtio device. The possible limit comes
> from the SWIOTLB implementation in the Linux kernel, that
> has an upper limit of (currently) 256kb of contiguous
> memory it can map. Other DMA-API implementations might also
> have limits.
> 
> Use the new dma_max_mapping_size() function to determine the
> maximum mapping size when DMA-API is in use for virtio.
> 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/virtio/virtio_ring.c | 10 ++
>  include/linux/virtio.h   |  2 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cd7e755484e3..9ca3fe6af9fa 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -266,6 +266,16 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>   return false;
>  }
>  
> +size_t virtio_max_dma_size(struct virtio_device *vdev)
> +{
> + size_t max_segment_size = SIZE_MAX;
> +
> + if (vring_use_dma_api(vdev))
> + max_segment_size = dma_max_mapping_size(>dev);
> +
> + return max_segment_size;
> +}

When I build with these patches and with the virtio devices as modules I
get a build failure. Looks like this needs an EXPORT_SYMBOL_GPL().

Thanks,
Tom

> +
>  static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
> dma_addr_t *dma_handle, gfp_t flag)
>  {
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index fa1b5da2804e..673fe3ef3607 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -157,6 +157,8 @@ int virtio_device_freeze(struct virtio_device *dev);
>  int virtio_device_restore(struct virtio_device *dev);
>  #endif
>  
> +size_t virtio_max_dma_size(struct virtio_device *vdev);
> +
>  #define virtio_device_for_each_vq(vdev, vq) \
>   list_for_each_entry(vq, >vqs, list)
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 09:00:06AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 30, 2019 at 04:18:48AM +, Jason Gunthorpe wrote:
> > Every attempt to give BAR memory to struct page has run into major
> > trouble, IMHO, so I like that this approach avoids that.
> 
> Way less problems than not having struct page for doing anything
> non-trivial.  If you map the BAR to userspace with remap_pfn_range
> and friends the mapping is indeed very simple.  But any operation
> that expects a page structure, which is at least everything using
> get_user_pages won't work.
> 
> So you can't do direct I/O to your remapped BAR, you can't create MRs
> on it, etc, etc.

We do not want direct I/O, in fact at least for GPU we want to seldomly
allow access to object vma, so the less thing can access it the happier
we are :) All the GPU userspace driver API (OpenGL, OpenCL, Vulkan, ...)
that expose any such mapping with the application are very clear on the
limitation which is often worded: the only valid thing is direct CPU
access (no syscall can be use with those pointers).

So application developer already have low expectation on what is valid
and allowed to do.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Logan Gunthorpe



On 2019-01-30 10:26 a.m., Christoph Hellwig wrote:
> On Wed, Jan 30, 2019 at 10:55:43AM -0500, Jerome Glisse wrote:
>> Even outside GPU driver, device driver like RDMA just want to share their
>> doorbell to other device and they do not want to see those doorbell page
>> use in direct I/O or anything similar AFAICT.
> 
> At least Mellanox HCA support and inline data feature where you
> can copy data directly into the BAR.  For something like a usrspace
> NVMe target it might be very useful to do direct I/O straight into
> the BAR for that.

Yup, these are things we definitely want to be able to do, and have done
with hacky garbage code: Direct I/O from NVMe to P2P BAR, then we could
Direct I/O to another drive or map it as an MR and send it over an RNIC.

We'd definitely like to move in that direction. And a world where such
userspace mappings are gimpped by the fact that they are only some
special feature of userspace VMAs that can only be used in specialized
userspace interfaces is not useful to us.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
> > I don't see why a special case with a VMA is really that different.
> 
> Well one *really* big difference is the VMA changes necessarily expose
> specialized new functionality to userspace which has to be supported
> forever and may be difficult to change. The p2pdma code is largely
> in-kernel and we can rework and change the interfaces all we want as we
> improve our struct page infrastructure.

I do not see how VMA changes are any different than using struct page
in respect to userspace exposure. Those vma callback do not need to be
set by everyone, in fact expectation is that only handful of driver
will set those.

How can we do p2p between RDMA and GPU for instance, without exposure
to userspace ? At some point you need to tell userspace hey this kernel
does allow you to do that :)

RDMA works on vma, and GPU driver can easily setup vma for an object
hence why vma sounds like a logical place. In fact vma (mmap of a device
file) is very common device driver pattern.

In the model i am proposing the exporting device is in control of
policy ie wether to allow or not the peer to peer mapping. So each
device driver can define proper device specific API to enable and
expose that feature to userspace.

If they do, the only thing we have to preserve is the end result for
the user. The userspace does not care one bit if we achieve this in
the kernel with a set of new callback within the vm_operations struct
or in some other way. Only the end result matter.

So question is do we want to allow RDMA to access GPU driver object ?
I believe we do, they are people using non upstream solution with open
source driver to do just that, so it is a testimony that they are
users for this. More use case have been propose too.

> 
> I'd also argue that p2pdma isn't nearly as specialized as this VMA thing
> and can be used pretty generically to do other things. Though, the other
> ideas we've talked about doing are pretty far off and may have other
> challenges.

I believe p2p is highly specialize on non cache-coherent inter-connect
platform like x86 with PCIE. So i do not think that using struct page
for this is a good idea, it is not warranted/needed, and it can only be
problematic if some random kernel code get holds of those struct page
without understanding it is not regular memory.

I believe the vma callback are the simplest solution with the minimum
burden for the device driver and for the kernel. If they are any better
solution that emerge there is nothing that would block us to remove
this to replace it with the other solution.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] virtio: Introduce virtio_max_dma_size()

2019-01-30 Thread Joerg Roedel
Hi Tom,

On Wed, Jan 30, 2019 at 03:10:29PM +, Lendacky, Thomas wrote:
> On 1/29/19 2:43 AM, Joerg Roedel wrote:

> > +size_t virtio_max_dma_size(struct virtio_device *vdev)
> > +{
> > +   size_t max_segment_size = SIZE_MAX;
> > +
> > +   if (vring_use_dma_api(vdev))
> > +   max_segment_size = dma_max_mapping_size(>dev);
> > +
> > +   return max_segment_size;
> > +}
> 
> When I build with these patches and with the virtio devices as modules I
> get a build failure. Looks like this needs an EXPORT_SYMBOL_GPL().

Thanks for pointing that out, I added the missing EXPORTs and will send
a new version shortly.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 06:26:53PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 30, 2019 at 10:55:43AM -0500, Jerome Glisse wrote:
> > Even outside GPU driver, device driver like RDMA just want to share their
> > doorbell to other device and they do not want to see those doorbell page
> > use in direct I/O or anything similar AFAICT.
> 
> At least Mellanox HCA support and inline data feature where you
> can copy data directly into the BAR.  For something like a usrspace
> NVMe target it might be very useful to do direct I/O straight into
> the BAR for that.

It doesn't really work like that. 

The PCI-E TLP sequence to trigger this feature is very precise, and
the data requires the right headers/etc. Mixing that with O_DIRECT
seems very unlikely.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 06:26:53PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 30, 2019 at 10:55:43AM -0500, Jerome Glisse wrote:
> > Even outside GPU driver, device driver like RDMA just want to share their
> > doorbell to other device and they do not want to see those doorbell page
> > use in direct I/O or anything similar AFAICT.
> 
> At least Mellanox HCA support and inline data feature where you
> can copy data directly into the BAR.  For something like a usrspace
> NVMe target it might be very useful to do direct I/O straight into
> the BAR for that.

And what i am proposing is not exclusive of that. If exporting device
wants to have struct page for its BAR than it can do so. What i do not
want is imposing that burden on everyone as many devices do not want
or do not care for that. Moreover having struct page and allowing that
struct page to trickle know in obscure corner of the kernel means that
exporter that want that will also have the burden to check that what
they are doing does not end up in something terribly bad.

While i would like a one API fits all i do not think that we can sanely
do that for P2P. They are too much differences between how different
devices expose and manage their BAR to make any such attempt reasonably
sane.

Maybe thing will evolve oragnicaly, but for now i do not see a way out
side the API i am proposing (again this is not exclusive of the struct
page API that is upstream both can co-exist and a device can use both
or just one).

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Logan Gunthorpe



On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
> I don't see why a special case with a VMA is really that different.

Well one *really* big difference is the VMA changes necessarily expose
specialized new functionality to userspace which has to be supported
forever and may be difficult to change. The p2pdma code is largely
in-kernel and we can rework and change the interfaces all we want as we
improve our struct page infrastructure.

I'd also argue that p2pdma isn't nearly as specialized as this VMA thing
and can be used pretty generically to do other things. Though, the other
ideas we've talked about doing are pretty far off and may have other
challenges.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5 v5] Fix virtio-blk issue with SWIOTLB

2019-01-30 Thread Lendacky, Thomas
On 1/30/19 10:40 AM, Joerg Roedel wrote:
> Hi,
> 
> here is the next version of this patch-set. Previous
> versions can be found here:
> 
>   V1: https://lore.kernel.org/lkml/20190110134433.15672-1-j...@8bytes.org/
> 
>   V2: https://lore.kernel.org/lkml/20190115132257.6426-1-j...@8bytes.org/
> 
>   V3: https://lore.kernel.org/lkml/20190123163049.24863-1-j...@8bytes.org/
> 
>   V4: https://lore.kernel.org/lkml/20190129084342.26030-1-j...@8bytes.org/
> 
> The problem solved here is a limitation of the SWIOTLB implementation,
> which does not support allocations larger than 256kb.  When the
> virtio-blk driver tries to read/write a block larger than that, the
> allocation of the dma-handle fails and an IO error is reported.
> 
> Changes to v4 are:
> 
>   - Added Reviewed-by tags from Christoph
> 
>   - Added missing EXPORT_SYMBOL(_GPL) lines
> 
> Please review.

Looks good. Booted and tested using an SEV guest without any issues.

Tested-by: Tom Lendacky 

Thanks,
Tom

> 
> Thanks,
> 
>   Joerg
> Joerg Roedel (5):
>   swiotlb: Introduce swiotlb_max_mapping_size()
>   swiotlb: Add is_swiotlb_active() function
>   dma: Introduce dma_max_mapping_size()
>   virtio: Introduce virtio_max_dma_size()
>   virtio-blk: Consider virtio_max_dma_size() for maximum segment size
> 
>  Documentation/DMA-API.txt|  8 
>  drivers/block/virtio_blk.c   | 10 ++
>  drivers/virtio/virtio_ring.c | 11 +++
>  include/linux/dma-mapping.h  | 16 
>  include/linux/swiotlb.h  | 11 +++
>  include/linux/virtio.h   |  2 ++
>  kernel/dma/direct.c  | 12 
>  kernel/dma/swiotlb.c | 14 ++
>  8 files changed, 80 insertions(+), 4 deletions(-)
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: print reason for iommu_map_page failure in map_sg

2019-01-30 Thread Joerg Roedel
On Mon, Jan 28, 2019 at 05:59:37PM -0700, Jerry Snitselaar wrote:
> Since there are multiple possible failures in iommu_map_page
> it would be useful to know which case is being hit when the
> error message is printed in map_sg. While here, fix up checkpatch
> complaint about using function name in a string instead of
> __func__.
> 
> Cc: Joerg Roedel 
> Signed-off-by: Jerry Snitselaar 

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove clear_flush_young notifier

2019-01-30 Thread Joerg Roedel
On Wed, Jan 30, 2019 at 01:57:58PM +0800, Peter Xu wrote:
> AMD IOMMU driver is using the clear_flush_young() to do cache flushing
> but that's actually already covered by invalidate_range().  Remove the
> extra notifier and the chunks.
> 
> Signed-off-by: Peter Xu 

Applied to x86/amd, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Remove change_pte notifier

2019-01-30 Thread Joerg Roedel
On Wed, Jan 30, 2019 at 01:57:57PM +0800, Peter Xu wrote:
> The change_pte() interface is tailored for PFN updates, while the
> other notifier invalidate_range() should be enough for Intel IOMMU
> cache flushing.  Actually we've done similar thing for AMD IOMMU
> already in 8301da53fbc1 ("iommu/amd: Remove change_pte mmu_notifier
> call-back", 2014-07-30) but the Intel IOMMU driver still have it.
> 
> Signed-off-by: Peter Xu 

Applied to x86/vt-d, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [git pull] IOMMU Fixes for Linux v5.0-rc4

2019-01-30 Thread pr-tracker-bot
The pull request you sent on Wed, 30 Jan 2019 16:06:23 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-fixes-v5.0-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/1c0490ce902206f4685f812fa81304fd1adf4e35

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 09:02:08AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 29, 2019 at 08:58:35PM +, Jason Gunthorpe wrote:
> > On Tue, Jan 29, 2019 at 01:39:49PM -0700, Logan Gunthorpe wrote:
> > 
> > > implement the mapping. And I don't think we should have 'special' vma's
> > > for this (though we may need something to ensure we don't get mapping
> > > requests mixed with different types of pages...).
> > 
> > I think Jerome explained the point here is to have a 'special vma'
> > rather than a 'special struct page' as, really, we don't need a
> > struct page at all to make this work.
> > 
> > If I recall your earlier attempts at adding struct page for BAR
> > memory, it ran aground on issues related to O_DIRECT/sgls, etc, etc.
> 
> Struct page is what makes O_DIRECT work, using sgls or biovecs, etc on
> it work.  Without struct page none of the above can work at all.  That
> is why we use struct page for backing BARs in the existing P2P code.
> Not that I'm a particular fan of creating struct page for this device
> memory, but without major invasive surgery to large parts of the kernel
> it is the only way to make it work.

I don't think anyone is interested in O_DIRECT/etc for RDMA doorbell
pages.

.. and again, I recall Logan already attempted to mix non-CPU memory
into sgls and it was a disaster. You pointed out that one cannot just
put iomem addressed into a SGL without auditing basically the entire
block stack to prove that nothing uses iomem without an iomem
accessor.

I recall that proposal veered into a direction where the block layer
would just fail very early if there was iomem in the sgl, so generally
no O_DIRECT support anyhow.

We already accepted the P2P stuff from Logan as essentially a giant
special case - it only works with RDMA and only because RDMA MR was
hacked up with a special p2p callback.

I don't see why a special case with a VMA is really that different.

If someone figures out the struct page path down the road it can
obviously be harmonized with this VMA approach pretty easily.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Logan Gunthorpe



On 2019-01-29 9:18 p.m., Jason Gunthorpe wrote:
> Every attempt to give BAR memory to struct page has run into major
> trouble, IMHO, so I like that this approach avoids that.
> 
> And if you don't have struct page then the only kernel object left to
> hang meta data off is the VMA itself.
> 
> It seems very similar to the existing P2P work between in-kernel
> consumers, just that VMA is now mediating a general user space driven
> discovery process instead of being hard wired into a driver.

But the kernel now has P2P bars backed by struct pages and it works
well. And that's what we are doing in-kernel. We even have a hacky
out-of-tree module which exposes these pages and it also works (but
would need Jerome's solution for denying those pages in GUP, etc). So
why do something completely different in userspace so they can't share
any of the DMA map infrastructure?

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Christoph Hellwig
On Wed, Jan 30, 2019 at 10:55:43AM -0500, Jerome Glisse wrote:
> Even outside GPU driver, device driver like RDMA just want to share their
> doorbell to other device and they do not want to see those doorbell page
> use in direct I/O or anything similar AFAICT.

At least Mellanox HCA support and inline data feature where you
can copy data directly into the BAR.  For something like a usrspace
NVMe target it might be very useful to do direct I/O straight into
the BAR for that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5 v5] Fix virtio-blk issue with SWIOTLB

2019-01-30 Thread Konrad Rzeszutek Wilk
On Wed, Jan 30, 2019 at 05:40:02PM +0100, Joerg Roedel wrote:
> Hi,
> 
> here is the next version of this patch-set. Previous
> versions can be found here:
> 
>   V1: https://lore.kernel.org/lkml/20190110134433.15672-1-j...@8bytes.org/
> 
>   V2: https://lore.kernel.org/lkml/20190115132257.6426-1-j...@8bytes.org/
> 
>   V3: https://lore.kernel.org/lkml/20190123163049.24863-1-j...@8bytes.org/
> 
>   V4: https://lore.kernel.org/lkml/20190129084342.26030-1-j...@8bytes.org/
> 
> The problem solved here is a limitation of the SWIOTLB implementation,
> which does not support allocations larger than 256kb.  When the
> virtio-blk driver tries to read/write a block larger than that, the
> allocation of the dma-handle fails and an IO error is reported.
> 
> Changes to v4 are:
> 
>   - Added Reviewed-by tags from Christoph
> 
>   - Added missing EXPORT_SYMBOL(_GPL) lines
> 
> Please review.

I can put it in my tree and send it to Linus .. unless folks want
to do it through a different tree?


> 
> Thanks,
> 
>   Joerg
> Joerg Roedel (5):
>   swiotlb: Introduce swiotlb_max_mapping_size()
>   swiotlb: Add is_swiotlb_active() function
>   dma: Introduce dma_max_mapping_size()
>   virtio: Introduce virtio_max_dma_size()
>   virtio-blk: Consider virtio_max_dma_size() for maximum segment size
> 
>  Documentation/DMA-API.txt|  8 
>  drivers/block/virtio_blk.c   | 10 ++
>  drivers/virtio/virtio_ring.c | 11 +++
>  include/linux/dma-mapping.h  | 16 
>  include/linux/swiotlb.h  | 11 +++
>  include/linux/virtio.h   |  2 ++
>  kernel/dma/direct.c  | 12 
>  kernel/dma/swiotlb.c | 14 ++
>  8 files changed, 80 insertions(+), 4 deletions(-)
> 
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/io-pgtable-arm-v7s: only kmemleak_ignore L2 tables

2019-01-30 Thread Will Deacon
On Mon, Jan 28, 2019 at 05:43:01PM +0800, Nicolas Boichat wrote:
> L1 tables are allocated with __get_dma_pages, and therefore already
> ignored by kmemleak.
> 
> Without this, the kernel would print this error message on boot,
> when the first L1 table is allocated:
> 
> [2.810533] kmemleak: Trying to color unknown object at 0xffd652388000 
> as Black
> [2.818190] CPU: 5 PID: 39 Comm: kworker/5:0 Tainted: G S
> 4.19.16 #8
> [2.831227] Workqueue: events deferred_probe_work_func
> [2.836353] Call trace:
> ...
> [2.852532]  paint_ptr+0xa0/0xa8
> [2.855750]  kmemleak_ignore+0x38/0x6c
> [2.859490]  __arm_v7s_alloc_table+0x168/0x1f4
> [2.863922]  arm_v7s_alloc_pgtable+0x114/0x17c
> [2.868354]  alloc_io_pgtable_ops+0x3c/0x78
> ...
> 
> Fixes: e5fc9753b1a8314 ("iommu/io-pgtable: Add ARMv7 short descriptor 
> support")
> Signed-off-by: Nicolas Boichat 
> ---
> 
> I only tested this on top of my other series
> (https://patchwork.kernel.org/patch/10720495/), but I think the same fix
> applies. I'm still a bit confused as to why this only shows up now, as IIUC,
> the kmemleak_ignore call was always wrong with L1 tables.

Yes, I managed to reproduce this on top of -rc4 (see below). I suspect you
/are/ the intersection of people using v7s w/ kmemleak, so this has just
lingered and never been hit until now.

For the patch (assuming this is going via Joerg):

Acked-by: Will Deacon 

Will

--->8

[0.124473] kmemleak: Trying to color unknown object at 0x842d8000 
as Black
[0.125312] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.0.0-rc4-00012-g40b114779944 #1
[0.126181] Hardware name: linux,dummy-virt (DT)
[0.126680] Call trace:
[0.126950]  dump_backtrace+0x0/0x140
[0.127346]  show_stack+0x14/0x20
[0.127706]  dump_stack+0x90/0xb4
[0.128066]  paint_ptr+0x94/0xa8
[0.128417]  kmemleak_ignore+0x54/0x60
[0.128991]  __arm_v7s_alloc_table+0x6c/0x240
[0.129661]  arm_v7s_alloc_pgtable+0x10c/0x188
[0.130359]  alloc_io_pgtable_ops+0x44/0xb0
[0.131006]  arm_v7s_do_selftests+0x84/0x4bc
[0.131663]  do_one_initcall+0x74/0x178
[0.132253]  kernel_init_freeable+0x188/0x220
[0.132923]  kernel_init+0x10/0x100
[0.133460]  ret_from_fork+0x10/0x18
[0.142102] arm-v7s io-pgtable: self test ok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Fixes for Linux v5.0-rc4

2019-01-30 Thread Joerg Roedel
Hi Linus,

The following changes since commit e8e683ae9a736407a20135df7809090a446db707:

  iommu/of: Fix probe-deferral (2019-01-11 12:28:24 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v5.0-rc4

for you to fetch changes up to 9825bd94e3a2baae1f4874767ae3a7d4c049720e:

  iommu/amd: Fix IOMMU page flush when detach device from a domain (2019-01-24 
15:24:49 +0100)


IOMMU Fixes for Linux v5.0-rc4

A few more fixes this time:

- Two patches to fix the error path of the map_sg implementation
  of the AMD IOMMU driver.

- Also a missing IOTLB flush is fixed in the AMD IOMMU driver.

- Memory leak fix for the Intel IOMMU driver.

- Fix a regression in the Mediatek IOMMU driver which caused
  device initialization to fail (seen as broken HDMI output).


Gerald Schaefer (1):
  iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()

Jerry Snitselaar (2):
  iommu/amd: Call free_iova_fast with pfn in map_sg
  iommu/amd: Unmap all mapped pages in error path of map_sg

Joerg Roedel (1):
  iommu/mediatek: Use correct fwspec in mtk_iommu_add_device()

Suravee Suthikulpanit (1):
  iommu/amd: Fix IOMMU page flush when detach device from a domain

 drivers/iommu/amd_iommu.c| 19 +--
 drivers/iommu/intel-iommu.c  |  2 +-
 drivers/iommu/mtk_iommu_v1.c |  4 
 3 files changed, 18 insertions(+), 7 deletions(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] iommu/vt-d: Leave scalable mode default off

2019-01-30 Thread Joerg Roedel
On Thu, Jan 24, 2019 at 10:31:32AM +0800, Lu Baolu wrote:
> Commit 765b6a98c1de3 ("iommu/vt-d: Enumerate the scalable
> mode capability") enables VT-d scalable mode if hardware
> advertises the capability. As we will bring up different
> features and use cases to upstream in different patch
> series, it will leave some intermediate kernel versions
> which support partial features. Hence, end user might run
> into problems when they use such kernels on bare metals
> or virtualization environments.
> 
> This leaves scalable mode default off and end users could
> turn it on with "intel-iommu=sm_on" only when they have
> clear ideas about which scalable features are supported
> in the kernel.
> 
> Cc: Liu Yi L 
> Cc: Jacob Pan 
> Suggested-by: Ashok Raj 
> Suggested-by: Kevin Tian 
> Signed-off-by: Lu Baolu 

Applied to iommu/fixes, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: Implement dma_[un]map_resource()

2019-01-30 Thread Joerg Roedel
On Tue, Jan 22, 2019 at 02:30:45PM -0700, Logan Gunthorpe wrote:
> Currently the Intel IOMMU uses the default dma_[un]map_resource()
> implementations does nothing and simply returns the physical address
> unmodified.
> 
> However, this doesn't create the IOVA entries necessary for addresses
> mapped this way to work when the IOMMU is enabled. Thus, when the
> IOMMU is enabled, drivers relying on dma_map_resource() will trigger
> DMAR errors. We see this when running ntb_transport with the IOMMU
> enabled, DMA, and switchtec hardware.
> 
> The implementation for intel_map_resource() is nearly identical to
> intel_map_page(), we just have to re-create __intel_map_single().
> dma_unmap_resource() uses intel_unmap_page() directly as the
> functions are identical.
> 
> Signed-off-by: Logan Gunthorpe 
> Cc: David Woodhouse 
> Cc: Joerg Roedel 

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 04:30:27AM +, Jason Gunthorpe wrote:
> On Tue, Jan 29, 2019 at 07:08:06PM -0500, Jerome Glisse wrote:
> > On Tue, Jan 29, 2019 at 11:02:25PM +, Jason Gunthorpe wrote:
> > > On Tue, Jan 29, 2019 at 03:44:00PM -0500, Jerome Glisse wrote:
> > > 
> > > > > But this API doesn't seem to offer any control - I thought that
> > > > > control was all coming from the mm/hmm notifiers triggering 
> > > > > p2p_unmaps?
> > > > 
> > > > The control is within the driver implementation of those callbacks. 
> > > 
> > > Seems like what you mean by control is 'the exporter gets to choose
> > > the physical address at the instant of map' - which seems reasonable
> > > for GPU.
> > > 
> > > 
> > > > will only allow p2p map to succeed for objects that have been tagged by 
> > > > the
> > > > userspace in some way ie the userspace application is in control of what
> > > > can be map to peer device.
> > > 
> > > I would have thought this means the VMA for the object is created
> > > without the map/unmap ops? Or are GPU objects and VMAs unrelated?
> > 
> > GPU object and VMA are unrelated in all open source GPU driver i am
> > somewhat familiar with (AMD, Intel, NVidia). You can create a GPU
> > object and never map it (and thus never have it associated with a
> > vma) and in fact this is very common. For graphic you usualy only
> > have hand full of the hundreds of GPU object your application have
> > mapped.
> 
> I mean the other way does every VMA with a p2p_map/unmap point to
> exactly one GPU object?
> 
> ie I'm surprised you say that p2p_map needs to have policy, I would
> have though the policy is applied when the VMA is created (ie objects
> that are not for p2p do not have p2p_map set), and even for GPU
> p2p_map should really only have to do with window allocation and pure
> 'can I even do p2p' type functionality.

All userspace API to enable p2p happens after object creation and in
some case they are mutable ie you can decide to no longer share the
object (userspace application decision). The BAR address space is a
resource from GPU driver point of view and thus from userspace point
of view. As such decissions that affect how it is use an what object
can use it, can change over application lifetime.

This is why i would like to allow kernel driver to apply any such
access policy, decided by the application on its object (on top of
which the kernel GPU driver can apply its own policy for GPU resource
sharing by forcing some object to main memory).


> 
> > Idea is that we can only ask exporter to be predictable and still allow
> > them to fail if things are really going bad.
> 
> I think hot unplug / PCI error recovery is one of the 'really going
> bad' cases..

GPU can hang and all data becomes _undefined_, it can also be suspended
to save power (think laptop with discret GPU for instance). GPU threads
can be kill ... So they are few cases i can think of where either you
want to kill the p2p mapping and make sure the importer is aware and
might have a change to report back through its own userspace API, or
at very least fallback to dummy pages. In some of the above cases, for
instance suspend, you just want to move thing around to allow to shut
down device memory.


> > I think i put it in the comment above the ops but in any cases i should
> > write something in documentation with example and thorough guideline.
> > Note that there won't be any mmu notifier to mmap of a device file
> > unless the device driver calls for it or there is a syscall like munmap
> > or mremap or mprotect well any syscall that work on vma.
> 
> This is something we might need to explore, does calling
> zap_vma_ptes() invoke enough notifiers that a MMU notifiers or HMM
> mirror consumer will release any p2p maps on that VMA?

Yes it does.

> 
> > If we ever want to support full pin then we might have to add a
> > flag so that GPU driver can refuse an importer that wants things
> > pin forever.
> 
> This would become interesting for VFIO and RDMA at least - I don't
> think VFIO has anything like SVA so it would want to import a p2p_map
> and indicate that it will not respond to MMU notifiers.
> 
> GPU can refuse, but maybe RDMA would allow it...

Ok i will add a flag field in next post. GPU could allow pin but they
would most likely use main memory for any such object, hence it is no
longer really p2p but at least both device look at the same data.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] swiotlb: Introduce swiotlb_max_mapping_size()

2019-01-30 Thread Joerg Roedel
From: Joerg Roedel 

The function returns the maximum size that can be remapped
by the SWIOTLB implementation. This function will be later
exposed to users through the DMA-API.

Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Joerg Roedel 
---
 include/linux/swiotlb.h | 5 +
 kernel/dma/swiotlb.c| 5 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7c007ed7505f..1c22d96e1742 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,7 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 
 extern int
 swiotlb_dma_supported(struct device *hwdev, u64 mask);
+size_t swiotlb_max_mapping_size(struct device *dev);
 
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
@@ -95,6 +96,10 @@ static inline unsigned int swiotlb_max_segment(void)
 {
return 0;
 }
+static inline size_t swiotlb_max_mapping_size(struct device *dev)
+{
+   return SIZE_MAX;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fb6fd68b9c7..9cb21259cb0b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -662,3 +662,8 @@ swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
return __phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
 }
+
+size_t swiotlb_max_mapping_size(struct device *dev)
+{
+   return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
+}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5 v5] Fix virtio-blk issue with SWIOTLB

2019-01-30 Thread Joerg Roedel
Hi,

here is the next version of this patch-set. Previous
versions can be found here:

V1: https://lore.kernel.org/lkml/20190110134433.15672-1-j...@8bytes.org/

V2: https://lore.kernel.org/lkml/20190115132257.6426-1-j...@8bytes.org/

V3: https://lore.kernel.org/lkml/20190123163049.24863-1-j...@8bytes.org/

V4: https://lore.kernel.org/lkml/20190129084342.26030-1-j...@8bytes.org/

The problem solved here is a limitation of the SWIOTLB implementation,
which does not support allocations larger than 256kb.  When the
virtio-blk driver tries to read/write a block larger than that, the
allocation of the dma-handle fails and an IO error is reported.

Changes to v4 are:

- Added Reviewed-by tags from Christoph

- Added missing EXPORT_SYMBOL(_GPL) lines

Please review.

Thanks,

Joerg
Joerg Roedel (5):
  swiotlb: Introduce swiotlb_max_mapping_size()
  swiotlb: Add is_swiotlb_active() function
  dma: Introduce dma_max_mapping_size()
  virtio: Introduce virtio_max_dma_size()
  virtio-blk: Consider virtio_max_dma_size() for maximum segment size

 Documentation/DMA-API.txt|  8 
 drivers/block/virtio_blk.c   | 10 ++
 drivers/virtio/virtio_ring.c | 11 +++
 include/linux/dma-mapping.h  | 16 
 include/linux/swiotlb.h  | 11 +++
 include/linux/virtio.h   |  2 ++
 kernel/dma/direct.c  | 12 
 kernel/dma/swiotlb.c | 14 ++
 8 files changed, 80 insertions(+), 4 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] swiotlb: Add is_swiotlb_active() function

2019-01-30 Thread Joerg Roedel
From: Joerg Roedel 

This function will be used from dma_direct code to determine
the maximum segment size of a dma mapping.

Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Joerg Roedel 
---
 include/linux/swiotlb.h | 6 ++
 kernel/dma/swiotlb.c| 9 +
 2 files changed, 15 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 1c22d96e1742..e9e786b4b598 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -63,6 +63,7 @@ extern void swiotlb_tbl_sync_single(struct device *hwdev,
 extern int
 swiotlb_dma_supported(struct device *hwdev, u64 mask);
 size_t swiotlb_max_mapping_size(struct device *dev);
+bool is_swiotlb_active(void);
 
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
@@ -100,6 +101,11 @@ static inline size_t swiotlb_max_mapping_size(struct 
device *dev)
 {
return SIZE_MAX;
 }
+
+static inline bool is_swiotlb_active(void)
+{
+   return false;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9cb21259cb0b..c873f9cc2146 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -667,3 +667,12 @@ size_t swiotlb_max_mapping_size(struct device *dev)
 {
return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
 }
+
+bool is_swiotlb_active(void)
+{
+   /*
+* When SWIOTLB is initialized, even if io_tlb_start points to physical
+* address zero, io_tlb_end surely doesn't.
+*/
+   return io_tlb_end != 0;
+}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] dma: Introduce dma_max_mapping_size()

2019-01-30 Thread Joerg Roedel
From: Joerg Roedel 

The function returns the maximum size that can be mapped
using DMA-API functions. The patch also adds the
implementation for direct DMA and a new dma_map_ops pointer
so that other implementations can expose their limit.

Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Joerg Roedel 
---
 Documentation/DMA-API.txt   |  8 
 include/linux/dma-mapping.h | 16 
 kernel/dma/direct.c | 12 
 3 files changed, 36 insertions(+)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index e133ccd60228..acfe3d0f78d1 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -195,6 +195,14 @@ Requesting the required mask does not alter the current 
mask.  If you
 wish to take advantage of it, you should issue a dma_set_mask()
 call to set the mask to the value returned.
 
+::
+
+   size_t
+   dma_direct_max_mapping_size(struct device *dev);
+
+Returns the maximum size of a mapping for the device. The size parameter
+of the mapping functions like dma_map_single(), dma_map_page() and
+others should not be larger than the returned value.
 
 Part Id - Streaming DMA mappings
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f6ded992c183..a3ca8a71a704 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -130,6 +130,7 @@ struct dma_map_ops {
enum dma_data_direction direction);
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
+   size_t (*max_mapping_size)(struct device *dev);
 };
 
 #define DMA_MAPPING_ERROR  (~(dma_addr_t)0)
@@ -257,6 +258,8 @@ static inline void dma_direct_sync_sg_for_cpu(struct device 
*dev,
 }
 #endif
 
+size_t dma_direct_max_mapping_size(struct device *dev);
+
 #ifdef CONFIG_HAS_DMA
 #include 
 
@@ -440,6 +443,19 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
return 0;
 }
 
+static inline size_t dma_max_mapping_size(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+   size_t size = SIZE_MAX;
+
+   if (dma_is_direct(ops))
+   size = dma_direct_max_mapping_size(dev);
+   else if (ops && ops->max_mapping_size)
+   size = ops->max_mapping_size(dev);
+
+   return size;
+}
+
 void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t flag, unsigned long attrs);
 void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 355d16acee6d..81ca8170b928 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -380,3 +380,15 @@ int dma_direct_supported(struct device *dev, u64 mask)
 */
return mask >= __phys_to_dma(dev, min_mask);
 }
+
+size_t dma_direct_max_mapping_size(struct device *dev)
+{
+   size_t size = SIZE_MAX;
+
+   /* If SWIOTLB is active, use its maximum mapping size */
+   if (is_swiotlb_active())
+   size = swiotlb_max_mapping_size(dev);
+
+   return size;
+}
+EXPORT_SYMBOL(dma_direct_max_mapping_size);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 02:01:35PM -0700, Logan Gunthorpe wrote:

> And I feel the GUP->SGL->DMA flow should still be what we are aiming
> for. Even if we need a special GUP for special pages, and a special DMA
> map; and the SGL still has to be homogenous

*shrug* so what if the special GUP called a VMA op instead of
traversing the VMA PTEs today? Why does it really matter? It could
easily change to a struct page flow tomorrow..

> > So, I see Jerome solving the GUP problem by replacing GUP entirely
> > using an API that is more suited to what these sorts of drivers
> > actually need.
> 
> Yes, this is what I'm expecting and what I want. Not bypassing the whole
> thing by doing special things with VMAs.

IMHO struct page is a big pain for this application, and if we can
build flows that don't actually need it then we shouldn't require it
just because the old flows needed it.

HMM mirror is a new flow that doesn't need struct page.

Would you feel better if this also came along with a:

  struct dma_sg_table *sgl_dma_map_user(struct device *dma_device, 
 void __user *prt, size_t len)

flow which returns a *DMA MAPPED* sgl that does not have struct page
pointers as another interface?

We can certainly call an API like this from RDMA for non-ODP MRs.

Eliminating the page pointers also eliminates the __iomem
problem. However this sgl object is not copyable or accessible from
the CPU, so the caller must be sure it doesn't need CPU access when
using this API. 

For RDMA I'd include some flag in the struct ib_device if the driver
requires CPU accessible SGLs and call the right API. Maybe the block
layer could do the same trick for O_DIRECT?

This would also directly solve the P2P problem with hfi1/qib/rxe, as
I'd likely also say that pci_p2pdma_map_sg() returns the same DMA only
sgl thing.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Logan Gunthorpe



On 2019-01-30 2:50 p.m., Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 02:01:35PM -0700, Logan Gunthorpe wrote:
> 
>> And I feel the GUP->SGL->DMA flow should still be what we are aiming
>> for. Even if we need a special GUP for special pages, and a special DMA
>> map; and the SGL still has to be homogenous
> 
> *shrug* so what if the special GUP called a VMA op instead of
> traversing the VMA PTEs today? Why does it really matter? It could
> easily change to a struct page flow tomorrow..

Well it's so that it's composable. We want the SGL->DMA side to work for
APIs from kernel space and not have to run a completely different flow
for kernel drivers than from userspace memory.

For GUP to do a special VMA traversal it would now need to return
something besides struct pages which means no SGL and it means a
completely different DMA mapping call.
> Would you feel better if this also came along with a:
> 
>   struct dma_sg_table *sgl_dma_map_user(struct device *dma_device, 
>  void __user *prt, size_t len)

That seems like a nice API. But certainly the implementation would need
to use existing dma_map or pci_p2pdma_map calls, or whatever as part of
it...

,
> flow which returns a *DMA MAPPED* sgl that does not have struct page
> pointers as another interface?
> 
> We can certainly call an API like this from RDMA for non-ODP MRs.
> 
> Eliminating the page pointers also eliminates the __iomem
> problem. However this sgl object is not copyable or accessible from
> the CPU, so the caller must be sure it doesn't need CPU access when
> using this API. 

We actually stopped caring about the __iomem problem. We are working
under the assumption that pages returned by devm_memremap_pages() can be
accessed as normal RAM and does not need the __iomem designation. The
main problem now is that code paths need to know to use pci_p2pdma_map
or not. And in theory this could be pushed into regular dma_map
implementations but we'd have to get it into all of them which is a pain.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 10:17:27AM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2019-01-29 9:18 p.m., Jason Gunthorpe wrote:
> > Every attempt to give BAR memory to struct page has run into major
> > trouble, IMHO, so I like that this approach avoids that.
> > 
> > And if you don't have struct page then the only kernel object left to
> > hang meta data off is the VMA itself.
> > 
> > It seems very similar to the existing P2P work between in-kernel
> > consumers, just that VMA is now mediating a general user space driven
> > discovery process instead of being hard wired into a driver.
> 
> But the kernel now has P2P bars backed by struct pages and it works
> well. 

I don't think it works that well..

We ended up with a 'sgl' that is not really a sgl, and doesn't work
with many of the common SGL patterns. sg_copy_buffer doesn't work,
dma_map, doesn't work, sg_page doesn't work quite right, etc.

Only nvme and rdma got the special hacks to make them understand these
p2p-sgls, and I'm still not convinced some of the RDMA drivers that
want access to CPU addresses from the SGL (rxe, usnic, hfi, qib) don't
break in this scenario.

Since the SGLs become broken, it pretty much means there is no path to
make GUP work generically, we have to go through and make everything
safe to use with p2p-sgls before allowing GUP. Which, frankly, sounds
impossible with all the competing objections.

But GPU seems to have a problem unrelated to this - what Jerome wants
is to have two faulting domains for VMA's - visible-to-cpu and
visible-to-dma. The new op is essentially faulting the pages into the
visible-to-dma category and leaving them invisible-to-cpu.

So that duality would still have to exists, and I think p2p_map/unmap
is a much simpler implementation than trying to create some kind of
special PTE in the VMA..

At least for RDMA, struct page or not doesn't really matter. 

We can make struct pages for the BAR the same way NVMe does.  GPU is
probably the same, just with more mememory at stake?  

And maybe this should be the first implementation. The p2p_map VMA
operation should return a SGL and the caller should do the existing
pci_p2pdma_map_sg() flow.. 

Worry about optimizing away the struct page overhead later?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 09:00:06AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 30, 2019 at 04:18:48AM +, Jason Gunthorpe wrote:
> > Every attempt to give BAR memory to struct page has run into major
> > trouble, IMHO, so I like that this approach avoids that.
> 
> Way less problems than not having struct page for doing anything
> non-trivial.  If you map the BAR to userspace with remap_pfn_range
> and friends the mapping is indeed very simple.  But any operation
> that expects a page structure, which is at least everything using
> get_user_pages won't work.

GUP doesn't work anyhow today, and won't work with BAR struct pages in
the forseeable future (Logan has sent attempts on this before).

So nothing seems lost..

> So you can't do direct I/O to your remapped BAR, you can't create MRs
> on it, etc, etc.

Jerome made the HMM mirror API use this flow, so afer his patch to
switch the ODP MR to use HMM, and to switch GPU drivers, it will work
for those cases. Which is more than the zero cases than we have today
:)

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 06:56:59PM +, Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 10:17:27AM -0700, Logan Gunthorpe wrote:
> > 
> > 
> > On 2019-01-29 9:18 p.m., Jason Gunthorpe wrote:
> > > Every attempt to give BAR memory to struct page has run into major
> > > trouble, IMHO, so I like that this approach avoids that.
> > > 
> > > And if you don't have struct page then the only kernel object left to
> > > hang meta data off is the VMA itself.
> > > 
> > > It seems very similar to the existing P2P work between in-kernel
> > > consumers, just that VMA is now mediating a general user space driven
> > > discovery process instead of being hard wired into a driver.
> > 
> > But the kernel now has P2P bars backed by struct pages and it works
> > well. 
> 
> I don't think it works that well..
> 
> We ended up with a 'sgl' that is not really a sgl, and doesn't work
> with many of the common SGL patterns. sg_copy_buffer doesn't work,
> dma_map, doesn't work, sg_page doesn't work quite right, etc.
> 
> Only nvme and rdma got the special hacks to make them understand these
> p2p-sgls, and I'm still not convinced some of the RDMA drivers that
> want access to CPU addresses from the SGL (rxe, usnic, hfi, qib) don't
> break in this scenario.
> 
> Since the SGLs become broken, it pretty much means there is no path to
> make GUP work generically, we have to go through and make everything
> safe to use with p2p-sgls before allowing GUP. Which, frankly, sounds
> impossible with all the competing objections.
> 
> But GPU seems to have a problem unrelated to this - what Jerome wants
> is to have two faulting domains for VMA's - visible-to-cpu and
> visible-to-dma. The new op is essentially faulting the pages into the
> visible-to-dma category and leaving them invisible-to-cpu.
> 
> So that duality would still have to exists, and I think p2p_map/unmap
> is a much simpler implementation than trying to create some kind of
> special PTE in the VMA..
> 
> At least for RDMA, struct page or not doesn't really matter. 
> 
> We can make struct pages for the BAR the same way NVMe does.  GPU is
> probably the same, just with more mememory at stake?  
> 
> And maybe this should be the first implementation. The p2p_map VMA
> operation should return a SGL and the caller should do the existing
> pci_p2pdma_map_sg() flow.. 

For GPU it would not work, GPU might want to use main memory (because
it is running out of BAR space) it is a lot easier if the p2p_map
callback calls the right dma map function (for page or io) rather than
having to define some format that would pass down the information.

> 
> Worry about optimizing away the struct page overhead later?

Struct page do not fit well for GPU as the BAR address can be reprogram
to point to any page inside the device memory (think 256M BAR versus
16GB device memory). Forcing struct page on GPU driver would require
major surgery to the GPU driver inner working and there is no benefit
to have from the struct page. So it is hard to justify this.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 20/20] iommu/mediatek: Switch to SPDX license identifier

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 8:00 PM Yong Wu  wrote:
>
> Switch to SPDX license identifier for MediaTek iommu/smi and their
> header files.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Rob Herring 

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Logan Gunthorpe



On 2019-01-30 12:38 p.m., Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 02:22:34PM -0500, Jerome Glisse wrote:
> 
>> For GPU it would not work, GPU might want to use main memory (because
>> it is running out of BAR space) it is a lot easier if the p2p_map
>> callback calls the right dma map function (for page or io) rather than
>> having to define some format that would pass down the information.
> 
> This is already sort of built into the sgl, you are supposed to use
> is_pci_p2pdma_page() and pci_p2pdma_map_sg() and somehow it is supposed
> to work out - but I think this is also fairly incomplete.


> ie the current APIs seem to assume the SGL is homogeneous :(

We never changed SGLs. We still use them to pass p2pdma pages, only we
need to be a bit careful where we send the entire SGL. I see no reason
why we can't continue to be careful once their in userspace if there's
something in GUP to deny them.

It would be nice to have heterogeneous SGLs and it is something we
should work toward but in practice they aren't really necessary at the
moment.

>>> Worry about optimizing away the struct page overhead later?
>>
>> Struct page do not fit well for GPU as the BAR address can be reprogram
>> to point to any page inside the device memory (think 256M BAR versus
>> 16GB device memory).
> 
> The struct page only points to the BAR - it is not related to the
> actual GPU memory in any way. The struct page is just an alternative
> way to specify the physical address of the BAR page.

That doesn't even necessarily need to be the case. For HMM, I
understand, struct pages may not point to any accessible memory and the
memory that backs it (or not) may change over the life time of it. So
they don't have to be strictly tied to BARs addresses. p2pdma pages are
strictly tied to BAR addresses though.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 08:50:00PM +, Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 03:43:32PM -0500, Jerome Glisse wrote:
> > On Wed, Jan 30, 2019 at 08:11:19PM +, Jason Gunthorpe wrote:
> > > On Wed, Jan 30, 2019 at 01:00:02PM -0700, Logan Gunthorpe wrote:
> > > 
> > > > We never changed SGLs. We still use them to pass p2pdma pages, only we
> > > > need to be a bit careful where we send the entire SGL. I see no reason
> > > > why we can't continue to be careful once their in userspace if there's
> > > > something in GUP to deny them.
> > > > 
> > > > It would be nice to have heterogeneous SGLs and it is something we
> > > > should work toward but in practice they aren't really necessary at the
> > > > moment.
> > > 
> > > RDMA generally cannot cope well with an API that requires homogeneous
> > > SGLs.. User space can construct complex MRs (particularly with the
> > > proposed SGL MR flow) and we must marshal that into a single SGL or
> > > the drivers fall apart.
> > > 
> > > Jerome explained that GPU is worse, a single VMA may have a random mix
> > > of CPU or device pages..
> > > 
> > > This is a pretty big blocker that would have to somehow be fixed.
> > 
> > Note that HMM takes care of that RDMA ODP with my ODP to HMM patch,
> > so what you get for an ODP umem is just a list of dma address you
> > can program your device to. The aim is to avoid the driver to care
> > about that. The access policy when the UMEM object is created by
> > userspace through verbs API should however ascertain that for mmap
> > of device file it is only creating a UMEM that is fully covered by
> > one and only one vma. GPU device driver will have one vma per logical
> > GPU object. I expect other kind of device do that same so that they
> > can match a vma to a unique object in their driver.
> 
> A one VMA rule is not really workable.
> 
> With ODP VMA boundaries can move around across the lifetime of the MR
> and we have no obvious way to fail anything if userpace puts a VMA
> boundary in the middle of an existing ODP MR address range.

This is true only for vma that are not mmap of a device file. This is
what i was trying to get accross. An mmap of a file is never merge
so it can only get split/butcher by munmap/mremap but when that happen
you also need to reflect the virtual address space change to the
device ie any access to a now invalid range must trigger error.

> 
> I think the HMM mirror API really needs to deal with this for the
> driver somehow.

Yes the HMM does deal with this for you, you do not have to worry about
it. Sorry if that was not clear. I just wanted to stress that vma that
are mmap of a file do not behave like other vma hence when you create
the UMEM you can check for those if you feel the need.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 09:56:07PM +, Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 04:45:25PM -0500, Jerome Glisse wrote:
> > On Wed, Jan 30, 2019 at 08:50:00PM +, Jason Gunthorpe wrote:
> > > On Wed, Jan 30, 2019 at 03:43:32PM -0500, Jerome Glisse wrote:
> > > > On Wed, Jan 30, 2019 at 08:11:19PM +, Jason Gunthorpe wrote:
> > > > > On Wed, Jan 30, 2019 at 01:00:02PM -0700, Logan Gunthorpe wrote:
> > > > > 
> > > > > > We never changed SGLs. We still use them to pass p2pdma pages, only 
> > > > > > we
> > > > > > need to be a bit careful where we send the entire SGL. I see no 
> > > > > > reason
> > > > > > why we can't continue to be careful once their in userspace if 
> > > > > > there's
> > > > > > something in GUP to deny them.
> > > > > > 
> > > > > > It would be nice to have heterogeneous SGLs and it is something we
> > > > > > should work toward but in practice they aren't really necessary at 
> > > > > > the
> > > > > > moment.
> > > > > 
> > > > > RDMA generally cannot cope well with an API that requires homogeneous
> > > > > SGLs.. User space can construct complex MRs (particularly with the
> > > > > proposed SGL MR flow) and we must marshal that into a single SGL or
> > > > > the drivers fall apart.
> > > > > 
> > > > > Jerome explained that GPU is worse, a single VMA may have a random mix
> > > > > of CPU or device pages..
> > > > > 
> > > > > This is a pretty big blocker that would have to somehow be fixed.
> > > > 
> > > > Note that HMM takes care of that RDMA ODP with my ODP to HMM patch,
> > > > so what you get for an ODP umem is just a list of dma address you
> > > > can program your device to. The aim is to avoid the driver to care
> > > > about that. The access policy when the UMEM object is created by
> > > > userspace through verbs API should however ascertain that for mmap
> > > > of device file it is only creating a UMEM that is fully covered by
> > > > one and only one vma. GPU device driver will have one vma per logical
> > > > GPU object. I expect other kind of device do that same so that they
> > > > can match a vma to a unique object in their driver.
> > > 
> > > A one VMA rule is not really workable.
> > > 
> > > With ODP VMA boundaries can move around across the lifetime of the MR
> > > and we have no obvious way to fail anything if userpace puts a VMA
> > > boundary in the middle of an existing ODP MR address range.
> > 
> > This is true only for vma that are not mmap of a device file. This is
> > what i was trying to get accross. An mmap of a file is never merge
> > so it can only get split/butcher by munmap/mremap but when that happen
> > you also need to reflect the virtual address space change to the
> > device ie any access to a now invalid range must trigger error.
> 
> Why is it invalid? The address range still has valid process memory?

If you do munmap(A, size) then all address in the range [A, A+size]
are invalid. This is what i am refering too here. Same for mremap.

> 
> What is the problem in the HMM mirror that it needs this restriction?

No restriction at all here. I think i just wasn't understood.

> There is also the situation where we create an ODP MR that spans 0 ->
> U64_MAX in the process address space. In this case there are lots of
> different VMAs it covers and we expect it to fully track all changes
> to all VMAs.

Yes and that works however any memory access above TASK_SIZE will
return -EFAULT as this is kernel address space so you can only access
anything that is a valid process virtual address.

> 
> So we have to spin up dedicated umem_odps that carefully span single
> VMAs, and somehow track changes to VMA ?

No you do not.

> 
> mlx5 odp does some of this already.. But yikes, this needs some pretty
> careful testing in all these situations.

Sorry if i confused you even more than the first time. Everything works
you have nothing to worry about :)

> 
> > > I think the HMM mirror API really needs to deal with this for the
> > > driver somehow.
> > 
> > Yes the HMM does deal with this for you, you do not have to worry about
> > it. Sorry if that was not clear. I just wanted to stress that vma that
> > are mmap of a file do not behave like other vma hence when you create
> > the UMEM you can check for those if you feel the need.
> 
> What properties do we get from HMM mirror? Will it tell us when to
> create more umems to cover VMA seams or will it just cause undesired
> no-mapped failures in some cases?

You do not get anything from HMM mirror, i might add a flag so that
HMM can report this special condition if driver wants to know. If
you want to know you have to look at the vma by yourself. GPU driver
will definitly want to know whem importing so i might add a flag so
that they do not have to lookup the vma themself to know.

Again if you do not care then just ignore everything here, it is
handled by HMM and you do not have to worry one bit. If it worked
with GUP it will work with HMM and with those p2p patches if it
will even 

Re: [PATCH v5 15/20] memory: mtk-smi: Invoke pm runtime_callback to enable clocks

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
>
> This patch only move the clk_prepare_enable and config_port into the
> runtime suspend/resume callback. It doesn't change the code content
> and sequence.
>
> This is a preparing patch for adjusting SMI_BUS_SEL for mt8183.
> (SMI_BUS_SEL need to be restored after smi-common resume every time.)
> Also it gives a chance to get rid of mtk_smi_larb_get/put which could
> be a next topic.
>
> CC: Matthias Brugger 
> Signed-off-by: Yong Wu 

I believe this refactoring is a no-op as described, because the order is still:
1) mtk_smi_clk_enable(common)
2) mtk_smi_clk_enable(larb)
3) larb_gen->config_port()

And teardown still happens in the opposite order, except for
config_port, which they seem not to do in suspend.
So, looks good to me.

Reviewed-by: Evan Green 

> ---
>  drivers/memory/mtk-smi.c | 113 
> ++-
>  1 file changed, 72 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> index a430721..9790801 100644
> --- a/drivers/memory/mtk-smi.c
> +++ b/drivers/memory/mtk-smi.c
> @@ -86,17 +86,13 @@ struct mtk_smi_larb { /* larb: local arbiter */
> u32 *mmu;
>  };
>
> -static int mtk_smi_enable(const struct mtk_smi *smi)
> +static int mtk_smi_clk_enable(const struct mtk_smi *smi)
>  {
> int ret;
>
> -   ret = pm_runtime_get_sync(smi->dev);
> -   if (ret < 0)
> -   return ret;
> -
> ret = clk_prepare_enable(smi->clk_apb);
> if (ret)
> -   goto err_put_pm;
> +   return ret;
>
> ret = clk_prepare_enable(smi->clk_smi);
> if (ret)
> @@ -118,59 +114,28 @@ static int mtk_smi_enable(const struct mtk_smi *smi)
> clk_disable_unprepare(smi->clk_smi);
>  err_disable_apb:
> clk_disable_unprepare(smi->clk_apb);
> -err_put_pm:
> -   pm_runtime_put_sync(smi->dev);
> return ret;
>  }
>
> -static void mtk_smi_disable(const struct mtk_smi *smi)
> +static void mtk_smi_clk_disable(const struct mtk_smi *smi)
>  {
> clk_disable_unprepare(smi->clk_gals1);
> clk_disable_unprepare(smi->clk_gals0);
> clk_disable_unprepare(smi->clk_smi);
> clk_disable_unprepare(smi->clk_apb);
> -   pm_runtime_put_sync(smi->dev);
>  }
>
>  int mtk_smi_larb_get(struct device *larbdev)
>  {
> -   struct mtk_smi_larb *larb = dev_get_drvdata(larbdev);
> -   const struct mtk_smi_larb_gen *larb_gen = larb->larb_gen;
> -   struct mtk_smi *common = dev_get_drvdata(larb->smi_common_dev);
> -   int ret;
> +   int ret = pm_runtime_get_sync(larbdev);
>
> -   /* Enable the smi-common's power and clocks */
> -   ret = mtk_smi_enable(common);
> -   if (ret)
> -   return ret;
> -
> -   /* Enable the larb's power and clocks */
> -   ret = mtk_smi_enable(>smi);
> -   if (ret) {
> -   mtk_smi_disable(common);
> -   return ret;
> -   }
> -
> -   /* Configure the iommu info for this larb */
> -   larb_gen->config_port(larbdev);
> -
> -   return 0;
> +   return (ret < 0) ? ret : 0;
>  }
>  EXPORT_SYMBOL_GPL(mtk_smi_larb_get);
>
>  void mtk_smi_larb_put(struct device *larbdev)
>  {
> -   struct mtk_smi_larb *larb = dev_get_drvdata(larbdev);
> -   struct mtk_smi *common = dev_get_drvdata(larb->smi_common_dev);
> -
> -   /*
> -* Don't de-configure the iommu info for this larb since there may be
> -* several modules in this larb.
> -* The iommu info will be reset after power off.
> -*/
> -
> -   mtk_smi_disable(>smi);
> -   mtk_smi_disable(common);
> +   pm_runtime_put_sync(larbdev);
>  }
>  EXPORT_SYMBOL_GPL(mtk_smi_larb_put);
>
> @@ -385,12 +350,52 @@ static int mtk_smi_larb_remove(struct platform_device 
> *pdev)
> return 0;
>  }
>
> +static int __maybe_unused mtk_smi_larb_resume(struct device *dev)
> +{
> +   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
> +   const struct mtk_smi_larb_gen *larb_gen = larb->larb_gen;
> +   int ret;
> +
> +   /* Power on smi-common. */
> +   ret = pm_runtime_get_sync(larb->smi_common_dev);
> +   if (ret < 0) {
> +   dev_err(dev, "Failed to pm get for smi-common(%d).\n", ret);
> +   return ret;
> +   }
> +
> +   ret = mtk_smi_clk_enable(>smi);
> +   if (ret < 0) {
> +   dev_err(dev, "Failed to enable clock(%d).\n", ret);
> +   pm_runtime_put_sync(larb->smi_common_dev);
> +   return ret;
> +   }
> +
> +   /* Configure the basic setting for this larb */
> +   larb_gen->config_port(dev);
> +
> +   return 0;
> +}
> +
> +static int __maybe_unused mtk_smi_larb_suspend(struct device *dev)
> +{
> +   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
> +
> +   mtk_smi_clk_disable(>smi);
> +   pm_runtime_put_sync(larb->smi_common_dev);
> +   

Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
> > I don't see why a special case with a VMA is really that different.
> 
> Well one *really* big difference is the VMA changes necessarily expose
> specialized new functionality to userspace which has to be supported
> forever and may be difficult to change. 

The only user change here is that more things will succeed when
creating RDMA MRs (and vice versa to GPU). I don't think this
restricts the kernel implementation at all, unless we intend to
remove P2P entirely..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 18/20] iommu/mediatek: Fix VLD_PA_RANGE register backup when suspend

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 8:00 PM Yong Wu  wrote:
>
> The register VLD_PA_RNG(0x118) was forgot to backup while adding 4GB
> mode support for mt2712. this patch add it.
>
> Fixes: 30e2fccf9512 ("iommu/mediatek: Enlarge the validate PA range
> for 4GB mode")
> Signed-off-by: Yong Wu 

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Logan Gunthorpe



On 2019-01-30 12:19 p.m., Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
>>
>>
>> On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
>>> I don't see why a special case with a VMA is really that different.
>>
>> Well one *really* big difference is the VMA changes necessarily expose
>> specialized new functionality to userspace which has to be supported
>> forever and may be difficult to change. 
> 
> The only user change here is that more things will succeed when
> creating RDMA MRs (and vice versa to GPU). I don't think this
> restricts the kernel implementation at all, unless we intend to
> remove P2P entirely..

Well for MRs I'd expect you are using struct pages to track the memory
some how VMAs that aren't backed by pages and use this special
interface must therefore be creating new special interfaces that can
call p2p_[un]map...

I'd much rather see special cases around struct page so we can find ways
to generalize it in the future than create special cases tied to random
userspace interfaces.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 03:43:32PM -0500, Jerome Glisse wrote:
> On Wed, Jan 30, 2019 at 08:11:19PM +, Jason Gunthorpe wrote:
> > On Wed, Jan 30, 2019 at 01:00:02PM -0700, Logan Gunthorpe wrote:
> > 
> > > We never changed SGLs. We still use them to pass p2pdma pages, only we
> > > need to be a bit careful where we send the entire SGL. I see no reason
> > > why we can't continue to be careful once their in userspace if there's
> > > something in GUP to deny them.
> > > 
> > > It would be nice to have heterogeneous SGLs and it is something we
> > > should work toward but in practice they aren't really necessary at the
> > > moment.
> > 
> > RDMA generally cannot cope well with an API that requires homogeneous
> > SGLs.. User space can construct complex MRs (particularly with the
> > proposed SGL MR flow) and we must marshal that into a single SGL or
> > the drivers fall apart.
> > 
> > Jerome explained that GPU is worse, a single VMA may have a random mix
> > of CPU or device pages..
> > 
> > This is a pretty big blocker that would have to somehow be fixed.
> 
> Note that HMM takes care of that RDMA ODP with my ODP to HMM patch,
> so what you get for an ODP umem is just a list of dma address you
> can program your device to. The aim is to avoid the driver to care
> about that. The access policy when the UMEM object is created by
> userspace through verbs API should however ascertain that for mmap
> of device file it is only creating a UMEM that is fully covered by
> one and only one vma. GPU device driver will have one vma per logical
> GPU object. I expect other kind of device do that same so that they
> can match a vma to a unique object in their driver.

A one VMA rule is not really workable.

With ODP VMA boundaries can move around across the lifetime of the MR
and we have no obvious way to fail anything if userpace puts a VMA
boundary in the middle of an existing ODP MR address range.

I think the HMM mirror API really needs to deal with this for the
driver somehow.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 10:33:04PM +, Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 05:30:27PM -0500, Jerome Glisse wrote:
> 
> > > What is the problem in the HMM mirror that it needs this restriction?
> > 
> > No restriction at all here. I think i just wasn't understood.
> 
> Are you are talking about from the exporting side - where the thing
> creating the VMA can really only put one distinct object into it?

The message i was trying to get accross is that HMM mirror will
always succeed for everything* except for special vma ie mmap of
device file. For those it can only succeed if a p2p_map() call
succeed.

So any user of HMM mirror might to know why the mirroring fail ie
was it because something exceptional is happening ? Or is it because
i was trying to map a special vma which can be forbiden.

Hence why i assume that you might want to know about such p2p_map
failure at the time you create the umem odp object as it might be
some failure you might want to report differently and handle
differently. If you do not care about differentiating OOM or
exceptional failure from p2p_map failure than you have nothing to
worry about you will get the same error from HMM for both.

Cheers,
Jérôme

* Everything except when they are exceptional condition like OOM or
  poisonous memory.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 16/20] memory: mtk-smi: Add bus_sel for mt8183

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
>
> There are 2 mmu cells in a M4U HW. we could adjust some larbs entering
> mmu0 or mmu1 to balance the bandwidth via the smi-common register
> SMI_BUS_SEL(0x220)(Each larb occupy 2 bits).
>
> In mt8183, For better performance, we switch larb1/2/5/7 to enter
> mmu1 while the others still keep enter mmu0.
>
> In mt8173 and mt2712, we don't get the performance issue,
> Keep its default value(0x0), that means all the larbs enter mmu0.
>
> Note: smi gen1(mt2701/mt7623) don't have this bus_sel.
>
> CC: Matthias Brugger 
> Signed-off-by: Yong Wu 
> ---
>  drivers/memory/mtk-smi.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> index 9790801..08cf40d 100644
> --- a/drivers/memory/mtk-smi.c
> +++ b/drivers/memory/mtk-smi.c
> @@ -49,6 +49,12 @@
>  #define SMI_LARB_NONSEC_CON(id)(0x380 + ((id) * 4))
>  #define F_MMU_EN   BIT(0)
>
> +/* SMI COMMON */
> +#define SMI_BUS_SEL0x220
> +#define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1)
> +/* All are MMU0 defaultly. Only specialize mmu1 here. */
> +#define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid))
> +
>  enum mtk_smi_gen {
> MTK_SMI_GEN1,
> MTK_SMI_GEN2
> @@ -57,6 +63,7 @@ enum mtk_smi_gen {
>  struct mtk_smi_common_plat {
> enum mtk_smi_gen gen;
> bool has_gals;
> +   u32  bus_sel; /* Balance some larbs to enter mmu0 or mmu1 
> */
>  };
>
>  struct mtk_smi_larb_gen {
> @@ -72,8 +79,8 @@ struct mtk_smi {
> struct clk  *clk_apb, *clk_smi;
> struct clk  *clk_gals0, *clk_gals1;
> struct clk  *clk_async; /*only needed by mt2701*/
> -   void __iomem*smi_ao_base;
> -
> +   void __iomem*smi_ao_base; /* only for gen1 */
> +   void __iomem*base;/* only for gen2 */
> const struct mtk_smi_common_plat *plat;
>  };
>
> @@ -410,6 +417,8 @@ static int __maybe_unused mtk_smi_larb_suspend(struct 
> device *dev)
>  static const struct mtk_smi_common_plat mtk_smi_common_mt8183 = {
> .gen  = MTK_SMI_GEN2,
> .has_gals = true,
> +   .bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(5) |
> +   F_MMU1_LARB(7),
>  };
>
>  static const struct of_device_id mtk_smi_common_of_ids[] = {
> @@ -482,6 +491,11 @@ static int mtk_smi_common_probe(struct platform_device 
> *pdev)
> ret = clk_prepare_enable(common->clk_async);
> if (ret)
> return ret;
> +   } else {
> +   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +   common->base = devm_ioremap_resource(dev, res);
> +   if (IS_ERR(common->base))
> +   return PTR_ERR(common->base);

So you split base and smi_ao_base because they're completely different
register regions, or because ->base is no longer "always on"? It's
tempting to recombine them because they appear to be mutually
exclusive, but if they're truly different register regions then I
understand.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 01:00:02PM -0700, Logan Gunthorpe wrote:

> We never changed SGLs. We still use them to pass p2pdma pages, only we
> need to be a bit careful where we send the entire SGL. I see no reason
> why we can't continue to be careful once their in userspace if there's
> something in GUP to deny them.
> 
> It would be nice to have heterogeneous SGLs and it is something we
> should work toward but in practice they aren't really necessary at the
> moment.

RDMA generally cannot cope well with an API that requires homogeneous
SGLs.. User space can construct complex MRs (particularly with the
proposed SGL MR flow) and we must marshal that into a single SGL or
the drivers fall apart.

Jerome explained that GPU is worse, a single VMA may have a random mix
of CPU or device pages..

This is a pretty big blocker that would have to somehow be fixed.

> That doesn't even necessarily need to be the case. For HMM, I
> understand, struct pages may not point to any accessible memory and the
> memory that backs it (or not) may change over the life time of it. So
> they don't have to be strictly tied to BARs addresses. p2pdma pages are
> strictly tied to BAR addresses though.

No idea, but at least for this case I don't think we need magic HMM
pages to make simple VMA ops p2p_map/umap work..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 12:48:33PM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2019-01-30 12:19 p.m., Jason Gunthorpe wrote:
> > On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
> >>> I don't see why a special case with a VMA is really that different.
> >>
> >> Well one *really* big difference is the VMA changes necessarily expose
> >> specialized new functionality to userspace which has to be supported
> >> forever and may be difficult to change. 
> > 
> > The only user change here is that more things will succeed when
> > creating RDMA MRs (and vice versa to GPU). I don't think this
> > restricts the kernel implementation at all, unless we intend to
> > remove P2P entirely..
> 
> Well for MRs I'd expect you are using struct pages to track the memory
> some how 

Not really, for MRs most drivers care about DMA addresses only. The
only reason struct page ever gets involved is because it is part of
the GUP, SGL and dma_map family of APIs.

> VMAs that aren't backed by pages and use this special interface must
> therefore be creating new special interfaces that can call
> p2p_[un]map...

Well, those are kernel internal interfaces, so they can be changed

No matter what we do, code that wants to DMA to user BAR pages must
take *some kind* of special care - either it needs to use a special
GUP and SGL flow, or a mixed GUP, SGL and p2p_map flow. 

I don't really see why one is better than the other at this point, or
why doing one means we can't do the other some day later. They are
fairly similar.

O_DIRECT seems to be the justification for struct page, but nobody is
signing up to make O_DIRECT have the required special GUP/SGL/P2P flow
that would be needed to *actually* make that work - so it really isn't
a justification today.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 08:11:19PM +, Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 01:00:02PM -0700, Logan Gunthorpe wrote:
> 
> > We never changed SGLs. We still use them to pass p2pdma pages, only we
> > need to be a bit careful where we send the entire SGL. I see no reason
> > why we can't continue to be careful once their in userspace if there's
> > something in GUP to deny them.
> > 
> > It would be nice to have heterogeneous SGLs and it is something we
> > should work toward but in practice they aren't really necessary at the
> > moment.
> 
> RDMA generally cannot cope well with an API that requires homogeneous
> SGLs.. User space can construct complex MRs (particularly with the
> proposed SGL MR flow) and we must marshal that into a single SGL or
> the drivers fall apart.
> 
> Jerome explained that GPU is worse, a single VMA may have a random mix
> of CPU or device pages..
> 
> This is a pretty big blocker that would have to somehow be fixed.

Note that HMM takes care of that RDMA ODP with my ODP to HMM patch,
so what you get for an ODP umem is just a list of dma address you
can program your device to. The aim is to avoid the driver to care
about that. The access policy when the UMEM object is created by
userspace through verbs API should however ascertain that for mmap
of device file it is only creating a UMEM that is fully covered by
one and only one vma. GPU device driver will have one vma per logical
GPU object. I expect other kind of device do that same so that they
can match a vma to a unique object in their driver.

> 
> > That doesn't even necessarily need to be the case. For HMM, I
> > understand, struct pages may not point to any accessible memory and the
> > memory that backs it (or not) may change over the life time of it. So
> > they don't have to be strictly tied to BARs addresses. p2pdma pages are
> > strictly tied to BAR addresses though.
> 
> No idea, but at least for this case I don't think we need magic HMM
> pages to make simple VMA ops p2p_map/umap work..

Yes, you do not need page for simple driver, if we start creating struct
page for all PCIE BAR we are gonna waste a lot of memory and resources
for no good reason. I doubt all of the PCIE BAR of a device enabling p2p
will ever be map as p2p. So simple driver do not need struct page, GPU
driver that do not use HMM (all GPU that are more than 2 years old) do
not need struct page. Struct page is a burden here more than anything
else. I have not seen one good thing the struct page gives you.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/5] iommu: Add APIs for IOMMU PASID management

2019-01-30 Thread Jacob Pan
On Mon, 12 Nov 2018 14:44:57 +0800
Lu Baolu  wrote:

> This adds APIs for IOMMU drivers and device drivers to manage
> the PASIDs used for DMA transfer and translation. It bases on
> I/O ASID allocator for PASID namespace management and relies
> on vendor specific IOMMU drivers for paravirtual PASIDs.
> 
Trying to integrate this with VT-d SVM code had some thoughts.
First of all, it seems to me the problem we are trying to solve here is
to allow custom PASID/IOASID allocator.
IOASID, as in driver base, is a common infrastructure of its own right.
So it is OK to let device drivers such as VT-d driver directly
communicate with IOASID w/o going through common IOMMU layer. Therefore
I see little value of adding this wrapper to ioasid code, instead I
feel it might be less work to enhance ioasid with the following:

1. allow ioasid code to register a system wide custom asid allocator,
which takes precedence over the idr allocator
e.g.
typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
void ioasid_set_allocator(ioasid_alloc_fn_t fn)
{}
We can still use idr for tracking ioasid and private data lookup, since
the ioasid idr is exclusively owned by ioasid_alloc, we can rest and be
sure there is no conflict with other callers. See code comments below.

2. support setting asid private data _after_ asid is allocated. The use
case is that PASID allocation and mm bind can be split into separate
stages. Private data (mm related) are not available at the time of
allocation, only bind time.
Since IDR needs the data pointer at allocation time, we can still
create an empty ioasid_data for ioasid tracking at alloc time. i.e.
struct ioasid_data {
void *ptr; /* private data */
};


3. allow NULL ioasid_set
I also had a hard time understanding the use of ioasid_set, since there
is already a private data associated with each ioasid, is it not enough
to group ioasid based on private data?

So the usage scenarios can be.
1. during boot, vt-d driver register a custom ioasid allocator (vcmd) if
it detects its running as a guest.

2. Running as a guest, all pasid allocation via ioasid_alloc() will go
to this custom allocators and tracked

3. for native case, there is no custom allocator, ioasid just use IDR
alloc

4. for native bind mm, pasid allocation already has private data filled
in when calling ioasid_alloc

5. for guest bind, pasid is allocated with empty private data (called
by VFIO layer), but private data is filled in later by bind guest
pasid inside the vt-d driver.

So my thinking is that we can avoid having another layer of APIs as
below and keep everything within ioasid. Also allows private data to be
managed at lower level as compared to VFIO.


Thanks,

Jacob

> Below APIs are added:
> 
> * iommu_pasid_init(pasid)
>   - Initialize a PASID consumer. The vendor specific IOMMU
> drivers are able to set the PASID range imposed by IOMMU
> hardware through a callback in iommu_ops.
> 
> * iommu_pasid_exit(pasid)
>   - The PASID consumer stops consuming any PASID.
> 
> * iommu_pasid_alloc(pasid, min, max, private, *ioasid)
>   - Allocate a PASID and associate a @private data with this
> PASID. The PASID value is stored in @ioaisd if returning
> success.
> 
> * iommu_pasid_free(pasid, ioasid)
>   - Free a PASID to the pool so that it could be consumed by
> others.
> 
> This also adds below helpers to lookup or iterate PASID items
> associated with a consumer.
> 
> * iommu_pasid_for_each(pasid, func, data)
>   - Iterate PASID items of the consumer identified by @pasid,
> and call @func() against each item. An error returned from
> @func() will break the iteration.
> 
> * iommu_pasid_find(pasid, ioasid)
>   - Retrieve the private data associated with @ioasid.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Jean-Philippe Brucker 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/Kconfig |  1 +
>  drivers/iommu/iommu.c | 89
> +++ include/linux/iommu.h |
> 73 +++ 3 files changed, 163
> insertions(+)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d9a25715650e..39f2bb76c7b8 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -1,6 +1,7 @@
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>   bool
> + select IOASID
>  
>  menuconfig IOMMU_SUPPORT
>   bool "IOMMU Hardware Support"
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0b7c96d1425e..570b244897bb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2082,3 +2082,92 @@ void iommu_detach_device_aux(struct
> iommu_domain *domain, struct device *dev) }
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_device_aux);
> +
> +/*
> + * APIs for PASID used by IOMMU and the device drivers which depend
> + * on IOMMU.
> + */
> +struct iommu_pasid *iommu_pasid_init(struct bus_type *bus)
> +{
> + struct iommu_pasid *pasid;
> + int 

Re: [PATCH v5 19/20] iommu/mediatek: Add shutdown callback

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 8:00 PM Yong Wu  wrote:
>
> In the reboot burning test, if some Multimedia HW has something wrong,
> It may keep send the invalid request to IOMMU. In order to avoid
> affect the reboot flow, we add the shutdown callback to disable
> M4U HW when shutdown.

Sounds unpleasant. Hopefully the reboot flow still continues properly
even in that case, since this shutdown code may not run during some
rougher resets.

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 02:22:34PM -0500, Jerome Glisse wrote:

> For GPU it would not work, GPU might want to use main memory (because
> it is running out of BAR space) it is a lot easier if the p2p_map
> callback calls the right dma map function (for page or io) rather than
> having to define some format that would pass down the information.

This is already sort of built into the sgl, you are supposed to use
is_pci_p2pdma_page() and pci_p2pdma_map_sg() and somehow it is supposed
to work out - but I think this is also fairly incomplete.

ie the current APIs seem to assume the SGL is homogeneous :(

> > Worry about optimizing away the struct page overhead later?
> 
> Struct page do not fit well for GPU as the BAR address can be reprogram
> to point to any page inside the device memory (think 256M BAR versus
> 16GB device memory).

The struct page only points to the BAR - it is not related to the
actual GPU memory in any way. The struct page is just an alternative
way to specify the physical address of the BAR page.

I think this boils down to one call to setup the entire BAR, like nvme
does, and then using the struct page in the p2p_map SGL??

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 12:45:46PM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2019-01-30 12:06 p.m., Jason Gunthorpe wrote:
> >> Way less problems than not having struct page for doing anything
> >> non-trivial.  If you map the BAR to userspace with remap_pfn_range
> >> and friends the mapping is indeed very simple.  But any operation
> >> that expects a page structure, which is at least everything using
> >> get_user_pages won't work.
> > 
> > GUP doesn't work anyhow today, and won't work with BAR struct pages in
> > the forseeable future (Logan has sent attempts on this before).
> 
> I don't recall ever attempting that... But patching GUP for special
> pages or VMAS; or working around by not calling it in some cases seems
> like the thing that's going to need to be done one way or another.

Remember, the long discussion we had about how to get the IOMEM
annotation into SGL? That is a necessary pre-condition to doing
anything with GUP in DMA using drivers as GUP -> SGL -> DMA map is
pretty much the standard flow.
 
> > Jerome made the HMM mirror API use this flow, so afer his patch to
> > switch the ODP MR to use HMM, and to switch GPU drivers, it will work
> > for those cases. Which is more than the zero cases than we have today
> > :)
> 
> But we're getting the same bait and switch here... If you are using HMM
> you are using struct pages, but we're told we need this special VMA hack
> for cases that don't use HMM and thus don't have struct pages...

Well, I don't know much about HMM, but the HMM mirror API looks like a
version of MMU notifiers that offloads a bunch of dreck to the HMM
code core instead of drivers. The RDMA code got hundreds of lines
shorter by using it.

Some of that dreck is obtaining a DMA address for the user VMAs,
including using multiple paths to get them. A driver using HMM mirror
doesn't seem to call GUP at all, HMM mirror handles that, along with
various special cases, including calling out to these new VMA ops.

I don't really know how mirror relates to other parts of HMM, like the
bits that use struct pages. Maybe it also knows about more special
cases created by other parts of HMM?

So, I see Jerome solving the GUP problem by replacing GUP entirely
using an API that is more suited to what these sorts of drivers
actually need.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Logan Gunthorpe



On 2019-01-30 12:59 p.m., Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 12:45:46PM -0700, Logan Gunthorpe wrote:
>>
>>
>> On 2019-01-30 12:06 p.m., Jason Gunthorpe wrote:
 Way less problems than not having struct page for doing anything
 non-trivial.  If you map the BAR to userspace with remap_pfn_range
 and friends the mapping is indeed very simple.  But any operation
 that expects a page structure, which is at least everything using
 get_user_pages won't work.
>>>
>>> GUP doesn't work anyhow today, and won't work with BAR struct pages in
>>> the forseeable future (Logan has sent attempts on this before).
>>
>> I don't recall ever attempting that... But patching GUP for special
>> pages or VMAS; or working around by not calling it in some cases seems
>> like the thing that's going to need to be done one way or another.
> 
> Remember, the long discussion we had about how to get the IOMEM
> annotation into SGL? That is a necessary pre-condition to doing
> anything with GUP in DMA using drivers as GUP -> SGL -> DMA map is
> pretty much the standard flow.

Yes, but that was unrelated to GUP even if that might have been the
eventual direction.

And I feel the GUP->SGL->DMA flow should still be what we are aiming
for. Even if we need a special GUP for special pages, and a special DMA
map; and the SGL still has to be homogenous

> So, I see Jerome solving the GUP problem by replacing GUP entirely
> using an API that is more suited to what these sorts of drivers
> actually need.

Yes, this is what I'm expecting and what I want. Not bypassing the whole
thing by doing special things with VMAs.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 02/20] iommu/mediatek: Use a struct as the platform data

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:56 PM Yong Wu  wrote:
>
> Use a struct as the platform special data instead of the enumeration.
> This is a prepare patch for adding mt8183 iommu support.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Matthias Brugger 

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 03/20] memory: mtk-smi: Use a general config_port interface

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:57 PM Yong Wu  wrote:
>
> The config_port of mt2712 and mt8183 are the same. Use a general
> config_port interface instead.
>
> In addition, in mt2712, larb8 and larb9 are the bdpsys larbs which
> are not the normal larb, their register space are different from the
> normal one. thus, we can not call the general config_port. In mt8183,
> IPU0/1 and CCU connect with smi-common directly, they also are not
> the normal larb. Hence, we add a "larb_direct_to_common_mask" for these
> larbs which connect to smi-commmon directly.
>
> This is also a preparing patch for adding mt8183 SMI support.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Matthias Brugger 

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 01/20] dt-bindings: mediatek: Add binding for mt8183 IOMMU and SMI

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:56 PM Yong Wu  wrote:
>
> This patch adds decriptions for mt8183 IOMMU and SMI.
>
> mt8183 has only one M4U like mt8173 and is also MTK IOMMU gen2 which
> uses ARM Short-Descriptor translation table format.
>
> The mt8183 M4U-SMI HW diagram is as below:
>
>   EMI
>|
>   M4U
>|
>--
>||
>gals0-rx   gals1-rx
>||
>||
>gals0-tx   gals1-tx
>||
>   
>SMI Common
>   
>|
>   +-+-++-+-+---+---+
>   | | || | |   |   |
>   | |  gals-rx  gals-rx  |   gals-rx gals-rx gals-rx
>   | | || | |   |   |
>   | | || | |   |   |
>   | |  gals-tx  gals-tx  |   gals-tx gals-tx gals-tx
>   | | || | |   |   |
> larb0 larb1  IPU0IPU1  larb4  larb5  larb6CCU
> disp  vdec   img camvenc   imgcam

It might be cool to put the gals in the picture in the bindings. Not a
big deal though.

>
> All the connections are HW fixed, SW can NOT adjust it.
>
> Compared with mt8173, we add a GALS(Global Async Local Sync) module
> between SMI-common and M4U, and additional GALS between larb2/3/5/6
> and SMI-common. GALS can help synchronize for the modules in different
> clock frequency, it can be seen as a "asynchronous fifo".
>
> GALS can only help transfer the command/data while it doesn't have
> the configuring register, thus it has the special "smi" clock and it
> doesn't have the "apb" clock. From the diagram above, we add "gals0"
> and "gals1" clocks for smi-common and add a "gals" clock for smi-larb.
>
> From the diagram above, IPU0/IPU1(Image Processor Unit) and CCU(Camera
> Control Unit) is connected with smi-common directly, we can take them
> as "larb2", "larb3" and "larb7", and their register spaces are
> different with the normal larb.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Rob Herring 
> ---
>  .../devicetree/bindings/iommu/mediatek,iommu.txt   |  15 ++-
>  .../memory-controllers/mediatek,smi-common.txt |  11 +-
>  .../memory-controllers/mediatek,smi-larb.txt   |   3 +
>  include/dt-bindings/memory/mt8183-larb-port.h  | 130 
> +
>  4 files changed, 153 insertions(+), 6 deletions(-)
>  create mode 100644 include/dt-bindings/memory/mt8183-larb-port.h
>
> diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt 
> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> index 6922db5..6e758996 100644
> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> @@ -36,6 +36,10 @@ each local arbiter.
>  like display, video decode, and camera. And there are different ports
>  in each larb. Take a example, There are many ports like MC, PP, VLD in the
>  video decode local arbiter, all these ports are according to the video HW.
> +  In some SoCs, there may be a GALS(Global Async Local Sync) module between
> +smi-common and m4u, and additional GALS module between smi-larb and
> +smi-common. GALS can been seen as a "asynchronous fifo" which could help
> +synchronize for the modules in different clock frequency.
>
>  Required properties:
>  - compatible : must be one of the following string:
> @@ -44,18 +48,23 @@ Required properties:
> "mediatek,mt7623-m4u", "mediatek,mt2701-m4u" for mt7623 which uses
>  generation one m4u HW.
> "mediatek,mt8173-m4u" for mt8173 which uses generation two m4u HW.
> +   "mediatek,mt8183-m4u" for mt8183 which uses generation two m4u HW.
>  - reg : m4u register base and size.
>  - interrupts : the interrupt of m4u.
>  - clocks : must contain one entry for each clock-names.
> -- clock-names : must be "bclk", It is the block clock of m4u.
> +- clock-names : Only 1 optional clock:
> +  - "bclk": the block clock of m4u.
> +  Note that m4u use the EMI clock which always has been enabled before kernel
> +  if there is no this "bclk".

Ideally bclk could be specified a little more crisply, as this is
actually required for some SoCs and not used at all on others (as in
patch 7).

>  - mediatek,larbs : List of phandle to the local arbiters in the current Socs.
> Refer to bindings/memory-controllers/mediatek,smi-larb.txt. It must 
> sort
> according to the local arbiter index, like larb0, larb1, larb2...
>  - iommu-cells : must be 1. This is the mtk_m4u_id according to the HW.
> Specifies the mtk_m4u_id as defined in
> dt-binding/memory/mt2701-larb-port.h for mt2701, mt7623
> 

Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 04:45:25PM -0500, Jerome Glisse wrote:
> On Wed, Jan 30, 2019 at 08:50:00PM +, Jason Gunthorpe wrote:
> > On Wed, Jan 30, 2019 at 03:43:32PM -0500, Jerome Glisse wrote:
> > > On Wed, Jan 30, 2019 at 08:11:19PM +, Jason Gunthorpe wrote:
> > > > On Wed, Jan 30, 2019 at 01:00:02PM -0700, Logan Gunthorpe wrote:
> > > > 
> > > > > We never changed SGLs. We still use them to pass p2pdma pages, only we
> > > > > need to be a bit careful where we send the entire SGL. I see no reason
> > > > > why we can't continue to be careful once their in userspace if there's
> > > > > something in GUP to deny them.
> > > > > 
> > > > > It would be nice to have heterogeneous SGLs and it is something we
> > > > > should work toward but in practice they aren't really necessary at the
> > > > > moment.
> > > > 
> > > > RDMA generally cannot cope well with an API that requires homogeneous
> > > > SGLs.. User space can construct complex MRs (particularly with the
> > > > proposed SGL MR flow) and we must marshal that into a single SGL or
> > > > the drivers fall apart.
> > > > 
> > > > Jerome explained that GPU is worse, a single VMA may have a random mix
> > > > of CPU or device pages..
> > > > 
> > > > This is a pretty big blocker that would have to somehow be fixed.
> > > 
> > > Note that HMM takes care of that RDMA ODP with my ODP to HMM patch,
> > > so what you get for an ODP umem is just a list of dma address you
> > > can program your device to. The aim is to avoid the driver to care
> > > about that. The access policy when the UMEM object is created by
> > > userspace through verbs API should however ascertain that for mmap
> > > of device file it is only creating a UMEM that is fully covered by
> > > one and only one vma. GPU device driver will have one vma per logical
> > > GPU object. I expect other kind of device do that same so that they
> > > can match a vma to a unique object in their driver.
> > 
> > A one VMA rule is not really workable.
> > 
> > With ODP VMA boundaries can move around across the lifetime of the MR
> > and we have no obvious way to fail anything if userpace puts a VMA
> > boundary in the middle of an existing ODP MR address range.
> 
> This is true only for vma that are not mmap of a device file. This is
> what i was trying to get accross. An mmap of a file is never merge
> so it can only get split/butcher by munmap/mremap but when that happen
> you also need to reflect the virtual address space change to the
> device ie any access to a now invalid range must trigger error.

Why is it invalid? The address range still has valid process memory?

What is the problem in the HMM mirror that it needs this restriction?

There is also the situation where we create an ODP MR that spans 0 ->
U64_MAX in the process address space. In this case there are lots of
different VMAs it covers and we expect it to fully track all changes
to all VMAs.

So we have to spin up dedicated umem_odps that carefully span single
VMAs, and somehow track changes to VMA ?

mlx5 odp does some of this already.. But yikes, this needs some pretty
careful testing in all these situations.

> > I think the HMM mirror API really needs to deal with this for the
> > driver somehow.
> 
> Yes the HMM does deal with this for you, you do not have to worry about
> it. Sorry if that was not clear. I just wanted to stress that vma that
> are mmap of a file do not behave like other vma hence when you create
> the UMEM you can check for those if you feel the need.

What properties do we get from HMM mirror? Will it tell us when to
create more umems to cover VMA seams or will it just cause undesired
no-mapped failures in some cases?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 05:30:27PM -0500, Jerome Glisse wrote:

> > What is the problem in the HMM mirror that it needs this restriction?
> 
> No restriction at all here. I think i just wasn't understood.

Are you are talking about from the exporting side - where the thing
creating the VMA can really only put one distinct object into it?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/20] iommu/mediatek: Add bclk can be supported optionally

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:57 PM Yong Wu  wrote:
>
> In some SoCs, M4U doesn't have its "bclk", it will use the EMI
> clock instead which has always been enabled when entering kernel.
>
> This also is a preparing patch for mt8183.
>
> Signed-off-by: Yong Wu 

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 04/20] memory: mtk-smi: Use a struct for the platform data for smi-common

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:57 PM Yong Wu  wrote:
>
> Use a struct as the platform special data instead of the enumeration.
>
> Also there is a minor change that moving the position of
> "enum mtk_smi_gen" definition, this is because we expect define
> "struct mtk_smi_common_plat" before it is referred.
>
> This is a preparing patch for mt8183.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Matthias Brugger 

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/20] iommu/mediatek: Add mt8183 IOMMU support

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
>
> The M4U IP blocks in mt8183 is MediaTek's generation2 M4U which use
> the ARM Short-descriptor like mt8173, and most of the HW registers
> are the same.
>
> Here list main differences between mt8183 and mt8173/mt2712:
> 1) mt8183 has only one M4U HW like mt8173 while mt2712 has two.
> 2) mt8183 don't have the "bclk" clock, it use the EMI clock instead.
> 3) mt8183 can support the dram over 4GB, but it doesn't call this "4GB
> mode".
> 4) mt8183 pgtable base register(0x0) extend bit[1:0] which represent
> the bit[33:32] in the physical address of the pgtable base, But the
> standard ttbr0[1] means the S bit which is enabled defaultly, Hence,
> we add a mask.
> 5) mt8183 HW has a GALS modules, SMI should enable "has_gals" support.
> 6) mt8183 need reset_axi like mt8173.
> 7) the larb-id in smi-common is remapped. M4U should add its larbid_remap.
>
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 15 ---
>  drivers/iommu/mtk_iommu.h |  1 +
>  drivers/memory/mtk-smi.c  | 20 
>  3 files changed, 33 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2913ddb..66e3615 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -36,6 +36,7 @@
>  #include "mtk_iommu.h"
>
>  #define REG_MMU_PT_BASE_ADDR   0x000
> +#define MMU_PT_ADDR_MASK   GENMASK(31, 7)
>
>  #define REG_MMU_INVALIDATE 0x020
>  #define F_ALL_INVLD0x2
> @@ -342,7 +343,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
> *domain,
> /* Update the pgtable base address register of the M4U HW */
> if (!data->m4u_dom) {
> data->m4u_dom = dom;
> -   writel(dom->cfg.arm_v7s_cfg.ttbr[0],
> +   writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,

So there aren't any other bits down below 7 that you need, like the
shareable bits?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 06/20] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:57 PM Yong Wu  wrote:
>
> MediaTek extend the arm v7s descriptor to support the dram over 4GB.
>
> In the mt2712 and mt8173, it's called "4GB mode", the physical address
> is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> is remapped to high address from 0x1__ to 0x1__, the
> bit32 is always enabled. thus, in the M4U, we always enable the bit9
> for all PTEs which means to enable bit32 of physical address.

I got a little lost here. I get that you're trying to explain why you
always used to set bit32 of the physical address. But I don't totally
get the part about physical addresses being from 0x4000_ -
0x1_3fff_, but also from 0x1__-0x1__. Are you
saying that the physical addresses from the iommu's perspective were
always >0x1__? But then from whose perspective is it
0x4000_? ... oh, or you're saying there was some sort of remapping
facility that moved the physical addresses around?

>
> but in mt8183, M4U support the dram from 0x4000_ to 0x3__
> which isn't remaped. We extend the PTEs: the bit9 represent bit32 of
> PA and the bit4 represent bit33 of PA. Meanwhile the iova still is
> 32bits.
>
> In order to unify code, in the "4GB mode", we add the bit32 for the
> physical address manually in our driver.
>
> Correspondingly, Adding bit32 and bit33 for the PA in the iova_to_phys
> has to been moved into v7s.
>
> Regarding whether the pagetable address could be over 4GB, the mt8183
> support it while the previous mt8173 don't. thus keep it as is.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Robin Murphy 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 31 ---
>  drivers/iommu/io-pgtable.h |  7 +++
>  drivers/iommu/mtk_iommu.c  | 14 --
>  drivers/iommu/mtk_iommu.h  |  1 +
>  4 files changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index 11d8505..8803a35 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -124,7 +124,9 @@
>  #define ARM_V7S_TEX_MASK   0x7
>  #define ARM_V7S_ATTR_TEX(val)  (((val) & ARM_V7S_TEX_MASK) << 
> ARM_V7S_TEX_SHIFT)
>
> -#define ARM_V7S_ATTR_MTK_4GB   BIT(9) /* MTK extend it for 4GB mode 
> */
> +/* MediaTek extend the two bits below for over 4GB mode */
> +#define ARM_V7S_ATTR_MTK_PA_BIT32  BIT(9)
> +#define ARM_V7S_ATTR_MTK_PA_BIT33  BIT(4)

If other vendors start doing stuff like this we'll need a more generic
way to handle this... but I guess until we see a pattern this is okay.

>
>  /* *well, except for TEX on level 2 large pages, of course :( */
>  #define ARM_V7S_CONT_PAGE_TEX_SHIFT6
> @@ -183,13 +185,22 @@ static dma_addr_t __arm_v7s_dma_addr(void *pages)
>  static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
> struct io_pgtable_cfg *cfg)
>  {
> -   return paddr & ARM_V7S_LVL_MASK(lvl);
> +   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
> +
> +   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB) {
> +   if (paddr & BIT_ULL(32))
> +   pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
> +   if (paddr & BIT_ULL(33))
> +   pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
> +   }
> +   return pte;
>  }
>
>  static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl,
>   struct io_pgtable_cfg *cfg)
>  {
> arm_v7s_iopte mask;
> +   phys_addr_t paddr;
>
> if (ARM_V7S_PTE_IS_TABLE(pte, lvl))
> mask = ARM_V7S_TABLE_MASK;
> @@ -198,7 +209,14 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int 
> lvl,
> else
> mask = ARM_V7S_LVL_MASK(lvl);
>
> -   return pte & mask;
> +   paddr = pte & mask;
> +   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB) {
> +   if (pte & ARM_V7S_ATTR_MTK_PA_BIT32)
> +   paddr |= BIT_ULL(32);
> +   if (pte & ARM_V7S_ATTR_MTK_PA_BIT33)
> +   paddr |= BIT_ULL(33);
> +   }
> +   return paddr;
>  }
>
>  static arm_v7s_iopte *iopte_deref(arm_v7s_iopte pte, int lvl,
> @@ -315,9 +333,6 @@ static arm_v7s_iopte arm_v7s_prot_to_pte(int prot, int 
> lvl,
> if (lvl == 1 && (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS))
> pte |= ARM_V7S_ATTR_NS_SECTION;
>
> -   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB)
> -   pte |= ARM_V7S_ATTR_MTK_4GB;
> -

So despite getting lost in the details, I guess the reason it's okay
that this goes from unconditional to conditional on bit32 is that
before, with the older chips, all physical addresses were above 4GB,
so we'll always see PA's above 4GB?

> return pte;
>  }
>
> @@ -504,7 +519,9 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
> unsigned long iova,
> 

Re: [PATCH v5 08/20] iommu/mediatek: Add larb-id remapped support

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:58 PM Yong Wu  wrote:
>
> The larb-id may be remapped in the smi-common, this means the
> larb-id reported in the mtk_iommu_isr isn't the real larb-id,
>
> Take mt8183 as a example:
>M4U
> |
> -
> |   SMI common  |
> -0-7-5-6-1-2--3-4- <- Id remapped
>  | | | | | |  | |
> larb0 larb1 IPU0  IPU1 larb4 larb5  larb6  CCU
> disp  vdec  img   cam   venc  imgcam
> As above, larb0 connects with the id 0 in smi-common.
>   larb1 connects with the id 7 in smi-common.
>   ...
> If the larb-id reported in the isr is 7, actually it's larb1(vdec).
> In order to output the right larb-id in the isr, we add a larb-id
> remapping relationship in this patch.
>
> If there is no this larb-id remapping in some SoCs, use the linear
> mapping array instead.
>
> This also is a preparing patch for mt8183.
>
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 4 
>  drivers/iommu/mtk_iommu.h | 2 ++
>  2 files changed, 6 insertions(+)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 847082c..eca1536 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -220,6 +220,8 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
> fault_larb = F_MMU0_INT_ID_LARB_ID(regval);
> fault_port = F_MMU0_INT_ID_PORT_ID(regval);
>
> +   fault_larb = data->plat_data->larbid_remap[fault_larb];
> +
> if (report_iommu_fault(>domain, data->dev, fault_iova,
>write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) 
> {
> dev_err_ratelimited(
> @@ -742,12 +744,14 @@ static int __maybe_unused mtk_iommu_resume(struct 
> device *dev)
> .m4u_plat = M4U_MT2712,
> .has_4gb_mode = true,
> .has_bclk = true,
> +   .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
>  };
>
>  static const struct mtk_iommu_plat_data mt8173_data = {
> .m4u_plat = M4U_MT8173,
> .has_4gb_mode = true,
> .has_bclk = true,
> +   .larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */

I briefly considered bikeshedding about how to define these arrays in
a way that might save memory for linear-map devices, but then decided
this is fine.

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 10/20] iommu/mediatek: Move reset_axi into plat_data

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:58 PM Yong Wu  wrote:
>
> In mt8173 and mt8183, 0x48 is REG_MMU_STANDARD_AXI_MODE while
> it is extended to REG_MMU_CTRL which contains _STANDARD_AXI_MODE in
> the other SoCs. I move this property to plat_data since both mt8173
> and mt8183 use this property.
>
> It is a preparing patch for mt8183.
>
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 4 ++--
>  drivers/iommu/mtk_iommu.h | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 35a1263..8d8ab21 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -558,8 +558,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
> *data)
> }
> writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
>
> -   /* It's MISC control register whose default value is ok except 
> mt8173.*/
> -   if (data->plat_data->m4u_plat == M4U_MT8173)
> +   if (data->plat_data->reset_axi)
> writel_relaxed(0, data->base + REG_MMU_STANDARD_AXI_MODE);

The commit description makes it sound like the overall format of the
register is the same, but the "other SoCs" have some extra bits they'd
like to leave alone. Would it be easier to do a read-modify-write to
always clear some bits in the register, instead of doing something
based on the SoC? Or do the bits mean completely different things in
the different versions (in which case what you've got makes sense to
me)?
-Evan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 11/20] iommu/mediatek: Move vld_pa_rng into plat_data

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:58 PM Yong Wu  wrote:
>
> Both mt8173 and mt8183 don't have this vld_pa_rng(valid physical address
> range) register while mt2712 have. Move it into the plat_data.
>
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 3 ++-
>  drivers/iommu/mtk_iommu.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 8d8ab21..2913ddb 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -548,7 +548,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
> *data)
>  upper_32_bits(data->protect_base);
> writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR);
>
> -   if (data->enable_4GB && data->plat_data->m4u_plat != M4U_MT8173) {
> +   if (data->enable_4GB && data->plat_data->vld_pa_rng) {
> /*
>  * If 4GB mode is enabled, the validate PA range is from
>  * 0x1__ to 0x1__. here record bit[32:30].
> @@ -741,6 +741,7 @@ static int __maybe_unused mtk_iommu_resume(struct device 
> *dev)
> .m4u_plat = M4U_MT2712,
> .has_4gb_mode = true,
> .has_bclk = true,
> +   .vld_pa_rng   = true,
> .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
>  };
>
> diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> index b46aeaa..a8c5d1e 100644
> --- a/drivers/iommu/mtk_iommu.h
> +++ b/drivers/iommu/mtk_iommu.h
> @@ -48,6 +48,7 @@ struct mtk_iommu_plat_data {
> /* HW will use the EMI clock if there isn't the "bclk". */
> boolhas_bclk;
> boolreset_axi;
> +   boolvld_pa_rng;

I agree with Nicolas that valid_pa_range would be much clearer...
although, even now that I know what it's supposed to mean, I don't get
what it represents. What is this saying?

-Evan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 05/20] iommu/io-pgtable-arm-v7s: Add paddr_to_iopte and iopte_to_paddr helpers

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:57 PM Yong Wu  wrote:
>
> Add two helper functions: paddr_to_iopte and iopte_to_paddr.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Robin Murphy 

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 12/20] memory: mtk-smi: Add gals support

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:58 PM Yong Wu  wrote:
>
> In some SoCs like mt8183, SMI add GALS(Global Async Local Sync) module
> which can help synchronize for the modules in different clock frequency.
> It can be seen as a "asynchronous fifo". This is a example diagram:
>
> M4U
>  |
>  --
>  ||
>  gals0-rx   gals1-rx
>  ||
>  ||
>  gals0-tx   gals1-tx
>  ||
> 
>  SMI Common
> 
>  |
>   +-++-+- ...
>   | || |
>   |  gals-rx  gals-rx  |
>   | || |
>   | || |
>   |  gals-tx  gals-tx  |
>   | || |
> larb1 larb2   larb3  larb4
>
> GALS only help transfer the command/data while it doesn't have the
> configuring register, thus it has the special "smi" clock and doesn't
> have the "apb" clock. From the diagram above, we add "gals0" and
> "gals1" clocks for smi-common and add a "gals" clock for smi-larb.
>
> This patch adds gals clock supporting in the SMI. Note that some larbs
> may still don't have the "gals" clock like larb1 and larb4 above.
>
> This is also a preparing patch for mt8183 which has GALS.
>
> Signed-off-by: Yong Wu 

So really from a software perspective the gals are just a couple of
extra clocks that need to be turned on for certain larbs. Seems fine
to me.

Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/20] iommu/mediatek: Refine protect memory definition

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:58 PM Yong Wu  wrote:
>
> The protect memory setting is a little different in the different SoCs.
> In the register REG_MMU_CTRL_REG(0x110), the TF_PROT(translation fault
> protect) shift bit is normally 4 while it shift 5 bits only in the
> mt8173. This patch delete the complex MACRO and use a common if-else
> instead.
>
> Also, use "F_MMU_TF_PROT_TO_PROGRAM_ADDR" instead of the hard code(2)
> which means the M4U will output the dirty data to the programmed
> address that we allocated dynamically when translation fault occurs.
>
> Signed-off-by: Yong Wu 
> ---
> @Nicalos, I don't put it in the plat_data since only the previous mt8173
> shift 5. As I know, the latest SoC always use the new setting like mt2712
> and mt8183. Thus, I think it is unnecessary to put it in plat_data and
> let all the latest SoC set it. Hence, I still keep "== mt8173" for this
> like the reg REG_MMU_CTRL_REG.
> ---
>  drivers/iommu/mtk_iommu.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index eca1536..35a1263 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -53,11 +53,7 @@
>
>  #define REG_MMU_CTRL_REG   0x110
>  #define F_MMU_PREFETCH_RT_REPLACE_MOD  BIT(4)
> -#define F_MMU_TF_PROTECT_SEL_SHIFT(data) \
> -   ((data)->plat_data->m4u_plat == M4U_MT2712 ? 4 : 5)
> -/* It's named by F_MMU_TF_PROT_SEL in mt2712. */
> -#define F_MMU_TF_PROTECT_SEL(prot, data) \
> -   (((prot) & 0x3) << F_MMU_TF_PROTECT_SEL_SHIFT(data))
> +#define F_MMU_TF_PROT_TO_PROGRAM_ADDR  2
>
>  #define REG_MMU_IVRP_PADDR 0x114
>
> @@ -521,9 +517,11 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
> *data)
> return ret;
> }
>
> -   regval = F_MMU_TF_PROTECT_SEL(2, data);
> if (data->plat_data->m4u_plat == M4U_MT8173)
> -   regval |= F_MMU_PREFETCH_RT_REPLACE_MOD;
> +   regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
> + (F_MMU_TF_PROT_TO_PROGRAM_ADDR << 5);
> +   else
> +   regval = F_MMU_TF_PROT_TO_PROGRAM_ADDR << 4;

I agree with Nicolas with regard to not having the random 4 and 5
sprinkled in the function.
-Evan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 14/20] iommu/mediatek: Add mmu1 support

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
>
> Normally the M4U HW connect EMI with smi. the diagram is like below:
>   EMI
>|
>   M4U
>|
> smi-common
>|
>-
>||| |...
> larb0 larb1  larb2 larb3
>
> Actually there are 2 mmu cells in the M4U HW, like this diagram:
>
>   EMI
>-
> | |
>mmu0  mmu1 <- M4U
> | |
>-
>|
> smi-common
>|
>-
>||| |...
> larb0 larb1  larb2 larb3
>
> This patch add support for mmu1. In order to get better performance,
> we could adjust some larbs go to mmu1 while the others still go to
> mmu0. This is controlled by a SMI COMMON register SMI_BUS_SEL(0x220).
>
> mt2712, mt8173 and mt8183 M4U HW all have 2 mmu cells. the default
> value of that register is 0 which means all the larbs go to mmu0
> defaultly.
>
> This is a preparing patch for adjusting SMI_BUS_SEL for mt8183.
>
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 47 
> +--
>  1 file changed, 29 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 66e3615..7fcef19 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -70,27 +70,32 @@
>  #define F_MISS_FIFO_ERR_INT_EN BIT(6)
>  #define F_INT_CLR_BIT  BIT(12)
>
> -#define REG_MMU_INT_MAIN_CONTROL   0x124
> -#define F_INT_TRANSLATION_FAULTBIT(0)
> -#define F_INT_MAIN_MULTI_HIT_FAULT BIT(1)
> -#define F_INT_INVALID_PA_FAULT BIT(2)
> -#define F_INT_ENTRY_REPLACEMENT_FAULT  BIT(3)
> -#define F_INT_TLB_MISS_FAULT   BIT(4)
> -#define F_INT_MISS_TRANSACTION_FIFO_FAULT  BIT(5)
> -#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT   BIT(6)
> +#define REG_MMU_INT_MAIN_CONTROL   0x124 /* mmu0 | mmu1 */

The comment being on that line is kind of weird, since the comment
really applies to the lines below it. Maybe the comment should be on
its own line, or on the TRANSLATION_FAULT line.

Other than that,
Reviewed-by: Evan Green 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 17/20] memory: mtk-smi: Get rid of need_larbid

2019-01-30 Thread Evan Green
On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
>
> The "mediatek,larb-id" has already been parsed in MTK IOMMU driver.
> It's no need to parse it again in SMI driver. Only clean some codes.
> This patch is fit for all the current mt2701, mt2712, mt7623, mt8173
> and mt8183.
>
> After this patch, the "mediatek,larb-id" only be needed for mt2712
> which have 2 M4Us. In the other SoCs, we can get the larb-id from M4U
> in which the larbs in the "mediatek,larbs" always are ordered.
>
> CC: Matthias Brugger 
> Signed-off-by: Yong Wu 
> ---
>  drivers/memory/mtk-smi.c | 26 ++
>  1 file changed, 2 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> index 08cf40d..10e6493 100644
> --- a/drivers/memory/mtk-smi.c
> +++ b/drivers/memory/mtk-smi.c
> @@ -67,7 +67,6 @@ struct mtk_smi_common_plat {
>  };
>
>  struct mtk_smi_larb_gen {
> -   bool need_larbid;
> int port_in_larb[MTK_LARB_NR_MAX + 1];
> void (*config_port)(struct device *);
> unsigned int larb_direct_to_common_mask;
> @@ -153,18 +152,9 @@ void mtk_smi_larb_put(struct device *larbdev)
> struct mtk_smi_iommu *smi_iommu = data;
> unsigned int i;
>
> -   if (larb->larb_gen->need_larbid) {
> -   larb->mmu = _iommu->larb_imu[larb->larbid].mmu;
> -   return 0;
> -   }
> -
> -   /*
> -* If there is no larbid property, Loop to find the corresponding
> -* iommu information.
> -*/
> -   for (i = 0; i < smi_iommu->larb_nr; i++) {
> +   for (i = 0; i < MTK_LARB_NR_MAX; i++) {

Looks like this was the only use of mtk_smi_iommu.larb_nr. Should we
remove that now?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 12:52:44PM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2019-01-30 12:22 p.m., Jerome Glisse wrote:
> > On Wed, Jan 30, 2019 at 06:56:59PM +, Jason Gunthorpe wrote:
> >> On Wed, Jan 30, 2019 at 10:17:27AM -0700, Logan Gunthorpe wrote:
> >>>
> >>>
> >>> On 2019-01-29 9:18 p.m., Jason Gunthorpe wrote:
>  Every attempt to give BAR memory to struct page has run into major
>  trouble, IMHO, so I like that this approach avoids that.
> 
>  And if you don't have struct page then the only kernel object left to
>  hang meta data off is the VMA itself.
> 
>  It seems very similar to the existing P2P work between in-kernel
>  consumers, just that VMA is now mediating a general user space driven
>  discovery process instead of being hard wired into a driver.
> >>>
> >>> But the kernel now has P2P bars backed by struct pages and it works
> >>> well. 
> >>
> >> I don't think it works that well..
> >>
> >> We ended up with a 'sgl' that is not really a sgl, and doesn't work
> >> with many of the common SGL patterns. sg_copy_buffer doesn't work,
> >> dma_map, doesn't work, sg_page doesn't work quite right, etc.
> >>
> >> Only nvme and rdma got the special hacks to make them understand these
> >> p2p-sgls, and I'm still not convinced some of the RDMA drivers that
> >> want access to CPU addresses from the SGL (rxe, usnic, hfi, qib) don't
> >> break in this scenario.
> >>
> >> Since the SGLs become broken, it pretty much means there is no path to
> >> make GUP work generically, we have to go through and make everything
> >> safe to use with p2p-sgls before allowing GUP. Which, frankly, sounds
> >> impossible with all the competing objections.
> >>
> >> But GPU seems to have a problem unrelated to this - what Jerome wants
> >> is to have two faulting domains for VMA's - visible-to-cpu and
> >> visible-to-dma. The new op is essentially faulting the pages into the
> >> visible-to-dma category and leaving them invisible-to-cpu.
> >>
> >> So that duality would still have to exists, and I think p2p_map/unmap
> >> is a much simpler implementation than trying to create some kind of
> >> special PTE in the VMA..
> >>
> >> At least for RDMA, struct page or not doesn't really matter. 
> >>
> >> We can make struct pages for the BAR the same way NVMe does.  GPU is
> >> probably the same, just with more mememory at stake?  
> >>
> >> And maybe this should be the first implementation. The p2p_map VMA
> >> operation should return a SGL and the caller should do the existing
> >> pci_p2pdma_map_sg() flow.. 
> > 
> > For GPU it would not work, GPU might want to use main memory (because
> > it is running out of BAR space) it is a lot easier if the p2p_map
> > callback calls the right dma map function (for page or io) rather than
> > having to define some format that would pass down the information.
> 
> >>
> >> Worry about optimizing away the struct page overhead later?
> > 
> > Struct page do not fit well for GPU as the BAR address can be reprogram
> > to point to any page inside the device memory (think 256M BAR versus
> > 16GB device memory). Forcing struct page on GPU driver would require
> > major surgery to the GPU driver inner working and there is no benefit
> > to have from the struct page. So it is hard to justify this.
> 
> I think we have to consider the struct pages to track the address space,
> not what backs it (essentially what HMM is doing). If we need to add
> operations for the driver to map the address space/struct pages back to
> physical memory then do that. Creating a whole new idea that's tied to
> userspace VMAs still seems wrong to me.

VMA is the object RDMA works on, GPU driver have been working with
VMA too, where VMA is tie to only one specific GPU object. So the most
disrupting approach here is using struct page. It was never use and
will not be use in many driver. Updating those to struct page is too
risky and too much changes. The vma call back is something you can
remove at any time if you have something better that do not need major
surgery to GPU driver.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5 v5] Fix virtio-blk issue with SWIOTLB

2019-01-30 Thread Michael S. Tsirkin
On Wed, Jan 30, 2019 at 01:35:13PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 30, 2019 at 05:40:02PM +0100, Joerg Roedel wrote:
> > Hi,
> > 
> > here is the next version of this patch-set. Previous
> > versions can be found here:
> > 
> > V1: https://lore.kernel.org/lkml/20190110134433.15672-1-j...@8bytes.org/
> > 
> > V2: https://lore.kernel.org/lkml/20190115132257.6426-1-j...@8bytes.org/
> > 
> > V3: https://lore.kernel.org/lkml/20190123163049.24863-1-j...@8bytes.org/
> > 
> > V4: https://lore.kernel.org/lkml/20190129084342.26030-1-j...@8bytes.org/
> > 
> > The problem solved here is a limitation of the SWIOTLB implementation,
> > which does not support allocations larger than 256kb.  When the
> > virtio-blk driver tries to read/write a block larger than that, the
> > allocation of the dma-handle fails and an IO error is reported.
> > 
> > Changes to v4 are:
> > 
> > - Added Reviewed-by tags from Christoph
> > 
> > - Added missing EXPORT_SYMBOL(_GPL) lines
> > 
> > Please review.
> 
> I can put it in my tree and send it to Linus .. unless folks want
> to do it through a different tree?

I queued it in my tree as it seems virtio specific.

> 
> > 
> > Thanks,
> > 
> > Joerg
> > Joerg Roedel (5):
> >   swiotlb: Introduce swiotlb_max_mapping_size()
> >   swiotlb: Add is_swiotlb_active() function
> >   dma: Introduce dma_max_mapping_size()
> >   virtio: Introduce virtio_max_dma_size()
> >   virtio-blk: Consider virtio_max_dma_size() for maximum segment size
> > 
> >  Documentation/DMA-API.txt|  8 
> >  drivers/block/virtio_blk.c   | 10 ++
> >  drivers/virtio/virtio_ring.c | 11 +++
> >  include/linux/dma-mapping.h  | 16 
> >  include/linux/swiotlb.h  | 11 +++
> >  include/linux/virtio.h   |  2 ++
> >  kernel/dma/direct.c  | 12 
> >  kernel/dma/swiotlb.c | 14 ++
> >  8 files changed, 80 insertions(+), 4 deletions(-)
> > 
> > -- 
> > 2.17.1
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jerome Glisse
On Wed, Jan 30, 2019 at 10:51:55PM +, Jason Gunthorpe wrote:
> On Wed, Jan 30, 2019 at 05:47:05PM -0500, Jerome Glisse wrote:
> > On Wed, Jan 30, 2019 at 10:33:04PM +, Jason Gunthorpe wrote:
> > > On Wed, Jan 30, 2019 at 05:30:27PM -0500, Jerome Glisse wrote:
> > > 
> > > > > What is the problem in the HMM mirror that it needs this restriction?
> > > > 
> > > > No restriction at all here. I think i just wasn't understood.
> > > 
> > > Are you are talking about from the exporting side - where the thing
> > > creating the VMA can really only put one distinct object into it?
> > 
> > The message i was trying to get accross is that HMM mirror will
> > always succeed for everything* except for special vma ie mmap of
> > device file. For those it can only succeed if a p2p_map() call
> > succeed.
> > 
> > So any user of HMM mirror might to know why the mirroring fail ie
> > was it because something exceptional is happening ? Or is it because
> > i was trying to map a special vma which can be forbiden.
> > 
> > Hence why i assume that you might want to know about such p2p_map
> > failure at the time you create the umem odp object as it might be
> > some failure you might want to report differently and handle
> > differently. If you do not care about differentiating OOM or
> > exceptional failure from p2p_map failure than you have nothing to
> > worry about you will get the same error from HMM for both.
> 
> I think my hope here was that we could have some kind of 'trial'
> interface where very eary users can call
> 'hmm_mirror_is_maybe_supported(dev, user_ptr, len)' and get a failure
> indication.
> 
> We probably wouldn't call this on the full address space though

Yes we can do special wrapper around the general case that allow
caller to differentiate failure. So at creation you call the
special flavor and get proper distinction between error. Afterward
during normal operation any failure is just treated in a same way
no matter what is the reasons (munmap, mremap, mprotect, ...).


> Beyond that it is just inevitable there can be problems faulting if
> the memory map is messed with after MR is created.
> 
> And here again, I don't want to worry about any particular VMA
> boundaries..

You do not have to worry about boundaries HMM will return -EFAULT
if there is no valid vma behind the address you are trying to map
(or if the vma prot does not allow you to access it). So then you
can handle that failure just like you do now and as my ODP HMM
patch preserve.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Jason Gunthorpe
On Wed, Jan 30, 2019 at 03:52:13PM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2019-01-30 2:50 p.m., Jason Gunthorpe wrote:
> > On Wed, Jan 30, 2019 at 02:01:35PM -0700, Logan Gunthorpe wrote:
> > 
> >> And I feel the GUP->SGL->DMA flow should still be what we are aiming
> >> for. Even if we need a special GUP for special pages, and a special DMA
> >> map; and the SGL still has to be homogenous
> > 
> > *shrug* so what if the special GUP called a VMA op instead of
> > traversing the VMA PTEs today? Why does it really matter? It could
> > easily change to a struct page flow tomorrow..
> 
> Well it's so that it's composable. We want the SGL->DMA side to work for
> APIs from kernel space and not have to run a completely different flow
> for kernel drivers than from userspace memory.

If we want to have these DMA-only SGLs anyhow, then the kernel flow
can use them too.

In the kernel it easier because the 'exporter' already knows it is
working with BAR memory, so it can just do something like this:

struct dma_sg_table *sgl_dma_map_pci_bar(struct pci_device *from,
 struct device *to,
 unsigned long bar_ptr,
 size_t length)

And then it falls down the same DMA-SGL-only kind of flow that would
exist to support the user side. ie it is the kernel version of the API
I described below.

> For GUP to do a special VMA traversal it would now need to return
> something besides struct pages which means no SGL and it means a
> completely different DMA mapping call.

GUP cannot support BAR memory because it must only return CPU memory -
I think too many callers make this assumption for it to be possible to
change it.. (see below)

A new-GUP can return DMA addresses - so it doesn't have this problem.

> > Would you feel better if this also came along with a:
> > 
> >   struct dma_sg_table *sgl_dma_map_user(struct device *dma_device, 
> >  void __user *prt, size_t len)
> 
> That seems like a nice API. But certainly the implementation would need
> to use existing dma_map or pci_p2pdma_map calls, or whatever as part of
> it...

I wonder how Jerome worked the translation, I haven't looked yet..

> We actually stopped caring about the __iomem problem. We are working
> under the assumption that pages returned by devm_memremap_pages() can be
> accessed as normal RAM and does not need the __iomem designation.

As far as CPU mapped uncached BAR memory goes, this is broadly not
true.

s390x for instance requires dedicated CPU instructions to access BAR
memory.

x86 will fault if you attempt to use a SSE algorithm on uncached BAR
memory. The kernel has many SSE accelerated algorithsm so you can
never pass these special p2p SGL's through to them either. (think
parity or encryption transformations through the block stack)

Many platforms have limitations on alignment and access size for BAR
memory - you can't just blindly call memcpy and expect it to work.
(TPM is actually struggling with this now, confusingly different
versions of memcpy are giving different results on some x86 io memory)

Other platforms might fault or corrupt if an unaligned access is
attempted to BAR memory.

In short - I don't see a way to avoid knowing about __iomem in the
sgl. There are too many real use cases that require this knowledge,
and too many places that touch the SGL pages with the CPU.

I think we must have 'DMA only' SGLs and code paths that are known
DMA-only clean to make it work properly with BAR memory.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 10/20] iommu/mediatek: Move reset_axi into plat_data

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 10:30 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:58 PM Yong Wu  wrote:
> >
> > In mt8173 and mt8183, 0x48 is REG_MMU_STANDARD_AXI_MODE while
> > it is extended to REG_MMU_CTRL which contains _STANDARD_AXI_MODE in
> > the other SoCs. I move this property to plat_data since both mt8173
> > and mt8183 use this property.
> >
> > It is a preparing patch for mt8183.
> >
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/iommu/mtk_iommu.c | 4 ++--
> >  drivers/iommu/mtk_iommu.h | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 35a1263..8d8ab21 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -558,8 +558,7 @@ static int mtk_iommu_hw_init(const struct 
> > mtk_iommu_data *data)
> > }
> > writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
> >
> > -   /* It's MISC control register whose default value is ok except 
> > mt8173.*/
> > -   if (data->plat_data->m4u_plat == M4U_MT8173)
> > +   if (data->plat_data->reset_axi)
> > writel_relaxed(0, data->base + REG_MMU_STANDARD_AXI_MODE);
> 
> The commit description makes it sound like the overall format of the
> register is the same, but the "other SoCs" have some extra bits they'd
> like to leave alone. Would it be easier to do a read-modify-write to
> always clear some bits in the register, instead of doing something
> based on the SoC? Or do the bits mean completely different things in
> the different versions (in which case what you've got makes sense to
> me)?

The bits mean completely is different.(the axi bit position also is
different. I will add this in the comment of this patch.)

> -Evan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 17/20] memory: mtk-smi: Get rid of need_larbid

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 11:11 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
> >
> > The "mediatek,larb-id" has already been parsed in MTK IOMMU driver.
> > It's no need to parse it again in SMI driver. Only clean some codes.
> > This patch is fit for all the current mt2701, mt2712, mt7623, mt8173
> > and mt8183.
> >
> > After this patch, the "mediatek,larb-id" only be needed for mt2712
> > which have 2 M4Us. In the other SoCs, we can get the larb-id from M4U
> > in which the larbs in the "mediatek,larbs" always are ordered.
> >
> > CC: Matthias Brugger 
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/memory/mtk-smi.c | 26 ++
> >  1 file changed, 2 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> > index 08cf40d..10e6493 100644
> > --- a/drivers/memory/mtk-smi.c
> > +++ b/drivers/memory/mtk-smi.c
> > @@ -67,7 +67,6 @@ struct mtk_smi_common_plat {
> >  };
> >
> >  struct mtk_smi_larb_gen {
> > -   bool need_larbid;
> > int port_in_larb[MTK_LARB_NR_MAX + 1];
> > void (*config_port)(struct device *);
> > unsigned int larb_direct_to_common_mask;
> > @@ -153,18 +152,9 @@ void mtk_smi_larb_put(struct device *larbdev)
> > struct mtk_smi_iommu *smi_iommu = data;
> > unsigned int i;
> >
> > -   if (larb->larb_gen->need_larbid) {
> > -   larb->mmu = _iommu->larb_imu[larb->larbid].mmu;
> > -   return 0;
> > -   }
> > -
> > -   /*
> > -* If there is no larbid property, Loop to find the corresponding
> > -* iommu information.
> > -*/
> > -   for (i = 0; i < smi_iommu->larb_nr; i++) {
> > +   for (i = 0; i < MTK_LARB_NR_MAX; i++) {
> 
> Looks like this was the only use of mtk_smi_iommu.larb_nr. Should we
> remove that now?

This is necessary since the mt2712 which has two M4U HW.

>From its dtsi[1], take iommu1 as a example, its larb_nr only is 3, but
we need scan all the larb.

[1]
http://lists.infradead.org/pipermail/linux-mediatek/2018-December/016119.html

> 
> ___
> Linux-mediatek mailing list
> linux-media...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 14/20] iommu/mediatek: Add mmu1 support

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 10:55 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
> >
> > Normally the M4U HW connect EMI with smi. the diagram is like below:
> >   EMI
> >|
> >   M4U
> >|
> > smi-common
> >|
> >-
> >||| |...
> > larb0 larb1  larb2 larb3
> >
> > Actually there are 2 mmu cells in the M4U HW, like this diagram:
> >
> >   EMI
> >-
> > | |
> >mmu0  mmu1 <- M4U
> > | |
> >-
> >|
> > smi-common
> >|
> >-
> >||| |...
> > larb0 larb1  larb2 larb3
> >
> > This patch add support for mmu1. In order to get better performance,
> > we could adjust some larbs go to mmu1 while the others still go to
> > mmu0. This is controlled by a SMI COMMON register SMI_BUS_SEL(0x220).
> >
> > mt2712, mt8173 and mt8183 M4U HW all have 2 mmu cells. the default
> > value of that register is 0 which means all the larbs go to mmu0
> > defaultly.
> >
> > This is a preparing patch for adjusting SMI_BUS_SEL for mt8183.
> >
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/iommu/mtk_iommu.c | 47 
> > +--
> >  1 file changed, 29 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 66e3615..7fcef19 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -70,27 +70,32 @@
> >  #define F_MISS_FIFO_ERR_INT_EN BIT(6)
> >  #define F_INT_CLR_BIT  BIT(12)
> >
> > -#define REG_MMU_INT_MAIN_CONTROL   0x124
> > -#define F_INT_TRANSLATION_FAULTBIT(0)
> > -#define F_INT_MAIN_MULTI_HIT_FAULT BIT(1)
> > -#define F_INT_INVALID_PA_FAULT BIT(2)
> > -#define F_INT_ENTRY_REPLACEMENT_FAULT  BIT(3)
> > -#define F_INT_TLB_MISS_FAULT   BIT(4)
> > -#define F_INT_MISS_TRANSACTION_FIFO_FAULT  BIT(5)
> > -#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT   BIT(6)
> > +#define REG_MMU_INT_MAIN_CONTROL   0x124 /* mmu0 | mmu1 */
> 
> The comment being on that line is kind of weird, since the comment
> really applies to the lines below it. Maybe the comment should be on
> its own line, or on the TRANSLATION_FAULT line.

Sharp eye. You are right, this comment applies the lines below.
But If I move it below, then the next line will be over 80 chars.

How about I add a "below:" like this: 
> +#define REG_MMU_INT_MAIN_CONTROL   0x124 /* below: mmu0 |
mmu1 */

> 
> Other than that,
> Reviewed-by: Evan Green 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 06/20] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 10:28 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:57 PM Yong Wu  wrote:
> >
> > MediaTek extend the arm v7s descriptor to support the dram over 4GB.
> >
> > In the mt2712 and mt8173, it's called "4GB mode", the physical address
> > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it
> > is remapped to high address from 0x1__ to 0x1__, the
> > bit32 is always enabled. thus, in the M4U, we always enable the bit9
> > for all PTEs which means to enable bit32 of physical address.
> 
> I got a little lost here. I get that you're trying to explain why you
> always used to set bit32 of the physical address. But I don't totally
> get the part about physical addresses being from 0x4000_ -
> 0x1_3fff_, but also from 0x1__-0x1__. Are you
> saying that the physical addresses from the iommu's perspective were
> always >0x1__? 

Yes. From the IOMMU's perspective, the Physical address is from
0x1__ to 0x1__.

> But then from whose perspective is it 0x4000_? ... 

I guess from SW point view. it is from 0x4000_ to 0x1_3fff_.

If 4GB mode is enabled, the memory property in dts like this:

memory@4000 {
device_type = "memory";
reg = <0 0x4000 0x0001 0x>;
};

> oh, or you're saying there was some sort of remapping
> facility that moved the physical addresses around?
> 
> >
> > but in mt8183, M4U support the dram from 0x4000_ to 0x3__
> > which isn't remaped. We extend the PTEs: the bit9 represent bit32 of
> > PA and the bit4 represent bit33 of PA. Meanwhile the iova still is
> > 32bits.
> >
> > In order to unify code, in the "4GB mode", we add the bit32 for the
> > physical address manually in our driver.
> >
> > Correspondingly, Adding bit32 and bit33 for the PA in the iova_to_phys
> > has to been moved into v7s.
> >
> > Regarding whether the pagetable address could be over 4GB, the mt8183
> > support it while the previous mt8173 don't. thus keep it as is.
> >
> > Signed-off-by: Yong Wu 
> > Reviewed-by: Robin Murphy 
> > ---
> >  drivers/iommu/io-pgtable-arm-v7s.c | 31 ---
> >  drivers/iommu/io-pgtable.h |  7 +++
> >  drivers/iommu/mtk_iommu.c  | 14 --
> >  drivers/iommu/mtk_iommu.h  |  1 +
> >  4 files changed, 36 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 11d8505..8803a35 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -124,7 +124,9 @@
> >  #define ARM_V7S_TEX_MASK   0x7
> >  #define ARM_V7S_ATTR_TEX(val)  (((val) & ARM_V7S_TEX_MASK) << 
> > ARM_V7S_TEX_SHIFT)
> >
> > -#define ARM_V7S_ATTR_MTK_4GB   BIT(9) /* MTK extend it for 4GB 
> > mode */
> > +/* MediaTek extend the two bits below for over 4GB mode */
> > +#define ARM_V7S_ATTR_MTK_PA_BIT32  BIT(9)
> > +#define ARM_V7S_ATTR_MTK_PA_BIT33  BIT(4)
> 
> If other vendors start doing stuff like this we'll need a more generic
> way to handle this... but I guess until we see a pattern this is okay.
> 
> >
> >  /* *well, except for TEX on level 2 large pages, of course :( */
> >  #define ARM_V7S_CONT_PAGE_TEX_SHIFT6
> > @@ -183,13 +185,22 @@ static dma_addr_t __arm_v7s_dma_addr(void *pages)
> >  static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
> > struct io_pgtable_cfg *cfg)
> >  {
> > -   return paddr & ARM_V7S_LVL_MASK(lvl);
> > +   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
> > +
> > +   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB) {
> > +   if (paddr & BIT_ULL(32))
> > +   pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
> > +   if (paddr & BIT_ULL(33))
> > +   pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
> > +   }
> > +   return pte;
> >  }
> >
> >  static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl,
> >   struct io_pgtable_cfg *cfg)
> >  {
> > arm_v7s_iopte mask;
> > +   phys_addr_t paddr;
> >
> > if (ARM_V7S_PTE_IS_TABLE(pte, lvl))
> > mask = ARM_V7S_TABLE_MASK;
> > @@ -198,7 +209,14 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, 
> > int lvl,
> > else
> > mask = ARM_V7S_LVL_MASK(lvl);
> >
> > -   return pte & mask;
> > +   paddr = pte & mask;
> > +   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_4GB) {
> > +   if (pte & ARM_V7S_ATTR_MTK_PA_BIT32)
> > +   paddr |= BIT_ULL(32);
> > +   if (pte & ARM_V7S_ATTR_MTK_PA_BIT33)
> > +   paddr |= BIT_ULL(33);
> > +   }
> > +   return paddr;
> >  }
> >
> >  static arm_v7s_iopte *iopte_deref(arm_v7s_iopte pte, int lvl,
> > @@ -315,9 +333,6 @@ static arm_v7s_iopte 

[PATCHv2 0/9] Use vm_insert_range and vm_insert_range_buggy

2019-01-30 Thread Souptick Joarder
Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.

As this pattern is common across different drivers, it can
be generalized by creating new functions and use it across
the drivers.

vm_insert_range() is the API which could be used to mapped
kernel memory/pages in drivers which has considered vm_pgoff

vm_insert_range_buggy() is the API which could be used to map
range of kernel memory/pages in drivers which has not considered
vm_pgoff. vm_pgoff is passed default as 0 for those drivers.

We _could_ then at a later "fix" these drivers which are using
vm_insert_range_buggy() to behave according to the normal vm_pgoff
offsetting simply by removing the _buggy suffix on the function
name and if that causes regressions, it gives us an easy way to revert.

v1 -> v2:
Few Reviewed-by.

Updated the change log in [8/9]

In [7/9], vm_pgoff is treated in V4L2 API as a 'cookie'
to select a buffer, not as a in-buffer offset by design
and it always want to mmap a whole buffer from its beginning.
Added additional changes after discussing with Marek and
vm_insert_range could be used instead of vm_insert_range_buggy.

Souptick Joarder (9):
  mm: Introduce new vm_insert_range and vm_insert_range_buggy API
  arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range
  drivers/firewire/core-iso.c: Convert to use vm_insert_range_buggy
  drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range
  drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range
  iommu/dma-iommu.c: Convert to use vm_insert_range
  videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range
  xen/gntdev.c: Convert to use vm_insert_range
  xen/privcmd-buf.c: Convert to use vm_insert_range_buggy

 arch/arm/mm/dma-mapping.c  | 22 ++
 drivers/firewire/core-iso.c| 15 +---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c| 17 +
 drivers/gpu/drm/xen/xen_drm_front_gem.c| 18 ++---
 drivers/iommu/dma-iommu.c  | 12 +---
 drivers/media/common/videobuf2/videobuf2-core.c|  7 ++
 .../media/common/videobuf2/videobuf2-dma-contig.c  |  6 --
 drivers/media/common/videobuf2/videobuf2-dma-sg.c  | 22 ++
 drivers/xen/gntdev.c   | 16 ++---
 drivers/xen/privcmd-buf.c  |  8 +--
 include/linux/mm.h |  4 ++
 mm/memory.c| 81 ++
 mm/nommu.c | 14 
 13 files changed, 136 insertions(+), 106 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 15/20] memory: mtk-smi: Invoke pm runtime_callback to enable clocks

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 11:05 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
> >
> > This patch only move the clk_prepare_enable and config_port into the
> > runtime suspend/resume callback. It doesn't change the code content
> > and sequence.
> >
> > This is a preparing patch for adjusting SMI_BUS_SEL for mt8183.
> > (SMI_BUS_SEL need to be restored after smi-common resume every time.)
> > Also it gives a chance to get rid of mtk_smi_larb_get/put which could
> > be a next topic.
> >
> > CC: Matthias Brugger 
> > Signed-off-by: Yong Wu 
> 
> I believe this refactoring is a no-op as described, because the order is 
> still:
> 1) mtk_smi_clk_enable(common)
> 2) mtk_smi_clk_enable(larb)
> 3) larb_gen->config_port()
> 
> And teardown still happens in the opposite order, except for

Thanks your confirm.

> config_port, which they seem not to do in suspend.

After suspend, the HW engine should not work. config_port is not
helpful. At that time, use the HW default value.

> So, looks good to me.
> 
> Reviewed-by: Evan Green 

Thanks.

> 
> > ---
> >  drivers/memory/mtk-smi.c | 113 
> > ++-
> >  1 file changed, 72 insertions(+), 41 deletions(-)
> >
> > diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> > index a430721..9790801 100644
> > --- a/drivers/memory/mtk-smi.c
> > +++ b/drivers/memory/mtk-smi.c
> > @@ -86,17 +86,13 @@ struct mtk_smi_larb { /* larb: local arbiter */
> > u32 *mmu;
> >  };
> >
> > -static int mtk_smi_enable(const struct mtk_smi *smi)
> > +static int mtk_smi_clk_enable(const struct mtk_smi *smi)
> >  {
> > int ret;
> >
> > -   ret = pm_runtime_get_sync(smi->dev);
> > -   if (ret < 0)
> > -   return ret;
> > -
> > ret = clk_prepare_enable(smi->clk_apb);
> > if (ret)
> > -   goto err_put_pm;
> > +   return ret;
> >
> > ret = clk_prepare_enable(smi->clk_smi);
> > if (ret)
> > @@ -118,59 +114,28 @@ static int mtk_smi_enable(const struct mtk_smi *smi)
> > clk_disable_unprepare(smi->clk_smi);
> >  err_disable_apb:
> > clk_disable_unprepare(smi->clk_apb);
> > -err_put_pm:
> > -   pm_runtime_put_sync(smi->dev);
> > return ret;
> >  }
> >
> > -static void mtk_smi_disable(const struct mtk_smi *smi)
> > +static void mtk_smi_clk_disable(const struct mtk_smi *smi)
> >  {
> > clk_disable_unprepare(smi->clk_gals1);
> > clk_disable_unprepare(smi->clk_gals0);
> > clk_disable_unprepare(smi->clk_smi);
> > clk_disable_unprepare(smi->clk_apb);
> > -   pm_runtime_put_sync(smi->dev);
> >  }
> >
> >  int mtk_smi_larb_get(struct device *larbdev)
> >  {
> > -   struct mtk_smi_larb *larb = dev_get_drvdata(larbdev);
> > -   const struct mtk_smi_larb_gen *larb_gen = larb->larb_gen;
> > -   struct mtk_smi *common = dev_get_drvdata(larb->smi_common_dev);
> > -   int ret;
> > +   int ret = pm_runtime_get_sync(larbdev);
> >
> > -   /* Enable the smi-common's power and clocks */
> > -   ret = mtk_smi_enable(common);
> > -   if (ret)
> > -   return ret;
> > -
> > -   /* Enable the larb's power and clocks */
> > -   ret = mtk_smi_enable(>smi);
> > -   if (ret) {
> > -   mtk_smi_disable(common);
> > -   return ret;
> > -   }
> > -
> > -   /* Configure the iommu info for this larb */
> > -   larb_gen->config_port(larbdev);
> > -
> > -   return 0;
> > +   return (ret < 0) ? ret : 0;
> >  }
> >  EXPORT_SYMBOL_GPL(mtk_smi_larb_get);
> >
> >  void mtk_smi_larb_put(struct device *larbdev)
> >  {
> > -   struct mtk_smi_larb *larb = dev_get_drvdata(larbdev);
> > -   struct mtk_smi *common = dev_get_drvdata(larb->smi_common_dev);
> > -
> > -   /*
> > -* Don't de-configure the iommu info for this larb since there may 
> > be
> > -* several modules in this larb.
> > -* The iommu info will be reset after power off.
> > -*/
> > -
> > -   mtk_smi_disable(>smi);
> > -   mtk_smi_disable(common);
> > +   pm_runtime_put_sync(larbdev);
> >  }
> >  EXPORT_SYMBOL_GPL(mtk_smi_larb_put);
> >
> > @@ -385,12 +350,52 @@ static int mtk_smi_larb_remove(struct platform_device 
> > *pdev)
> > return 0;
> >  }
> >
> > +static int __maybe_unused mtk_smi_larb_resume(struct device *dev)
> > +{
> > +   struct mtk_smi_larb *larb = dev_get_drvdata(dev);
> > +   const struct mtk_smi_larb_gen *larb_gen = larb->larb_gen;
> > +   int ret;
> > +
> > +   /* Power on smi-common. */
> > +   ret = pm_runtime_get_sync(larb->smi_common_dev);
> > +   if (ret < 0) {
> > +   dev_err(dev, "Failed to pm get for smi-common(%d).\n", ret);
> > +   return ret;
> > +   }
> > +
> > +   ret = mtk_smi_clk_enable(>smi);
> > +   if (ret < 0) {
> > +   dev_err(dev, "Failed to enable 

[PATCHv2 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2019-01-30 Thread Souptick Joarder
Convert to use vm_insert_range() to map range of kernel
memory to user vma.

Signed-off-by: Souptick Joarder 
---
 drivers/iommu/dma-iommu.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d19f3d6..bdf14b87 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -620,17 +620,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
 
 int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct 
*vma)
 {
-   unsigned long uaddr = vma->vm_start;
-   unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   int ret = -ENXIO;
-
-   for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
-   ret = vm_insert_page(vma, uaddr, pages[i]);
-   if (ret)
-   break;
-   uaddr += PAGE_SIZE;
-   }
-   return ret;
+   return vm_insert_range(vma, pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 13/20] iommu/mediatek: Add mt8183 IOMMU support

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 10:31 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
> >
> > The M4U IP blocks in mt8183 is MediaTek's generation2 M4U which use
> > the ARM Short-descriptor like mt8173, and most of the HW registers
> > are the same.
> >
> > Here list main differences between mt8183 and mt8173/mt2712:
> > 1) mt8183 has only one M4U HW like mt8173 while mt2712 has two.
> > 2) mt8183 don't have the "bclk" clock, it use the EMI clock instead.
> > 3) mt8183 can support the dram over 4GB, but it doesn't call this "4GB
> > mode".
> > 4) mt8183 pgtable base register(0x0) extend bit[1:0] which represent
> > the bit[33:32] in the physical address of the pgtable base, But the
> > standard ttbr0[1] means the S bit which is enabled defaultly, Hence,
> > we add a mask.
> > 5) mt8183 HW has a GALS modules, SMI should enable "has_gals" support.
> > 6) mt8183 need reset_axi like mt8173.
> > 7) the larb-id in smi-common is remapped. M4U should add its larbid_remap.
> >
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/iommu/mtk_iommu.c | 15 ---
> >  drivers/iommu/mtk_iommu.h |  1 +
> >  drivers/memory/mtk-smi.c  | 20 
> >  3 files changed, 33 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 2913ddb..66e3615 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -36,6 +36,7 @@
> >  #include "mtk_iommu.h"
> >
> >  #define REG_MMU_PT_BASE_ADDR   0x000
> > +#define MMU_PT_ADDR_MASK   GENMASK(31, 7)
> >
> >  #define REG_MMU_INVALIDATE 0x020
> >  #define F_ALL_INVLD0x2
> > @@ -342,7 +343,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
> > *domain,
> > /* Update the pgtable base address register of the M4U HW */
> > if (!data->m4u_dom) {
> > data->m4u_dom = dom;
> > -   writel(dom->cfg.arm_v7s_cfg.ttbr[0],
> > +   writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> 
> So there aren't any other bits down below 7 that you need, like the
> shareable bits?

Yes. We don't need all the bits down below 7. As the comment 4) above,
we mask it just because the S bit.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 16/20] memory: mtk-smi: Add bus_sel for mt8183

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 11:07 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:59 PM Yong Wu  wrote:
> >
> > There are 2 mmu cells in a M4U HW. we could adjust some larbs entering
> > mmu0 or mmu1 to balance the bandwidth via the smi-common register
> > SMI_BUS_SEL(0x220)(Each larb occupy 2 bits).
> >
> > In mt8183, For better performance, we switch larb1/2/5/7 to enter
> > mmu1 while the others still keep enter mmu0.
> >
> > In mt8173 and mt2712, we don't get the performance issue,
> > Keep its default value(0x0), that means all the larbs enter mmu0.
> >
> > Note: smi gen1(mt2701/mt7623) don't have this bus_sel.
> >
> > CC: Matthias Brugger 
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/memory/mtk-smi.c | 22 --
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> > index 9790801..08cf40d 100644
> > --- a/drivers/memory/mtk-smi.c
> > +++ b/drivers/memory/mtk-smi.c
> > @@ -49,6 +49,12 @@
> >  #define SMI_LARB_NONSEC_CON(id)(0x380 + ((id) * 4))
> >  #define F_MMU_EN   BIT(0)
> >
> > +/* SMI COMMON */
> > +#define SMI_BUS_SEL0x220
> > +#define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1)
> > +/* All are MMU0 defaultly. Only specialize mmu1 here. */
> > +#define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid))
> > +
> >  enum mtk_smi_gen {
> > MTK_SMI_GEN1,
> > MTK_SMI_GEN2
> > @@ -57,6 +63,7 @@ enum mtk_smi_gen {
> >  struct mtk_smi_common_plat {
> > enum mtk_smi_gen gen;
> > bool has_gals;
> > +   u32  bus_sel; /* Balance some larbs to enter mmu0 or 
> > mmu1 */
> >  };
> >
> >  struct mtk_smi_larb_gen {
> > @@ -72,8 +79,8 @@ struct mtk_smi {
> > struct clk  *clk_apb, *clk_smi;
> > struct clk  *clk_gals0, *clk_gals1;
> > struct clk  *clk_async; /*only needed by 
> > mt2701*/
> > -   void __iomem*smi_ao_base;
> > -
> > +   void __iomem*smi_ao_base; /* only for gen1 */
> > +   void __iomem*base;/* only for gen2 */
> > const struct mtk_smi_common_plat *plat;
> >  };
> >
> > @@ -410,6 +417,8 @@ static int __maybe_unused mtk_smi_larb_suspend(struct 
> > device *dev)
> >  static const struct mtk_smi_common_plat mtk_smi_common_mt8183 = {
> > .gen  = MTK_SMI_GEN2,
> > .has_gals = true,
> > +   .bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(5) |
> > +   F_MMU1_LARB(7),
> >  };
> >
> >  static const struct of_device_id mtk_smi_common_of_ids[] = {
> > @@ -482,6 +491,11 @@ static int mtk_smi_common_probe(struct platform_device 
> > *pdev)
> > ret = clk_prepare_enable(common->clk_async);
> > if (ret)
> > return ret;
> > +   } else {
> > +   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +   common->base = devm_ioremap_resource(dev, res);
> > +   if (IS_ERR(common->base))
> > +   return PTR_ERR(common->base);
> 
> So you split base and smi_ao_base because they're completely different
> register regions, or because ->base is no longer "always on"? It's
> tempting to recombine them because they appear to be mutually
> exclusive, but if they're truly different register regions then I
> understand.

They are completely different. the common->base is the smi-common normal
base while the common->smi_ao_base only exist in smi-gen1.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 01/20] dt-bindings: mediatek: Add binding for mt8183 IOMMU and SMI

2019-01-30 Thread Yong Wu
Hi Evan,

Thanks very much for reviewing this patchset.

On Wed, 2019-01-30 at 10:27 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:56 PM Yong Wu  wrote:
> >
> > This patch adds decriptions for mt8183 IOMMU and SMI.
> >
> > mt8183 has only one M4U like mt8173 and is also MTK IOMMU gen2 which
> > uses ARM Short-Descriptor translation table format.
> >
> > The mt8183 M4U-SMI HW diagram is as below:
> >
> >   EMI
> >|
> >   M4U
> >|
> >--
> >||
> >gals0-rx   gals1-rx
> >||
> >||
> >gals0-tx   gals1-tx
> >||
> >   
> >SMI Common
> >   
> >|
> >   +-+-++-+-+---+---+
> >   | | || | |   |   |
> >   | |  gals-rx  gals-rx  |   gals-rx gals-rx gals-rx
> >   | | || | |   |   |
> >   | | || | |   |   |
> >   | |  gals-tx  gals-tx  |   gals-tx gals-tx gals-tx
> >   | | || | |   |   |
> > larb0 larb1  IPU0IPU1  larb4  larb5  larb6CCU
> > disp  vdec   img camvenc   imgcam
> 
> It might be cool to put the gals in the picture in the bindings. Not a
> big deal though.

OK. the picture in the binding should be more generic, I will try add
gals in the binding picture in next version.

> 
> >
> > All the connections are HW fixed, SW can NOT adjust it.
> >
> > Compared with mt8173, we add a GALS(Global Async Local Sync) module
> > between SMI-common and M4U, and additional GALS between larb2/3/5/6
> > and SMI-common. GALS can help synchronize for the modules in different
> > clock frequency, it can be seen as a "asynchronous fifo".
> >
> > GALS can only help transfer the command/data while it doesn't have
> > the configuring register, thus it has the special "smi" clock and it
> > doesn't have the "apb" clock. From the diagram above, we add "gals0"
> > and "gals1" clocks for smi-common and add a "gals" clock for smi-larb.
> >
> > From the diagram above, IPU0/IPU1(Image Processor Unit) and CCU(Camera
> > Control Unit) is connected with smi-common directly, we can take them
> > as "larb2", "larb3" and "larb7", and their register spaces are
> > different with the normal larb.
> >
> > Signed-off-by: Yong Wu 
> > Reviewed-by: Rob Herring 
> > ---
> >  .../devicetree/bindings/iommu/mediatek,iommu.txt   |  15 ++-
> >  .../memory-controllers/mediatek,smi-common.txt |  11 +-
> >  .../memory-controllers/mediatek,smi-larb.txt   |   3 +
> >  include/dt-bindings/memory/mt8183-larb-port.h  | 130 
> > +
> >  4 files changed, 153 insertions(+), 6 deletions(-)
> >  create mode 100644 include/dt-bindings/memory/mt8183-larb-port.h
> >
> > diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt 
> > b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> > index 6922db5..6e758996 100644
> > --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> > +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> > @@ -36,6 +36,10 @@ each local arbiter.
> >  like display, video decode, and camera. And there are different ports
> >  in each larb. Take a example, There are many ports like MC, PP, VLD in the
> >  video decode local arbiter, all these ports are according to the video HW.
> > +  In some SoCs, there may be a GALS(Global Async Local Sync) module between
> > +smi-common and m4u, and additional GALS module between smi-larb and
> > +smi-common. GALS can been seen as a "asynchronous fifo" which could help
> > +synchronize for the modules in different clock frequency.
> >
> >  Required properties:
> >  - compatible : must be one of the following string:
> > @@ -44,18 +48,23 @@ Required properties:
> > "mediatek,mt7623-m4u", "mediatek,mt2701-m4u" for mt7623 which uses
> >  generation one m4u HW.
> > "mediatek,mt8173-m4u" for mt8173 which uses generation two m4u HW.
> > +   "mediatek,mt8183-m4u" for mt8183 which uses generation two m4u HW.
> >  - reg : m4u register base and size.
> >  - interrupts : the interrupt of m4u.
> >  - clocks : must contain one entry for each clock-names.
> > -- clock-names : must be "bclk", It is the block clock of m4u.
> > +- clock-names : Only 1 optional clock:
> > +  - "bclk": the block clock of m4u.
> > +  Note that m4u use the EMI clock which always has been enabled before 
> > kernel
> > +  if there is no this "bclk".
> 
> Ideally bclk could be specified a little more crisply, as this is
> actually required for some SoCs and not used at all on others (as in
> patch 7).

OK. 

Re: [PATCH v5 19/20] iommu/mediatek: Add shutdown callback

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 11:12 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 8:00 PM Yong Wu  wrote:
> >
> > In the reboot burning test, if some Multimedia HW has something wrong,
> > It may keep send the invalid request to IOMMU. In order to avoid
> > affect the reboot flow, we add the shutdown callback to disable
> > M4U HW when shutdown.
> 
> Sounds unpleasant. Hopefully the reboot flow still continues properly
> even in that case, since this shutdown code may not run during some
> rougher resets.

Thanks this hint. I will reword the comment avoid writing reboot.

> 
> Reviewed-by: Evan Green 

Thanks.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 11/20] iommu/mediatek: Move vld_pa_rng into plat_data

2019-01-30 Thread Yong Wu
On Wed, 2019-01-30 at 10:30 -0800, Evan Green wrote:
> On Mon, Dec 31, 2018 at 7:58 PM Yong Wu  wrote:
> >
> > Both mt8173 and mt8183 don't have this vld_pa_rng(valid physical address
> > range) register while mt2712 have. Move it into the plat_data.
> >
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/iommu/mtk_iommu.c | 3 ++-
> >  drivers/iommu/mtk_iommu.h | 1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 8d8ab21..2913ddb 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -548,7 +548,7 @@ static int mtk_iommu_hw_init(const struct 
> > mtk_iommu_data *data)
> >  upper_32_bits(data->protect_base);
> > writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR);
> >
> > -   if (data->enable_4GB && data->plat_data->m4u_plat != M4U_MT8173) {
> > +   if (data->enable_4GB && data->plat_data->vld_pa_rng) {
> > /*
> >  * If 4GB mode is enabled, the validate PA range is from
> >  * 0x1__ to 0x1__. here record bit[32:30].
> > @@ -741,6 +741,7 @@ static int __maybe_unused mtk_iommu_resume(struct 
> > device *dev)
> > .m4u_plat = M4U_MT2712,
> > .has_4gb_mode = true,
> > .has_bclk = true,
> > +   .vld_pa_rng   = true,
> > .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
> >  };
> >
> > diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> > index b46aeaa..a8c5d1e 100644
> > --- a/drivers/iommu/mtk_iommu.h
> > +++ b/drivers/iommu/mtk_iommu.h
> > @@ -48,6 +48,7 @@ struct mtk_iommu_plat_data {
> > /* HW will use the EMI clock if there isn't the "bclk". */
> > boolhas_bclk;
> > boolreset_axi;
> > +   boolvld_pa_rng;
> 
> I agree with Nicolas that valid_pa_range would be much clearer...
> although, even now that I know what it's supposed to mean, I don't get
> what it represents. What is this saying?

This register in the coda is called "vld_pa_rng".

How about I change it to "has_vld_pa_rng"?. In the comment above, I have
explained the meaning(valid physical address range).

> 
> -Evan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv2 1/9] mm: Introduce new vm_insert_range and vm_insert_range_buggy API

2019-01-30 Thread Souptick Joarder
Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.

As this pattern is common across different drivers, it can
be generalized by creating new functions and use it across
the drivers.

vm_insert_range() is the API which could be used to mapped
kernel memory/pages in drivers which has considered vm_pgoff

vm_insert_range_buggy() is the API which could be used to map
range of kernel memory/pages in drivers which has not considered
vm_pgoff. vm_pgoff is passed default as 0 for those drivers.

We _could_ then at a later "fix" these drivers which are using
vm_insert_range_buggy() to behave according to the normal vm_pgoff
offsetting simply by removing the _buggy suffix on the function
name and if that causes regressions, it gives us an easy way to revert.

Signed-off-by: Souptick Joarder 
Suggested-by: Russell King 
Suggested-by: Matthew Wilcox 
---
 include/linux/mm.h |  4 +++
 mm/memory.c| 81 ++
 mm/nommu.c | 14 ++
 3 files changed, 99 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80bb640..25752b0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2565,6 +2565,10 @@ unsigned long change_prot_numa(struct vm_area_struct 
*vma,
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
+int vm_insert_range(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num);
+int vm_insert_range_buggy(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num);
 vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
 vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index e11ca9d..0a4bf57 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1520,6 +1520,87 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
long addr,
 }
 EXPORT_SYMBOL(vm_insert_page);
 
+/**
+ * __vm_insert_range - insert range of kernel pages into user vma
+ * @vma: user vma to map to
+ * @pages: pointer to array of source kernel pages
+ * @num: number of pages in page array
+ * @offset: user's requested vm_pgoff
+ *
+ * This allows drivers to insert range of kernel pages they've allocated
+ * into a user vma.
+ *
+ * If we fail to insert any page into the vma, the function will return
+ * immediately leaving any previously inserted pages present.  Callers
+ * from the mmap handler may immediately return the error as their caller
+ * will destroy the vma, removing any successfully inserted pages. Other
+ * callers should make their own arrangements for calling unmap_region().
+ *
+ * Context: Process context.
+ * Return: 0 on success and error code otherwise.
+ */
+static int __vm_insert_range(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num, unsigned long offset)
+{
+   unsigned long count = vma_pages(vma);
+   unsigned long uaddr = vma->vm_start;
+   int ret, i;
+
+   /* Fail if the user requested offset is beyond the end of the object */
+   if (offset > num)
+   return -ENXIO;
+
+   /* Fail if the user requested size exceeds available object size */
+   if (count > num - offset)
+   return -ENXIO;
+
+   for (i = 0; i < count; i++) {
+   ret = vm_insert_page(vma, uaddr, pages[offset + i]);
+   if (ret < 0)
+   return ret;
+   uaddr += PAGE_SIZE;
+   }
+
+   return 0;
+}
+
+/**
+ * vm_insert_range - insert range of kernel pages starts with non zero offset
+ * @vma: user vma to map to
+ * @pages: pointer to array of source kernel pages
+ * @num: number of pages in page array
+ *
+ * Maps an object consisting of `num' `pages', catering for the user's
+ * requested vm_pgoff
+ *
+ * Context: Process context. Called by mmap handlers.
+ * Return: 0 on success and error code otherwise.
+ */
+int vm_insert_range(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num)
+{
+   return __vm_insert_range(vma, pages, num, vma->vm_pgoff);
+}
+EXPORT_SYMBOL(vm_insert_range);
+
+/**
+ * vm_insert_range_buggy - insert range of kernel pages starts with zero offset
+ * @vma: user vma to map to
+ * @pages: pointer to array of source kernel pages
+ * @num: number of pages in page array
+ *
+ * Maps a set of pages, always starting at page[0]
+ *
+ * Context: Process context. Called by mmap handlers.
+ * Return: 0 on success and error code otherwise.
+ */
+int vm_insert_range_buggy(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num)

Re: [RFC PATCH 1/5] pci/p2p: add a function to test peer to peer capability

2019-01-30 Thread Christian König

Am 29.01.19 um 21:24 schrieb Logan Gunthorpe:


On 2019-01-29 12:56 p.m., Alex Deucher wrote:

On Tue, Jan 29, 2019 at 12:47 PM  wrote:

From: Jérôme Glisse 

device_test_p2p() return true if two devices can peer to peer to
each other. We add a generic function as different inter-connect
can support peer to peer and we want to genericaly test this no
matter what the inter-connect might be. However this version only
support PCIE for now.


What about something like these patches:
https://cgit.freedesktop.org/~deathsimple/linux/commit/?h=p2p=4fab9ff69cb968183f717551441b475fabce6c1c
https://cgit.freedesktop.org/~deathsimple/linux/commit/?h=p2p=f90b12d41c277335d08c9dab62433f27c0fadbe5
They are a bit more thorough.

Those new functions seem to have a lot of overlap with the code that is
already upstream in p2pdma Perhaps you should be improving the
p2pdma functions if they aren't suitable for what you want already
instead of creating new ones.


Yeah, well that's what I was suggesting for the very beginning :)

But completely agree the existing functions should be improved instead 
of adding new ones,

Christian.



Logan
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

2019-01-30 Thread Koenig, Christian
Am 30.01.19 um 09:02 schrieb Christoph Hellwig:
> On Tue, Jan 29, 2019 at 08:58:35PM +, Jason Gunthorpe wrote:
>> On Tue, Jan 29, 2019 at 01:39:49PM -0700, Logan Gunthorpe wrote:
>>
>>> implement the mapping. And I don't think we should have 'special' vma's
>>> for this (though we may need something to ensure we don't get mapping
>>> requests mixed with different types of pages...).
>> I think Jerome explained the point here is to have a 'special vma'
>> rather than a 'special struct page' as, really, we don't need a
>> struct page at all to make this work.
>>
>> If I recall your earlier attempts at adding struct page for BAR
>> memory, it ran aground on issues related to O_DIRECT/sgls, etc, etc.
> Struct page is what makes O_DIRECT work, using sgls or biovecs, etc on
> it work.  Without struct page none of the above can work at all.  That
> is why we use struct page for backing BARs in the existing P2P code.
> Not that I'm a particular fan of creating struct page for this device
> memory, but without major invasive surgery to large parts of the kernel
> it is the only way to make it work.

The problem seems to be that struct page does two things:

1. Memory management for system memory.
2. The object to work with in the I/O layer.

This was done because a good part of that stuff overlaps, like reference 
counting how often a page is used.  The problem now is that this doesn't 
work very well for device memory in some cases.

For example on GPUs you usually have a large amount of memory which is 
not even accessible by the CPU. In other words you can't easily create a 
struct page for it because you can't reference it with a physical CPU 
address.

Maybe struct page should be split up into smaller structures? I mean 
it's really overloaded with data.

Christian.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu