Re: [PATCH 0/9] Use vm_insert_range

2018-11-20 Thread Souptick Joarder
On Thu, Nov 15, 2018 at 9:09 PM Souptick Joarder  wrote:
>
> Previouly drivers have their own way of mapping range of
> kernel pages/memory into user vma and this was done by
> invoking vm_insert_page() within a loop.
>
> As this pattern is common across different drivers, it can
> be generalized by creating a new function and use it across
> the drivers.
>
> vm_insert_range is the new API which will be used to map a
> range of kernel memory/pages to user vma.
>
> All the applicable places are converted to use new vm_insert_range
> in this patch series.
>
> Souptick Joarder (9):
>   mm: Introduce new vm_insert_range API
>   arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range
>   drivers/firewire/core-iso.c: Convert to use vm_insert_range
>   drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range
>   drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range
>   iommu/dma-iommu.c: Convert to use vm_insert_range
>   videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range
>   xen/gntdev.c: Convert to use vm_insert_range
>   xen/privcmd-buf.c: Convert to use vm_insert_range

Any further comment on driver changes ?
>
>  arch/arm/mm/dma-mapping.c | 21 ++---
>  drivers/firewire/core-iso.c   | 15 ++--
>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c   | 20 ++--
>  drivers/gpu/drm/xen/xen_drm_front_gem.c   | 20 +---
>  drivers/iommu/dma-iommu.c | 12 ++
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 23 ++-
>  drivers/xen/gntdev.c  | 11 -
>  drivers/xen/privcmd-buf.c |  8 ++-
>  include/linux/mm_types.h  |  3 +++
>  mm/memory.c   | 28 
> +++
>  mm/nommu.c|  7 ++
>  11 files changed, 70 insertions(+), 98 deletions(-)
>
> --
> 1.9.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-11-20 Thread Lu Baolu

Hi Joerg,

On 11/8/18 12:43 AM, j...@8bytes.org wrote:

Hi,

On Wed, Nov 07, 2018 at 11:40:30AM +0800, Lu Baolu wrote:

Hi Joerg,

On 11/7/18 12:25 AM, j...@8bytes.org wrote:
Nowadays, we can find PASID granular DMA translation on both ARM and x86
platforms. With PASID granular DMA translation supported in system iommu, if
a device divides itself into multiple subsets and tag the DMA
transfers of each subset with a unique PASID, each subset become
assignable. We call the assignable subset as an ADI (Assignable Device
Interface). As the result, each ADI could be attached with a domain.


Yeah, I know the background. The point is, the IOMMU-API as of today
implements a strict one-to-one relationship between domains and devices,
every device can only have one domain assigned at a given time. If we
assign a new domain to a device, the old gets unassigned.

If we allow to attach multiple domains to a single device we
fundamentally break that semantic.


Yes. In the latest v4 submission, we use the aux-domain specific APIs to
attach/detach the domain to a device. Do you see other APIs that will
possibly break this semantic?



Therefore I think it is better is support the ADI devices with
subdomains and a new set of functions in the API to handle only these
sub-domains.


Can you please elaborate a bit more about the concept of subdomains?
From my point of view, an aux-domain is a normal un-managed domain which
has a PASID and could be attached to any ADIs through the aux-domain
specific attach/detach APIs. It could also be attached to a device with
the existing domain_attach/detach_device() APIs at the same time, hence
mdev and pci devices in a vfio container could share a domain.

Best regards,
Lu Baolu




Further more, a single domain might be attached to an ADI of device A,
while attached to another legacy device B which doesn't support PASID
features. In this case, we say "Domain 4" is attached to ADI(PASID#x) in
aux mode and attached to device B in primary mode.


This will of course not work with subdomains, but is that really
necessary? VFIO already allocates a separate domain for each device
anyway (iirc), so it is unlikely that is uses the same domain for a
legacy and an ADI device.


Regards,

Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC] dma-direct: do not allocate a single page from CMA area

2018-11-20 Thread Nicolin Chen
On Tue, Nov 20, 2018 at 10:20:10AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 05, 2018 at 02:40:51PM -0800, Nicolin Chen wrote:
> > > > In general, this seems to make sense to me. It does represent a 
> > > > theoretical 
> > > > change in behaviour for devices which have their own CMA area somewhere 
> > > > other than kernel memory, and only ever make non-atomic allocations, 
> > > > but 
> > > > I'm not sure whether that's a realistic or common enough case to really 
> > > > worry about.
> > > 
> > > Yes, I think we should make the decision in dma_alloc_from_contiguous
> > > based on having a per-dev CMA area or not.  There is a lot of cruft in
> > 
> > It seems that cma_alloc() already has a CMA area check? Would it
> > be duplicated to have a similar one in dma_alloc_from_contiguous?
> 
> It isn't duplicate if it serves a different purpose.
> 
> > > this area that should be cleaned up while we're at it, like always
> > > falling back to the normal page allocator if there is no CMA area or
> > > nothing suitable found in dma_alloc_from_contiguous instead of
> > > having to duplicate all that in the caller.
> > 
> > Am I supposed to clean up things that's mentioned above by moving
> > the fallback allocator into dma_alloc_from_contiguous, or to just
> > move my change (the count check) into dma_alloc_from_contiguous?
> > 
> > I understand that'd be great to have a cleanup, yet feel it could
> > be done separately as this patch isn't really a cleanup change.
> 
> I can take care of any cleanups.  I've been trying to dust up that
> area anyway.

Thanks for the reply. It looks like it'd be better for me to wait
for the cleanup being done? I feel odd merely adding a size check
in the dma_alloc_from_contiguous().
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices

2018-11-20 Thread Rafael J. Wysocki
On Friday, November 16, 2018 11:57:38 AM CET Lorenzo Pieralisi wrote:
> On Thu, Nov 15, 2018 at 07:33:54PM +, mario.limoncie...@dell.com wrote:
> > 
> > 
> > > -Original Message-
> > > From: Mika Westerberg 
> > > Sent: Thursday, November 15, 2018 1:01 PM
> > > To: Lorenzo Pieralisi
> > > Cc: Lukas Wunner; iommu@lists.linux-foundation.org; Joerg Roedel; David
> > > Woodhouse; Lu Baolu; Ashok Raj; Bjorn Helgaas; Rafael J. Wysocki; Jacob 
> > > jun Pan;
> > > Andreas Noever; Michael Jamet; Yehezkel Bernat; Christian Kellner; 
> > > Limonciello,
> > > Mario; Anthony Wong; linux-a...@vger.kernel.org; 
> > > linux-...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org
> > > Subject: Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices
> > > 
> > > 
> > > [EXTERNAL EMAIL]
> > > 
> > > On Thu, Nov 15, 2018 at 05:46:08PM +, Lorenzo Pieralisi wrote:
> > > > Do you really need to parse it if the dev->is_thunderbolt check is 
> > > > enough ?
> > > 
> > > Yes, we need to parse it one way or another. dev->is_thunderbolt is
> > > based on heuristics which do not apply anymore when the thing gets
> > > integrated in the SoC.
> > > 
> > > The _DSD is there already (on existing systems) and is being used by
> > > Windows so I don't understand why we cannot take advantage of it? Every
> > > new system with Thunderbolt ports will have it.
> 
> We have different opinions on this, there is no point in me reiterating
> it over and over, I am against the approach taken to solve this problem
> first in defining the bindings outside the ACPI specifications and
> second by acquiescing to what has been done so that it will be done
> over and over again.

Arguably, however, we are on the receiving end of things here and even if
we don't use this binding, that won't buy us anything (but it will introduce
a fair amount of disappointment among both users and OEMs).

If Windows uses it, then systems will ship with it regardless of what Linux
does with it, so your "acquiescing to what has been done" argument leads to
nowhere in this particular case.  It's just a matter of whether or not
Linux will provide the same level of functionality as Windows with respect
to this and IMO it would be impractical to refuse to do that for purely
formal reasons.

> I will raise the point in the appropriate forum, it is up to Bjorn
> and Rafael to decide on this patch.

For the record, my opinion is that there's a little choice on whether or not
to use this extra information that firmware is willing to give us.  It could
be defined in a better way and so on, but since it is in use anyway, we really
have nothing to gain by refusing to use it.

Now, where the handling of it belongs to is a separate matter that should be
decided on its own.

Thanks,
Rafael


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 6/8] vfio/mdev: Add iommu place holders in mdev_device

2018-11-20 Thread Kirti Wankhede



On 11/16/2018 2:27 PM, Christoph Hellwig wrote:
> On Fri, Nov 16, 2018 at 09:20:48AM +0800, Lu Baolu wrote:
>>> Please keep symbols mdev_set/get_iommu_device(dev, iommu_device) non-GPL
>>> same as other exported symbols from mdev_core module.
>>
>> Yes. It will be fixed in the next version.
> 
> No.  mdev shall not be used to circumvent the exports in the generic
> vfio code.
> 

It is about how mdev framework can be used by existing drivers. These
symbols doesn't use any other exported symbols.

Thanks,
Kirti
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] SWIOTLB fixes for 4.20

2018-11-20 Thread John Stultz
On Tue, Nov 20, 2018 at 6:10 AM Robin Murphy  wrote:
>
> This is what I have so far, which at least resolves the most ovbious
> problems. I still haven't got very far with the USB corruption issue
> I see on Juno with -rc1, but I'm yet to confirm whether that's actually
> attributable to the SWIOTLB changes or something else entirely.
>
> Robin.
>
> Robin Murphy (2):
>   swiotlb: Make DIRECT_MAPPING_ERROR viable
>   swiotlb: Skip cache maintenance on map error
>
>  include/linux/dma-direct.h | 2 +-
>  kernel/dma/swiotlb.c   | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)

Thanks so much for chasing this down!

Unfortunately AOSP is giving me grief this week, so I've not been able
to test the full environment, but I don't seem to be hitting the io
hangs I was seeing earlier with this patch set.

For both:
  Tested-by: John Stultz 

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] Re: [PATCH v4 5/7] iommu: Add virtio-iommu driver

2018-11-20 Thread Jean-Philippe Brucker
On 16/11/2018 18:46, Jean-Philippe Brucker wrote:
>>> +/*
>>> + * __viommu_sync_req - Complete all in-flight requests
>>> + *
>>> + * Wait for all added requests to complete. When this function returns, all
>>> + * requests that were in-flight at the time of the call have completed.
>>> + */
>>> +static int __viommu_sync_req(struct viommu_dev *viommu)
>>> +{
>>> +   int ret = 0;
>>> +   unsigned int len;
>>> +   size_t write_len;
>>> +   struct viommu_request *req;
>>> +   struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
>>> +
>>> +   assert_spin_locked(&viommu->request_lock);
>>> +
>>> +   virtqueue_kick(vq);
>>> +
>>> +   while (!list_empty(&viommu->requests)) {
>>> +   len = 0;
>>> +   req = virtqueue_get_buf(vq, &len);
>>> +   if (!req)
>>> +   continue;
>>> +
>>> +   if (!len)
>>> +   viommu_set_req_status(req->buf, req->len,
>>> + VIRTIO_IOMMU_S_IOERR);
>>> +
>>> +   write_len = req->len - req->write_offset;
>>> +   if (req->writeback && len >= write_len)
>> I don't get "len >= write_len". Is it valid for the device to write more
>> than write_len? If it writes less than write_len, the status is not set,
>> is it?

Actually, len could well be three bytes smaller than write_len. The spec
doesn't require the device to write the three padding bytes in
virtio_iommu_req_tail, after the status byte.

Here the driver just assumes that the device wrote the reserved field.
The QEMU device seems to write uninitialized data in there...

Any objection to changing the spec to require the device to initialize
those bytes to zero? I think it makes things nicer overall and shouldn't
have any performance impact.

[...]
>>> +static struct iommu_domain *viommu_domain_alloc(unsigned type)
>>> +{
>>> +   struct viommu_domain *vdomain;
>>> +
>>> +   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>> smmu drivers also support the IDENTITY type. Don't we want to support it
>> as well?
> 
> Eventually yes. The IDENTITY domain is useful when an IOMMU has been
> forced upon you and gets in the way of performance, in which case you
> get rid of it with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y or
> iommu.passthrough=y. For virtio-iommu though, you could simply not
> instantiate the device.
> 
> I don't think it's difficult to add: if the device supports
> VIRTIO_IOMMU_F_BYPASS, then we simply don't send an ATTACH request.
> Otherwise after ATTACH we send a MAP for the whole input range. If the
> change isn't bigger than this, I'll add it to v5.

Not as straightforward as I hoped, when the device doesn't support
VIRTIO_IOMMU_F_BYPASS:

* We can't simply create an ID map for the whole input range, we need to
carve out the resv_mem regions.

* For a VFIO device, the host has no way of forwarding the identity
mapping. For example the guest will attempt to map [0; 0x]
-> [0; 0x], but VFIO only allows to map RAM and cannot take
such request. One solution is to only create ID mapping for RAM, and
register a notifier for memory hotplug, like intel-iommu does for IOMMUs
that don't support HW pass-through.

Since it's not completely trivial and - I think - not urgent, I'll leave
this for later.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable

2018-11-20 Thread Konrad Rzeszutek Wilk
On Tue, Nov 20, 2018 at 03:01:33PM +, Robin Murphy wrote:
> On 20/11/2018 14:49, Konrad Rzeszutek Wilk wrote:
> > On Tue, Nov 20, 2018 at 02:09:52PM +, Robin Murphy wrote:
> > > With the overflow buffer removed, we no longer have a unique address
> > > which is guaranteed not to be a valid DMA target to use as an error
> > > token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent
> > > an unlikely DMA target, but unfortunately there are already SWIOTLB
> > > users with DMA-able memory at physical address 0 which now gets falsely
> > > treated as a mapping failure and leads to all manner of misbehaviour.
> > > 
> > > The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the
> > > commonly-used all-bits-set value, since the last single byte of memory
> > > is by far the least-likely-valid DMA target.
> > 
> > Are all the callers checking for DIRECT_MAPPING_ERROR or is it more of
> > a comparison (as in if (!ret)) ?
> 
> dma_direct_map_page() and dma_direct_mapping_error() were already doing the
> right thing, and external callers must rely on the latter via
> dma_mapping_error() rather than trying to inspect the actual value
> themselves, since that varies between implementations anyway. AFAICS all the
> new return paths from swiotlb_map_page() are also robust in referencing the
> macro explicitly, so I think we're good.

Cool! Thank you for checking.

Acked-by: Konrad Rzeszutek Wilk 

Thank you!
> 
> Thanks,
> Robin.
> 
> > > Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")]
> > > Reported-by: John Stultz 
> > > Signed-off-by: Robin Murphy 
> > > ---
> > >   include/linux/dma-direct.h | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> > > index bd73e7a91410..9de9c7ab39d6 100644
> > > --- a/include/linux/dma-direct.h
> > > +++ b/include/linux/dma-direct.h
> > > @@ -5,7 +5,7 @@
> > >   #include 
> > >   #include 
> > > -#define DIRECT_MAPPING_ERROR 0
> > > +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0
> > >   #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
> > >   #include 
> > > -- 
> > > 2.19.1.dirty
> > > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] SWIOTLB fixes for 4.20

2018-11-20 Thread Konrad Rzeszutek Wilk
On Tue, Nov 20, 2018 at 05:08:18PM +0100, Christoph Hellwig wrote:
> On Tue, Nov 20, 2018 at 02:09:51PM +, Robin Murphy wrote:
> > This is what I have so far, which at least resolves the most ovbious
> > problems. I still haven't got very far with the USB corruption issue
> > I see on Juno with -rc1, but I'm yet to confirm whether that's actually
> > attributable to the SWIOTLB changes or something else entirely.
> 
> This looks good modulo the minor nitpicks.
> 
> Konrad, are you ok with me picking up both through the dma-mapping
> tree?

Yes, albeit I would want Stefano to take a peek at patch #2 just in case.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable

2018-11-20 Thread Christoph Hellwig
The subject line should say dma-direct.  This isn't really swiotlb
specific.

> +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0

Please add braces around the value like the other *MAPPING_ERROR
defintions so that it can be safely used in any context.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] SWIOTLB fixes for 4.20

2018-11-20 Thread Christoph Hellwig
On Tue, Nov 20, 2018 at 02:09:51PM +, Robin Murphy wrote:
> This is what I have so far, which at least resolves the most ovbious
> problems. I still haven't got very far with the USB corruption issue
> I see on Juno with -rc1, but I'm yet to confirm whether that's actually
> attributable to the SWIOTLB changes or something else entirely.

This looks good modulo the minor nitpicks.

Konrad, are you ok with me picking up both through the dma-mapping
tree?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] swiotlb: Skip cache maintenance on map error

2018-11-20 Thread Christoph Hellwig
On Tue, Nov 20, 2018 at 02:09:53PM +, Robin Murphy wrote:
> If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may
> lead to such delights as performing cache maintenance on whatever
> address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically
> outside the kernel memory map and goes about as well as expected.
> 
> Don't do that.
> 
> Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA")
> Signed-off-by: Robin Murphy 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry

On 20/11/2018 14:20, Robin Murphy wrote:

On 20/11/2018 13:42, John Garry wrote:

From: Ganapatrao Kulkarni 

Change function __iommu_dma_alloc_pages() to allocate memory/pages
for DMA from respective device NUMA node.

Signed-off-by: Ganapatrao Kulkarni 
[JPG:  Modifed to use kvzalloc() and fixed indentation]
Signed-off-by: John Garry 
---
Difference v1->v2:
- Add Ganapatrao's tag and change author

This patch was originally posted by Ganapatrao in [1].

However, after initial review, it was never reposted (due to lack of
cycles, I think). In addition, the functionality in its sibling patches
were merged through patches, as mentioned in [2]; this also refers to a
discussion on device local allocations vs CPU local allocations for DMA
pool, and which is better [3].

However, as mentioned in [3], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

[1] https://lore.kernel.org/patchwork/patch/833004/
[2] https://lkml.org/lkml/2018/8/22/391
[3]
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html


diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..ada00bc 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page
**pages, int count)
  kvfree(pages);
  }
  -static struct page **__iommu_dma_alloc_pages(unsigned int count,
-unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+unsigned int count, unsigned long order_mask, gfp_t gfp)
  {
  struct page **pages;
-unsigned int i = 0, array_size = count * sizeof(*pages);
+unsigned int i = 0, nid = dev_to_node(dev);
order_mask &= (2U << MAX_ORDER) - 1;
  if (!order_mask)
  return NULL;
  -if (array_size <= PAGE_SIZE)
-pages = kzalloc(array_size, GFP_KERNEL);
-else
-pages = vzalloc(array_size);
+pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid);


The pages array is only accessed by the CPU servicing the
iommu_dma_alloc() call, and is usually freed again before that call even
returns. It's certainly never touched by the device, so forcing it to a
potentially non-local node doesn't make a great deal of sense.


Right, it seems sensible to not make this allocation include the 
device-locality requirement, so can leave as is. However modifying to 
use kvzalloc() would seem ok.





  if (!pages)
  return NULL;
  @@ -483,8 +480,10 @@ static struct page
**__iommu_dma_alloc_pages(unsigned int count,
  unsigned int order = __fls(order_mask);
order_size = 1U << order;
-page = alloc_pages((order_mask - order_size) ?
-   gfp | __GFP_NORETRY : gfp, order);
+page = alloc_pages_node(nid,
+(order_mask - order_size) ?
+gfp | __GFP_NORETRY : gfp,
+order);


If we're touching this, can we sort out that horrendous ternary? FWIW I
found I have a local version of the original patch which I tweaked at
the time, and apparently I reworked this hunk as below, which does seem
somewhat nicer for the same diffstat.

Robin.


@@ -446,10 +443,12 @@ static struct page
**__iommu_dma_alloc_pages(unsigned int count,
for (order_mask &= (2U << __fls(count)) - 1;
 order_mask; order_mask &= ~order_size) {
unsigned int order = __fls(order_mask);
+   gfp_t alloc_flags = gfp;

order_size = 1U << order;
-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp,
order);
+   if (order_size < order_mask)
+   alloc_flags |= __GFP_NORETRY;


Sure, this can be included


+   page = alloc_pages_node(nid, alloc_flags, order);
if (!page)
continue;
if (!order)

.




Cheers,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable

2018-11-20 Thread Robin Murphy

On 20/11/2018 14:49, Konrad Rzeszutek Wilk wrote:

On Tue, Nov 20, 2018 at 02:09:52PM +, Robin Murphy wrote:

With the overflow buffer removed, we no longer have a unique address
which is guaranteed not to be a valid DMA target to use as an error
token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent
an unlikely DMA target, but unfortunately there are already SWIOTLB
users with DMA-able memory at physical address 0 which now gets falsely
treated as a mapping failure and leads to all manner of misbehaviour.

The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the
commonly-used all-bits-set value, since the last single byte of memory
is by far the least-likely-valid DMA target.


Are all the callers checking for DIRECT_MAPPING_ERROR or is it more of
a comparison (as in if (!ret)) ?


dma_direct_map_page() and dma_direct_mapping_error() were already doing 
the right thing, and external callers must rely on the latter via 
dma_mapping_error() rather than trying to inspect the actual value 
themselves, since that varies between implementations anyway. AFAICS all 
the new return paths from swiotlb_map_page() are also robust in 
referencing the macro explicitly, so I think we're good.


Thanks,
Robin.


Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")]
Reported-by: John Stultz 
Signed-off-by: Robin Murphy 
---
  include/linux/dma-direct.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index bd73e7a91410..9de9c7ab39d6 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,7 +5,7 @@
  #include 
  #include 
  
-#define DIRECT_MAPPING_ERROR		0

+#define DIRECT_MAPPING_ERROR   ~(dma_addr_t)0
  
  #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA

  #include 
--
2.19.1.dirty


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] swiotlb: Skip cache maintenance on map error

2018-11-20 Thread Konrad Rzeszutek Wilk
On Tue, Nov 20, 2018 at 02:09:53PM +, Robin Murphy wrote:
> If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may
> lead to such delights as performing cache maintenance on whatever
> address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically
> outside the kernel memory map and goes about as well as expected.
> 
> Don't do that.

+Stefano
> 
> Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA")
> Signed-off-by: Robin Murphy 
> ---
>  kernel/dma/swiotlb.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 5731daa09a32..045930e32c0e 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -679,7 +679,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct 
> page *page,
>   }
>  
>   if (!dev_is_dma_coherent(dev) &&
> - (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
> + (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 &&
> + dev_addr != DIRECT_MAPPING_ERROR)
>   arch_sync_dma_for_device(dev, phys, size, dir);
>  
>   return dev_addr;
> -- 
> 2.19.1.dirty
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable

2018-11-20 Thread Konrad Rzeszutek Wilk
On Tue, Nov 20, 2018 at 02:09:52PM +, Robin Murphy wrote:
> With the overflow buffer removed, we no longer have a unique address
> which is guaranteed not to be a valid DMA target to use as an error
> token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent
> an unlikely DMA target, but unfortunately there are already SWIOTLB
> users with DMA-able memory at physical address 0 which now gets falsely
> treated as a mapping failure and leads to all manner of misbehaviour.
> 
> The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the
> commonly-used all-bits-set value, since the last single byte of memory
> is by far the least-likely-valid DMA target.

Are all the callers checking for DIRECT_MAPPING_ERROR or is it more of
a comparison (as in if (!ret)) ?


> 
> Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")]
> Reported-by: John Stultz 
> Signed-off-by: Robin Murphy 
> ---
>  include/linux/dma-direct.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index bd73e7a91410..9de9c7ab39d6 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -5,7 +5,7 @@
>  #include 
>  #include 
>  
> -#define DIRECT_MAPPING_ERROR 0
> +#define DIRECT_MAPPING_ERROR ~(dma_addr_t)0
>  
>  #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
>  #include 
> -- 
> 2.19.1.dirty
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread Robin Murphy

On 20/11/2018 13:42, John Garry wrote:

From: Ganapatrao Kulkarni 

Change function __iommu_dma_alloc_pages() to allocate memory/pages
for DMA from respective device NUMA node.

Signed-off-by: Ganapatrao Kulkarni 
[JPG:  Modifed to use kvzalloc() and fixed indentation]
Signed-off-by: John Garry 
---
Difference v1->v2:
- Add Ganapatrao's tag and change author

This patch was originally posted by Ganapatrao in [1].

However, after initial review, it was never reposted (due to lack of
cycles, I think). In addition, the functionality in its sibling patches
were merged through patches, as mentioned in [2]; this also refers to a
discussion on device local allocations vs CPU local allocations for DMA
pool, and which is better [3].

However, as mentioned in [3], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

[1] https://lore.kernel.org/patchwork/patch/833004/
[2] https://lkml.org/lkml/2018/8/22/391
[3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html


diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..ada00bc 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, 
int count)
kvfree(pages);
  }
  
-static struct page **__iommu_dma_alloc_pages(unsigned int count,

-   unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+   unsigned int count, unsigned long order_mask, gfp_t gfp)
  {
struct page **pages;
-   unsigned int i = 0, array_size = count * sizeof(*pages);
+   unsigned int i = 0, nid = dev_to_node(dev);
  
  	order_mask &= (2U << MAX_ORDER) - 1;

if (!order_mask)
return NULL;
  
-	if (array_size <= PAGE_SIZE)

-   pages = kzalloc(array_size, GFP_KERNEL);
-   else
-   pages = vzalloc(array_size);
+   pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid);


The pages array is only accessed by the CPU servicing the 
iommu_dma_alloc() call, and is usually freed again before that call even 
returns. It's certainly never touched by the device, so forcing it to a 
potentially non-local node doesn't make a great deal of sense.



if (!pages)
return NULL;
  
@@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int count,

unsigned int order = __fls(order_mask);
  
  			order_size = 1U << order;

-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp, order);
+   page = alloc_pages_node(nid,
+   (order_mask - order_size) ?
+   gfp | __GFP_NORETRY : gfp,
+   order);


If we're touching this, can we sort out that horrendous ternary? FWIW I 
found I have a local version of the original patch which I tweaked at 
the time, and apparently I reworked this hunk as below, which does seem 
somewhat nicer for the same diffstat.


Robin.


@@ -446,10 +443,12 @@ static struct page 
**__iommu_dma_alloc_pages(unsigned int count,

for (order_mask &= (2U << __fls(count)) - 1;
 order_mask; order_mask &= ~order_size) {
unsigned int order = __fls(order_mask);
+   gfp_t alloc_flags = gfp;

order_size = 1U << order;
-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp, 
order);

+   if (order_size < order_mask)
+   alloc_flags |= __GFP_NORETRY;
+   page = alloc_pages_node(nid, alloc_flags, order);
if (!page)
continue;
if (!order)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] swiotlb: Make DIRECT_MAPPING_ERROR viable

2018-11-20 Thread Robin Murphy
With the overflow buffer removed, we no longer have a unique address
which is guaranteed not to be a valid DMA target to use as an error
token. The DIRECT_MAPPING_ERROR value of 0 tries to at least represent
an unlikely DMA target, but unfortunately there are already SWIOTLB
users with DMA-able memory at physical address 0 which now gets falsely
treated as a mapping failure and leads to all manner of misbehaviour.

The best we can do to mitigate that is flip DIRECT_MAPPING_ERROR to the
commonly-used all-bits-set value, since the last single byte of memory
is by far the least-likely-valid DMA target.

Fixes: dff8d6c1ed58 ("swiotlb: remove the overflow buffer")]
Reported-by: John Stultz 
Signed-off-by: Robin Murphy 
---
 include/linux/dma-direct.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index bd73e7a91410..9de9c7ab39d6 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,7 +5,7 @@
 #include 
 #include 
 
-#define DIRECT_MAPPING_ERROR   0
+#define DIRECT_MAPPING_ERROR   ~(dma_addr_t)0
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] swiotlb: Skip cache maintenance on map error

2018-11-20 Thread Robin Murphy
If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may
lead to such delights as performing cache maintenance on whatever
address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically
outside the kernel memory map and goes about as well as expected.

Don't do that.

Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA")
Signed-off-by: Robin Murphy 
---
 kernel/dma/swiotlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 5731daa09a32..045930e32c0e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -679,7 +679,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page 
*page,
}
 
if (!dev_is_dma_coherent(dev) &&
-   (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
+   (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0 &&
+   dev_addr != DIRECT_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
 
return dev_addr;
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] SWIOTLB fixes for 4.20

2018-11-20 Thread Robin Murphy
This is what I have so far, which at least resolves the most ovbious
problems. I still haven't got very far with the USB corruption issue
I see on Juno with -rc1, but I'm yet to confirm whether that's actually
attributable to the SWIOTLB changes or something else entirely.

Robin.

Robin Murphy (2):
  swiotlb: Make DIRECT_MAPPING_ERROR viable
  swiotlb: Skip cache maintenance on map error

 include/linux/dma-direct.h | 2 +-
 kernel/dma/swiotlb.c   | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry
From: Ganapatrao Kulkarni 

Change function __iommu_dma_alloc_pages() to allocate memory/pages
for DMA from respective device NUMA node.

Signed-off-by: Ganapatrao Kulkarni 
[JPG:  Modifed to use kvzalloc() and fixed indentation]
Signed-off-by: John Garry 
---
Difference v1->v2:
- Add Ganapatrao's tag and change author

This patch was originally posted by Ganapatrao in [1].

However, after initial review, it was never reposted (due to lack of
cycles, I think). In addition, the functionality in its sibling patches
were merged through patches, as mentioned in [2]; this also refers to a
discussion on device local allocations vs CPU local allocations for DMA
pool, and which is better [3].

However, as mentioned in [3], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

[1] https://lore.kernel.org/patchwork/patch/833004/
[2] https://lkml.org/lkml/2018/8/22/391
[3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html


diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..ada00bc 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, 
int count)
kvfree(pages);
 }
 
-static struct page **__iommu_dma_alloc_pages(unsigned int count,
-   unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+   unsigned int count, unsigned long order_mask, gfp_t gfp)
 {
struct page **pages;
-   unsigned int i = 0, array_size = count * sizeof(*pages);
+   unsigned int i = 0, nid = dev_to_node(dev);
 
order_mask &= (2U << MAX_ORDER) - 1;
if (!order_mask)
return NULL;
 
-   if (array_size <= PAGE_SIZE)
-   pages = kzalloc(array_size, GFP_KERNEL);
-   else
-   pages = vzalloc(array_size);
+   pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid);
if (!pages)
return NULL;
 
@@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count,
unsigned int order = __fls(order_mask);
 
order_size = 1U << order;
-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp, order);
+   page = alloc_pages_node(nid,
+   (order_mask - order_size) ?
+   gfp | __GFP_NORETRY : gfp,
+   order);
if (!page)
continue;
if (!order)
@@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
alloc_sizes = min_size;
 
count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
+   pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
+   gfp);
if (!pages)
return NULL;
 
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-20 Thread Jean-Philippe Brucker
On 20/11/2018 09:16, Jonathan Cameron wrote:
> +CC Jean-Phillipe and iommu list.

Thanks for the Cc, sorry I don't have enough bandwidth to follow this
thread at the moment.

> In WarpDrive/uacce, we make this simple. If you support IOMMU and it 
> support
> SVM/SVA. Everything will be fine just like ODP implicit mode. And you 
> don't need
> to write any code for that. Because it has been done by IOMMU framework. 
> If it  

 Looks like the IOMMU code uses mmu_notifier, so it is identical to
 IB's ODP. The only difference is that IB tends to have the IOMMU page
 table in the device, not in the CPU.

 The only case I know if that is different is the new-fangled CAPI
 stuff where the IOMMU can directly use the CPU's page table and the
 IOMMU page table (in device or CPU) is eliminated.  
>>>
>>> Yes. We are not focusing on the current implementation. As mentioned in the
>>> cover letter. We are expecting Jean Philips' SVA patch:
>>> git://linux-arm.org/linux-jpb.  
>>
>> This SVA stuff does not look comparable to CAPI as it still requires
>> maintaining seperate IOMMU page tables.

With SVA, we use the same page tables in the IOMMU and CPU. It's the
same pgd pointer, there is no mirroring of mappings. We bind the process
page tables with the device using a PASID (Process Address Space ID).

After fork(), the child's mm is different from the parent's one, and is
not automatically bound to the device. The device driver will have to
issue a new bind() request, and the child mm will be bound with a
different PASID.

There could be a problem if the child inherits the parent's device
handle. Then depending on the device, the child could be able to program
DMA and possibly access the parent's address space. The parent needs to
be aware of that when using the bind() API, and close the device fd in
the child after fork().

We use MMU notifiers for some address space changes:

* The IOTLB needs to be invalidated after any unmap() to the process
address space. On Arm systems the SMMU IOTLBs can be invalidated by the
CPU TLBI instructions, but we still need to invalidate TLBs private to
devices that are arch-agnostic (Address Translation Cache in PCI ATS).

* When the process mm exits, we need to remove the associated PASID
configuration in the IOMMU and invalidate the TLBs.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry

On 20/11/2018 10:09, Ganapatrao Kulkarni wrote:

Hi John,

On Tue, Nov 20, 2018 at 3:35 PM John Garry  wrote:


On 08/11/2018 17:55, John Garry wrote:

Change function __iommu_dma_alloc_pages() to allocate memory/pages
for DMA from respective device NUMA node.



Ping a friendly reminder on this patch.

Thanks


Originally-from: Ganapatrao Kulkarni 
Signed-off-by: John Garry 
---

This patch was originally posted by Ganapatrao in [1] *.

However, after initial review, it was never reposted (due to lack of
cycles, I think). In addition, the functionality in its sibling patches
were merged through patches, as mentioned in [2]; this also refers to a
discussion on device local allocations vs CPU local allocations for DMA
pool, and which is better [3].

However, as mentioned in [3], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

[1] https://lore.kernel.org/patchwork/patch/833004/
[2] https://lkml.org/lkml/2018/8/22/391
[3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html

* Authorship on this updated patch may need to be fixed - I add not want to
  add Ganapatrao's SOB without permission.


thanks for taking this up. please feel free to add my SoB.


OK, I will also make you the author and repost.

Thanks,
John



diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..ada00bc 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, 
int count)
  kvfree(pages);
 }

-static struct page **__iommu_dma_alloc_pages(unsigned int count,
- unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+ unsigned int count, unsigned long order_mask, gfp_t gfp)
 {
  struct page **pages;
- unsigned int i = 0, array_size = count * sizeof(*pages);
+ unsigned int i = 0, nid = dev_to_node(dev);

  order_mask &= (2U << MAX_ORDER) - 1;
  if (!order_mask)
  return NULL;

- if (array_size <= PAGE_SIZE)
- pages = kzalloc(array_size, GFP_KERNEL);
- else
- pages = vzalloc(array_size);
+ pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid);
  if (!pages)
  return NULL;

@@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count,
  unsigned int order = __fls(order_mask);

  order_size = 1U << order;
- page = alloc_pages((order_mask - order_size) ?
-gfp | __GFP_NORETRY : gfp, order);
+ page = alloc_pages_node(nid,
+ (order_mask - order_size) ?
+ gfp | __GFP_NORETRY : gfp,
+ order);
  if (!page)
  continue;
  if (!order)
@@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
  alloc_sizes = min_size;

  count = PAGE_ALIGN(size) >> PAGE_SHIFT;
- pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
+ pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
+ gfp);
  if (!pages)
  return NULL;







thanks
Ganapat

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread Ganapatrao Kulkarni
Hi John,

On Tue, Nov 20, 2018 at 3:35 PM John Garry  wrote:
>
> On 08/11/2018 17:55, John Garry wrote:
> > Change function __iommu_dma_alloc_pages() to allocate memory/pages
> > for DMA from respective device NUMA node.
> >
>
> Ping a friendly reminder on this patch.
>
> Thanks
>
> > Originally-from: Ganapatrao Kulkarni 
> > Signed-off-by: John Garry 
> > ---
> >
> > This patch was originally posted by Ganapatrao in [1] *.
> >
> > However, after initial review, it was never reposted (due to lack of
> > cycles, I think). In addition, the functionality in its sibling patches
> > were merged through patches, as mentioned in [2]; this also refers to a
> > discussion on device local allocations vs CPU local allocations for DMA
> > pool, and which is better [3].
> >
> > However, as mentioned in [3], dma_alloc_coherent() uses the locality
> > information from the device - as in direct DMA - so this patch is just
> > applying this same policy.
> >
> > [1] https://lore.kernel.org/patchwork/patch/833004/
> > [2] https://lkml.org/lkml/2018/8/22/391
> > [3] 
> > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html
> >
> > * Authorship on this updated patch may need to be fixed - I add not want to
> >   add Ganapatrao's SOB without permission.

thanks for taking this up. please feel free to add my SoB.
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index d1b0475..ada00bc 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page 
> > **pages, int count)
> >   kvfree(pages);
> >  }
> >
> > -static struct page **__iommu_dma_alloc_pages(unsigned int count,
> > - unsigned long order_mask, gfp_t gfp)
> > +static struct page **__iommu_dma_alloc_pages(struct device *dev,
> > + unsigned int count, unsigned long order_mask, gfp_t gfp)
> >  {
> >   struct page **pages;
> > - unsigned int i = 0, array_size = count * sizeof(*pages);
> > + unsigned int i = 0, nid = dev_to_node(dev);
> >
> >   order_mask &= (2U << MAX_ORDER) - 1;
> >   if (!order_mask)
> >   return NULL;
> >
> > - if (array_size <= PAGE_SIZE)
> > - pages = kzalloc(array_size, GFP_KERNEL);
> > - else
> > - pages = vzalloc(array_size);
> > + pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid);
> >   if (!pages)
> >   return NULL;
> >
> > @@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned 
> > int count,
> >   unsigned int order = __fls(order_mask);
> >
> >   order_size = 1U << order;
> > - page = alloc_pages((order_mask - order_size) ?
> > -gfp | __GFP_NORETRY : gfp, order);
> > + page = alloc_pages_node(nid,
> > + (order_mask - order_size) ?
> > + gfp | __GFP_NORETRY : gfp,
> > + order);
> >   if (!page)
> >   continue;
> >   if (!order)
> > @@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, 
> > size_t size, gfp_t gfp,
> >   alloc_sizes = min_size;
> >
> >   count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> > - pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, 
> > gfp);
> > + pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
> > + gfp);
> >   if (!pages)
> >   return NULL;
> >
> >
>
>

thanks
Ganapat
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages()

2018-11-20 Thread John Garry

On 08/11/2018 17:55, John Garry wrote:

Change function __iommu_dma_alloc_pages() to allocate memory/pages
for DMA from respective device NUMA node.



Ping a friendly reminder on this patch.

Thanks


Originally-from: Ganapatrao Kulkarni 
Signed-off-by: John Garry 
---

This patch was originally posted by Ganapatrao in [1] *.

However, after initial review, it was never reposted (due to lack of
cycles, I think). In addition, the functionality in its sibling patches
were merged through patches, as mentioned in [2]; this also refers to a
discussion on device local allocations vs CPU local allocations for DMA
pool, and which is better [3].

However, as mentioned in [3], dma_alloc_coherent() uses the locality
information from the device - as in direct DMA - so this patch is just
applying this same policy.

[1] https://lore.kernel.org/patchwork/patch/833004/
[2] https://lkml.org/lkml/2018/8/22/391
[3] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1692998.html

* Authorship on this updated patch may need to be fixed - I add not want to
  add Ganapatrao's SOB without permission.

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..ada00bc 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -449,20 +449,17 @@ static void __iommu_dma_free_pages(struct page **pages, 
int count)
kvfree(pages);
 }

-static struct page **__iommu_dma_alloc_pages(unsigned int count,
-   unsigned long order_mask, gfp_t gfp)
+static struct page **__iommu_dma_alloc_pages(struct device *dev,
+   unsigned int count, unsigned long order_mask, gfp_t gfp)
 {
struct page **pages;
-   unsigned int i = 0, array_size = count * sizeof(*pages);
+   unsigned int i = 0, nid = dev_to_node(dev);

order_mask &= (2U << MAX_ORDER) - 1;
if (!order_mask)
return NULL;

-   if (array_size <= PAGE_SIZE)
-   pages = kzalloc(array_size, GFP_KERNEL);
-   else
-   pages = vzalloc(array_size);
+   pages = kvzalloc_node(count * sizeof(*pages), GFP_KERNEL, nid);
if (!pages)
return NULL;

@@ -483,8 +480,10 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count,
unsigned int order = __fls(order_mask);

order_size = 1U << order;
-   page = alloc_pages((order_mask - order_size) ?
-  gfp | __GFP_NORETRY : gfp, order);
+   page = alloc_pages_node(nid,
+   (order_mask - order_size) ?
+   gfp | __GFP_NORETRY : gfp,
+   order);
if (!page)
continue;
if (!order)
@@ -569,7 +568,8 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
alloc_sizes = min_size;

count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   pages = __iommu_dma_alloc_pages(count, alloc_sizes >> PAGE_SHIFT, gfp);
+   pages = __iommu_dma_alloc_pages(dev, count, alloc_sizes >> PAGE_SHIFT,
+   gfp);
if (!pages)
return NULL;





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/10] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs

2018-11-20 Thread Christoph Hellwig
On Mon, Nov 19, 2018 at 03:22:13PM -0800, John Stultz wrote:
> > +   sg->dma_address = dma_addr;
> > sg_dma_len(sg) = sg->length;
> > }
> 
> I know Robin has already replied with more detailed info, but just to
> close the loop as I'm finally home, applying this patch didn't seem to
> help with the IO hangs I'm seeing w/ HiKey960.

If Robins observation is right this should fix the problem for you:

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index bd73e7a91410..1833f0c1fba0 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,7 +5,7 @@
 #include 
 #include 
 
-#define DIRECT_MAPPING_ERROR   0
+#define DIRECT_MAPPING_ERROR   (~(dma_addr_t)0x0)
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/10] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs

2018-11-20 Thread Christoph Hellwig
On Mon, Nov 19, 2018 at 07:36:44PM +, Robin Murphy wrote:
> OK, having brought my Hikey to life and reproduced John's stall with rc1, 
> what's going on is that at some point dma_map_sg() returns 0, which causes 
> the SCSI/UFS layer to go round in circles repeatedly trying to map the same 
> list(s) equally unsuccessfully.
>
> Why does dma_map_sg() fail? Turns out what we all managed to overlook is 
> that this patch *does* introduce a subtle change in behaviour, in that 
> previously the non-bounced case assigned dev_addr to sg->dma_address 
> without looking at it; now with the swiotlb_map_page() call we check the 
> return value against DIRECT_MAPPING_ERROR regardless of whether it was 
> bounced or not.
>
> Flash back to the other thread when I said "...but I suspect there may well 
> be non-IOMMU platforms where DMA to physical address 0 is a thing :("? I 
> have the 3GB Hikey where all the RAM is below 32 bits so SWIOTLB never ever 
> bounces, but sure enough, guess where that RAM starts...

What is PAGE_OFFSET on that machine?   We usually don't use kernel
virtual address 0 so that we can deal with 0 pointer derferences sanely,
but I guess we can get to physical address 0.

I guess the quick fix would be to move DMA_DIRECT_MAPPING_ERROR to all-F
ASAP..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC] dma-direct: do not allocate a single page from CMA area

2018-11-20 Thread Christoph Hellwig
On Mon, Nov 05, 2018 at 02:40:51PM -0800, Nicolin Chen wrote:
> > > In general, this seems to make sense to me. It does represent a 
> > > theoretical 
> > > change in behaviour for devices which have their own CMA area somewhere 
> > > other than kernel memory, and only ever make non-atomic allocations, but 
> > > I'm not sure whether that's a realistic or common enough case to really 
> > > worry about.
> > 
> > Yes, I think we should make the decision in dma_alloc_from_contiguous
> > based on having a per-dev CMA area or not.  There is a lot of cruft in
> 
> It seems that cma_alloc() already has a CMA area check? Would it
> be duplicated to have a similar one in dma_alloc_from_contiguous?

It isn't duplicate if it serves a different purpose.

> > this area that should be cleaned up while we're at it, like always
> > falling back to the normal page allocator if there is no CMA area or
> > nothing suitable found in dma_alloc_from_contiguous instead of
> > having to duplicate all that in the caller.
> 
> Am I supposed to clean up things that's mentioned above by moving
> the fallback allocator into dma_alloc_from_contiguous, or to just
> move my change (the count check) into dma_alloc_from_contiguous?
> 
> I understand that'd be great to have a cleanup, yet feel it could
> be done separately as this patch isn't really a cleanup change.

I can take care of any cleanups.  I've been trying to dust up that
area anyway.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-20 Thread Jonathan Cameron
+CC Jean-Phillipe and iommu list.


On Mon, 19 Nov 2018 20:29:39 -0700
Jason Gunthorpe  wrote:

> On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:  
> > > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > > From: Jason Gunthorpe 
> > > To: Kenneth Lee 
> > > CC: Leon Romanovsky , Kenneth Lee ,
> > >  Tim Sell , linux-...@vger.kernel.org, Alexander
> > >  Shishkin , Zaibo Xu
> > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > >  , Gavin Schenk , RDMA 
> > > mailing
> > >  list , Zhou Wang ,
> > >  Doug Ledford , Uwe Kleine-König
> > >  , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-cry...@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , "David S.
> > >  Miller" , linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.9.4 (2018-02-28)
> > > Message-ID: <20181119184954.gb4...@ziepe.ca>
> > > 
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > >
> > > > If the hardware cannot share page table with the CPU, we then need to 
> > > > have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But 
> > > > this cannot
> > > > solve the COW problem: if the user process A share a page P with 
> > > > device, and A 
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you 
> > > > have
> > > > no way to let the device know it should use P' rather than P.  
> > > 
> > > Is this true? I thought mmu_notifiers covered all these cases.
> > > 
> > > The mm_notifier for A should fire if B causes the physical address of
> > > A's pages to change via COW. 
> > > 
> > > And this causes the device page tables to re-synchronize.  
> > 
> > I don't see such code. The current do_cow_fault() implemenation has nothing 
> > to
> > do with mm_notifer.  
> 
> Well, that sure sounds like it would be a bug in mmu_notifiers..
> 
> But considering Jean's SVA stuff seems based on mmu notifiers, I have
> a hard time believing that it has any different behavior from RDMA's
> ODP, and if it does have different behavior, then it is probably just
> a bug in the ODP implementation.
> 
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it 
> > > > support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you 
> > > > don't need
> > > > to write any code for that. Because it has been done by IOMMU 
> > > > framework. If it  
> > > 
> > > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > > table in the device, not in the CPU.
> > > 
> > > The only case I know if that is different is the new-fangled CAPI
> > > stuff where the IOMMU can directly use the CPU's page table and the
> > > IOMMU page table (in device or CPU) is eliminated.  
> >
> > Yes. We are not focusing on the current implementation. As mentioned in the
> > cover letter. We are expecting Jean Philips' SVA patch:
> > git://linux-arm.org/linux-jpb.  
> 
> This SVA stuff does not look comparable to CAPI as it still requires
> maintaining seperate IOMMU page tables.
> 
> Also, those patches from Jean have a lot of references to
> mmu_notifiers (ie look at iommu_mmu_notifier).
> 
> Are you really sure it is actually any different at all?
> 
> > > Anyhow, I don't think a single instance of hardware should justify an
> > > entire new subsystem. Subsystems are hard to make and without multiple
> > > hardware examples there is no way to expect that it would cover any
> > > future use cases.  
> > 
> > Yes. That's our first expectation. We can keep it with our driver. But 
> > because
> > there is no user driver support for any accelerator in mainline kernel. 
> > Even the
> > well known QuickAssit has to be maintained out of tree. So we try to see if
> > people is interested in working together to solve the problem.  
> 
> Well, you should come with patches ack'ed by these other groups.
> 
> > > If all your driver needs is to mmap some PCI bar space, route
> > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > choice.   
> > 
> > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion 
> > and
> > try not to add complexity to the mm subsystem.  
> 
> Why would a mediated VFIO driver touch the mm subsystem? Sounds like
> you don't have a VFIO driver if it needs to do stuff like that...
> 
> > > If it needs to do a bunch of other stuff, not related to PCI bar
> > > space, i