Re: [PATCH 2/4] dma-buf: lock the reservation object during (un)map_dma_buf v2

2018-07-03 Thread Christian König

Am 03.07.2018 um 14:52 schrieb Daniel Vetter:

On Tue, Jul 03, 2018 at 01:46:44PM +0200, Christian König wrote:

Am 25.06.2018 um 11:12 schrieb Daniel Vetter:

On Mon, Jun 25, 2018 at 10:22:31AM +0200, Daniel Vetter wrote:

On Fri, Jun 22, 2018 at 04:11:01PM +0200, Christian König wrote:

First step towards unpinned DMA buf operation.

I've checked the DRM drivers to potential locking of the reservation
object, but essentially we need to audit all implementations of the
dma_buf _ops for this to work.

v2: reordered

Signed-off-by: Christian König 

Reviewed-by: Daniel Vetter 

Ok I did review drivers a bit, but apparently not well enough by far. i915
CI is unhappy:

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9400/fi-whl-u/igt@gem_mmap_...@basic-small-bo-tiledx.html

So yeah inserting that lock in there isn't the most trivial thing :-/

I kinda assume that other drivers will have similar issues, e.g. omapdrm's
use of dev->struct_mutex also very much looks like it'll result in a new
locking inversion.

Ah, crap. Already feared that this wouldn't be easy, but yeah that it is as
bad as this is rather disappointing.

Thanks for the info, going to keep thinking about how to solve those issues.

Side note: We want to make sure that drivers don't get the reservation_obj
locking hierarchy wrong in other places (using dev->struct_mutex is kinda
a pre-existing mis-use that we can't wish away retroactively
unfortunately). One really important thing is that shrinker vs resv_obj
must work with trylocks in the shrinker, so that you can allocate memory
while holding reservation objects.

One neat trick to teach lockdep about this would be to have a dummy

if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
ww_mutex_lock(dma_buf->resv_obj);
fs_reclaim_acquire(GFP_KERNEL);
fs_reclaim_release(GFP_KERNEL);
ww_mutex_unlock(dma_buf->resv_obj);
}

in dma_buf_init(). We're using the fs_reclaim_acquire/release check very
successfully to improve our igt test coverage for i915.ko in other areas.

Totally unrelated to dev->struct_mutex, but thoughts? Well for
dev->struct_mutex we could at least decide on one true way to nest
resv_obj vs. dev->struct_mutex as maybe an interim step, but not sure how
much that would help.


I don't think that would help. As far as I can see we only have two choices:

1. Either have a big patch which fixes all DMA-buf implementations to 
allow the reservation lock to be held during map/unmap (unrealistic).


2. Add a flag to at least in the mid term tell the DMA-buf helper 
functions what to do. E.g. create the mapping without the reservation 
lock held.



How about moving the SGL caching from the DRM layer into the DMA-buf 
layer and add a flag if the exporter wants/needs this caching?


Then only the implementations which can deal with dynamic invalidation 
disable SGL caching and with it enable creating the sgl with the 
reservation object locked.


This way we can kill two birds with one stone by both avoiding the SGL 
caching in the DRM layer as well as having a sane handling for the locking.


Thoughts?

Christian.



-Daniel




Re: [PATCH 04/15] dma-fence: Make ->wait callback optional

2018-05-04 Thread Christian König

Am 04.05.2018 um 11:25 schrieb Daniel Vetter:

On Fri, May 4, 2018 at 11:16 AM, Chris Wilson  wrote:

Quoting Daniel Vetter (2018-05-04 09:57:59)

On Fri, May 04, 2018 at 09:31:33AM +0100, Chris Wilson wrote:

Quoting Daniel Vetter (2018-05-04 09:23:01)

On Fri, May 04, 2018 at 10:17:22AM +0200, Daniel Vetter wrote:

On Fri, May 04, 2018 at 09:09:10AM +0100, Chris Wilson wrote:

Quoting Daniel Vetter (2018-05-03 15:25:52)

Almost everyone uses dma_fence_default_wait.

v2: Also remove the BUG_ON(!ops->wait) (Chris).

I just don't get the rationale for implicit over explicit.

Closer approximation of dwim semantics. There's been tons of patch series
all over drm and related places to get there, once we have a big pile of
implementations and know what the dwim semantics should be. Individually
they're all not much, in aggregate they substantially simplify simple
drivers.

I also think clearer separation between optional optimization hooks and
mandatory core parts is useful in itself.

A new spelling of midlayer ;) I don't see the contradiction with a
driver saying use the default and simplicity. (I know which one the
compiler thinks is simpler ;)

If the compiler overhead is real then I guess it would makes to be
explicit. I don't expect that to be a problem though for a blocking
function.

I disagree on this being a midlayer - you can still overwrite everything
you please to. What it does help is people doing less copypasting (and
assorted bugs), at least in the grand scheme of things. And we do have a
_lot_ more random small drivers than just a few years ago. Reducing the
amount of explicit typing just to get default bahaviour has been an
ongoing theme for a few years now, and your objection here is about the
first that this is not a good idea. So I'm somewhat confused.

I'm just saying I don't see any rationale for this patch.

 "Almost everyone uses dma_fence_default_wait."

Why change?

Making it look simpler on the surface, so that you don't have to think
about things straight away? I understand the appeal, but I do worry
about it just being an illusion. (Cutting and pasting a line saying
.wait = default_wait, doesn't feel that onerous, as you likely cut and
paste the ops anyway, and at the very least you are reminded about some
of the interactions. You could even have default initializers and/or
magic macros to hide the cut and paste; maybe a simple_dma_fence [now
that's a midlayer!] but I haven't looked.)

In really monolithic vtables like drm_driver we do use default
function macros, so you type 1 line, get them all. But dma_fence_ops
is pretty small, and most drivers only implement a few callbacks. Also
note that e.g. the ->release callback already works like that, so this
pattern is there already. I simply extended it to ->wait and
->enable_signaling. Also note that I leave the EXPORT_SYMBOL in place,
you can still wrap dma_fence_default_wait if you wish to do so.

But I just realized that I didn't clean out the optional release
hooks, I guess I should do that too (for the few cases it's not yet
done) and respin.


I kind of agree with Chris here, but also see the practical problem to 
copy the default function in all the implementations.


We had the same problem in TTM and I also don't really like the result 
to always have that "if (some_callback) default(); else some_callback();".


Might be that the run time overhead is negligible, but it doesn't feels 
right from the coding style perspective.


Christian.


-Daniel




Re: [PATCH 04/17] dma-fence: Allow wait_any_timeout for all fences

2018-04-29 Thread Christian König

Am 27.04.2018 um 08:17 schrieb Daniel Vetter:

When this was introduced in

commit a519435a96597d8cd96123246fea4ae5a6c90b02
Author: Christian König <christian.koe...@amd.com>
Date:   Tue Oct 20 16:34:16 2015 +0200

 dma-buf/fence: add fence_wait_any_timeout function v2

there was a restriction added that this only works if the dma-fence
uses the dma_fence_default_wait hook. Which works for amdgpu, which is
the only caller. Well, until you share some buffers with e.g. i915,
then you get an -EINVAL.

But there's really no reason for this, because all drivers must
support callbacks. The special ->wait hook is only as an optimization;
if the driver needs to create a worker thread for an active callback,
then it can avoid to do that if it knows that there's a process
context available already. So ->wait is just an optimization, just
using the logic in dma_fence_default_wait() should work for all
drivers.

Let's remove this restriction.


Mhm, that was intentional introduced because for radeon that is not only 
an optimization, but mandatory for correct operation.


On the other hand radeon isn't using this function, so it should be fine 
as long as the Intel driver can live with it.


Christian.



Signed-off-by: Daniel Vetter <daniel.vet...@intel.com>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Cc: Gustavo Padovan <gust...@padovan.org>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Cc: Christian König <christian.koe...@amd.com>
Cc: Alex Deucher <alexander.deuc...@amd.com>
---
  drivers/dma-buf/dma-fence.c | 5 -
  1 file changed, 5 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 7b5b40d6b70e..59049375bd19 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -503,11 +503,6 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, 
uint32_t count,
for (i = 0; i < count; ++i) {
struct dma_fence *fence = fences[i];
  
-		if (fence->ops->wait != dma_fence_default_wait) {

-   ret = -EINVAL;
-   goto fence_rm_cb;
-   }
-
cb[i].task = current;
if (dma_fence_add_callback(fence, [i].base,
   dma_fence_default_wait_cb)) {




Re: [PATCH 4/8] dma-buf: add peer2peer flag

2018-04-20 Thread Christian König

Am 20.04.2018 um 12:17 schrieb Christoph Hellwig:

On Fri, Apr 20, 2018 at 10:58:50AM +0200, Christian König wrote:

Yes there's a bit a layering violation insofar that drivers really
shouldn't each have their own copy of "how do I convert a piece of dma
memory into  dma-buf", but that doesn't render the interface a bad idea.

Completely agree on that.

What we need is an sg_alloc_table_from_resources(dev, resources,
num_resources) which does the handling common to all drivers.

A structure that contains

{page,offset,len} + {dma_addr+dma_len}

is not a good container for storing

{virt addr, dma_addr, len}

no matter what interface you build arond it.


Why not? I mean at least for my use case we actually don't need the 
virtual address.


What we need is {dma_addr+dma_len} in a consistent interface which can 
come from both {page,offset,len} as well as {resource, len}.


What I actually don't need is separate handling for system memory and 
resources, but that would we get exactly when we don't use sg_table.


Christian.


And that is discounting
all the problems around mapping coherent allocations for other devices,
or the iommu merging problem we are having another thread on.

So let's come up with a better high level interface first, and then
worrty about how to implement it in the low-level dma-mapping interface
second.  Especially given that my consolidation of the dma_map_ops
implementation is in full stream and there shoudn't be all that many
to bother with.

So first question:  Do you actually care about having multiple
pairs of the above, or instead of all chunks just deal with a single
of the above?  In that case we really should not need that many
new interfaces as dma_map_resource will be all you need anyway.


Christian.


-Daniel

---end quoted text---




Re: [PATCH 4/8] dma-buf: add peer2peer flag

2018-04-20 Thread Christian König

Am 20.04.2018 um 09:13 schrieb Daniel Vetter:

On Thu, Apr 19, 2018 at 01:16:57AM -0700, Christoph Hellwig wrote:

On Mon, Apr 16, 2018 at 03:38:56PM +0200, Daniel Vetter wrote:

We've broken that assumption in i915 years ago. Not struct page backed
gpu memory is very real.

Of course we'll never feed such a strange sg table to a driver which
doesn't understand it, but allowing sg_page == NULL works perfectly
fine. At least for gpu drivers.

For GPU drivers on x86 with no dma coherency problems, sure.  But not
all the world is x86.  We already have problems due to dmabugs use
of the awkward get_sgtable interface (see the common on
arm_dma_get_sgtable that I fully agree with), and doing this for memory
that doesn't have a struct page at all will make things even worse.

x86 dma isn't coherent either, if you're a GPU :-) Flushing gpu caches
tends to be too expensive, so there's pci-e support and chipset support to
forgo it. Plus drivers flushing caches themselves.

The dma_get_sgtable thing is indeed fun, right solution would probably be
to push the dma-buf export down into the dma layer. The comment for
arm_dma_get_sgtable is also not a realy concern, because dma-buf also
abstracts away the flushing (or well is supposed to), so there really
shouldn't be anyone calling the streaming apis on the returned sg table.
That's why dma-buf gives you an sg table that's mapped already.


If that's not acceptable then I guess we could go over the entire tree
and frob all the gpu related code to switch over to a new struct
sg_table_might_not_be_struct_page_backed, including all the other
functions we added over the past few years to iterate over sg tables.
But seems slightly silly, given that sg tables seem to do exactly what
we need.

It isn't silly.  We will have to do some surgery like that anyway
because the current APIs don't work.  So relax, sit back and come up
with an API that solves the existing issues and serves us well in
the future.

So we should just implement a copy of sg table for dma-buf, since I still
think it does exactly what we need for gpus?

Yes there's a bit a layering violation insofar that drivers really
shouldn't each have their own copy of "how do I convert a piece of dma
memory into  dma-buf", but that doesn't render the interface a bad idea.


Completely agree on that.

What we need is an sg_alloc_table_from_resources(dev, resources, 
num_resources) which does the handling common to all drivers.


Christian.


-Daniel




[PATCH 5/5] drm/amdgpu: add independent DMA-buf import v3

2018-04-11 Thread Christian König
Instead of relying on the DRM functions just implement our own import
functions. This adds support for taking care of unpinned DMA-buf.

v2: enable for all exporters, not just amdgpu, fix invalidation
handling, lock reservation object while setting callback
v3: change to new dma_buf attach interface

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 72 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 32 +++---
 2 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7fef95f0fed1..58dcfba0225a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -86,28 +86,24 @@ int amdgpu_gem_prime_mmap(struct drm_gem_object *obj, 
struct vm_area_struct *vma
return ret;
 }
 
-struct drm_gem_object *
-amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
-struct dma_buf_attachment *attach,
-struct sg_table *sg)
+static struct drm_gem_object *
+amdgpu_gem_prime_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
 {
-   struct reservation_object *resv = attach->dmabuf->resv;
+   struct reservation_object *resv = dma_buf->resv;
struct amdgpu_device *adev = dev->dev_private;
struct amdgpu_bo *bo;
int ret;
 
ww_mutex_lock(>lock, NULL);
-   ret = amdgpu_bo_create(adev, attach->dmabuf->size, PAGE_SIZE,
+   ret = amdgpu_bo_create(adev, dma_buf->size, PAGE_SIZE,
   AMDGPU_GEM_DOMAIN_CPU, 0, ttm_bo_type_sg,
   resv, );
if (ret)
goto error;
 
-   bo->tbo.sg = sg;
-   bo->tbo.ttm->sg = sg;
bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
-   if (attach->dmabuf->ops != _dmabuf_ops)
+   if (dma_buf->ops != _dmabuf_ops)
bo->prime_shared_count = 1;
 
ww_mutex_unlock(>lock);
@@ -118,6 +114,26 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+struct drm_gem_object *
+amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
+struct dma_buf_attachment *attach,
+struct sg_table *sg)
+{
+   struct drm_gem_object *obj;
+   struct amdgpu_bo *bo;
+
+   obj = amdgpu_gem_prime_create_obj(dev, attach->dmabuf);
+   if (IS_ERR(obj))
+   return obj;
+
+   bo = gem_to_amdgpu_bo(obj);
+   amdgpu_bo_reserve(bo, true);
+   bo->tbo.sg = sg;
+   bo->tbo.ttm->sg = sg;
+   amdgpu_bo_unreserve(bo);
+   return obj;
+}
+
 static struct sg_table *
 amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
   enum dma_data_direction dir)
@@ -293,9 +309,29 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device 
*dev,
return buf;
 }
 
+static void
+amdgpu_gem_prime_invalidate_mappings(struct dma_buf_attachment *attach)
+{
+   struct ttm_operation_ctx ctx = { false, false };
+   struct drm_gem_object *obj = attach->priv;
+   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+   struct ttm_placement placement = {};
+   int r;
+
+   r = ttm_bo_validate(>tbo, , );
+   if (r)
+   DRM_ERROR("Failed to unmap DMA-buf import (%d))\n", r);
+}
+
 struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
struct dma_buf *dma_buf)
 {
+   struct dma_buf_attach_info attach_info = {
+   .dev = dev->dev,
+   .dmabuf = dma_buf,
+   .invalidate = amdgpu_gem_prime_invalidate_mappings
+   };
+   struct dma_buf_attachment *attach;
struct drm_gem_object *obj;
 
if (dma_buf->ops == _dmabuf_ops) {
@@ -310,5 +346,21 @@ struct drm_gem_object *amdgpu_gem_prime_import(struct 
drm_device *dev,
}
}
 
-   return drm_gem_prime_import(dev, dma_buf);
+   if (!dma_buf->invalidation_supported)
+   return drm_gem_prime_import(dev, dma_buf);
+
+   obj = amdgpu_gem_prime_create_obj(dev, dma_buf);
+   if (IS_ERR(obj))
+   return obj;
+
+   attach_info.importer_priv = obj;
+   attach = dma_buf_attach(_info);
+   if (IS_ERR(attach)) {
+   drm_gem_object_put(obj);
+   return ERR_CAST(attach);
+   }
+
+   get_dma_buf(dma_buf);
+   obj->import_attach = attach;
+   return obj;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index ab73300e6c7f..2a8f328918cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgp

[PATCH 1/5] dma-buf: use parameter structure for dma_buf_attach

2018-04-11 Thread Christian König
Move the parameters into a structure to make it simpler to extend it in
follow up patches.

This also adds the importer private as parameter so that we can directly
work with a completely filled in attachment structure.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 15 ---
 drivers/gpu/drm/armada/armada_gem.c   |  6 +-
 drivers/gpu/drm/drm_prime.c   |  6 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c|  6 +-
 drivers/gpu/drm/tegra/gem.c   |  6 +-
 drivers/gpu/drm/udl/udl_dmabuf.c  |  6 +-
 drivers/media/common/videobuf2/videobuf2-dma-contig.c |  6 +-
 drivers/media/common/videobuf2/videobuf2-dma-sg.c |  6 +-
 drivers/staging/media/tegra-vde/tegra-vde.c   |  6 +-
 include/linux/dma-buf.h   | 17 +++--
 10 files changed, 63 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..4b46982c6d9c 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -534,8 +534,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
 /**
  * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
  * calls attach() of dma_buf_ops to allow device-specific attach functionality
- * @dmabuf:[in]buffer to attach device to.
- * @dev:   [in]device to be attached.
+ * @info:  [in]holds all the attach related information provided
+ * by the importer. see  dma_buf_attach_info
+ * for further details.
  *
  * Returns struct dma_buf_attachment pointer for this attachment. Attachments
  * must be cleaned up by calling dma_buf_detach().
@@ -549,26 +550,26 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
  * accessible to @dev, and cannot be moved to a more suitable place. This is
  * indicated with the error code -EBUSY.
  */
-struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
- struct device *dev)
+struct dma_buf_attachment *dma_buf_attach(const struct dma_buf_attach_info 
*info)
 {
+   struct dma_buf *dmabuf = info->dmabuf;
struct dma_buf_attachment *attach;
int ret;
 
-   if (WARN_ON(!dmabuf || !dev))
+   if (WARN_ON(!dmabuf || !info->dev))
return ERR_PTR(-EINVAL);
 
attach = kzalloc(sizeof(*attach), GFP_KERNEL);
if (!attach)
return ERR_PTR(-ENOMEM);
 
-   attach->dev = dev;
+   attach->dev = info->dev;
attach->dmabuf = dmabuf;
 
mutex_lock(>lock);
 
if (dmabuf->ops->attach) {
-   ret = dmabuf->ops->attach(dmabuf, dev, attach);
+   ret = dmabuf->ops->attach(dmabuf, info->dev, attach);
if (ret)
goto err_attach;
}
diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index a97f509743a5..f4d1c11f57ea 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -514,6 +514,10 @@ armada_gem_prime_export(struct drm_device *dev, struct 
drm_gem_object *obj,
 struct drm_gem_object *
 armada_gem_prime_import(struct drm_device *dev, struct dma_buf *buf)
 {
+   struct dma_buf_attach_info attach_info = {
+   .dev = dev->dev,
+   .dmabuf = buf
+   };
struct dma_buf_attachment *attach;
struct armada_gem_object *dobj;
 
@@ -529,7 +533,7 @@ armada_gem_prime_import(struct drm_device *dev, struct 
dma_buf *buf)
}
}
 
-   attach = dma_buf_attach(buf, dev->dev);
+   attach = dma_buf_attach(_info);
if (IS_ERR(attach))
return ERR_CAST(attach);
 
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 7856a9b3f8a8..4da242de51c2 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -707,6 +707,10 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
struct dma_buf *dma_buf,
struct device *attach_dev)
 {
+   struct dma_buf_attach_info attach_info = {
+   .dev = attach_dev,
+   .dmabuf = dma_buf
+   };
struct dma_buf_attachment *attach;
struct sg_table *sgt;
struct drm_gem_object *obj;
@@ -727,7 +731,7 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
if (!dev->driver->gem_prime_import_sg_table)
return ERR_PTR(-EINVAL);
 
-   attach = dma_buf_attach(dma_buf, attach_dev);
+   attach = dma_buf_attach(_info);
if (IS_ERR(attach))
return ERR_CAST(attach);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i9

[PATCH 3/5] drm/ttm: remove the backing store if no placement is given

2018-04-11 Thread Christian König
Pipeline removal of the BOs backing store when the placement is given
during validation.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 98e06f8bf23b..17e821f01d0a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1078,6 +1078,18 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
uint32_t new_flags;
 
reservation_object_assert_held(bo->resv);
+
+   /*
+* Remove the backing store if no placement is given.
+*/
+   if (!placement->num_placement && !placement->num_busy_placement) {
+   ret = ttm_bo_pipeline_gutting(bo);
+   if (ret)
+   return ret;
+
+   return ttm_tt_create(bo, false);
+   }
+
/*
 * Check whether we need to move buffer.
 */
-- 
2.14.1



[PATCH 4/5] drm/amdgpu: add independent DMA-buf export v2

2018-04-11 Thread Christian König
Instead of relying on the DRM functions just implement our own export
functions. This adds support for taking care of unpinned DMA-buf.

v2: fix unintended recursion, remove debugging leftovers

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c  | 108 ++---
 4 files changed, 72 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2babfad1fd7f..3e1727251edc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -371,7 +371,6 @@ int amdgpu_gem_object_open(struct drm_gem_object *obj,
 void amdgpu_gem_object_close(struct drm_gem_object *obj,
struct drm_file *file_priv);
 unsigned long amdgpu_gem_timeout(uint64_t timeout_ns);
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj);
 struct drm_gem_object *
 amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0b19482b36b8..6107b845d8a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -890,7 +890,6 @@ static struct drm_driver kms_driver = {
.gem_prime_export = amdgpu_gem_prime_export,
.gem_prime_import = amdgpu_gem_prime_import,
.gem_prime_res_obj = amdgpu_gem_prime_res_obj,
-   .gem_prime_get_sg_table = amdgpu_gem_prime_get_sg_table,
.gem_prime_import_sg_table = amdgpu_gem_prime_import_sg_table,
.gem_prime_vmap = amdgpu_gem_prime_vmap,
.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 9e23d6f6f3f3..15bb48cebbc4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -31,6 +31,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -953,6 +954,10 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 
amdgpu_bo_kunmap(abo);
 
+   if (abo->gem_base.dma_buf && !abo->gem_base.import_attach &&
+   bo->mem.mem_type != TTM_PL_SYSTEM)
+   dma_buf_invalidate_mappings(abo->gem_base.dma_buf);
+
/* remember the eviction */
if (evict)
atomic64_inc(>num_evictions);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 4b584cb75bf4..7fef95f0fed1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -32,14 +32,6 @@
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
-{
-   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-   int npages = bo->tbo.num_pages;
-
-   return drm_prime_pages_to_sg(bo->tbo.ttm->pages, npages);
-}
-
 void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 {
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -126,23 +118,22 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
-static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
-struct device *target_dev,
-struct dma_buf_attachment *attach)
+static struct sg_table *
+amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
+  enum dma_data_direction dir)
 {
+   struct dma_buf *dma_buf = attach->dmabuf;
struct drm_gem_object *obj = dma_buf->priv;
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+   struct sg_table *sgt;
long r;
 
-   r = drm_gem_map_attach(dma_buf, target_dev, attach);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_reserve(bo, false);
-   if (unlikely(r != 0))
-   goto error_detach;
-
+   if (!attach->invalidate) {
+   r = amdgpu_bo_reserve(bo, false);
+   if (unlikely(r != 0))
+   return ERR_PTR(r);
+   }
 
if (attach->dev->driver != adev->dev->driver) {
/*
@@ -158,42 +149,77 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
}
}
 
-   /* pin buffer into GTT */
-   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
-   if (r)
+   if (!attach->invalidate) {
+   /* pin buffer into GTT */
+   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+   if (r)
+   goto er

[PATCH 2/5] dma-buf: add optional invalidate_mappings callback v3

2018-04-11 Thread Christian König
Each importer can now provide an invalidate_mappings callback.

This allows the exporter to provide the mappings without the need to pin
the backing store.

v2: don't try to invalidate mappings when the callback is NULL,
lock the reservation obj while using the attachments,
add helper to set the callback
v3: move flag for invalidation support into the DMA-buf,
use new attach_info structure to set the callback
v4: use importer_priv field instead of mangling exporter priv.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 44 
 include/linux/dma-buf.h   | 36 ++--
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 4b46982c6d9c..ffdaab10e2f2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -565,6 +565,8 @@ struct dma_buf_attachment *dma_buf_attach(const struct 
dma_buf_attach_info *info
 
attach->dev = info->dev;
attach->dmabuf = dmabuf;
+   attach->importer_priv = info->importer_priv;
+   attach->invalidate = info->invalidate;
 
mutex_lock(>lock);
 
@@ -573,7 +575,9 @@ struct dma_buf_attachment *dma_buf_attach(const struct 
dma_buf_attach_info *info
if (ret)
goto err_attach;
}
+   reservation_object_lock(dmabuf->resv, NULL);
list_add(>node, >attachments);
+   reservation_object_unlock(dmabuf->resv);
 
mutex_unlock(>lock);
return attach;
@@ -599,7 +603,9 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct 
dma_buf_attachment *attach)
return;
 
mutex_lock(>lock);
+   reservation_object_lock(dmabuf->resv, NULL);
list_del(>node);
+   reservation_object_unlock(dmabuf->resv);
if (dmabuf->ops->detach)
dmabuf->ops->detach(dmabuf, attach);
 
@@ -633,10 +639,23 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
if (WARN_ON(!attach || !attach->dmabuf))
return ERR_PTR(-EINVAL);
 
+   /*
+* Mapping a DMA-buf can trigger its invalidation, prevent sending this
+* event to the caller by temporary removing this attachment from the
+* list.
+*/
+   if (attach->invalidate) {
+   reservation_object_assert_held(attach->dmabuf->resv);
+   list_del(>node);
+   }
+
sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
if (!sg_table)
sg_table = ERR_PTR(-ENOMEM);
 
+   if (attach->invalidate)
+   list_add(>node, >dmabuf->attachments);
+
return sg_table;
 }
 EXPORT_SYMBOL_GPL(dma_buf_map_attachment);
@@ -657,6 +676,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 {
might_sleep();
 
+   if (attach->invalidate)
+   reservation_object_assert_held(attach->dmabuf->resv);
+
if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
return;
 
@@ -665,6 +687,26 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 }
 EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 
+/**
+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:[in]buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+
+   reservation_object_assert_held(dmabuf->resv);
+
+   list_for_each_entry(attach, >attachments, node)
+   if (attach->invalidate)
+   attach->invalidate(attach);
+}
+EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
+
 /**
  * DOC: cpu access
  *
@@ -1122,10 +1164,12 @@ static int dma_buf_debug_show(struct seq_file *s, void 
*unused)
seq_puts(s, "\tAttached Devices:\n");
attach_count = 0;
 
+   reservation_object_lock(buf_obj->resv, NULL);
list_for_each_entry(attach_obj, _obj->attachments, node) {
seq_printf(s, "\t%s\n", dev_name(attach_obj->dev));
attach_count++;
}
+   reservation_object_unlock(buf_obj->resv);
 
seq_printf(s, "Total %d devices attached\n\n",
attach_count);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 414b4dde5eb7..566503dd2b4f 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -270,6 +270,8 @@ struct dma_buf_ops {
  * @poll: for userspace poll support
  * @cb_excl: for userspace poll support
  * @cb_shared: for userspace po

Re: [PATCH v2] Add udmabuf misc device

2018-04-06 Thread Christian König

Am 06.04.2018 um 11:33 schrieb Gerd Hoffmann:

   Hi,


The pages backing a DMA-buf are not allowed to move (at least not without a
patch set I'm currently working on), but for certain MM operations to work
correctly you must be able to modify the page tables entries and move the
pages backing them around.

For example try to use fork() with some copy on write pages with this
approach. You will find that you have only two options to correctly handle
this.

The fork() issue should go away with shared memory pages (no cow).
I guess this is the reason why vgem is internally backed by shmem.


Yes, exactly that is also an approach which should work fine. Just don't 
try to get this working with get_user_pages().




Hmm.  So I could try to limit the udmabuf driver to shmem too (i.e.
have the ioctl take a shmem filehandle and offset instead of a virtual
address).

But maybe it is better then to just extend vgem, i.e. add support to
create gem objects from existing shmem.

Comments?


Yes, extending vgem instead of creating something new sounds like a good 
idea to me as well.


Regards,
Christian.



cheers,
   Gerd





Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-29 Thread Christian König

Am 29.03.2018 um 18:25 schrieb Logan Gunthorpe:


On 29/03/18 10:10 AM, Christian König wrote:

Why not? I mean the dma_map_resource() function is for P2P while other
dma_map_* functions are only for system memory.

Oh, hmm, I wasn't aware dma_map_resource was exclusively for mapping
P2P. Though it's a bit odd seeing we've been working under the
assumption that PCI P2P is different as it has to translate the PCI bus
address. Where as P2P for devices on other buses is a big unknown.


Yeah, completely agree. On my TODO list (but rather far down) is 
actually supporting P2P with USB devices.


And no, I don't have the slightest idea how to do this at the moment.


And this is necessary to
check if the DMA ops in use support it or not. We can't have the
dma_map_X() functions do the wrong thing because they don't support it yet.

Well that sounds like we should just return an error from
dma_map_resources() when an architecture doesn't support P2P yet as Alex
suggested.

Yes, well except in our patch-set we can't easily use
dma_map_resources() as we either have SGLs to deal with or we need to
create whole new interfaces to a number of subsystems.


Agree as well. I was also in clear favor of extending the SGLs to have a 
flag for this instead of the dma_map_resource() interface, but for some 
reason that didn't made it into the kernel.



You don't seem to understand the implications: The devices do have a
common upstream bridge! In other words your code would currently claim
that P2P is supported, but in practice it doesn't work.

Do they? They don't on any of the Intel machines I'm looking at. The
previous version of the patchset not only required a common upstream
bridge but two layers of upstream bridges on both devices which would
effectively limit transfers to PCIe switches only. But Bjorn did not
like this.


At least to me that sounds like a good idea, it would at least disable 
(the incorrect) auto detection of P2P for such devices.



You need to include both drivers which participate in the P2P
transaction to make sure that both supports this and give them
opportunity to chicken out and in the case of AMD APUs even redirect the
request to another location (e.g. participate in the DMA translation).

I don't think it's the drivers responsibility to reject P2P . The
topology is what governs support or not. The discussions we had with
Bjorn settled on if the devices are all behind the same bridge they can
communicate with each other. This is essentially guaranteed by the PCI spec.


Well it is not only rejecting P2P, see the devices I need to worry about 
are essentially part of the CPU. Their resources looks like a PCI BAR to 
the BIOS and OS, but are actually backed by stolen system memory.


So as crazy as it sounds what you get is an operation which starts as 
P2P, but then the GPU drivers sees it and says: Hey please don't write 
that to my PCIe BAR, but rather system memory location X.



DMA-buf fortunately seems to handle all this already, that's why we
choose it as base for our implementation.

Well, unfortunately DMA-buf doesn't help for the drivers we are working
with as neither the block layer nor the RDMA subsystem have any
interfaces for it.


A fact that gives me quite some sleepless nights as well. I think we 
sooner or later need to extend those interfaces to work with DMA-bufs as 
well.


I will try to give your patch set a review when I'm back from vacation 
and rebase my DMA-buf work on top of that.


Regards,
Christian.



Logan




Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-29 Thread Christian König

Am 29.03.2018 um 17:45 schrieb Logan Gunthorpe:


On 29/03/18 05:44 AM, Christian König wrote:

Am 28.03.2018 um 21:53 schrieb Logan Gunthorpe:

On 28/03/18 01:44 PM, Christian König wrote:

Well, isn't that exactly what dma_map_resource() is good for? As far as
I can see it makes sure IOMMU is aware of the access route and
translates a CPU address into a PCI Bus address.
I'm using that with the AMD IOMMU driver and at least there it works
perfectly fine.

Yes, it would be nice, but no arch has implemented this yet. We are just
lucky in the x86 case because that arch is simple and doesn't need to do
anything for P2P (partially due to the Bus and CPU addresses being the
same). But in the general case, you can't rely on it.

Well, that an arch hasn't implemented it doesn't mean that we don't have
the right interface to do it.

Yes, but right now we don't have a performant way to check if we are
doing P2P or not in the dma_map_X() wrappers.


Why not? I mean the dma_map_resource() function is for P2P while other 
dma_map_* functions are only for system memory.



And this is necessary to
check if the DMA ops in use support it or not. We can't have the
dma_map_X() functions do the wrong thing because they don't support it yet.


Well that sounds like we should just return an error from 
dma_map_resources() when an architecture doesn't support P2P yet as Alex 
suggested.



Devices integrated in the CPU usually only "claim" to be PCIe devices.
In reality their memory request path go directly through the integrated
north bridge. The reason for this is simple better throughput/latency.

These are just more reasons why our patchset restricts to devices behind
a switch. And more mess for someone to deal with if they need to relax
that restriction.


You don't seem to understand the implications: The devices do have a 
common upstream bridge! In other words your code would currently claim 
that P2P is supported, but in practice it doesn't work.


You need to include both drivers which participate in the P2P 
transaction to make sure that both supports this and give them 
opportunity to chicken out and in the case of AMD APUs even redirect the 
request to another location (e.g. participate in the DMA translation).


DMA-buf fortunately seems to handle all this already, that's why we 
choose it as base for our implementation.


Regards,
Christian.


Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-29 Thread Christian König

Am 28.03.2018 um 21:53 schrieb Logan Gunthorpe:


On 28/03/18 01:44 PM, Christian König wrote:

Well, isn't that exactly what dma_map_resource() is good for? As far as
I can see it makes sure IOMMU is aware of the access route and
translates a CPU address into a PCI Bus address.
I'm using that with the AMD IOMMU driver and at least there it works
perfectly fine.

Yes, it would be nice, but no arch has implemented this yet. We are just
lucky in the x86 case because that arch is simple and doesn't need to do
anything for P2P (partially due to the Bus and CPU addresses being the
same). But in the general case, you can't rely on it.


Well, that an arch hasn't implemented it doesn't mean that we don't have 
the right interface to do it.



Yeah, but not for ours. See if you want to do real peer 2 peer you need
to keep both the operation as well as the direction into account.

Not sure what you are saying here... I'm pretty sure we are doing "real"
peer 2 peer...


For example when you can do writes between A and B that doesn't mean
that writes between B and A work. And reads are generally less likely to
work than writes. etc...

If both devices are behind a switch then the PCI spec guarantees that A
can both read and write B and vice versa.

Sorry to say that, but I know a whole bunch of PCI devices which
horrible ignores that.

Can you elaborate? As far as the device is concerned it shouldn't know
whether a request comes from a peer or from the host. If it does do
crazy stuff like that it's well out of spec. It's up to the switch (or
root complex if good support exists) to route the request to the device
and it's the root complex that tends to be what drops the load requests
which causes the asymmetries.


Devices integrated in the CPU usually only "claim" to be PCIe devices. 
In reality their memory request path go directly through the integrated 
north bridge. The reason for this is simple better throughput/latency.


That is hidden from the software, for example the BIOS just allocates 
address space for the BARs as if it's a normal PCIe device.


The only crux is when you then do peer2peer your request simply go into 
nirvana and are not handled by anything because the BARs are only 
visible from the CPU side of the northbridge.


Regards,
Christian.



Logan
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




Re: [PATCH 4/8] dma-buf: add peer2peer flag

2018-03-29 Thread Christian König

Am 29.03.2018 um 08:57 schrieb Daniel Vetter:

On Sun, Mar 25, 2018 at 12:59:56PM +0200, Christian König wrote:

Add a peer2peer flag noting that the importer can deal with device
resources which are not backed by pages.

Signed-off-by: Christian König <christian.koe...@amd.com>

Um strictly speaking they all should, but ttm never bothered to use the
real interfaces but just hacked around the provided sg list, grabbing the
underlying struct pages, then rebuilding the sg list again.


Actually that isn't correct. TTM converts them to a dma address array 
because drivers need it like this (at least nouveau, radeon and amdgpu).


I've fixed radeon and amdgpu to be able to deal without it and mailed 
with Ben about nouveau, but the outcome is they don't really know.


TTM itself doesn't have any need for the pages on imported BOs (you 
can't mmap them anyway), the real underlying problem is that sg tables 
doesn't provide what drivers need.


I think we could rather easily fix sg tables, but that is a totally 
separate task.



The entire point of using sg lists was exactly to allow this use case of
peer2peer dma (or well in general have special exporters which managed
memory/IO ranges not backed by struct page). So essentially you're having
a "I'm totally not broken flag" here.


No, independent of needed struct page pointers we need to note if the 
exporter can handle peer2peer stuff from the hardware side in general.


So what I've did is just to set peer2peer allowed on the importer 
because of the driver needs and clear it in the exporter if the hardware 
can't handle that.



I think a better approach would be if we add a requires_struct_page or so,
and annotate the current importers accordingly. Or we just fix them up (it
is all in shared ttm code after all, I think everyone else got this
right).


I would rather not bed on that.

Christian.


-Daniel


---
  drivers/dma-buf/dma-buf.c | 1 +
  include/linux/dma-buf.h   | 4 
  2 files changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index ffaa2f9a9c2c..f420225f93c6 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -565,6 +565,7 @@ struct dma_buf_attachment *dma_buf_attach(const struct 
dma_buf_attach_info *info
  
  	attach->dev = info->dev;

attach->dmabuf = dmabuf;
+   attach->peer2peer = info->peer2peer;
attach->priv = info->priv;
attach->invalidate = info->invalidate;
  
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h

index 15dd8598bff1..1ef50bd9bc5b 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -313,6 +313,7 @@ struct dma_buf {
   * @dmabuf: buffer for this attachment.
   * @dev: device attached to the buffer.
   * @node: list of dma_buf_attachment.
+ * @peer2peer: true if the importer can handle peer resources without pages.
   * @priv: exporter specific attachment data.
   *
   * This structure holds the attachment information between the dma_buf buffer
@@ -328,6 +329,7 @@ struct dma_buf_attachment {
struct dma_buf *dmabuf;
struct device *dev;
struct list_head node;
+   bool peer2peer;
void *priv;
  
  	/**

@@ -392,6 +394,7 @@ struct dma_buf_export_info {
   * @dmabuf:   the exported dma_buf
   * @dev:  the device which wants to import the attachment
   * @priv: private data of importer to this attachment
+ * @peer2peer: true if the importer can handle peer resources without pages
   * @invalidate:   callback to use for invalidating mappings
   *
   * This structure holds the information required to attach to a buffer. Used
@@ -401,6 +404,7 @@ struct dma_buf_attach_info {
struct dma_buf *dmabuf;
struct device *dev;
void *priv;
+   bool peer2peer;
void (*invalidate)(struct dma_buf_attachment *attach);
  };
  
--

2.14.1

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-28 Thread Christian König

Am 28.03.2018 um 20:57 schrieb Logan Gunthorpe:


On 28/03/18 12:28 PM, Christian König wrote:

I'm just using amdgpu as blueprint because I'm the co-maintainer of it
and know it mostly inside out.

Ah, I see.


The resource addresses are translated using dma_map_resource(). As far
as I know that should be sufficient to offload all the architecture
specific stuff to the DMA subsystem.

It's not. The dma_map infrastructure currently has no concept of
peer-to-peer mappings and is designed for system memory only. No
architecture I'm aware of will translate PCI CPU addresses into PCI Bus
addresses which is necessary for any transfer that doesn't go through
the root complex (though on arches like x86 the CPU and Bus address
happen to be the same). There's a lot of people that would like to see
this change but it's likely going to be a long road before it does.


Well, isn't that exactly what dma_map_resource() is good for? As far as 
I can see it makes sure IOMMU is aware of the access route and 
translates a CPU address into a PCI Bus address.



Furthermore, one of the reasons our patch-set avoids going through the
root complex at all is that IOMMU drivers will need to be made aware
that it is operating on P2P memory and do arch-specific things
accordingly. There will also need to be flags that indicate whether a
given IOMMU driver supports this. None of this work is done or easy.


I'm using that with the AMD IOMMU driver and at least there it works 
perfectly fine.



Yeah, but not for ours. See if you want to do real peer 2 peer you need
to keep both the operation as well as the direction into account.

Not sure what you are saying here... I'm pretty sure we are doing "real"
peer 2 peer...


For example when you can do writes between A and B that doesn't mean
that writes between B and A work. And reads are generally less likely to
work than writes. etc...

If both devices are behind a switch then the PCI spec guarantees that A
can both read and write B and vice versa.


Sorry to say that, but I know a whole bunch of PCI devices which 
horrible ignores that.


For example all AMD APUs fall under that category...


Only once you involve root
complexes do you have this problem. Ie. you have unknown support which
may be no support, or partial support (stores but not loads); or
sometimes bad performance; or a combination of both... and you need some
way to figure out all this mess and that is hard. Whoever tries to
implement a white list will have to sort all this out.


Yes, exactly and unfortunately it looks like I'm the poor guy who needs 
to do this :)


Regards,
Christian.



Logan
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-28 Thread Christian König

Am 28.03.2018 um 18:25 schrieb Logan Gunthorpe:


On 28/03/18 10:02 AM, Christian König wrote:

Yeah, that looks very similar to what I picked up from the older
patches, going to read up on that after my vacation.

Yeah, I was just reading through your patchset and there are a lot of
similarities. Though, I'm not sure what you're trying to accomplish as I
could not find a cover letter and it seems to only enable one driver.


Yeah, it was the last day before my easter vacation and I wanted it out 
of the door.



Is it meant to enable DMA transactions only between two AMD GPUs?


Not really, DMA-buf is a general framework for sharing buffers between 
device drivers.


It is widely used in the GFX stack on laptops with both Intel+AMD, 
Intel+NVIDIA or AMD+AMD graphics devices.


Additional to that ARM uses it quite massively for their GFX stacks 
because they have rendering and displaying device separated.


I'm just using amdgpu as blueprint because I'm the co-maintainer of it 
and know it mostly inside out.



I also don't see where you've taken into account the PCI bus address. On
some architectures this is not the same as the CPU physical address.


The resource addresses are translated using dma_map_resource(). As far 
as I know that should be sufficient to offload all the architecture 
specific stuff to the DMA subsystem.





Just in general why are you interested in the "distance" of the devices?

We've taken a general approach where some drivers may provide p2p memory
(ie. an NVMe card or an RDMA NIC) and other drivers make use of it (ie.
the NVMe-of driver). The orchestrator driver needs to find the most
applicable provider device for a transaction in a situation that may
have multiple providers and multiple clients. So the most applicable
provider is the one that's closest ("distance"-wise) to all the clients
for the P2P transaction.


That seems to make sense.




And BTW: At least for writes that Peer 2 Peer transactions between
different root complexes work is actually more common than the other way
around.

Maybe on x86 with hardware made in the last few years. But on PowerPC,
ARM64, and likely a lot more the chance of support is *much* less. Also,
hardware that only supports P2P stores is hardly full support and is
insufficient for our needs.


Yeah, but not for ours. See if you want to do real peer 2 peer you need 
to keep both the operation as well as the direction into account.


For example when you can do writes between A and B that doesn't mean 
that writes between B and A work. And reads are generally less likely to 
work than writes. etc...


Since the use case I'm targeting for is GFX or GFX+V4L (or GFX+NIC in 
the future) I really need to handle all such use cases as well.





So I'm a bit torn between using a blacklist or a whitelist. A whitelist
is certainly more conservative approach, but that could get a bit long.

I think a whitelist approach is correct. Given old hardware and other
architectures, a black list is going to be too long and too difficult to
comprehensively populate.


Yeah, it would certainly be better if we have something in the root 
complex capabilities. But you're right that a whitelist sounds the less 
painful way.


Regards,
Christian.




Logan
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-28 Thread Christian König

Am 28.03.2018 um 17:47 schrieb Logan Gunthorpe:


On 28/03/18 09:07 AM, Christian König wrote:

Am 28.03.2018 um 14:38 schrieb Christoph Hellwig:

On Sun, Mar 25, 2018 at 12:59:54PM +0200, Christian König wrote:

From: "wda...@nvidia.com" <wda...@nvidia.com>

Add an interface to find the first device which is upstream of both
devices.

Please work with Logan and base this on top of the outstanding peer
to peer patchset.

Can you point me to that? The last code I could find about that was from
2015.

The latest posted series is here:

https://lkml.org/lkml/2018/3/12/830

However, we've made some significant changes to the area that's similar
to what you are doing. You can find lasted un-posted here:

https://github.com/sbates130272/linux-p2pmem/tree/pci-p2p-v4-pre2

Specifically this function would be of interest to you:

https://github.com/sbates130272/linux-p2pmem/blob/0e9468ae2a5a5198513dd12990151e09105f0351/drivers/pci/p2pdma.c#L239

However, the difference between what we are doing is that we are
interested in the distance through the common upstream device and you
appear to be finding the actual common device.


Yeah, that looks very similar to what I picked up from the older 
patches, going to read up on that after my vacation.


Just in general why are you interested in the "distance" of the devices?

And BTW: At least for writes that Peer 2 Peer transactions between 
different root complexes work is actually more common than the other way 
around.


So I'm a bit torn between using a blacklist or a whitelist. A whitelist 
is certainly more conservative approach, but that could get a bit long.


Thanks,
Christian.



Thanks,

Logan




Re: [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-28 Thread Christian König

Am 28.03.2018 um 14:38 schrieb Christoph Hellwig:

On Sun, Mar 25, 2018 at 12:59:54PM +0200, Christian König wrote:

From: "wda...@nvidia.com" <wda...@nvidia.com>

Add an interface to find the first device which is upstream of both
devices.

Please work with Logan and base this on top of the outstanding peer
to peer patchset.


Can you point me to that? The last code I could find about that was from 
2015.


Thanks,
Christian.


Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-27 Thread Christian König

Am 27.03.2018 um 09:53 schrieb Daniel Vetter:

[SNIP]

[SNIP]
A slightly better solution is using atomic counter:
driver_range_start() {
  atomic_inc(>notifier_count);

...

Yeah, that is exactly what amdgpu is doing now. Sorry if my description
didn't made that clear.


I would like to see driver using same code, as it means one place to fix
issues. I had for a long time on my TODO list doing the above conversion
to amd or radeon kernel driver. I am pushing up my todo list hopefully in
next few weeks i can send an rfc so people can have a real sense of how
it can look.

Certainly a good idea, but I think we might have that separate to HMM.

TTM suffered really from feature overload, e.g. trying to do everything in a
single subsystem. And it would be rather nice to have coherent userptr
handling for GPUs as separate feature.

TTM suffered from being a midlayer imo, not from doing too much.


Yeah, completely agree.

midlayers work as long as you concentrate on doing exactly one things in 
your midlayer. E.g. in the case of TTM the callback for BO move handling 
is well justified.


Only all the stuff around it like address space handling etc... is 
really wrong designed and should be separated (which is exactly what DRM 
MM did, but TTM still uses this in the wrong way).



HMM is apparently structured like a toolbox (despite its documentation claiming
otherwise), so you can pick freely.


That sounds good, but I would still have a better feeling if userptr 
handling would be separated. That avoids mangling things together again.


Christian.


-Daniel




Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-27 Thread Christian König

Am 26.03.2018 um 17:42 schrieb Jerome Glisse:

On Mon, Mar 26, 2018 at 10:01:21AM +0200, Daniel Vetter wrote:

On Thu, Mar 22, 2018 at 10:58:55AM +0100, Christian König wrote:

Am 22.03.2018 um 08:18 schrieb Daniel Vetter:
[SNIP]
Key take away from that was that you can't take any locks from neither the
MMU notifier nor the shrinker you also take while calling kmalloc (or
simpler speaking get_user_pages()).

Additional to that in the MMU or shrinker callback all different kinds of
locks might be held, so you basically can't assume that you do thinks like
recursive page table walks or call dma_unmap_anything.

That sounds like a design bug in mmu_notifiers, since it would render them
useless for KVM. And they were developed for that originally. I think I'll
chat with Jerome to understand this, since it's all rather confusing.

Doing dma_unmap() during mmu_notifier callback should be fine, it was last
time i check. However there is no formal contract that it is ok to do so.


As I said before dma_unmap() isn't the real problem here.

The issues is more that you can't take a lock in the MMU notifier which 
you would also take while allocating memory without GFP_NOIO.


That makes it rather tricky to do any command submission, e.g. you need 
to grab all the pages/memory/resources prehand, then make sure that you 
don't have a MMU notifier running concurrently and do the submission.


If any of the prerequisites isn't fulfilled we need to restart the 
operation.



[SNIP]
A slightly better solution is using atomic counter:
   driver_range_start() {
 atomic_inc(>notifier_count);

...

Yeah, that is exactly what amdgpu is doing now. Sorry if my description 
didn't made that clear.



I would like to see driver using same code, as it means one place to fix
issues. I had for a long time on my TODO list doing the above conversion
to amd or radeon kernel driver. I am pushing up my todo list hopefully in
next few weeks i can send an rfc so people can have a real sense of how
it can look.


Certainly a good idea, but I think we might have that separate to HMM.

TTM suffered really from feature overload, e.g. trying to do everything 
in a single subsystem. And it would be rather nice to have coherent 
userptr handling for GPUs as separate feature.


Regards,
Christian.


Re: [PATCH] dma-buf: use parameter structure for dma_buf_attach

2018-03-26 Thread Christian König

Am 26.03.2018 um 10:36 schrieb Daniel Vetter:

On Sun, Mar 25, 2018 at 01:34:51PM +0200, Christian König wrote:

[SNIP]

-   attach->dev = dev;
+   attach->dev = info->dev;
attach->dmabuf = dmabuf;
+   attach->priv = info->priv;

The ->priv field is for the exporter, not the importer. See e.g.
drm_gem_map_attach. You can't let the importer set this now too, so needs
to be removed from the info struct.


Crap, in this case I need to add an importer_priv field because we now 
need to map from the attachment to it's importer object as well.


Thanks for noticing this.

Regards,
Christian.



Re: [PATCH] dma-buf: use parameter structure for dma_buf_attach

2018-03-26 Thread Christian König

Am 26.03.2018 um 10:36 schrieb Daniel Vetter:

On Sun, Mar 25, 2018 at 01:34:51PM +0200, Christian König wrote:

[SNIP]

-   attach->dev = dev;
+   attach->dev = info->dev;
attach->dmabuf = dmabuf;
+   attach->priv = info->priv;

The ->priv field is for the exporter, not the importer. See e.g.
drm_gem_map_attach. You can't let the importer set this now too, so needs
to be removed from the info struct.


Crap, in this case I need to add an importer_priv field because we now 
need to map from the attachment to it's importer object as well.


Thanks for noticing this.

Regards,
Christian.



[PATCH] dma-buf: use parameter structure for dma_buf_attach

2018-03-25 Thread Christian König
Move the parameters into a structure to make it simpler to extend it in
follow up patches.

This also adds the importer private as parameter so that we can directly
work with a completely filled in attachment structure.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 16 +---
 drivers/gpu/drm/armada/armada_gem.c   |  6 +-
 drivers/gpu/drm/drm_prime.c   |  6 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c|  6 +-
 drivers/gpu/drm/tegra/gem.c   |  6 +-
 drivers/gpu/drm/udl/udl_dmabuf.c  |  6 +-
 drivers/media/common/videobuf2/videobuf2-dma-contig.c |  6 +-
 drivers/media/common/videobuf2/videobuf2-dma-sg.c |  6 +-
 drivers/staging/media/tegra-vde/tegra-vde.c   |  6 +-
 include/linux/dma-buf.h   | 19 +--
 10 files changed, 66 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..d2e8ca0d9427 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -534,8 +534,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
 /**
  * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
  * calls attach() of dma_buf_ops to allow device-specific attach functionality
- * @dmabuf:[in]buffer to attach device to.
- * @dev:   [in]device to be attached.
+ * @info:  [in]holds all the attach related information provided
+ * by the importer. see  dma_buf_attach_info
+ * for further details.
  *
  * Returns struct dma_buf_attachment pointer for this attachment. Attachments
  * must be cleaned up by calling dma_buf_detach().
@@ -549,26 +550,27 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
  * accessible to @dev, and cannot be moved to a more suitable place. This is
  * indicated with the error code -EBUSY.
  */
-struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
- struct device *dev)
+struct dma_buf_attachment *dma_buf_attach(const struct dma_buf_attach_info 
*info)
 {
+   struct dma_buf *dmabuf = info->dmabuf;
struct dma_buf_attachment *attach;
int ret;
 
-   if (WARN_ON(!dmabuf || !dev))
+   if (WARN_ON(!dmabuf || !info->dev))
return ERR_PTR(-EINVAL);
 
attach = kzalloc(sizeof(*attach), GFP_KERNEL);
if (!attach)
return ERR_PTR(-ENOMEM);
 
-   attach->dev = dev;
+   attach->dev = info->dev;
attach->dmabuf = dmabuf;
+   attach->priv = info->priv;
 
mutex_lock(>lock);
 
if (dmabuf->ops->attach) {
-   ret = dmabuf->ops->attach(dmabuf, dev, attach);
+   ret = dmabuf->ops->attach(dmabuf, info->dev, attach);
if (ret)
goto err_attach;
}
diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index a97f509743a5..f4d1c11f57ea 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -514,6 +514,10 @@ armada_gem_prime_export(struct drm_device *dev, struct 
drm_gem_object *obj,
 struct drm_gem_object *
 armada_gem_prime_import(struct drm_device *dev, struct dma_buf *buf)
 {
+   struct dma_buf_attach_info attach_info = {
+   .dev = dev->dev,
+   .dmabuf = buf
+   };
struct dma_buf_attachment *attach;
struct armada_gem_object *dobj;
 
@@ -529,7 +533,7 @@ armada_gem_prime_import(struct drm_device *dev, struct 
dma_buf *buf)
}
}
 
-   attach = dma_buf_attach(buf, dev->dev);
+   attach = dma_buf_attach(_info);
if (IS_ERR(attach))
return ERR_CAST(attach);
 
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 7856a9b3f8a8..4da242de51c2 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -707,6 +707,10 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
struct dma_buf *dma_buf,
struct device *attach_dev)
 {
+   struct dma_buf_attach_info attach_info = {
+   .dev = attach_dev,
+   .dmabuf = dma_buf
+   };
struct dma_buf_attachment *attach;
struct sg_table *sgt;
struct drm_gem_object *obj;
@@ -727,7 +731,7 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct 
drm_device *dev,
if (!dev->driver->gem_prime_import_sg_table)
return ERR_PTR(-EINVAL);
 
-   attach = dma_buf_attach(dma_buf, attach_dev);
+   attach = dma_buf_attach(_info);
if (IS_ERR(attach))
return ERR_CAST(attach);
 
diff --git a/drivers/gpu/drm/i915

[PATCH 8/8] drm/amdgpu: add support for exporting VRAM using DMA-buf

2018-03-25 Thread Christian König
We should be able to do this now after checking all the prerequisites.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c| 44 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |  9 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 91 
 3 files changed, 135 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 133596df0775..86d983a0678b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -197,23 +197,44 @@ amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
} else {
/* move buffer into GTT */
struct ttm_operation_ctx ctx = { false, false };
+   unsigned domains = AMDGPU_GEM_DOMAIN_GTT;
 
-   amdgpu_ttm_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+   if (bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM &&
+   attach->peer2peer) {
+   bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+   domains |= AMDGPU_GEM_DOMAIN_VRAM;
+   }
+   amdgpu_ttm_placement_from_domain(bo, domains);
r = ttm_bo_validate(>tbo, >placement, );
if (r)
goto error_unreserve;
}
 
-   sgt = drm_prime_pages_to_sg(bo->tbo.ttm->pages, bo->tbo.num_pages);
-   if (IS_ERR(sgt)) {
-   r = PTR_ERR(sgt);
+   switch (bo->tbo.mem.mem_type) {
+   case TTM_PL_TT:
+   sgt = drm_prime_pages_to_sg(bo->tbo.ttm->pages,
+   bo->tbo.num_pages);
+   if (IS_ERR(sgt)) {
+   r = PTR_ERR(sgt);
+   goto error_unreserve;
+   }
+
+   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+ DMA_ATTR_SKIP_CPU_SYNC))
+   goto error_free;
+   break;
+
+   case TTM_PL_VRAM:
+   r = amdgpu_vram_mgr_alloc_sgt(adev, >tbo.mem, attach->dev,
+ dir, );
+   if (r)
+   goto error_unreserve;
+   break;
+   default:
+   r = -EINVAL;
goto error_unreserve;
}
 
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC))
-   goto error_free;
-
if (attach->dev->driver != adev->dev->driver)
bo->prime_shared_count++;
 
@@ -254,10 +275,15 @@ static void amdgpu_gem_unmap_dma_buf(struct 
dma_buf_attachment *attach,
if (!attach->invalidate)
amdgpu_bo_unreserve(bo);
 
-   if (sgt) {
+   if (!sgt)
+   return;
+
+   if (sgt->sgl->page_link) {
dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
sg_free_table(sgt);
kfree(sgt);
+   } else {
+   amdgpu_vram_mgr_free_sgt(adev, attach->dev, dir, sgt);
}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 6ea7de863041..b483900abed2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -73,6 +73,15 @@ bool amdgpu_gtt_mgr_has_gart_addr(struct ttm_mem_reg *mem);
 uint64_t amdgpu_gtt_mgr_usage(struct ttm_mem_type_manager *man);
 int amdgpu_gtt_mgr_recover(struct ttm_mem_type_manager *man);
 
+int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
+ struct ttm_mem_reg *mem,
+ struct device *dev,
+ enum dma_data_direction dir,
+ struct sg_table **sgt);
+void amdgpu_vram_mgr_free_sgt(struct amdgpu_device *adev,
+ struct device *dev,
+ enum dma_data_direction dir,
+ struct sg_table *sgt);
 uint64_t amdgpu_vram_mgr_usage(struct ttm_mem_type_manager *man);
 uint64_t amdgpu_vram_mgr_vis_usage(struct ttm_mem_type_manager *man);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 9aca653bec07..eb8f75525a81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -233,6 +233,97 @@ static void amdgpu_vram_mgr_del(struct 
ttm_mem_type_manager *man,
mem->mm_node = NULL;
 }
 
+/**
+ * amdgpu_vram_mgr_alloc_sgt - allocate and fill a sg table
+ *
+ * @adev: amdgpu device pointer
+ * @mem: TTM memory object
+ * @dev: the other device
+ * @dir: dma direction
+ * @sgt: resulting sg table
+ *
+ 

[PATCH 1/8] lib/scatterlist: add sg_set_dma_addr() helper

2018-03-25 Thread Christian König
Use this function to set an sg entry to point to device resources mapped
using dma_map_resource(). The page pointer is set to NULL and only the DMA
address, length and offset values are valid.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 include/linux/scatterlist.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 22b2131bcdcd..f944ee4e482c 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -149,6 +149,29 @@ static inline void sg_set_buf(struct scatterlist *sg, 
const void *buf,
sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
 }
 
+/**
+ * sg_set_dma_addr - Set sg entry to point at specified dma address
+ * @sg: SG entry
+ * @address:DMA address to set
+ * @len:Length of data
+ * @offset: Offset into page
+ *
+ * Description:
+ *   Use this function to set an sg entry to point to device resources mapped
+ *   using dma_map_resource(). The page pointer is set to NULL and only the DMA
+ *   address, length and offset values are valid.
+ *
+ **/
+static inline void sg_set_dma_addr(struct scatterlist *sg, dma_addr_t address,
+  unsigned int len, unsigned int offset)
+{
+   sg_set_page(sg, NULL, len, offset);
+   sg->dma_address = address;
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
+   sg->dma_length = len;
+#endif
+}
+
 /*
  * Loop over each sg element, following the pointer to a new list if necessary
  */
-- 
2.14.1



[PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

2018-03-25 Thread Christian König
From: "wda...@nvidia.com" <wda...@nvidia.com>

Add an interface to find the first device which is upstream of both
devices.

Signed-off-by: Will Davis <wda...@nvidia.com>
Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/pci/search.c | 24 
 include/linux/pci.h  |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index bc1e023f1353..446648f4238b 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -393,3 +393,27 @@ int pci_dev_present(const struct pci_device_id *ids)
return 0;
 }
 EXPORT_SYMBOL(pci_dev_present);
+
+/**
+ * pci_find_common_upstream_dev - Returns the first common upstream device
+ * @dev: the PCI device from which the bus hierarchy walk will start
+ * @other: the PCI device whose bus number will be used for the search
+ *
+ * Walks up the bus hierarchy from the @dev device, looking for the first bus
+ * which the @other device is downstream of. Returns %NULL if the devices do
+ * not share any upstream devices.
+ */
+struct pci_dev *pci_find_common_upstream_dev(struct pci_dev *dev,
+struct pci_dev *other)
+{
+   struct pci_dev *pdev = dev;
+
+   while (pdev != NULL) {
+   if ((other->bus->number >= pdev->bus->busn_res.start) &&
+   (other->bus->number <= pdev->bus->busn_res.end))
+   return pdev;
+   pdev = pci_upstream_bridge(pdev);
+   }
+
+   return NULL;
+}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1beda008..0d29f5cdcb04 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -956,6 +956,8 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned 
int bus,
 }
 struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
 int pci_dev_present(const struct pci_device_id *ids);
+struct pci_dev *pci_find_common_upstream_dev(struct pci_dev *from,
+struct pci_dev *to);
 
 int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
 int where, u8 *val);
-- 
2.14.1



[PATCH 4/8] dma-buf: add peer2peer flag

2018-03-25 Thread Christian König
Add a peer2peer flag noting that the importer can deal with device
resources which are not backed by pages.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 1 +
 include/linux/dma-buf.h   | 4 
 2 files changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index ffaa2f9a9c2c..f420225f93c6 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -565,6 +565,7 @@ struct dma_buf_attachment *dma_buf_attach(const struct 
dma_buf_attach_info *info
 
attach->dev = info->dev;
attach->dmabuf = dmabuf;
+   attach->peer2peer = info->peer2peer;
attach->priv = info->priv;
attach->invalidate = info->invalidate;
 
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 15dd8598bff1..1ef50bd9bc5b 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -313,6 +313,7 @@ struct dma_buf {
  * @dmabuf: buffer for this attachment.
  * @dev: device attached to the buffer.
  * @node: list of dma_buf_attachment.
+ * @peer2peer: true if the importer can handle peer resources without pages.
  * @priv: exporter specific attachment data.
  *
  * This structure holds the attachment information between the dma_buf buffer
@@ -328,6 +329,7 @@ struct dma_buf_attachment {
struct dma_buf *dmabuf;
struct device *dev;
struct list_head node;
+   bool peer2peer;
void *priv;
 
/**
@@ -392,6 +394,7 @@ struct dma_buf_export_info {
  * @dmabuf:the exported dma_buf
  * @dev:   the device which wants to import the attachment
  * @priv:  private data of importer to this attachment
+ * @peer2peer: true if the importer can handle peer resources without pages
  * @invalidate:callback to use for invalidating mappings
  *
  * This structure holds the information required to attach to a buffer. Used
@@ -401,6 +404,7 @@ struct dma_buf_attach_info {
struct dma_buf *dmabuf;
struct device *dev;
void *priv;
+   bool peer2peer;
void (*invalidate)(struct dma_buf_attachment *attach);
 };
 
-- 
2.14.1



[PATCH 3/8] PCI: Add pci_peer_traffic_supported()

2018-03-25 Thread Christian König
From: "wda...@nvidia.com" <wda...@nvidia.com>

Add checks for topology and ACS configuration to determine whether or not
peer traffic should be supported between two PCI devices.

Signed-off-by: Will Davis <wda...@nvidia.com>
Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/pci/pci.c   | 112 
 include/linux/pci.h |   1 +
 2 files changed, 113 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6a4dd10d9b0..efdca3c9dad1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -5285,6 +5286,117 @@ void pci_ignore_hotplug(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_ignore_hotplug);
 
+static bool pci_peer_host_traffic_supported(struct pci_host_bridge *host)
+{
+   struct pci_dev *bridge = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
+   unsigned short vendor, device;
+
+   if (!bridge)
+   return false;
+
+   vendor = bridge->vendor;
+   device = bridge->device;
+   pci_dev_put(bridge);
+
+   /* AMD ZEN host bridges can do peer to peer */
+   if (vendor == PCI_VENDOR_ID_AMD && device == 0x1450)
+   return true;
+
+   /* TODO: Extend that to a proper whitelist */
+   return false;
+}
+
+bool pci_peer_traffic_supported(struct pci_dev *dev, struct pci_dev *peer)
+{
+   struct pci_dev *rp_dev, *rp_peer, *common_upstream;
+   struct pci_host_bridge *peer_host_bridge;
+   struct pci_host_bridge *dev_host_bridge;
+   int pos;
+   u16 cap;
+
+   /* This is tricky enough, focus on PCIe for now */
+   if (!pci_is_pcie(dev) || !pci_is_pcie(peer))
+   return false;
+
+   /*
+* Disallow the peer-to-peer traffic if the devices do not share a
+* host bridge. The PCI specifications does not make any guarantees
+* about P2P capabilities between devices under separate domains.
+*
+* PCI Local Bus Specification Revision 3.0, section 3.10:
+*"Peer-to-peer transactions crossing multiple host bridges
+* PCI host bridges may, but are not required to, support PCI
+* peer-to-peer transactions that traverse multiple PCI host
+* bridges."
+*/
+   dev_host_bridge = pci_find_host_bridge(dev->bus);
+   peer_host_bridge = pci_find_host_bridge(peer->bus);
+   if (dev_host_bridge != peer_host_bridge)
+   return pci_peer_host_traffic_supported(dev_host_bridge);
+
+   rp_dev = pcie_find_root_port(dev);
+   rp_peer = pcie_find_root_port(peer);
+   common_upstream = pci_find_common_upstream_dev(dev, peer);
+
+   /*
+* Access Control Services (ACS) Checks
+*
+* ACS has a capability bit for P2P Request Redirects (RR),
+* but unfortunately it doesn't tell us much about the real
+* capabilities of the hardware.
+*
+* PCI Express Base Specification Revision 3.0, section
+* 6.12.1.1:
+*"ACS P2P Request Redirect: must be implemented by Root
+* Ports that support peer-to-peer traffic with other
+* Root Ports; [80]"
+* but
+*"[80] Root Port indication of ACS P2P Request Redirect
+* or ACS P2P Completion Redirect support does not imply
+* any particular level of peer-to-peer support by the
+* Root Complex, or that peer-to-peer traffic is
+* supported at all"
+*/
+
+   /*
+* If ACS is not implemented, we have no idea about P2P
+* support. Optimistically allow this if there is a common
+* upstream device.
+*/
+   pos = pci_find_ext_capability(rp_dev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return common_upstream != NULL;
+
+   /*
+* If the devices are under the same root port and have a common
+* upstream device, allow if the root port is further upstream
+* from the common upstream device and the common upstream
+* device has Upstream Forwarding disabled, or if the root port
+* is the common upstream device and ACS is not implemented.
+*/
+   pci_read_config_word(rp_dev, pos + PCI_ACS_CAP, );
+   if ((rp_dev == rp_peer && common_upstream) &&
+   (((common_upstream != rp_dev) &&
+ !pci_acs_enabled(common_upstream, PCI_ACS_UF)) ||
+((common_upstream == rp_dev) && ((cap & PCI_ACS_RR) == 0
+   return true;
+
+   /*
+* If ACS RR is implemented and disabled, allow only if the
+* devices are under the same root port.
+*/
+   if (cap & PCI_ACS_RR && !pci_acs_enabled(rp_dev, PCI_ACS_RR))
+   return rp_dev == rp_peer;
+
+   /*
+  

[PATCH 7/8] drm/amdgpu: add amdgpu_gem_attach

2018-03-25 Thread Christian König
Check if we can do peer2peer on the PCIe bus.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 2566268806c3..133596df0775 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -134,6 +134,29 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return obj;
 }
 
+static int amdgpu_gem_attach(struct dma_buf *dma_buf, struct device *dev,
+struct dma_buf_attachment *attach)
+{
+   struct drm_gem_object *obj = dma_buf->priv;
+   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+
+   if (!attach->peer2peer)
+   return 0;
+
+   if (!dev_is_pci(dev))
+   goto no_peer2peer;
+
+   if (!pci_peer_traffic_supported(adev->pdev, to_pci_dev(dev)))
+   goto no_peer2peer;
+
+   return 0;
+
+no_peer2peer:
+   attach->peer2peer = false;
+   return 0;
+}
+
 static struct sg_table *
 amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
   enum dma_data_direction dir)
@@ -274,6 +297,7 @@ static int amdgpu_gem_begin_cpu_access(struct dma_buf 
*dma_buf,
 }
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops = {
+   .attach = amdgpu_gem_attach,
.map_dma_buf = amdgpu_gem_map_dma_buf,
.unmap_dma_buf = amdgpu_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
-- 
2.14.1



[PATCH 6/8] drm/amdgpu: note that we can handle peer2peer DMA-buf

2018-03-25 Thread Christian König
Importing should work out of the box.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index fb43faf88eb0..2566268806c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -353,6 +353,7 @@ struct drm_gem_object *amdgpu_gem_prime_import(struct 
drm_device *dev,
if (IS_ERR(obj))
return obj;
 
+   attach_info.peer2peer = true;
attach_info.priv = obj;
attach = dma_buf_attach(_info);
if (IS_ERR(attach)) {
-- 
2.14.1



[PATCH 5/8] drm/amdgpu: print DMA-buf status in debugfs

2018-03-25 Thread Christian König
Just note if a BO was imported/exported.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 46b9ea4e6103..7c1f761bb1a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -777,6 +777,8 @@ static int amdgpu_debugfs_gem_bo_info(int id, void *ptr, 
void *data)
struct amdgpu_bo *bo = gem_to_amdgpu_bo(gobj);
struct seq_file *m = data;
 
+   struct dma_buf_attachment *attachment;
+   struct dma_buf *dma_buf;
unsigned domain;
const char *placement;
unsigned pin_count;
@@ -805,6 +807,15 @@ static int amdgpu_debugfs_gem_bo_info(int id, void *ptr, 
void *data)
pin_count = READ_ONCE(bo->pin_count);
if (pin_count)
seq_printf(m, " pin count %d", pin_count);
+
+   dma_buf = READ_ONCE(bo->gem_base.dma_buf);
+   attachment = READ_ONCE(bo->gem_base.import_attach);
+
+   if (attachment)
+   seq_printf(m, " imported from %p", dma_buf);
+   else if (dma_buf)
+   seq_printf(m, " exported as %p", dma_buf);
+
seq_printf(m, "\n");
 
return 0;
-- 
2.14.1



[PATCH 4/5] drm/amdgpu: add independent DMA-buf export v2

2018-03-25 Thread Christian König
Instead of relying on the DRM functions just implement our own export
functions. This adds support for taking care of unpinned DMA-buf.

v2: fix unintended recursion, remove debugging leftovers

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c  | 108 ++---
 4 files changed, 72 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a7e22298ce5c..190bada6f487 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -371,7 +371,6 @@ int amdgpu_gem_object_open(struct drm_gem_object *obj,
 void amdgpu_gem_object_close(struct drm_gem_object *obj,
struct drm_file *file_priv);
 unsigned long amdgpu_gem_timeout(uint64_t timeout_ns);
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj);
 struct drm_gem_object *
 amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1bfce79bc074..33893f0311c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -892,7 +892,6 @@ static struct drm_driver kms_driver = {
.gem_prime_export = amdgpu_gem_prime_export,
.gem_prime_import = amdgpu_gem_prime_import,
.gem_prime_res_obj = amdgpu_gem_prime_res_obj,
-   .gem_prime_get_sg_table = amdgpu_gem_prime_get_sg_table,
.gem_prime_import_sg_table = amdgpu_gem_prime_import_sg_table,
.gem_prime_vmap = amdgpu_gem_prime_vmap,
.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index ac1fa3394a9e..11d2ac175b30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -31,6 +31,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -935,6 +936,10 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 
amdgpu_bo_kunmap(abo);
 
+   if (abo->gem_base.dma_buf && !abo->gem_base.import_attach &&
+   bo->mem.mem_type != TTM_PL_SYSTEM)
+   dma_buf_invalidate_mappings(abo->gem_base.dma_buf);
+
/* remember the eviction */
if (evict)
atomic64_inc(>num_evictions);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 4b584cb75bf4..7fef95f0fed1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -32,14 +32,6 @@
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
-{
-   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-   int npages = bo->tbo.num_pages;
-
-   return drm_prime_pages_to_sg(bo->tbo.ttm->pages, npages);
-}
-
 void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 {
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -126,23 +118,22 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
-static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
-struct device *target_dev,
-struct dma_buf_attachment *attach)
+static struct sg_table *
+amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
+  enum dma_data_direction dir)
 {
+   struct dma_buf *dma_buf = attach->dmabuf;
struct drm_gem_object *obj = dma_buf->priv;
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+   struct sg_table *sgt;
long r;
 
-   r = drm_gem_map_attach(dma_buf, target_dev, attach);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_reserve(bo, false);
-   if (unlikely(r != 0))
-   goto error_detach;
-
+   if (!attach->invalidate) {
+   r = amdgpu_bo_reserve(bo, false);
+   if (unlikely(r != 0))
+   return ERR_PTR(r);
+   }
 
if (attach->dev->driver != adev->dev->driver) {
/*
@@ -158,42 +149,77 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
}
}
 
-   /* pin buffer into GTT */
-   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
-   if (r)
+   if (!attach->invalidate) {
+   /* pin buffer into GTT */
+   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+   if (r)
+   goto er

[PATCH 2/5] drm/ttm: keep a reference to transfer pipelined BOs

2018-03-25 Thread Christian König
Make sure the transfered BO is never destroy before the transfer BO.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 50 +++
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1f730b3f18e5..1ee20558ee31 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -39,6 +39,11 @@
 #include 
 #include 
 
+struct ttm_transfer_obj {
+   struct ttm_buffer_object base;
+   struct ttm_buffer_object *bo;
+};
+
 void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
 {
ttm_bo_mem_put(bo, >mem);
@@ -435,7 +440,11 @@ EXPORT_SYMBOL(ttm_bo_move_memcpy);
 
 static void ttm_transfered_destroy(struct ttm_buffer_object *bo)
 {
-   kfree(bo);
+   struct ttm_transfer_obj *fbo;
+
+   fbo = container_of(bo, struct ttm_transfer_obj, base);
+   ttm_bo_unref(>bo);
+   kfree(fbo);
 }
 
 /**
@@ -456,14 +465,15 @@ static void ttm_transfered_destroy(struct 
ttm_buffer_object *bo)
 static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
  struct ttm_buffer_object **new_obj)
 {
-   struct ttm_buffer_object *fbo;
+   struct ttm_transfer_obj *fbo;
int ret;
 
fbo = kmalloc(sizeof(*fbo), GFP_KERNEL);
if (!fbo)
return -ENOMEM;
 
-   *fbo = *bo;
+   fbo->base = *bo;
+   fbo->bo = ttm_bo_reference(bo);
 
/**
 * Fix up members that we shouldn't copy directly:
@@ -471,25 +481,25 @@ static int ttm_buffer_object_transfer(struct 
ttm_buffer_object *bo,
 */
 
atomic_inc(>bdev->glob->bo_count);
-   INIT_LIST_HEAD(>ddestroy);
-   INIT_LIST_HEAD(>lru);
-   INIT_LIST_HEAD(>swap);
-   INIT_LIST_HEAD(>io_reserve_lru);
-   mutex_init(>wu_mutex);
-   fbo->moving = NULL;
-   drm_vma_node_reset(>vma_node);
-   atomic_set(>cpu_writers, 0);
-
-   kref_init(>list_kref);
-   kref_init(>kref);
-   fbo->destroy = _transfered_destroy;
-   fbo->acc_size = 0;
-   fbo->resv = >ttm_resv;
-   reservation_object_init(fbo->resv);
-   ret = reservation_object_trylock(fbo->resv);
+   INIT_LIST_HEAD(>base.ddestroy);
+   INIT_LIST_HEAD(>base.lru);
+   INIT_LIST_HEAD(>base.swap);
+   INIT_LIST_HEAD(>base.io_reserve_lru);
+   mutex_init(>base.wu_mutex);
+   fbo->base.moving = NULL;
+   drm_vma_node_reset(>base.vma_node);
+   atomic_set(>base.cpu_writers, 0);
+
+   kref_init(>base.list_kref);
+   kref_init(>base.kref);
+   fbo->base.destroy = _transfered_destroy;
+   fbo->base.acc_size = 0;
+   fbo->base.resv = >base.ttm_resv;
+   reservation_object_init(fbo->base.resv);
+   ret = reservation_object_trylock(fbo->base.resv);
WARN_ON(!ret);
 
-   *new_obj = fbo;
+   *new_obj = >base;
return 0;
 }
 
-- 
2.14.1



[PATCH 5/5] drm/amdgpu: add independent DMA-buf import v3

2018-03-25 Thread Christian König
Instead of relying on the DRM functions just implement our own import
functions. This adds support for taking care of unpinned DMA-buf.

v2: enable for all exporters, not just amdgpu, fix invalidation
handling, lock reservation object while setting callback
v3: change to new dma_buf attach interface

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 72 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 32 +++---
 2 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7fef95f0fed1..fb43faf88eb0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -86,28 +86,24 @@ int amdgpu_gem_prime_mmap(struct drm_gem_object *obj, 
struct vm_area_struct *vma
return ret;
 }
 
-struct drm_gem_object *
-amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
-struct dma_buf_attachment *attach,
-struct sg_table *sg)
+static struct drm_gem_object *
+amdgpu_gem_prime_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
 {
-   struct reservation_object *resv = attach->dmabuf->resv;
+   struct reservation_object *resv = dma_buf->resv;
struct amdgpu_device *adev = dev->dev_private;
struct amdgpu_bo *bo;
int ret;
 
ww_mutex_lock(>lock, NULL);
-   ret = amdgpu_bo_create(adev, attach->dmabuf->size, PAGE_SIZE,
+   ret = amdgpu_bo_create(adev, dma_buf->size, PAGE_SIZE,
   AMDGPU_GEM_DOMAIN_CPU, 0, ttm_bo_type_sg,
   resv, );
if (ret)
goto error;
 
-   bo->tbo.sg = sg;
-   bo->tbo.ttm->sg = sg;
bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
-   if (attach->dmabuf->ops != _dmabuf_ops)
+   if (dma_buf->ops != _dmabuf_ops)
bo->prime_shared_count = 1;
 
ww_mutex_unlock(>lock);
@@ -118,6 +114,26 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+struct drm_gem_object *
+amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
+struct dma_buf_attachment *attach,
+struct sg_table *sg)
+{
+   struct drm_gem_object *obj;
+   struct amdgpu_bo *bo;
+
+   obj = amdgpu_gem_prime_create_obj(dev, attach->dmabuf);
+   if (IS_ERR(obj))
+   return obj;
+
+   bo = gem_to_amdgpu_bo(obj);
+   amdgpu_bo_reserve(bo, true);
+   bo->tbo.sg = sg;
+   bo->tbo.ttm->sg = sg;
+   amdgpu_bo_unreserve(bo);
+   return obj;
+}
+
 static struct sg_table *
 amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
   enum dma_data_direction dir)
@@ -293,9 +309,29 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device 
*dev,
return buf;
 }
 
+static void
+amdgpu_gem_prime_invalidate_mappings(struct dma_buf_attachment *attach)
+{
+   struct ttm_operation_ctx ctx = { false, false };
+   struct drm_gem_object *obj = attach->priv;
+   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+   struct ttm_placement placement = {};
+   int r;
+
+   r = ttm_bo_validate(>tbo, , );
+   if (r)
+   DRM_ERROR("Failed to unmap DMA-buf import (%d))\n", r);
+}
+
 struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
struct dma_buf *dma_buf)
 {
+   struct dma_buf_attach_info attach_info = {
+   .dev = dev->dev,
+   .dmabuf = dma_buf,
+   .invalidate = amdgpu_gem_prime_invalidate_mappings
+   };
+   struct dma_buf_attachment *attach;
struct drm_gem_object *obj;
 
if (dma_buf->ops == _dmabuf_ops) {
@@ -310,5 +346,21 @@ struct drm_gem_object *amdgpu_gem_prime_import(struct 
drm_device *dev,
}
}
 
-   return drm_gem_prime_import(dev, dma_buf);
+   if (!dma_buf->invalidation_supported)
+   return drm_gem_prime_import(dev, dma_buf);
+
+   obj = amdgpu_gem_prime_create_obj(dev, dma_buf);
+   if (IS_ERR(obj))
+   return obj;
+
+   attach_info.priv = obj;
+   attach = dma_buf_attach(_info);
+   if (IS_ERR(attach)) {
+   drm_gem_object_put(obj);
+   return ERR_CAST(attach);
+   }
+
+   get_dma_buf(dma_buf);
+   obj->import_attach = attach;
+   return obj;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d2ab40494a4c..ad93f201e7b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -43,6 

[PATCH 1/5] dma-buf: add optional invalidate_mappings callback v3

2018-03-25 Thread Christian König
Each importer can now provide an invalidate_mappings callback.

This allows the exporter to provide the mappings without the need to pin
the backing store.

v2: don't try to invalidate mappings when the callback is NULL,
lock the reservation obj while using the attachments,
add helper to set the callback
v3: move flag for invalidation support into the DMA-buf,
use new attach_info structure to set the callback

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 43 +++
 include/linux/dma-buf.h   | 28 
 2 files changed, 71 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d2e8ca0d9427..ffaa2f9a9c2c 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -566,6 +566,7 @@ struct dma_buf_attachment *dma_buf_attach(const struct 
dma_buf_attach_info *info
attach->dev = info->dev;
attach->dmabuf = dmabuf;
attach->priv = info->priv;
+   attach->invalidate = info->invalidate;
 
mutex_lock(>lock);
 
@@ -574,7 +575,9 @@ struct dma_buf_attachment *dma_buf_attach(const struct 
dma_buf_attach_info *info
if (ret)
goto err_attach;
}
+   reservation_object_lock(dmabuf->resv, NULL);
list_add(>node, >attachments);
+   reservation_object_unlock(dmabuf->resv);
 
mutex_unlock(>lock);
return attach;
@@ -600,7 +603,9 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct 
dma_buf_attachment *attach)
return;
 
mutex_lock(>lock);
+   reservation_object_lock(dmabuf->resv, NULL);
list_del(>node);
+   reservation_object_unlock(dmabuf->resv);
if (dmabuf->ops->detach)
dmabuf->ops->detach(dmabuf, attach);
 
@@ -634,10 +639,23 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
if (WARN_ON(!attach || !attach->dmabuf))
return ERR_PTR(-EINVAL);
 
+   /*
+* Mapping a DMA-buf can trigger its invalidation, prevent sending this
+* event to the caller by temporary removing this attachment from the
+* list.
+*/
+   if (attach->invalidate) {
+   reservation_object_assert_held(attach->dmabuf->resv);
+   list_del(>node);
+   }
+
sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
if (!sg_table)
sg_table = ERR_PTR(-ENOMEM);
 
+   if (attach->invalidate)
+   list_add(>node, >dmabuf->attachments);
+
return sg_table;
 }
 EXPORT_SYMBOL_GPL(dma_buf_map_attachment);
@@ -658,6 +676,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 {
might_sleep();
 
+   if (attach->invalidate)
+   reservation_object_assert_held(attach->dmabuf->resv);
+
if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
return;
 
@@ -666,6 +687,26 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 }
 EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 
+/**
+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:[in]buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+
+   reservation_object_assert_held(dmabuf->resv);
+
+   list_for_each_entry(attach, >attachments, node)
+   if (attach->invalidate)
+   attach->invalidate(attach);
+}
+EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
+
 /**
  * DOC: cpu access
  *
@@ -1123,10 +1164,12 @@ static int dma_buf_debug_show(struct seq_file *s, void 
*unused)
seq_puts(s, "\tAttached Devices:\n");
attach_count = 0;
 
+   reservation_object_lock(buf_obj->resv, NULL);
list_for_each_entry(attach_obj, _obj->attachments, node) {
seq_printf(s, "\t%s\n", dev_name(attach_obj->dev));
attach_count++;
}
+   reservation_object_unlock(buf_obj->resv);
 
seq_printf(s, "Total %d devices attached\n\n",
attach_count);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 2c27568d44af..15dd8598bff1 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -270,6 +270,8 @@ struct dma_buf_ops {
  * @poll: for userspace poll support
  * @cb_excl: for userspace poll support
  * @cb_shared: for userspace poll support
+ * @invalidation_supported: True when the exporter supports unpinned operation
+ *  

[PATCH 3/5] drm/ttm: remove the backing store if no placement is given

2018-03-25 Thread Christian König
Pipeline removal of the BOs backing store when the placement is given
during validation.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 98e06f8bf23b..17e821f01d0a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1078,6 +1078,18 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
uint32_t new_flags;
 
reservation_object_assert_held(bo->resv);
+
+   /*
+* Remove the backing store if no placement is given.
+*/
+   if (!placement->num_placement && !placement->num_busy_placement) {
+   ret = ttm_bo_pipeline_gutting(bo);
+   if (ret)
+   return ret;
+
+   return ttm_tt_create(bo, false);
+   }
+
/*
 * Check whether we need to move buffer.
 */
-- 
2.14.1



Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-22 Thread Christian König

Am 22.03.2018 um 08:18 schrieb Daniel Vetter:

On Wed, Mar 21, 2018 at 12:54:20PM +0100, Christian König wrote:

Am 21.03.2018 um 09:28 schrieb Daniel Vetter:

On Tue, Mar 20, 2018 at 06:47:57PM +0100, Christian König wrote:

Am 20.03.2018 um 15:08 schrieb Daniel Vetter:

[SNIP]
For the in-driver reservation path (CS) having a slow-path that grabs a
temporary reference, drops the vram lock and then locks the reservation
normally (using the acquire context used already for the entire CS) is a
bit tricky, but totally feasible. Ttm doesn't do that though.

That is exactly what we do in amdgpu as well, it's just not very efficient
nor reliable to retry getting the right pages for a submission over and over
again.

Out of curiosity, where's that code? I did read the ttm eviction code way
back, and that one definitely didn't do that. Would be interesting to
update my understanding.

That is in amdgpu_cs.c. amdgpu_cs_parser_bos() does a horrible dance with
grabbing, releasing and regrabbing locks in a loop.

Then in amdgpu_cs_submit() we grab an lock preventing page table updates and
check if all pages are still the one we want to have:

     amdgpu_mn_lock(p->mn);
     if (p->bo_list) {
     for (i = p->bo_list->first_userptr;
  i < p->bo_list->num_entries; ++i) {
     struct amdgpu_bo *bo = p->bo_list->array[i].robj;

     if
(amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
     amdgpu_mn_unlock(p->mn);
     return -ERESTARTSYS;
     }
     }
     }

If anything changed on the page tables we restart the whole IOCTL using
-ERESTARTSYS and try again.

I'm not talking about userptr here, but general bo eviction. Sorry for the
confusion.

The reason I'm dragging all the general bo management into this
discussions is because we do seem to have fairly fundamental difference in
how that's done, with resulting consequences for the locking hierarchy.

And if this invalidate_mapping stuff should work, together with userptr
and everything else, I think we're required to agree on how this is all
supposed to nest, and how exactly we should back off for the other side
that needs to break the locking circle.

That aside, I don't entirely understand why you need to restart so much. I
figured that get_user_pages is ordered correctly against mmu
invalidations, but I get the impression you think that's not the case. How
does that happen?


Correct. I've had the same assumption, but both Jerome as well as our 
internal tests proved me wrong on that.


Key take away from that was that you can't take any locks from neither 
the MMU notifier nor the shrinker you also take while calling kmalloc 
(or simpler speaking get_user_pages()).


Additional to that in the MMU or shrinker callback all different kinds 
of locks might be held, so you basically can't assume that you do thinks 
like recursive page table walks or call dma_unmap_anything.


Thinking a moment about that it actually seems to make perfect sense.

So it doesn't matter what order you got between the mmap_sem and your 
buffer or allocation lock, it will simply be incorrect with other locks 
in the system anyway.


The only solution to that problem we have found is the dance we have in 
amdgpu now:


1. Inside invalidate_range_start you grab a recursive read side lock.
2. Wait for all GPU operation on that pages to finish.
3. Then mark all pages as dirty and accessed.
4. Inside invalidate_range_end you release your recursive read side lock 
again.


Then during command submission you do the following:

1. Take the locks Figure out all the userptr BOs which needs new pages.
2. Drop the locks again call get_user_pages().
3. Install the new page arrays and reiterate with #1
4. As soon as everybody has pages update your page tables and prepare 
all command submission.

5. Take the write side of our recursive lock.
6. Check if all pages are still valid, if not restart the whole IOCTL.
7. Submit the job to the hardware.
8. Drop the write side of our recursive lock.

Regards,
Christian.


-Daniel




Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-22 Thread Christian König

Am 22.03.2018 um 08:14 schrieb Daniel Vetter:

On Wed, Mar 21, 2018 at 10:34:05AM +0100, Christian König wrote:

Am 21.03.2018 um 09:18 schrieb Daniel Vetter:

[SNIP]

For correct operation you always need to implement invalidate_range_end as
well and add some lock/completion work Otherwise get_user_pages() can again
grab the reference to the wrong page.

Is this really a problem?


Yes, and quite a big one.


I figured that if a mmu_notifier invalidation is
going on, a get_user_pages on that mm from anywhere else (whether i915 or
anyone really) will serialize with the ongoing invalidate?


No, that isn't correct. Jerome can probably better explain that than I do.


If that's not the case, then really any get_user_pages is racy, including all 
the
DIRECT_IO ones.


The key point here is that get_user_pages() grabs a reference to the 
page. So what you get is a bunch of pages which where mapped at that 
location at a specific point in time.


There is no guarantee that after get_user_pages() return you still have 
the same pages mapped at that point, you only guarantee that the pages 
are not reused for something else.


That is perfectly sufficient for a task like DIRECT_IO where you can 
only have block or network I/O, but unfortunately not really for GPUs 
where you crunch of results, write them back to pages and actually count 
on that the CPU sees the result in the right place.



[SNIP]
So no matter how you put it i915 is clearly doing something wrong here :)

tbh I'm not entirely clear on the reasons why this works, but
cross-release lockdep catches these things, and it did not complain.
On a high-level we make sure that mm locks needed by get_user_pages do
_not_ nest within dev->struct_mutex. We have massive back-off slowpaths to
do anything that could fault outside of our own main gem locking.


I'm pretty sure that this doesn't work as intended and just hides the 
real problem.



That was (at least in the past) a major difference with amdgpu, which
essentially has none of these paths. That would trivially deadlock with
your own gem mmap fault handler, so you had (maybe that changed) a dumb
retry loop, which did shut up lockdep but didn't fix any of the locking
inversions.


Any lock you grab in an MMU callback can't be even held when you call 
kmalloc() or get_free_page() (without GFP_NOIO).


Even simple things like drm_vm_open() violate that by using GFP_KERNEL. 
So I can 100% ensure you that what you do here is not correct.



So yeah, grabbing dev->struct_mutex is in principle totally fine while
holding all kinds of struct mm/vma locks. I'm not entirely clear why we
punt the actual unmapping to the worker though, maybe simply to not have a
constrained stack.


I strongly disagree on that. As far as I can see what TTM does looks 
actually like the right approach to the problem.



This is re: your statement that you can't unamp sg tables from the
shrinker. We can, because we've actually untangled the locking depencies
so that you can fully operate on gem objects from within mm/vma locks.
Maybe code has changed, but last time I looked at radeon/ttm a while back
that was totally not the case, and if you don't do all this work then yes
you'll deadlock.

Doen't mean it's not impossible, because we've done it :-)


And I'm pretty sure you didn't do it correctly :D


Well, it actually gets the job done. We'd need to at least get to
per-object locking, and probably even then we'd need to rewrite the code a
lot. But please note that this here is only to avoid the GFP_NOIO
constraint, all the other bits I clarified around why we don't actually
have circular locking (because the entire hierarchy is inverted for us)
still hold even if you would only trylock here.


Well you reversed your allocation and mmap_sem lock which avoids the 
lock inversion during page faults, but it doesn't help you at all with 
the MMU notifier and shrinker because then there are a lot more locks 
involved.


Regards,
Christian.


Aside: Given that yesterday a bunch of folks complained on #dri-devel that
amdgpu prematurely OOMs compared to i915, and that we've switched from a
simple trylock to this nastiness to be able to recover from more low
memory situation it's maybe not such a silly idea. Horrible, but not silly
because actually necessary.
-Daniel




Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-21 Thread Christian König

Am 21.03.2018 um 09:28 schrieb Daniel Vetter:

On Tue, Mar 20, 2018 at 06:47:57PM +0100, Christian König wrote:

Am 20.03.2018 um 15:08 schrieb Daniel Vetter:

[SNIP]
For the in-driver reservation path (CS) having a slow-path that grabs a
temporary reference, drops the vram lock and then locks the reservation
normally (using the acquire context used already for the entire CS) is a
bit tricky, but totally feasible. Ttm doesn't do that though.

That is exactly what we do in amdgpu as well, it's just not very efficient
nor reliable to retry getting the right pages for a submission over and over
again.

Out of curiosity, where's that code? I did read the ttm eviction code way
back, and that one definitely didn't do that. Would be interesting to
update my understanding.


That is in amdgpu_cs.c. amdgpu_cs_parser_bos() does a horrible dance 
with grabbing, releasing and regrabbing locks in a loop.


Then in amdgpu_cs_submit() we grab an lock preventing page table updates 
and check if all pages are still the one we want to have:

    amdgpu_mn_lock(p->mn);
    if (p->bo_list) {
    for (i = p->bo_list->first_userptr;
 i < p->bo_list->num_entries; ++i) {
    struct amdgpu_bo *bo = p->bo_list->array[i].robj;

    if 
(amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {

    amdgpu_mn_unlock(p->mn);
    return -ERESTARTSYS;
    }
    }
    }


If anything changed on the page tables we restart the whole IOCTL using 
-ERESTARTSYS and try again.


Regards,
Christian.


-Daniel




Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-21 Thread Christian König

Am 21.03.2018 um 09:18 schrieb Daniel Vetter:

[SNIP]
They're both in i915_gem_userptr.c, somewhat interleaved. Would be
interesting if you could show what you think is going wrong in there
compared to amdgpu_mn.c.


i915 implements only one callback:

static const struct mmu_notifier_ops i915_gem_userptr_notifier = {
    .invalidate_range_start = 
i915_gem_userptr_mn_invalidate_range_start,

};
For correct operation you always need to implement invalidate_range_end 
as well and add some lock/completion work Otherwise get_user_pages() can 
again grab the reference to the wrong page.


The next problem seems to be that cancel_userptr() doesn't prevent any 
new command submission. E.g.

i915_gem_object_wait(obj, I915_WAIT_ALL, MAX_SCHEDULE_TIMEOUT, NULL);
What prevents new command submissions to use the GEM object directly 
after you finished waiting here?



I get a feeling we're talking past each another here.

Yeah, agree. Additional to that I don't know the i915 code very well.


Can you perhaps explain what exactly the race is you're seeing? The i915 
userptr code is
fairly convoluted and pushes a lot of stuff to workers (but then syncs
with those workers again later on), so easily possible you've overlooked
one of these lines that might guarantee already what you think needs to be
guaranteed. We're definitely not aiming to allow userspace to allow
writing to random pages all over.


You not read/write to random pages, there still is a reference to the 
page. So that the page can't be reused until you are done.


The problem is rather that you can't guarantee that you write to the 
page which is mapped into the process at that location. E.g. the CPU and 
the GPU might see two different things.



Leaking the IOMMU mappings otoh means rogue userspace could do a bunch of
stray writes (I don't see anywhere code in amdgpu_mn.c to unmap at least
the gpu side PTEs to make stuff inaccessible) and wreak the core kernel's
book-keeping.

In i915 we guarantee that we call set_page_dirty/mark_page_accessed only
after all the mappings are really gone (both GPU PTEs and sg mapping),
guaranteeing that any stray writes from either the GPU or IOMMU will
result in faults (except bugs in the IOMMU, but can't have it all, "IOMMU
actually works" is an assumption behind device isolation).

Well exactly that's the point, the handling in i915 looks incorrect to me.
You need to call set_page_dirty/mark_page_accessed way before the mapping is
destroyed.

To be more precise for userptrs it must be called from the
invalidate_range_start, but i915 seems to delegate everything into a
background worker to avoid the locking problems.

Yeah, and at the end of the function there's a flush_work to make sure the
worker has caught up.

Ah, yes haven't seen that.

But then grabbing the obj->base.dev->struct_mutex lock in 
cancel_userptr() is rather evil. You just silenced lockdep because you 
offloaded that into a work item.


So no matter how you put it i915 is clearly doing something wrong here :)


I know. i915 gem has tons of fallbacks and retry loops (we restart the
entire CS if needed), and i915 userptr pushes the entire get_user_pages
dance off into a worker if the fastpath doesn't succeed and we run out of
memory or hit contended locks. We also have obscene amounts of
__GFP_NORETRY and __GFP_NOWARN all over the place to make sure the core mm
code doesn't do something we don't want it do to do in the fastpaths
(because there's really no point in spending lots of time trying to make
memory available if we have a slowpath fallback with much less
constraints).
Well I haven't audited the code, but I'm pretty sure that just mitigates 
the problem and silenced lockdep instead of really fixing the issue.



We're also not limiting ourselves to GFP_NOIO, but instead have a
recursion detection in our own shrinker callback to avoid these
deadlocks.


Which if you ask me is absolutely horrible. I mean the comment in 
linux/mutex.h sums it up pretty well:
 * This function should not be used, _ever_. It is purely for 
hysterical GEM

 * raisins, and once those are gone this will be removed.


Regards,
Christian.


Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-20 Thread Christian König

Am 20.03.2018 um 15:08 schrieb Daniel Vetter:

[SNIP]
For the in-driver reservation path (CS) having a slow-path that grabs a
temporary reference, drops the vram lock and then locks the reservation
normally (using the acquire context used already for the entire CS) is a
bit tricky, but totally feasible. Ttm doesn't do that though.


That is exactly what we do in amdgpu as well, it's just not very 
efficient nor reliable to retry getting the right pages for a submission 
over and over again.



[SNIP]
Note that there are 2 paths for i915 userptr. One is the mmu notifier, the
other one is the root-only hack we have for dubious reasons (or that I
really don't see the point in myself).


Well I'm referring to i915_gem_userptr.c, if that isn't what you are 
exposing then just feel free to ignore this whole discussion.



For coherent usage you need to install some lock to prevent concurrent
get_user_pages(), command submission and
invalidate_range_start/invalidate_range_end from the MMU notifier.

Otherwise you can't guarantee that you are actually accessing the right page
in the case of a fork() or mprotect().

Yeah doing that with a full lock will create endless amounts of issues,
but I don't see why we need that. Userspace racing stuff with itself gets
to keep all the pieces. This is like racing DIRECT_IO against mprotect and
fork.


First of all I strongly disagree on that. A thread calling fork() 
because it wants to run a command is not something we can forbid just 
because we have a gfx stack loaded. That the video driver is not capable 
of handling that correct is certainly not the problem of userspace.


Second it's not only userspace racing here, you can get into this kind 
of issues just because of transparent huge page support where the 
background daemon tries to reallocate the page tables into bigger chunks.


And if I'm not completely mistaken you can also open up quite a bunch of 
security problems if you suddenly access the wrong page.



Leaking the IOMMU mappings otoh means rogue userspace could do a bunch of
stray writes (I don't see anywhere code in amdgpu_mn.c to unmap at least
the gpu side PTEs to make stuff inaccessible) and wreak the core kernel's
book-keeping.

In i915 we guarantee that we call set_page_dirty/mark_page_accessed only
after all the mappings are really gone (both GPU PTEs and sg mapping),
guaranteeing that any stray writes from either the GPU or IOMMU will
result in faults (except bugs in the IOMMU, but can't have it all, "IOMMU
actually works" is an assumption behind device isolation).
Well exactly that's the point, the handling in i915 looks incorrect to 
me. You need to call set_page_dirty/mark_page_accessed way before the 
mapping is destroyed.


To be more precise for userptrs it must be called from the 
invalidate_range_start, but i915 seems to delegate everything into a 
background worker to avoid the locking problems.



Felix and I hammered for quite some time on amdgpu until all of this was
handled correctly, see drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c.

Maybe we should have more shared code in this, it seems to be a source of
endless amounts of fun ...


I can try to gather the lockdep splat from my mail history, but it
essentially took us multiple years to get rid of all of them.

I'm very much interested in specifically the splat that makes it
impossible for you folks to remove the sg mappings. That one sounds bad.
And would essentially make mmu_notifiers useless for their primary use
case, which is handling virtual machines where you really have to make
sure the IOMMU mapping is gone before you claim it's gone, because there's
no 2nd level of device checks (like GPU PTEs, or command checker) catching
stray writes.


Well to be more precise the problem is not that we can't destroy the sg 
table, but rather that we can't grab the locks to do so.


See when you need to destroy the sg table you usually need to grab the 
same lock you grabbed when you created it.


And all locks taken while in an MMU notifier can only depend on memory 
allocation with GFP_NOIO, which is not really feasible for gfx drivers.


Regards,
Christian.


Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-20 Thread Christian König

Am 20.03.2018 um 08:44 schrieb Daniel Vetter:

On Mon, Mar 19, 2018 at 5:23 PM, Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:

Am 19.03.2018 um 16:53 schrieb Chris Wilson:

Quoting Christian König (2018-03-16 14:22:32)
[snip, probably lost too must context]

This allows for full grown pipelining, e.g. the exporter can say I need
to move the buffer for some operation. Then let the move operation wait
for all existing fences in the reservation object and install the fence
of the move operation as exclusive fence.

Ok, the situation I have in mind is the non-pipelined case: revoking
dma-buf for mmu_invalidate_range or shrink_slab. I would need a
completion event that can be waited on the cpu for all the invalidate
callbacks. (Essentially an atomic_t counter plus struct completion; a
lighter version of dma_fence, I wonder where I've seen that before ;)


Actually that is harmless.

When you need to unmap a DMA-buf because of mmu_invalidate_range or
shrink_slab you need to wait for it's reservation object anyway.

reservation_object only prevents adding new fences, you still have to
wait for all the current ones to signal. Also, we have dma-access
without fences in i915. "I hold the reservation_object" does not imply
you can just go and nuke the backing storage.


I was not talking about taking the lock, but rather using 
reservation_object_wait_timeout_rcu().


To be more precise you actually can't take the reservation object lock 
in an mmu_invalidate_range callback and you can only trylock it in a 
shrink_slab callback.



This needs to be done to make sure that the backing memory is now idle, it
doesn't matter if the jobs where submitted by DMA-buf importers or your own
driver.

The sg tables pointing to the now released memory might live a bit longer,
but that is unproblematic and actually intended.

I think that's very problematic. One reason for an IOMMU is that you
have device access isolation, and a broken device can't access memory
it shouldn't be able to access. From that security-in-depth point of
view it's not cool that there's some sg tables hanging around still
that a broken GPU could use. And let's not pretend hw is perfect,
especially GPUs :-)


I completely agree on that, but there is unfortunately no other way.

See you simply can't take a reservation object lock in an mmu or slab 
callback, you can only trylock them.


For example it would require changing all allocations done while holding 
any reservation lock to GFP_NOIO.



When we would try to destroy the sg tables in an mmu_invalidate_range or
shrink_slab callback we would run into a lockdep horror.

So I'm no expert on this, but I think this is exactly what we're doing
in i915. Kinda no other way to actually free the memory without
throwing all the nice isolation aspects of an IOMMU into the wind. Can
you please paste the lockdeps you've seen with amdgpu when trying to
do that?


Taking a quick look at i915 I can definitely say that this is actually 
quite buggy what you guys do here.


For coherent usage you need to install some lock to prevent concurrent 
get_user_pages(), command submission and 
invalidate_range_start/invalidate_range_end from the MMU notifier.


Otherwise you can't guarantee that you are actually accessing the right 
page in the case of a fork() or mprotect().


Felix and I hammered for quite some time on amdgpu until all of this was 
handled correctly, see drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c.


I can try to gather the lockdep splat from my mail history, but it 
essentially took us multiple years to get rid of all of them.


Regards,
Christian.


-Daniel


Regards,
Christian.


Even so, it basically means passing a fence object down to the async
callbacks for them to signal when they are complete. Just to handle the
non-pipelined version. :|
-Chris
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
Linaro-mm-sig mailing list
linaro-mm-...@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-mm-sig







Re: [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-19 Thread Christian König

Am 19.03.2018 um 16:53 schrieb Chris Wilson:

Quoting Christian König (2018-03-16 14:22:32)
[snip, probably lost too must context]

This allows for full grown pipelining, e.g. the exporter can say I need
to move the buffer for some operation. Then let the move operation wait
for all existing fences in the reservation object and install the fence
of the move operation as exclusive fence.

Ok, the situation I have in mind is the non-pipelined case: revoking
dma-buf for mmu_invalidate_range or shrink_slab. I would need a
completion event that can be waited on the cpu for all the invalidate
callbacks. (Essentially an atomic_t counter plus struct completion; a
lighter version of dma_fence, I wonder where I've seen that before ;)


Actually that is harmless.

When you need to unmap a DMA-buf because of mmu_invalidate_range or 
shrink_slab you need to wait for it's reservation object anyway.


This needs to be done to make sure that the backing memory is now idle, 
it doesn't matter if the jobs where submitted by DMA-buf importers or 
your own driver.


The sg tables pointing to the now released memory might live a bit 
longer, but that is unproblematic and actually intended.


When we would try to destroy the sg tables in an mmu_invalidate_range or 
shrink_slab callback we would run into a lockdep horror.


Regards,
Christian.



Even so, it basically means passing a fence object down to the async
callbacks for them to signal when they are complete. Just to handle the
non-pipelined version. :|
-Chris
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-16 Thread Christian König

Am 16.03.2018 um 14:51 schrieb Chris Wilson:

Quoting Christian König (2018-03-16 13:20:45)

@@ -326,6 +338,29 @@ struct dma_buf_attachment {
 struct device *dev;
 struct list_head node;
 void *priv;
+
+   /**
+* @invalidate_mappings:
+*
+* Optional callback provided by the importer of the attachment which
+* must be set before mappings are created.
+*
+* If provided the exporter can avoid pinning the backing store while
+* mappings exists.

Hmm, no I don't think it avoids the pinning issue entirely. As it stands,
the importer doesn't have a page refcount and so they all rely on the
exporter keeping the dmabuf pages pinned while attached. What can happen
is that given the invalidate cb, the importers can revoke their
attachments, letting the exporter recover the pages/sg, and then start
again from scratch.


Yes, exactly. The wording is just not 100% precise and I haven't found 
something better so far.



That also neatly answers what happens if not all importers provide an
invalidate cb, or fail, the dmabuf remains pinned and the exporter must
retreat.


Yes, exactly as well. As soon as at least one importer says "I can't do 
this", we must fallback to the old behavior.



+*
+* The function is called with the lock of the reservation object
+* associated with the dma_buf held and the mapping function must be
+* called with this lock held as well. This makes sure that no mapping
+* is created concurrently with an ongoing invalidation.
+*
+* After the callback all existing mappings are still valid until all
+* fences in the dma_bufs reservation object are signaled, but should be
+* destroyed by the importer as soon as possible.
+*
+* New mappings can be created immediately, but can't be used before the
+* exclusive fence in the dma_bufs reservation object is signaled.
+*/
+   void (*invalidate_mappings)(struct dma_buf_attachment *attach);

The intent is that the invalidate is synchronous and immediate, while
locked? We are looking at recursing back into the dma_buf functions to
remove each attachment from the invalidate cb (as well as waiting for
dma), won't that cause some nasty issues?


No, with this idea invalidation is asynchronous. Already discussed that 
with Daniel as well and YES Daniel also already pointed out that I need 
to better document this.


When the exporter calls invalidate_mappings() it just means that all 
importers can no longer use their sg tables for new submissions, old 
ones stay active.


The existing sg tables are guaranteed to stay valid until all fences in 
the reservation object have signaled and the importer should also delay 
it's call to call dma_buf_unmap_attachment() until all the fences have 
signaled.


When the importer has new work to do, e.g. wants to attach a new fence 
to the reservation object, it must grab a new sg table for that. The 
importer also needs to make sure that all new work touching the dma-buf 
doesn't start before the exclusive fence in the reservation object signals.


This allows for full grown pipelining, e.g. the exporter can say I need 
to move the buffer for some operation. Then let the move operation wait 
for all existing fences in the reservation object and install the fence 
of the move operation as exclusive fence.


The importer can then immediately grab a new sg table for the new 
location of the buffer and use it to prepare the next operation.


Regards,
Christian.


-Chris
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[PATCH 4/5] drm/amdgpu: add independent DMA-buf export v2

2018-03-16 Thread Christian König
Instead of relying on the DRM functions just implement our own export
functions. This adds support for taking care of unpinned DMA-buf.

v2: fix unintended recursion, remove debugging leftovers

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c  | 110 ++---
 4 files changed, 73 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 96bcdb97e7e2..909dc9764a22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -371,7 +371,6 @@ int amdgpu_gem_object_open(struct drm_gem_object *obj,
 void amdgpu_gem_object_close(struct drm_gem_object *obj,
struct drm_file *file_priv);
 unsigned long amdgpu_gem_timeout(uint64_t timeout_ns);
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj);
 struct drm_gem_object *
 amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e6709362994a..e32dcdf8d3db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -886,7 +886,6 @@ static struct drm_driver kms_driver = {
.gem_prime_export = amdgpu_gem_prime_export,
.gem_prime_import = amdgpu_gem_prime_import,
.gem_prime_res_obj = amdgpu_gem_prime_res_obj,
-   .gem_prime_get_sg_table = amdgpu_gem_prime_get_sg_table,
.gem_prime_import_sg_table = amdgpu_gem_prime_import_sg_table,
.gem_prime_vmap = amdgpu_gem_prime_vmap,
.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 48e0115d4f76..18483a79dcdd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -31,6 +31,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -931,6 +932,10 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 
amdgpu_bo_kunmap(abo);
 
+   if (abo->gem_base.dma_buf && !abo->gem_base.import_attach &&
+   bo->mem.mem_type != TTM_PL_SYSTEM)
+   dma_buf_invalidate_mappings(abo->gem_base.dma_buf);
+
/* remember the eviction */
if (evict)
atomic64_inc(>num_evictions);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c9991738477..7b9edbc938eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -32,14 +32,6 @@
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
-{
-   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-   int npages = bo->tbo.num_pages;
-
-   return drm_prime_pages_to_sg(bo->tbo.ttm->pages, npages);
-}
-
 void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 {
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
@@ -126,22 +118,21 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
-static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
-struct device *target_dev,
-struct dma_buf_attachment *attach)
+static struct sg_table *
+amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
+  enum dma_data_direction dir)
 {
+   struct dma_buf *dma_buf = attach->dmabuf;
struct drm_gem_object *obj = dma_buf->priv;
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+   struct sg_table *sgt;
long r;
 
-   r = drm_gem_map_attach(dma_buf, target_dev, attach);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_reserve(bo, false);
-   if (unlikely(r != 0))
-   goto error_detach;
-
+   if (!attach->invalidate_mappings) {
+   r = amdgpu_bo_reserve(bo, false);
+   if (unlikely(r != 0))
+   return ERR_PTR(r);
+   }
 
if (dma_buf->ops != _dmabuf_ops) {
/*
@@ -157,41 +148,77 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
}
}
 
-   /* pin buffer into GTT */
-   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
-   if (r)
+   if (!attach->invalidate_mappings) {
+   /* pin buffer into GTT */
+   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+   if (r)
+   goto error_unreserve;
+
+   } else {
+   /* move buffer into GTT */

[PATCH 3/5] drm/ttm: remove the backing store if no placement is given

2018-03-16 Thread Christian König
Pipeline removal of the BOs backing store when the placement is given
during validation.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 98e06f8bf23b..17e821f01d0a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1078,6 +1078,18 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
uint32_t new_flags;
 
reservation_object_assert_held(bo->resv);
+
+   /*
+* Remove the backing store if no placement is given.
+*/
+   if (!placement->num_placement && !placement->num_busy_placement) {
+   ret = ttm_bo_pipeline_gutting(bo);
+   if (ret)
+   return ret;
+
+   return ttm_tt_create(bo, false);
+   }
+
/*
 * Check whether we need to move buffer.
 */
-- 
2.14.1



[PATCH 5/5] drm/amdgpu: add independent DMA-buf import v2

2018-03-16 Thread Christian König
Instead of relying on the DRM functions just implement our own import
functions. This adds support for taking care of unpinned DMA-buf.

v2: enable for all exporters, not just amdgpu, fix invalidation
handling, lock reservation object while setting callback

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 44 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 32 ++
 2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7b9edbc938eb..cb6ec9b7844a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -291,10 +291,26 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device 
*dev,
return buf;
 }
 
+static void
+amdgpu_gem_prime_invalidate_mappings(struct dma_buf_attachment *attach)
+{
+   struct ttm_operation_ctx ctx = { false, false };
+   struct drm_gem_object *obj = attach->priv;
+   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+   struct ttm_placement placement = {};
+   int r;
+
+   r = ttm_bo_validate(>tbo, , );
+   if (r)
+   DRM_ERROR("Failed to unmap DMA-buf import (%d))\n", r);
+}
+
 struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
struct dma_buf *dma_buf)
 {
+   struct dma_buf_attachment *attach;
struct drm_gem_object *obj;
+   int ret;
 
if (dma_buf->ops == _dmabuf_ops) {
obj = dma_buf->priv;
@@ -308,5 +324,31 @@ struct drm_gem_object *amdgpu_gem_prime_import(struct 
drm_device *dev,
}
}
 
-   return drm_gem_prime_import(dev, dma_buf);
+   if (!dma_buf->ops->supports_mapping_invalidation)
+   return drm_gem_prime_import(dev, dma_buf);
+
+   attach = dma_buf_attach(dma_buf, dev->dev);
+   if (IS_ERR(attach))
+   return ERR_CAST(attach);
+
+   get_dma_buf(dma_buf);
+
+   obj = amdgpu_gem_prime_import_sg_table(dev, attach, NULL);
+   if (IS_ERR(obj)) {
+   ret = PTR_ERR(obj);
+   goto fail_detach;
+   }
+
+   obj->import_attach = attach;
+   attach->priv = obj;
+
+   dma_buf_set_invalidate_callback(attach,
+   amdgpu_gem_prime_invalidate_mappings);
+   return obj;
+
+fail_detach:
+   dma_buf_detach(dma_buf, attach);
+   dma_buf_put(dma_buf);
+
+   return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 291dd3d600cd..aeead0281e92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
@@ -685,6 +686,7 @@ struct amdgpu_ttm_gup_task_list {
 
 struct amdgpu_ttm_tt {
struct ttm_dma_tt   ttm;
+   struct amdgpu_bo*bo;
u64 offset;
uint64_tuserptr;
struct mm_struct*usermm;
@@ -993,6 +995,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct 
ttm_buffer_object *bo,
return NULL;
}
gtt->ttm.ttm.func = _backend_func;
+   gtt->bo = ttm_to_amdgpu_bo(bo);
if (ttm_sg_tt_init(>ttm, bo, page_flags)) {
kfree(gtt);
return NULL;
@@ -1005,7 +1008,6 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
struct amdgpu_ttm_tt *gtt = (void *)ttm;
-   bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
 
if (gtt && gtt->userptr) {
ttm->sg = kzalloc(sizeof(struct sg_table), GFP_KERNEL);
@@ -1017,7 +1019,19 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
return 0;
}
 
-   if (slave && ttm->sg) {
+   if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
+   if (!ttm->sg) {
+   struct dma_buf_attachment *attach;
+   struct sg_table *sgt;
+
+   attach = gtt->bo->gem_base.import_attach;
+   sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+   if (IS_ERR(sgt))
+   return PTR_ERR(sgt);
+
+   ttm->sg = sgt;
+   }
+
drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
 gtt->ttm.dma_address,
 ttm->num_pages);
@@ -1036,9 +1050,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm

[PATCH 2/5] drm/ttm: keep a reference to transfer pipelined BOs

2018-03-16 Thread Christian König
Make sure the transfered BO is never destroy before the transfer BO.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 50 +++
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1f730b3f18e5..1ee20558ee31 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -39,6 +39,11 @@
 #include 
 #include 
 
+struct ttm_transfer_obj {
+   struct ttm_buffer_object base;
+   struct ttm_buffer_object *bo;
+};
+
 void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
 {
ttm_bo_mem_put(bo, >mem);
@@ -435,7 +440,11 @@ EXPORT_SYMBOL(ttm_bo_move_memcpy);
 
 static void ttm_transfered_destroy(struct ttm_buffer_object *bo)
 {
-   kfree(bo);
+   struct ttm_transfer_obj *fbo;
+
+   fbo = container_of(bo, struct ttm_transfer_obj, base);
+   ttm_bo_unref(>bo);
+   kfree(fbo);
 }
 
 /**
@@ -456,14 +465,15 @@ static void ttm_transfered_destroy(struct 
ttm_buffer_object *bo)
 static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
  struct ttm_buffer_object **new_obj)
 {
-   struct ttm_buffer_object *fbo;
+   struct ttm_transfer_obj *fbo;
int ret;
 
fbo = kmalloc(sizeof(*fbo), GFP_KERNEL);
if (!fbo)
return -ENOMEM;
 
-   *fbo = *bo;
+   fbo->base = *bo;
+   fbo->bo = ttm_bo_reference(bo);
 
/**
 * Fix up members that we shouldn't copy directly:
@@ -471,25 +481,25 @@ static int ttm_buffer_object_transfer(struct 
ttm_buffer_object *bo,
 */
 
atomic_inc(>bdev->glob->bo_count);
-   INIT_LIST_HEAD(>ddestroy);
-   INIT_LIST_HEAD(>lru);
-   INIT_LIST_HEAD(>swap);
-   INIT_LIST_HEAD(>io_reserve_lru);
-   mutex_init(>wu_mutex);
-   fbo->moving = NULL;
-   drm_vma_node_reset(>vma_node);
-   atomic_set(>cpu_writers, 0);
-
-   kref_init(>list_kref);
-   kref_init(>kref);
-   fbo->destroy = _transfered_destroy;
-   fbo->acc_size = 0;
-   fbo->resv = >ttm_resv;
-   reservation_object_init(fbo->resv);
-   ret = reservation_object_trylock(fbo->resv);
+   INIT_LIST_HEAD(>base.ddestroy);
+   INIT_LIST_HEAD(>base.lru);
+   INIT_LIST_HEAD(>base.swap);
+   INIT_LIST_HEAD(>base.io_reserve_lru);
+   mutex_init(>base.wu_mutex);
+   fbo->base.moving = NULL;
+   drm_vma_node_reset(>base.vma_node);
+   atomic_set(>base.cpu_writers, 0);
+
+   kref_init(>base.list_kref);
+   kref_init(>base.kref);
+   fbo->base.destroy = _transfered_destroy;
+   fbo->base.acc_size = 0;
+   fbo->base.resv = >base.ttm_resv;
+   reservation_object_init(fbo->base.resv);
+   ret = reservation_object_trylock(fbo->base.resv);
WARN_ON(!ret);
 
-   *new_obj = fbo;
+   *new_obj = >base;
return 0;
 }
 
-- 
2.14.1



[PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-16 Thread Christian König
Each importer can now provide an invalidate_mappings callback.

This allows the exporter to provide the mappings without the need to pin
the backing store.

v2: don't try to invalidate mappings when the callback is NULL,
lock the reservation obj while using the attachments,
add helper to set the callback

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 60 +++
 include/linux/dma-buf.h   | 38 ++
 2 files changed, 98 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..ed2b3559ba25 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -572,7 +572,9 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf 
*dmabuf,
if (ret)
goto err_attach;
}
+   reservation_object_lock(dmabuf->resv, NULL);
list_add(>node, >attachments);
+   reservation_object_unlock(dmabuf->resv);
 
mutex_unlock(>lock);
return attach;
@@ -598,7 +600,9 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct 
dma_buf_attachment *attach)
return;
 
mutex_lock(>lock);
+   reservation_object_lock(dmabuf->resv, NULL);
list_del(>node);
+   reservation_object_unlock(dmabuf->resv);
if (dmabuf->ops->detach)
dmabuf->ops->detach(dmabuf, attach);
 
@@ -632,10 +636,23 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
if (WARN_ON(!attach || !attach->dmabuf))
return ERR_PTR(-EINVAL);
 
+   /*
+* Mapping a DMA-buf can trigger its invalidation, prevent sending this
+* event to the caller by temporary removing this attachment from the
+* list.
+*/
+   if (attach->invalidate_mappings) {
+   reservation_object_assert_held(attach->dmabuf->resv);
+   list_del(>node);
+   }
+
sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
if (!sg_table)
sg_table = ERR_PTR(-ENOMEM);
 
+   if (attach->invalidate_mappings)
+   list_add(>node, >dmabuf->attachments);
+
return sg_table;
 }
 EXPORT_SYMBOL_GPL(dma_buf_map_attachment);
@@ -656,6 +673,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 {
might_sleep();
 
+   if (attach->invalidate_mappings)
+   reservation_object_assert_held(attach->dmabuf->resv);
+
if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
return;
 
@@ -664,6 +684,44 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 }
 EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 
+/**
+ * dma_buf_set_invalidate_callback - set the invalidate_mappings callback
+ *
+ * @attach:[in]attachment where to set the callback
+ * @cb:[in]the callback to set
+ *
+ * Makes sure to take the appropriate locks when updating the invalidate
+ * mappings callback.
+ */
+void dma_buf_set_invalidate_callback(struct dma_buf_attachment *attach,
+void (*cb)(struct dma_buf_attachment *))
+{
+   reservation_object_lock(attach->dmabuf->resv, NULL);
+   attach->invalidate_mappings = cb;
+   reservation_object_unlock(attach->dmabuf->resv);
+}
+EXPORT_SYMBOL_GPL(dma_buf_set_invalidate_callback);
+
+/**
+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:[in]buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+
+   reservation_object_assert_held(dmabuf->resv);
+
+   list_for_each_entry(attach, >attachments, node)
+   if (attach->invalidate_mappings)
+   attach->invalidate_mappings(attach);
+}
+EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
+
 /**
  * DOC: cpu access
  *
@@ -1121,10 +1179,12 @@ static int dma_buf_debug_show(struct seq_file *s, void 
*unused)
seq_puts(s, "\tAttached Devices:\n");
attach_count = 0;
 
+   reservation_object_lock(buf_obj->resv, NULL);
list_for_each_entry(attach_obj, _obj->attachments, node) {
seq_printf(s, "\t%s\n", dev_name(attach_obj->dev));
attach_count++;
}
+   reservation_object_unlock(buf_obj->resv);
 
seq_printf(s, "Total %d devices attached\n\n",
attach_count);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 085db2fee2d7..70c65fcfe1e3 1006

RFC: unpinned DMA-buf exporting v2

2018-03-16 Thread Christian König
Hi everybody,

since I've got positive feedback from Daniel I continued working on this 
approach.

A few issues are still open:
1. Daniel suggested that I make the invalidate_mappings callback a parameter of 
dma_buf_attach().

This approach unfortunately won't work because when the attachment is created 
the importer is not necessarily ready to handle invalidation events.

E.g. in the amdgpu example we first need to setup the imported GEM/TMM objects 
and install that in the attachment.

My solution is to introduce a separate function to grab the locks and set the 
callback, this function could then be used to pin the buffer later on if that 
turns out to be necessary after all.

2. With my example setup this currently results in a ping/pong situation 
because the exporter prefers a VRAM placement while the importer prefers a GTT 
placement.

This results in quite a performance drop, but can be fixed by a simple mesa 
patch which allows shred BOs to be placed in both VRAM and GTT.

Question is what should we do in the meantime? Accept the performance drop or 
only allow unpinned sharing with new Mesa?

Please review and comment,
Christian.



Re: [PATCH v2] Add udmabuf misc device

2018-03-16 Thread Christian König

Am 16.03.2018 um 08:46 schrieb Gerd Hoffmann:

A driver to let userspace turn iovecs into dma-bufs.

Use case:  Allows qemu create dmabufs for the vga framebuffer or
virtio-gpu ressources.  Then they can be passed around to display
those guest things on the host.  To spice client for classic full
framebuffer display, and hopefully some day to wayland server for
seamless guest window display.

Those dma-bufs are accounted against user's shm mlock bucket as the
pages are effectively locked in memory.


Sorry to disappoint you, but we have investigated the same idea for 
userptr use quite extensively and found that whole approach doesn't work.


The pages backing a DMA-buf are not allowed to move (at least not 
without a patch set I'm currently working on), but for certain MM 
operations to work correctly you must be able to modify the page tables 
entries and move the pages backing them around.


For example try to use fork() with some copy on write pages with this 
approach. You will find that you have only two options to correctly 
handle this.


Either you access memory which could no longer belong to your process, 
which in turn is major security no-go.


Or you use a MM notifier, but without being able to unmap DMA-bufs that 
won't work and you will certainly create a deadlock.


Even with the patch set I'm currently working on the MM notifier 
approach is a real beast to get right.


Regards,
Christian.



Cc: David Airlie 
Cc: Tomeu Vizoso 
Cc: Daniel Vetter 
Signed-off-by: Gerd Hoffmann 
---
  include/uapi/linux/udmabuf.h  |  23 ++
  drivers/dma-buf/udmabuf.c | 261 ++
  tools/testing/selftests/drivers/dma-buf/udmabuf.c |  69 ++
  drivers/dma-buf/Kconfig   |   7 +
  drivers/dma-buf/Makefile  |   1 +
  tools/testing/selftests/drivers/dma-buf/Makefile  |   5 +
  6 files changed, 366 insertions(+)
  create mode 100644 include/uapi/linux/udmabuf.h
  create mode 100644 drivers/dma-buf/udmabuf.c
  create mode 100644 tools/testing/selftests/drivers/dma-buf/udmabuf.c
  create mode 100644 tools/testing/selftests/drivers/dma-buf/Makefile

diff --git a/include/uapi/linux/udmabuf.h b/include/uapi/linux/udmabuf.h
new file mode 100644
index 00..54ceba203a
--- /dev/null
+++ b/include/uapi/linux/udmabuf.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_UDMABUF_H
+#define _UAPI_LINUX_UDMABUF_H
+
+#include 
+#include 
+
+struct udmabuf_iovec {
+   __u64 base;
+   __u64 len;
+};
+
+#define UDMABUF_FLAGS_CLOEXEC  0x01
+
+struct udmabuf_create {
+   __u32 flags;
+   __u32 niov;
+   struct udmabuf_iovec iovs[];
+};
+
+#define UDMABUF_CREATE _IOW(0x42, 0x23, struct udmabuf_create)
+
+#endif /* _UAPI_LINUX_UDMABUF_H */
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
new file mode 100644
index 00..664ab4ee4e
--- /dev/null
+++ b/drivers/dma-buf/udmabuf.c
@@ -0,0 +1,261 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+struct udmabuf {
+   u32 pagecount;
+   struct page **pages;
+   struct user_struct *owner;
+};
+
+static int udmabuf_vm_fault(struct vm_fault *vmf)
+{
+   struct vm_area_struct *vma = vmf->vma;
+   struct udmabuf *ubuf = vma->vm_private_data;
+
+   if (WARN_ON(vmf->pgoff >= ubuf->pagecount))
+   return VM_FAULT_SIGBUS;
+
+   vmf->page = ubuf->pages[vmf->pgoff];
+   get_page(vmf->page);
+   return 0;
+}
+
+static const struct vm_operations_struct udmabuf_vm_ops = {
+   .fault = udmabuf_vm_fault,
+};
+
+static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
+{
+   struct udmabuf *ubuf = buf->priv;
+
+   if ((vma->vm_flags & VM_SHARED) == 0)
+   return -EINVAL;
+
+   vma->vm_ops = _vm_ops;
+   vma->vm_private_data = ubuf;
+   return 0;
+}
+
+static struct sg_table *map_udmabuf(struct dma_buf_attachment *at,
+   enum dma_data_direction direction)
+{
+   struct udmabuf *ubuf = at->dmabuf->priv;
+   struct sg_table *sg;
+
+   sg = kzalloc(sizeof(*sg), GFP_KERNEL);
+   if (!sg)
+   goto err1;
+   if (sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount,
+ 0, ubuf->pagecount << PAGE_SHIFT,
+ GFP_KERNEL) < 0)
+   goto err2;
+   if (!dma_map_sg(at->dev, sg->sgl, sg->nents, direction))
+   goto err3;
+
+   return sg;
+
+err3:
+   sg_free_table(sg);
+err2:
+   kfree(sg);
+err1:
+  

Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback

2018-03-15 Thread Christian König

Am 15.03.2018 um 10:20 schrieb Daniel Vetter:

On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
[SNIP]
Take a look at the DOT graphs for atomic I've done a while ago. I think we
could make a formidable competition for who's doing the worst diagrams :-)


Thanks, going to give that a try.


[SNIP]
amdgpu: Expects that you never hold any of the heavywheight locks while
waiting for a fence (since gpu resets will need them).

i915: Happily blocks on fences while holding all kinds of locks, expects
gpu reset to be able to recover even in this case.


In this case I can comfort you, the looks amdgpu needs to grab during 
GPU reset are the reservation lock of the VM page tables. I have strong 
doubt that i915 will ever hold those.


Could be that we run into problems because Thread A hold lock 1 tries to 
take lock 2, then i915 holds 2 and our reset path needs 1.



[SNIP]

Yes, except for fallback paths and bootup self tests we simply never wait
for fences while holding locks.

That's not what I meant with "are you sure". Did you enable the
cross-release stuff (after patching the bunch of leftover core kernel
issues still present), annotate dma_fence with the cross-release stuff,
run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
weep?


Ok, what exactly do you mean with cross-release checking?


I didn't do the full thing yet, but just within i915 we've found tons of
small little deadlocks we never really considered thanks to cross release,
and that wasn't even including the dma_fence annotation. Luckily nothing
that needed a full-on driver redesign.

I guess I need to ping core kernel maintainers about cross-release again.
I'd much prefer if we could validate ->invalidate_mapping and the
locking/fence dependency issues using that, instead of me having to read
and understand all the drivers.

[SNIP]

I fear that with the ->invalidate_mapping callback (which inverts the
control flow between importer and exporter) and tying dma_fences into all
this it will be a _lot_ worse. And I'm definitely too stupid to understand
all the dependency chains without the aid of lockdep and a full test suite
(we have a bunch of amdgpu/i915 dma-buf tests in igt btw).


Yes, that is also something I worry about.

Regards,
Christian.


Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback

2018-03-13 Thread Christian König

Am 13.03.2018 um 17:00 schrieb Daniel Vetter:

On Tue, Mar 13, 2018 at 04:52:02PM +0100, Christian König wrote:

Am 13.03.2018 um 16:17 schrieb Daniel Vetter:
[SNIP]

Ok, so plan is to support fully pipeline moves and everything, with the
old sg tables lazily cleaned up. I was thinking more about evicting stuff
and throwing it out, where there's not going to be any new sg list but the
object is going to be swapped out.


Yes, exactly. Well my example was the unlikely case when the object is 
swapped out and immediately swapped in again because somebody needs it.




I think some state flow charts (we can do SVG or DOT) in the kerneldoc
would be sweet.Yeah, probably a good idea.


Sounds good and I find it great that you're volunteering for that :D

Ok seriously, my drawing capabilities are a bit underdeveloped. So I 
would prefer if somebody could at least help with that.



Re GPU might cause a deadlock: Isn't that already a problem if you hold
reservations of buffers used on other gpus, which want those reservations
to complete the gpu reset, but that gpu reset blocks some fence that the
reservation holder is waiting for?

Correct, that's why amdgpu and TTM tries quite hard to never wait for a
fence while a reservation object is locked.

We might have a fairly huge mismatch of expectations here :-/


What do you mean with that?


The only use case I haven't fixed so far is reaping deleted object during
eviction, but that is only a matter of my free time to fix it.

Yeah, this is the hard one.


Actually it isn't so hard, it's just that I didn't had time so far to 
clean it up and we never hit that issue so far during our reset testing.


The main point missing just a bit of functionality in the reservation 
object and Chris and I already had a good idea how to implement that.



In general the assumption is that dma_fence will get signalled no matter
what you're doing, assuming the only thing you need is to not block
interrupts. The i915 gpu reset logic to make that work is a bit a work of
art ...


Correct, but I don't understand why that is so hard on i915? Our GPU 
scheduler makes all of that rather trivial, e.g. fences either signal 
correctly or are aborted and set as erroneous after a timeout.



If we expect amdgpu and i915 to cooperate with shared buffers I guess one
has to give in. No idea how to do that best.


Again at least from amdgpu side I don't see much of an issue with that. 
So what exactly do you have in mind here?



We have tons of fun with deadlocks against GPU resets, and loots of
testcases, and I kinda get the impression amdgpu is throwing a lot of
issues under the rug through trylock tricks that shut up lockdep, but
don't fix much really.

Hui? Why do you think that? The only trylock I'm aware of is during eviction
and there it isn't a problem.

mmap fault handler had one too last time I looked, and it smelled fishy.


Good point, never wrapped my head fully around that one either.


btw adding cross-release lockdep annotations for fences will probably turn
up _lots_ more bugs in this area.

At least for amdgpu that should be handled by now.

You're sure? :-)


Yes, except for fallback paths and bootup self tests we simply never 
wait for fences while holding locks.



Trouble is that cross-release wasn't even ever enabled, much less anyone
typed the dma_fence annotations. And just cross-release alone turned up
_lost_ of deadlocks in i915 between fences, async workers (userptr, gpu
reset) and core mm stuff.


Yeah, we had lots of fun with the mm locks as well but as far as I know 
Felix and I already fixed all of them.


Christian.


I'd be seriously surprised if it wouldn't find an entire rats nest of
issues around dma_fence once we enable it.
-Daniel


+*
+* New mappings can be created immediately, but can't be used before the
+* exclusive fence in the dma_bufs reservation object is signaled.
+*/
+   void (*invalidate_mappings)(struct dma_buf_attachment *attach);

Bunch of questions about exact semantics, but I very much like this. And I
think besides those technical details, the overall approach seems sound.

Yeah this initial implementation was buggy like hell. Just wanted to confirm
that the idea is going in the right direction.

I wanted this 7 years ago, idea very much acked :-)


Ok, thanks. Good to know.

Christian.




Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback

2018-03-13 Thread Christian König

Am 13.03.2018 um 16:17 schrieb Daniel Vetter:

[SNIP]

I think a helper which both unmaps _and_ waits for all the fences to clear
would be best, with some guarantees that it'll either fail or all the
mappings _will_ be gone. The locking for that one will be hilarious, since
we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
throw away the dmabuf->lock and superseed it entirely by the reservation
lock.

Big NAK on that. The whole API is asynchronously, e.g. we never block for
any operation to finish.

Otherwise you run into big trouble with cross device GPU resets and stuff
like that.

But how will the unmapping work then? You can't throw the sg list away
before the dma stopped. The dma only stops once the fence is signalled.
The importer can't call dma_buf_detach because the reservation lock is
hogged already by the exporter trying to unmap everything.

How is this supposed to work?


Even after invalidation the sg list stays alive until it is explicitly 
destroyed by the importer using dma_buf_unmap_attachment() which in turn 
is only allowed after all fences have signaled.


The implementation is in ttm_bo_pipeline_gutting(), basically we use the 
same functionality as for pipelined moves/evictions which hangs the old 
backing store on a dummy object and destroys it after all fences signaled.


While the old sg list is still about to be destroyed the importer can 
request a new sg list for the new location of the DMA-buf using 
dma_buf_map_attachment(). This new location becomes valid after the move 
fence in the reservation object is signaled.


So from the CPU point of view multiple sg list could exists at the same 
time which allows us to have a seamless transition from the old to the 
new location from the GPU point of view.



Re GPU might cause a deadlock: Isn't that already a problem if you hold
reservations of buffers used on other gpus, which want those reservations
to complete the gpu reset, but that gpu reset blocks some fence that the
reservation holder is waiting for?


Correct, that's why amdgpu and TTM tries quite hard to never wait for a 
fence while a reservation object is locked.


The only use case I haven't fixed so far is reaping deleted object 
during eviction, but that is only a matter of my free time to fix it.



We have tons of fun with deadlocks against GPU resets, and loots of
testcases, and I kinda get the impression amdgpu is throwing a lot of
issues under the rug through trylock tricks that shut up lockdep, but
don't fix much really.


Hui? Why do you think that? The only trylock I'm aware of is during 
eviction and there it isn't a problem.



btw adding cross-release lockdep annotations for fences will probably turn
up _lots_ more bugs in this area.


At least for amdgpu that should be handled by now.


+*
+* New mappings can be created immediately, but can't be used before the
+* exclusive fence in the dma_bufs reservation object is signaled.
+*/
+   void (*invalidate_mappings)(struct dma_buf_attachment *attach);

Bunch of questions about exact semantics, but I very much like this. And I
think besides those technical details, the overall approach seems sound.

Yeah this initial implementation was buggy like hell. Just wanted to confirm
that the idea is going in the right direction.

I wanted this 7 years ago, idea very much acked :-)


Ok, thanks. Good to know.

Christian.


Re: RFC: unpinned DMA-buf exporting

2018-03-12 Thread Christian König

Am 12.03.2018 um 18:24 schrieb Daniel Vetter:

On Fri, Mar 09, 2018 at 08:11:40PM +0100, Christian K??nig wrote:

This set of patches adds an option invalidate_mappings callback to each
DMA-buf attachment which can be filled in by the importer.

This callback allows the exporter to provided the DMA-buf content
without pinning it. The reservation objects lock acts as synchronization
point for buffer moves and creating mappings.

This set includes an implementation for amdgpu which should be rather
easily portable to other DRM drivers.

Bunch of higher level comments, and one I've forgotten in reply to patch
1:

- What happens when a dma-buf is pinned (e.g. i915 loves to pin buffers
   for scanout)?


When you need to pin an imported DMA-buf you need to detach and reattach 
without the invalidate_mappings callback.



- pulling the dma-buf implementations into amdgpu makes sense, that's
   kinda how it was meant to be anyway. The gem prime helpers are a bit too
   much midlayer for my taste (mostly because nvidia wanted to bypass the
   EXPORT_SYMBOL_GPL of core dma-buf, hooray for legal bs). We can always
   extract more helpers once there's more ttm based drivers doing this.


Yeah, I though to abstract that similar to the AGP backend.

Just moving some callbacks around in TTM should be sufficient to 
de-midlayer the whole thing.


Thanks,
Christian.



Overall I like, there's some details to figure out first.
-Daniel




Re: [PATCH 1/4] dma-buf: add optional invalidate_mappings callback

2018-03-12 Thread Christian König

Am 12.03.2018 um 18:07 schrieb Daniel Vetter:

On Fri, Mar 09, 2018 at 08:11:41PM +0100, Christian K??nig wrote:

[SNIP]
  
+/**

+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:[in]buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+
+   reservation_object_assert_held(dmabuf->resv);
+
+   list_for_each_entry(attach, >attachments, node)
+   attach->invalidate_mappings(attach);

To make the locking work I think we also need to require importers to hold
the reservation object while attaching/detaching. Otherwise the list walk
above could go boom.


Oh, good point. Going, to fix this.


[SNIP]

+   /**
+* @supports_mapping_invalidation:
+*
+* True for exporters which supports unpinned DMA-buf operation using
+* the reservation lock.
+*
+* When attachment->invalidate_mappings is set the @map_dma_buf and
+* @unmap_dma_buf callbacks can be called with the reservation lock
+* held.
+*/
+   bool supports_mapping_invalidation;

Why do we need this? Importer could simply always register with the
invalidate_mapping hook registered, and exporters could use it when they
see fit. That gives us more lockdep coverage to make sure importers use
their attachment callbacks correctly (aka they hold the reservation
object).


One sole reason: Backward compability.

I didn't wanted to audit all those different drivers if they can handle 
being called with the reservation lock held.





+
/**
 * @map_dma_buf:
 *
@@ -326,6 +338,29 @@ struct dma_buf_attachment {
struct device *dev;
struct list_head node;
void *priv;
+
+   /**
+* @invalidate_mappings:
+*
+* Optional callback provided by the importer of the attachment which
+* must be set before mappings are created.

This doesn't work, it must be set before the attachment is created,
otherwise you race with your invalidate callback.


Another good point.



I think the simplest option would be to add a new dma_buf_attach_dynamic
(well except a less crappy name).


Well how about adding an optional invalidate_mappings parameter to the 
existing dma_buf_attach?





+*
+* If provided the exporter can avoid pinning the backing store while
+* mappings exists.
+*
+* The function is called with the lock of the reservation object
+* associated with the dma_buf held and the mapping function must be
+* called with this lock held as well. This makes sure that no mapping
+* is created concurrently with an ongoing invalidation.
+*
+* After the callback all existing mappings are still valid until all
+* fences in the dma_bufs reservation object are signaled, but should be
+* destroyed by the importer as soon as possible.

Do we guarantee that the importer will attach a fence, after which the
mapping will be gone? What about re-trying? Or just best effort (i.e. only
useful for evicting to try to make room).


The importer should attach fences for all it's operations with the DMA-buf.


I think a helper which both unmaps _and_ waits for all the fences to clear
would be best, with some guarantees that it'll either fail or all the
mappings _will_ be gone. The locking for that one will be hilarious, since
we need to figure out dmabuf->lock vs. the reservation. I kinda prefer we
throw away the dmabuf->lock and superseed it entirely by the reservation
lock.


Big NAK on that. The whole API is asynchronously, e.g. we never block 
for any operation to finish.


Otherwise you run into big trouble with cross device GPU resets and 
stuff like that.



+*
+* New mappings can be created immediately, but can't be used before the
+* exclusive fence in the dma_bufs reservation object is signaled.
+*/
+   void (*invalidate_mappings)(struct dma_buf_attachment *attach);

Bunch of questions about exact semantics, but I very much like this. And I
think besides those technical details, the overall approach seems sound.


Yeah this initial implementation was buggy like hell. Just wanted to 
confirm that the idea is going in the right direction.


Thanks for the comments,
Christian.


-Daniel





[PATCH 1/4] dma-buf: add optional invalidate_mappings callback

2018-03-09 Thread Christian König
Each importer can now provide an invalidate_mappings callback.

This allows the exporter to provide the mappings without the need to pin
the backing store.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 25 +
 include/linux/dma-buf.h   | 36 
 2 files changed, 61 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..ed8d5844ae74 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -629,6 +629,9 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
 
might_sleep();
 
+   if (attach->invalidate_mappings)
+   reservation_object_assert_held(attach->dmabuf->resv);
+
if (WARN_ON(!attach || !attach->dmabuf))
return ERR_PTR(-EINVAL);
 
@@ -656,6 +659,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 {
might_sleep();
 
+   if (attach->invalidate_mappings)
+   reservation_object_assert_held(attach->dmabuf->resv);
+
if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
return;
 
@@ -664,6 +670,25 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 }
 EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 
+/**
+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:[in]buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+
+   reservation_object_assert_held(dmabuf->resv);
+
+   list_for_each_entry(attach, >attachments, node)
+   attach->invalidate_mappings(attach);
+}
+EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
+
 /**
  * DOC: cpu access
  *
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 085db2fee2d7..c1e2f7d93509 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -91,6 +91,18 @@ struct dma_buf_ops {
 */
void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
 
+   /**
+* @supports_mapping_invalidation:
+*
+* True for exporters which supports unpinned DMA-buf operation using
+* the reservation lock.
+*
+* When attachment->invalidate_mappings is set the @map_dma_buf and
+* @unmap_dma_buf callbacks can be called with the reservation lock
+* held.
+*/
+   bool supports_mapping_invalidation;
+
/**
 * @map_dma_buf:
 *
@@ -326,6 +338,29 @@ struct dma_buf_attachment {
struct device *dev;
struct list_head node;
void *priv;
+
+   /**
+* @invalidate_mappings:
+*
+* Optional callback provided by the importer of the attachment which
+* must be set before mappings are created.
+*
+* If provided the exporter can avoid pinning the backing store while
+* mappings exists.
+*
+* The function is called with the lock of the reservation object
+* associated with the dma_buf held and the mapping function must be
+* called with this lock held as well. This makes sure that no mapping
+* is created concurrently with an ongoing invalidation.
+*
+* After the callback all existing mappings are still valid until all
+* fences in the dma_bufs reservation object are signaled, but should be
+* destroyed by the importer as soon as possible.
+*
+* New mappings can be created immediately, but can't be used before the
+* exclusive fence in the dma_bufs reservation object is signaled.
+*/
+   void (*invalidate_mappings)(struct dma_buf_attachment *attach);
 };
 
 /**
@@ -391,6 +426,7 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *,
enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
enum dma_data_direction);
+void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
 int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 enum dma_data_direction dir);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
-- 
2.14.1



[PATCH 3/4] drm/amdgpu: add independent DMA-buf export

2018-03-09 Thread Christian König
Instead of relying on the DRM functions just implement our own export
functions. This adds support for taking care of unpinned DMA-buf.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c  | 117 ++---
 4 files changed, 79 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 96bcdb97e7e2..909dc9764a22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -371,7 +371,6 @@ int amdgpu_gem_object_open(struct drm_gem_object *obj,
 void amdgpu_gem_object_close(struct drm_gem_object *obj,
struct drm_file *file_priv);
 unsigned long amdgpu_gem_timeout(uint64_t timeout_ns);
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj);
 struct drm_gem_object *
 amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
 struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e6709362994a..e32dcdf8d3db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -886,7 +886,6 @@ static struct drm_driver kms_driver = {
.gem_prime_export = amdgpu_gem_prime_export,
.gem_prime_import = amdgpu_gem_prime_import,
.gem_prime_res_obj = amdgpu_gem_prime_res_obj,
-   .gem_prime_get_sg_table = amdgpu_gem_prime_get_sg_table,
.gem_prime_import_sg_table = amdgpu_gem_prime_import_sg_table,
.gem_prime_vmap = amdgpu_gem_prime_vmap,
.gem_prime_vunmap = amdgpu_gem_prime_vunmap,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 48e0115d4f76..d5db5955a70a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -31,6 +31,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -931,6 +932,9 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 
amdgpu_bo_kunmap(abo);
 
+   if (abo->gem_base.dma_buf)
+   dma_buf_invalidate_mappings(abo->gem_base.dma_buf);
+
/* remember the eviction */
if (evict)
atomic64_inc(>num_evictions);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c9991738477..f575fb46d7a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -32,13 +32,9 @@
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
-struct sg_table *amdgpu_gem_prime_get_sg_table(struct drm_gem_object *obj)
-{
-   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
-   int npages = bo->tbo.num_pages;
-
-   return drm_prime_pages_to_sg(bo->tbo.ttm->pages, npages);
-}
+static void amdgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
+struct sg_table *sgt,
+enum dma_data_direction dir);
 
 void *amdgpu_gem_prime_vmap(struct drm_gem_object *obj)
 {
@@ -126,22 +122,21 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
-static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
-struct device *target_dev,
-struct dma_buf_attachment *attach)
+static struct sg_table *
+amdgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
+  enum dma_data_direction dir)
 {
+   struct dma_buf *dma_buf = attach->dmabuf;
struct drm_gem_object *obj = dma_buf->priv;
struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+   struct sg_table *sgt;
long r;
 
-   r = drm_gem_map_attach(dma_buf, target_dev, attach);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_reserve(bo, false);
-   if (unlikely(r != 0))
-   goto error_detach;
-
+   if (!attach->invalidate_mappings) {
+   r = amdgpu_bo_reserve(bo, false);
+   if (unlikely(r != 0))
+   return ERR_PTR(r);
+   }
 
if (dma_buf->ops != _dmabuf_ops) {
/*
@@ -157,41 +152,80 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
}
}
 
-   /* pin buffer into GTT */
-   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
-   if (r)
-   goto error_unreserve;
+   if (!attach->invalidate_mappings || true) {
+   /* pin buffer into GTT */
+   r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+   if (r)
+   goto error_unreserve;
+
+   } else {

[PATCH 4/4] drm/amdgpu: add independent DMA-buf import

2018-03-09 Thread Christian König
Instead of relying on the DRM functions just implement our own import
functions. This adds support for taking care of unpinned DMA-buf.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 64 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 32 +---
 2 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index f575fb46d7a8..7963ce329519 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -298,22 +298,64 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device 
*dev,
return buf;
 }
 
+static void
+amdgpu_gem_prime_invalidate_mappings(struct dma_buf_attachment *attach)
+{
+   struct ttm_operation_ctx ctx = { false, false };
+   struct drm_gem_object *obj = attach->priv;
+   struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+   int r;
+
+   amdgpu_ttm_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+   r = ttm_bo_validate(>tbo, >placement, );
+   if (r)
+   DRM_ERROR("Failed to unmap DMA-buf import (%d))\n", r);
+}
+
 struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
struct dma_buf *dma_buf)
 {
+   struct dma_buf_attachment *attach;
struct drm_gem_object *obj;
+   int ret;
 
-   if (dma_buf->ops == _dmabuf_ops) {
-   obj = dma_buf->priv;
-   if (obj->dev == dev) {
-   /*
-* Importing dmabuf exported from out own gem increases
-* refcount on gem itself instead of f_count of dmabuf.
-*/
-   drm_gem_object_get(obj);
-   return obj;
-   }
+   if (dma_buf->ops != _dmabuf_ops)
+   return drm_gem_prime_import(dev, dma_buf);
+
+   obj = dma_buf->priv;
+   if (obj->dev == dev) {
+   /*
+* Importing dmabuf exported from out own gem increases
+* refcount on gem itself instead of f_count of dmabuf.
+*/
+   drm_gem_object_get(obj);
+   return obj;
+   }
+
+   /*
+* Attach, but don't map other amdgpu BOs
+*/
+   attach = dma_buf_attach(dma_buf, dev->dev);
+   if (IS_ERR(attach))
+   return ERR_CAST(attach);
+
+   get_dma_buf(dma_buf);
+
+   obj = amdgpu_gem_prime_import_sg_table(dev, attach, NULL);
+   if (IS_ERR(obj)) {
+   ret = PTR_ERR(obj);
+   goto fail_detach;
}
 
-   return drm_gem_prime_import(dev, dma_buf);
+   obj->import_attach = attach;
+   attach->invalidate_mappings = amdgpu_gem_prime_invalidate_mappings;
+   attach->priv = obj;
+
+   return obj;
+
+fail_detach:
+   dma_buf_detach(dma_buf, attach);
+   dma_buf_put(dma_buf);
+
+   return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 291dd3d600cd..aeead0281e92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
@@ -685,6 +686,7 @@ struct amdgpu_ttm_gup_task_list {
 
 struct amdgpu_ttm_tt {
struct ttm_dma_tt   ttm;
+   struct amdgpu_bo*bo;
u64 offset;
uint64_tuserptr;
struct mm_struct*usermm;
@@ -993,6 +995,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct 
ttm_buffer_object *bo,
return NULL;
}
gtt->ttm.ttm.func = _backend_func;
+   gtt->bo = ttm_to_amdgpu_bo(bo);
if (ttm_sg_tt_init(>ttm, bo, page_flags)) {
kfree(gtt);
return NULL;
@@ -1005,7 +1008,6 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
struct amdgpu_ttm_tt *gtt = (void *)ttm;
-   bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG);
 
if (gtt && gtt->userptr) {
ttm->sg = kzalloc(sizeof(struct sg_table), GFP_KERNEL);
@@ -1017,7 +1019,19 @@ static int amdgpu_ttm_tt_populate(struct ttm_tt *ttm,
return 0;
}
 
-   if (slave && ttm->sg) {
+   if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
+   if (!ttm->sg) {
+   struct dma_buf_attachment *attach;
+   struct sg_table *sgt;
+
+   attach = gtt->bo->gem_base.import_attach;
+   sgt

RFC: unpinned DMA-buf exporting

2018-03-09 Thread Christian König
This set of patches adds an option invalidate_mappings callback to each DMA-buf 
attachment which can be filled in by the importer.

This callback allows the exporter to provided the DMA-buf content without 
pinning it. The reservation objects lock acts as synchronization point for 
buffer moves and creating mappings.

This set includes an implementation for amdgpu which should be rather easily 
portable to other DRM drivers.

Please comment,
Christian.



[PATCH 2/4] drm/ttm: keep a reference to transfer pipelined BOs

2018-03-09 Thread Christian König
Make sure the transfered BO is never destroy before the transfer BO.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 50 +++
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1f730b3f18e5..1ee20558ee31 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -39,6 +39,11 @@
 #include 
 #include 
 
+struct ttm_transfer_obj {
+   struct ttm_buffer_object base;
+   struct ttm_buffer_object *bo;
+};
+
 void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
 {
ttm_bo_mem_put(bo, >mem);
@@ -435,7 +440,11 @@ EXPORT_SYMBOL(ttm_bo_move_memcpy);
 
 static void ttm_transfered_destroy(struct ttm_buffer_object *bo)
 {
-   kfree(bo);
+   struct ttm_transfer_obj *fbo;
+
+   fbo = container_of(bo, struct ttm_transfer_obj, base);
+   ttm_bo_unref(>bo);
+   kfree(fbo);
 }
 
 /**
@@ -456,14 +465,15 @@ static void ttm_transfered_destroy(struct 
ttm_buffer_object *bo)
 static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
  struct ttm_buffer_object **new_obj)
 {
-   struct ttm_buffer_object *fbo;
+   struct ttm_transfer_obj *fbo;
int ret;
 
fbo = kmalloc(sizeof(*fbo), GFP_KERNEL);
if (!fbo)
return -ENOMEM;
 
-   *fbo = *bo;
+   fbo->base = *bo;
+   fbo->bo = ttm_bo_reference(bo);
 
/**
 * Fix up members that we shouldn't copy directly:
@@ -471,25 +481,25 @@ static int ttm_buffer_object_transfer(struct 
ttm_buffer_object *bo,
 */
 
atomic_inc(>bdev->glob->bo_count);
-   INIT_LIST_HEAD(>ddestroy);
-   INIT_LIST_HEAD(>lru);
-   INIT_LIST_HEAD(>swap);
-   INIT_LIST_HEAD(>io_reserve_lru);
-   mutex_init(>wu_mutex);
-   fbo->moving = NULL;
-   drm_vma_node_reset(>vma_node);
-   atomic_set(>cpu_writers, 0);
-
-   kref_init(>list_kref);
-   kref_init(>kref);
-   fbo->destroy = _transfered_destroy;
-   fbo->acc_size = 0;
-   fbo->resv = >ttm_resv;
-   reservation_object_init(fbo->resv);
-   ret = reservation_object_trylock(fbo->resv);
+   INIT_LIST_HEAD(>base.ddestroy);
+   INIT_LIST_HEAD(>base.lru);
+   INIT_LIST_HEAD(>base.swap);
+   INIT_LIST_HEAD(>base.io_reserve_lru);
+   mutex_init(>base.wu_mutex);
+   fbo->base.moving = NULL;
+   drm_vma_node_reset(>base.vma_node);
+   atomic_set(>base.cpu_writers, 0);
+
+   kref_init(>base.list_kref);
+   kref_init(>base.kref);
+   fbo->base.destroy = _transfered_destroy;
+   fbo->base.acc_size = 0;
+   fbo->base.resv = >base.ttm_resv;
+   reservation_object_init(fbo->base.resv);
+   ret = reservation_object_trylock(fbo->base.resv);
WARN_ON(!ret);
 
-   *new_obj = fbo;
+   *new_obj = >base;
return 0;
 }
 
-- 
2.14.1



[PATCH 1/4] dma-buf: add optional invalidate_mappings callback

2018-03-09 Thread Christian König
Each importer can now provide an invalidate_mappings callback.

This allows the exporter to provide the mappings without the need to pin
the backing store.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/dma-buf.c | 25 +
 include/linux/dma-buf.h   | 36 
 2 files changed, 61 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..ed8d5844ae74 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -629,6 +629,9 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *attach,
 
might_sleep();
 
+   if (attach->invalidate_mappings)
+   reservation_object_assert_held(attach->dmabuf->resv);
+
if (WARN_ON(!attach || !attach->dmabuf))
return ERR_PTR(-EINVAL);
 
@@ -656,6 +659,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 {
might_sleep();
 
+   if (attach->invalidate_mappings)
+   reservation_object_assert_held(attach->dmabuf->resv);
+
if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
return;
 
@@ -664,6 +670,25 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
*attach,
 }
 EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 
+/**
+ * dma_buf_invalidate_mappings - invalidate all mappings of this dma_buf
+ *
+ * @dmabuf:[in]buffer which mappings should be invalidated
+ *
+ * Informs all attachmenst that they need to destroy and recreated all their
+ * mappings.
+ */
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
+{
+   struct dma_buf_attachment *attach;
+
+   reservation_object_assert_held(dmabuf->resv);
+
+   list_for_each_entry(attach, >attachments, node)
+   attach->invalidate_mappings(attach);
+}
+EXPORT_SYMBOL_GPL(dma_buf_invalidate_mappings);
+
 /**
  * DOC: cpu access
  *
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 085db2fee2d7..c1e2f7d93509 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -91,6 +91,18 @@ struct dma_buf_ops {
 */
void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
 
+   /**
+* @supports_mapping_invalidation:
+*
+* True for exporters which supports unpinned DMA-buf operation using
+* the reservation lock.
+*
+* When attachment->invalidate_mappings is set the @map_dma_buf and
+* @unmap_dma_buf callbacks can be called with the reservation lock
+* held.
+*/
+   bool supports_mapping_invalidation;
+
/**
 * @map_dma_buf:
 *
@@ -326,6 +338,29 @@ struct dma_buf_attachment {
struct device *dev;
struct list_head node;
void *priv;
+
+   /**
+* @invalidate_mappings:
+*
+* Optional callback provided by the importer of the attachment which
+* must be set before mappings are created.
+*
+* If provided the exporter can avoid pinning the backing store while
+* mappings exists.
+*
+* The function is called with the lock of the reservation object
+* associated with the dma_buf held and the mapping function must be
+* called with this lock held as well. This makes sure that no mapping
+* is created concurrently with an ongoing invalidation.
+*
+* After the callback all existing mappings are still valid until all
+* fences in the dma_bufs reservation object are signaled, but should be
+* destroyed by the importer as soon as possible.
+*
+* New mappings can be created immediately, but can't be used before the
+* exclusive fence in the dma_bufs reservation object is signaled.
+*/
+   void (*invalidate_mappings)(struct dma_buf_attachment *attach);
 };
 
 /**
@@ -391,6 +426,7 @@ struct sg_table *dma_buf_map_attachment(struct 
dma_buf_attachment *,
enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
enum dma_data_direction);
+void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
 int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 enum dma_data_direction dir);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
-- 
2.14.1



Re: [PATCH 1/3] dma-buf: make returning the exclusive fence optional

2018-01-16 Thread Christian König

Ping? Daniel you requested the patch with its user.

Would be nice when I can commit this cause we need it for debugging and 
cleaning up a bunch of other things as well.


Regards,
Christian.

Am 12.01.2018 um 10:47 schrieb Christian König:

Change reservation_object_get_fences_rcu to make the exclusive fence
pointer optional.

If not specified the exclusive fence is put into the fence array as
well.

This is helpful for a couple of cases where we need all fences in a
single array.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
  drivers/dma-buf/reservation.c | 31 ++-
  1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b759a569b7b8..461afa9febd4 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -374,8 +374,9 @@ EXPORT_SYMBOL(reservation_object_copy_fences);
   * @pshared: the array of shared fence ptrs returned (array is krealloc'd to
   * the required size, and must be freed by caller)
   *
- * RETURNS
- * Zero or -errno
+ * Retrieve all fences from the reservation object. If the pointer for the
+ * exclusive fence is not specified the fence is put into the array of the
+ * shared fences as well. Returns either zero or -ENOMEM.
   */
  int reservation_object_get_fences_rcu(struct reservation_object *obj,
  struct dma_fence **pfence_excl,
@@ -389,8 +390,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
  
  	do {

struct reservation_object_list *fobj;
-   unsigned seq;
-   unsigned int i;
+   unsigned int i, seq;
+   size_t sz = 0;
  
  		shared_count = i = 0;
  
@@ -402,9 +403,14 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,

goto unlock;
  
  		fobj = rcu_dereference(obj->fence);

-   if (fobj) {
+   if (fobj)
+   sz += sizeof(*shared) * fobj->shared_max;
+
+   if (!pfence_excl && fence_excl)
+   sz += sizeof(*shared);
+
+   if (sz) {
struct dma_fence **nshared;
-   size_t sz = sizeof(*shared) * fobj->shared_max;
  
  			nshared = krealloc(shared, sz,

   GFP_NOWAIT | __GFP_NOWARN);
@@ -420,13 +426,19 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
break;
}
shared = nshared;
-   shared_count = fobj->shared_count;
-
+   shared_count = fobj ? fobj->shared_count : 0;
for (i = 0; i < shared_count; ++i) {
shared[i] = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(shared[i]))
break;
}
+
+   if (!pfence_excl && fence_excl) {
+   shared[i] = fence_excl;
+   fence_excl = NULL;
+   ++i;
+   ++shared_count;
+   }
}
  
  		if (i != shared_count || read_seqcount_retry(>seq, seq)) {

@@ -448,7 +460,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
  
  	*pshared_count = shared_count;

*pshared = shared;
-   *pfence_excl = fence_excl;
+   if (pfence_excl)
+   *pfence_excl = fence_excl;
  
  	return ret;

  }




[PATCH 2/3] drm/amdgpu: add amdgpu_pasid_free_delayed v2

2018-01-12 Thread Christian König
Free up a pasid after all fences signaled.

v2: also handle the case when we can't allocate a fence array.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 82 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |  2 +
 2 files changed, 84 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index 5248a3232aff..842caa5ed73b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -40,6 +40,12 @@
  */
 static DEFINE_IDA(amdgpu_pasid_ida);
 
+/* Helper to free pasid from a fence callback */
+struct amdgpu_pasid_cb {
+   struct dma_fence_cb cb;
+   unsigned int pasid;
+};
+
 /**
  * amdgpu_pasid_alloc - Allocate a PASID
  * @bits: Maximum width of the PASID in bits, must be at least 1
@@ -75,6 +81,82 @@ void amdgpu_pasid_free(unsigned int pasid)
ida_simple_remove(_pasid_ida, pasid);
 }
 
+static void amdgpu_pasid_free_cb(struct dma_fence *fence,
+struct dma_fence_cb *_cb)
+{
+   struct amdgpu_pasid_cb *cb =
+   container_of(_cb, struct amdgpu_pasid_cb, cb);
+
+   amdgpu_pasid_free(cb->pasid);
+   dma_fence_put(fence);
+   kfree(cb);
+}
+
+/**
+ * amdgpu_pasid_free_delayed - free pasid when fences signal
+ *
+ * @resv: reservation object with the fences to wait for
+ * @pasid: pasid to free
+ *
+ * Free the pasid only after all the fences in resv are signaled.
+ */
+void amdgpu_pasid_free_delayed(struct reservation_object *resv,
+  unsigned int pasid)
+{
+   struct dma_fence *fence, **fences;
+   struct amdgpu_pasid_cb *cb;
+   unsigned count;
+   int r;
+
+   r = reservation_object_get_fences_rcu(resv, NULL, , );
+   if (r)
+   goto fallback;
+
+   if (count == 0) {
+   amdgpu_pasid_free(pasid);
+   return;
+   }
+
+   if (count == 1) {
+   fence = fences[0];
+   kfree(fences);
+   } else {
+   uint64_t context = dma_fence_context_alloc(1);
+   struct dma_fence_array *array;
+
+   array = dma_fence_array_create(count, fences, context,
+  1, false);
+   if (!array) {
+   kfree(fences);
+   goto fallback;
+   }
+   fence = >base;
+   }
+
+   cb = kmalloc(sizeof(*cb), GFP_KERNEL);
+   if (!cb) {
+   /* Last resort when we are OOM */
+   dma_fence_wait(fence, false);
+   dma_fence_put(fence);
+   amdgpu_pasid_free(pasid);
+   } else {
+   cb->pasid = pasid;
+   if (dma_fence_add_callback(fence, >cb,
+  amdgpu_pasid_free_cb))
+   amdgpu_pasid_free_cb(fence, >cb);
+   }
+
+   return;
+
+fallback:
+   /* Not enough memory for the delayed delete, as last resort
+* block for all the fences to complete.
+*/
+   reservation_object_wait_timeout_rcu(resv, true, false,
+   MAX_SCHEDULE_TIMEOUT);
+   amdgpu_pasid_free(pasid);
+}
+
 /*
  * VMID manager
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
index ad931fa570b3..38f37c16fc5e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
@@ -69,6 +69,8 @@ struct amdgpu_vmid_mgr {
 
 int amdgpu_pasid_alloc(unsigned int bits);
 void amdgpu_pasid_free(unsigned int pasid);
+void amdgpu_pasid_free_delayed(struct reservation_object *resv,
+  unsigned int pasid);
 
 bool amdgpu_vmid_had_gpu_reset(struct amdgpu_device *adev,
   struct amdgpu_vmid *id);
-- 
2.14.1



[PATCH 1/3] dma-buf: make returning the exclusive fence optional

2018-01-12 Thread Christian König
Change reservation_object_get_fences_rcu to make the exclusive fence
pointer optional.

If not specified the exclusive fence is put into the fence array as
well.

This is helpful for a couple of cases where we need all fences in a
single array.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/reservation.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b759a569b7b8..461afa9febd4 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -374,8 +374,9 @@ EXPORT_SYMBOL(reservation_object_copy_fences);
  * @pshared: the array of shared fence ptrs returned (array is krealloc'd to
  * the required size, and must be freed by caller)
  *
- * RETURNS
- * Zero or -errno
+ * Retrieve all fences from the reservation object. If the pointer for the
+ * exclusive fence is not specified the fence is put into the array of the
+ * shared fences as well. Returns either zero or -ENOMEM.
  */
 int reservation_object_get_fences_rcu(struct reservation_object *obj,
  struct dma_fence **pfence_excl,
@@ -389,8 +390,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
 
do {
struct reservation_object_list *fobj;
-   unsigned seq;
-   unsigned int i;
+   unsigned int i, seq;
+   size_t sz = 0;
 
shared_count = i = 0;
 
@@ -402,9 +403,14 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
goto unlock;
 
fobj = rcu_dereference(obj->fence);
-   if (fobj) {
+   if (fobj)
+   sz += sizeof(*shared) * fobj->shared_max;
+
+   if (!pfence_excl && fence_excl)
+   sz += sizeof(*shared);
+
+   if (sz) {
struct dma_fence **nshared;
-   size_t sz = sizeof(*shared) * fobj->shared_max;
 
nshared = krealloc(shared, sz,
   GFP_NOWAIT | __GFP_NOWARN);
@@ -420,13 +426,19 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
break;
}
shared = nshared;
-   shared_count = fobj->shared_count;
-
+   shared_count = fobj ? fobj->shared_count : 0;
for (i = 0; i < shared_count; ++i) {
shared[i] = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(shared[i]))
break;
}
+
+   if (!pfence_excl && fence_excl) {
+   shared[i] = fence_excl;
+   fence_excl = NULL;
+   ++i;
+   ++shared_count;
+   }
}
 
if (i != shared_count || read_seqcount_retry(>seq, seq)) {
@@ -448,7 +460,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
 
*pshared_count = shared_count;
*pshared = shared;
-   *pfence_excl = fence_excl;
+   if (pfence_excl)
+   *pfence_excl = fence_excl;
 
return ret;
 }
-- 
2.14.1



[PATCH 3/3] drm/amdgpu: always allocate a PASIDs for each VM v2

2018-01-12 Thread Christian König
Start to always allocate a pasid for each VM.

v2: use dev_warn when we run out of PASIDs

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 43 ++---
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 5773a581761b..a108b30d8186 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -805,7 +805,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
 {
struct amdgpu_device *adev = dev->dev_private;
struct amdgpu_fpriv *fpriv;
-   int r;
+   int r, pasid;
 
file_priv->driver_priv = NULL;
 
@@ -819,28 +819,25 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
goto out_suspend;
}
 
-   r = amdgpu_vm_init(adev, >vm,
-  AMDGPU_VM_CONTEXT_GFX, 0);
-   if (r) {
-   kfree(fpriv);
-   goto out_suspend;
+   pasid = amdgpu_pasid_alloc(16);
+   if (pasid < 0) {
+   dev_warn(adev->dev, "No more PASIDs available!");
+   pasid = 0;
}
+   r = amdgpu_vm_init(adev, >vm, AMDGPU_VM_CONTEXT_GFX, pasid);
+   if (r)
+   goto error_pasid;
 
fpriv->prt_va = amdgpu_vm_bo_add(adev, >vm, NULL);
if (!fpriv->prt_va) {
r = -ENOMEM;
-   amdgpu_vm_fini(adev, >vm);
-   kfree(fpriv);
-   goto out_suspend;
+   goto error_vm;
}
 
if (amdgpu_sriov_vf(adev)) {
r = amdgpu_map_static_csa(adev, >vm, >csa_va);
-   if (r) {
-   amdgpu_vm_fini(adev, >vm);
-   kfree(fpriv);
-   goto out_suspend;
-   }
+   if (r)
+   goto error_vm;
}
 
mutex_init(>bo_list_lock);
@@ -849,6 +846,16 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
amdgpu_ctx_mgr_init(>ctx_mgr);
 
file_priv->driver_priv = fpriv;
+   goto out_suspend;
+
+error_vm:
+   amdgpu_vm_fini(adev, >vm);
+
+error_pasid:
+   if (pasid)
+   amdgpu_pasid_free(pasid);
+
+   kfree(fpriv);
 
 out_suspend:
pm_runtime_mark_last_busy(dev->dev);
@@ -871,6 +878,8 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
struct amdgpu_device *adev = dev->dev_private;
struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
struct amdgpu_bo_list *list;
+   struct amdgpu_bo *pd;
+   unsigned int pasid;
int handle;
 
if (!fpriv)
@@ -895,7 +904,13 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
amdgpu_bo_unreserve(adev->virt.csa_obj);
}
 
+   pasid = fpriv->vm.pasid;
+   pd = amdgpu_bo_ref(fpriv->vm.root.base.bo);
+
amdgpu_vm_fini(adev, >vm);
+   if (pasid)
+   amdgpu_pasid_free_delayed(pd->tbo.resv, pasid);
+   amdgpu_bo_unref();
 
idr_for_each_entry(>bo_list_handles, list, handle)
amdgpu_bo_list_free(list);
-- 
2.14.1



Re: [Linaro-mm-sig] [PATCH] dma-buf: add some lockdep asserts to the reservation object implementation

2018-01-11 Thread Christian König

Yeah, somehow missed that one.

The patch looks mostly good, except for reservation_object_get_excl().

For that one an RCU protection is usually sufficient, so annotating it 
with reservation_object_assert_held() sounds incorrect to me.


Regards,
Christian.

Am 11.01.2018 um 11:43 schrieb Lucas Stach:

Did this fall through the cracks over the holidays? It really has made
my work much easier while reworking some of the reservation object
handling in etnaviv and I think it might benefit others.

Regards,
Lucas

Am Freitag, den 01.12.2017, 12:12 +0100 schrieb Lucas Stach:

This adds lockdep asserts to the reservation functions which state in their
documentation that obj->lock must be held. Allows builds with PROVE_LOCKING
enabled to check that the locking requirements are met.


Signed-off-by: Lucas Stach 

---
  drivers/dma-buf/reservation.c | 8 
  include/linux/reservation.h   | 2 ++
  2 files changed, 10 insertions(+)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b44d9d7db347..accd398e2ea6 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -71,6 +71,8 @@ int reservation_object_reserve_shared(struct 
reservation_object *obj)

    struct reservation_object_list *fobj, *old;
    u32 max;
  

+   reservation_object_assert_held(obj);

+

    old = reservation_object_get_list(obj);
  

    if (old && old->shared_max) {

@@ -211,6 +213,8 @@ void reservation_object_add_shared_fence(struct 
reservation_object *obj,
  {

    struct reservation_object_list *old, *fobj = obj->staged;
  

+   reservation_object_assert_held(obj);

+

    old = reservation_object_get_list(obj);
    obj->staged = NULL;
  
@@ -236,6 +240,8 @@ void reservation_object_add_excl_fence(struct reservation_object *obj,

    struct reservation_object_list *old;
    u32 i = 0;
  

+   reservation_object_assert_held(obj);

+

    old = reservation_object_get_list(obj);
    if (old)
    i = old->shared_count;

@@ -276,6 +282,8 @@ int reservation_object_copy_fences(struct 
reservation_object *dst,

    size_t size;
    unsigned i;
  

+   reservation_object_assert_held(dst);

+

    rcu_read_lock();
    src_list = rcu_dereference(src->fence);
  
diff --git a/include/linux/reservation.h b/include/linux/reservation.h

index 21fc84d82d41..55e7318800fd 100644
--- a/include/linux/reservation.h
+++ b/include/linux/reservation.h
@@ -212,6 +212,8 @@ reservation_object_unlock(struct reservation_object *obj)
  static inline struct dma_fence *
  reservation_object_get_excl(struct reservation_object *obj)
  {

+   reservation_object_assert_held(obj);

+

    return rcu_dereference_protected(obj->fence_excl,
     reservation_object_held(obj));

  }

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: [Linaro-mm-sig] [PATCH] dma-buf: make returning the exclusive fence optional

2018-01-10 Thread Christian König

Am 10.01.2018 um 14:21 schrieb Daniel Vetter:

On Wed, Jan 10, 2018 at 01:53:41PM +0100, Christian König wrote:

Change reservation_object_get_fences_rcu to make the exclusive fence
pointer optional.

If not specified the exclusive fence is put into the fence array as
well.

This is helpful for a couple of cases where we need all fences in a
single array.

Signed-off-by: Christian König <christian.koe...@amd.com>

Seeing the use-case for this would be a lot more interesting ...


Yeah, sorry the use case is a 20 patches set on amd-gfx.

Didn't wanted to post all those here as well.

Christian.


-Daniel


---
  drivers/dma-buf/reservation.c | 31 ++-
  1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b759a569b7b8..461afa9febd4 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -374,8 +374,9 @@ EXPORT_SYMBOL(reservation_object_copy_fences);
   * @pshared: the array of shared fence ptrs returned (array is krealloc'd to
   * the required size, and must be freed by caller)
   *
- * RETURNS
- * Zero or -errno
+ * Retrieve all fences from the reservation object. If the pointer for the
+ * exclusive fence is not specified the fence is put into the array of the
+ * shared fences as well. Returns either zero or -ENOMEM.
   */
  int reservation_object_get_fences_rcu(struct reservation_object *obj,
  struct dma_fence **pfence_excl,
@@ -389,8 +390,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
  
  	do {

struct reservation_object_list *fobj;
-   unsigned seq;
-   unsigned int i;
+   unsigned int i, seq;
+   size_t sz = 0;
  
  		shared_count = i = 0;
  
@@ -402,9 +403,14 @@ int reservation_object_get_fences_rcu(struct reservation_object *obj,

goto unlock;
  
  		fobj = rcu_dereference(obj->fence);

-   if (fobj) {
+   if (fobj)
+   sz += sizeof(*shared) * fobj->shared_max;
+
+   if (!pfence_excl && fence_excl)
+   sz += sizeof(*shared);
+
+   if (sz) {
struct dma_fence **nshared;
-   size_t sz = sizeof(*shared) * fobj->shared_max;
  
  			nshared = krealloc(shared, sz,

   GFP_NOWAIT | __GFP_NOWARN);
@@ -420,13 +426,19 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
break;
}
shared = nshared;
-   shared_count = fobj->shared_count;
-
+   shared_count = fobj ? fobj->shared_count : 0;
for (i = 0; i < shared_count; ++i) {
shared[i] = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(shared[i]))
break;
}
+
+   if (!pfence_excl && fence_excl) {
+   shared[i] = fence_excl;
+   fence_excl = NULL;
+   ++i;
+   ++shared_count;
+   }
}
  
  		if (i != shared_count || read_seqcount_retry(>seq, seq)) {

@@ -448,7 +460,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
  
  	*pshared_count = shared_count;

*pshared = shared;
-   *pfence_excl = fence_excl;
+   if (pfence_excl)
+   *pfence_excl = fence_excl;
  
  	return ret;

  }
--
2.14.1

___
Linaro-mm-sig mailing list
linaro-mm-...@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-mm-sig




[PATCH] dma-buf: make returning the exclusive fence optional

2018-01-10 Thread Christian König
Change reservation_object_get_fences_rcu to make the exclusive fence
pointer optional.

If not specified the exclusive fence is put into the fence array as
well.

This is helpful for a couple of cases where we need all fences in a
single array.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/reservation.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index b759a569b7b8..461afa9febd4 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -374,8 +374,9 @@ EXPORT_SYMBOL(reservation_object_copy_fences);
  * @pshared: the array of shared fence ptrs returned (array is krealloc'd to
  * the required size, and must be freed by caller)
  *
- * RETURNS
- * Zero or -errno
+ * Retrieve all fences from the reservation object. If the pointer for the
+ * exclusive fence is not specified the fence is put into the array of the
+ * shared fences as well. Returns either zero or -ENOMEM.
  */
 int reservation_object_get_fences_rcu(struct reservation_object *obj,
  struct dma_fence **pfence_excl,
@@ -389,8 +390,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
 
do {
struct reservation_object_list *fobj;
-   unsigned seq;
-   unsigned int i;
+   unsigned int i, seq;
+   size_t sz = 0;
 
shared_count = i = 0;
 
@@ -402,9 +403,14 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
goto unlock;
 
fobj = rcu_dereference(obj->fence);
-   if (fobj) {
+   if (fobj)
+   sz += sizeof(*shared) * fobj->shared_max;
+
+   if (!pfence_excl && fence_excl)
+   sz += sizeof(*shared);
+
+   if (sz) {
struct dma_fence **nshared;
-   size_t sz = sizeof(*shared) * fobj->shared_max;
 
nshared = krealloc(shared, sz,
   GFP_NOWAIT | __GFP_NOWARN);
@@ -420,13 +426,19 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
break;
}
shared = nshared;
-   shared_count = fobj->shared_count;
-
+   shared_count = fobj ? fobj->shared_count : 0;
for (i = 0; i < shared_count; ++i) {
shared[i] = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(shared[i]))
break;
}
+
+   if (!pfence_excl && fence_excl) {
+   shared[i] = fence_excl;
+   fence_excl = NULL;
+   ++i;
+   ++shared_count;
+   }
}
 
if (i != shared_count || read_seqcount_retry(>seq, seq)) {
@@ -448,7 +460,8 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
 
*pshared_count = shared_count;
*pshared = shared;
-   *pfence_excl = fence_excl;
+   if (pfence_excl)
+   *pfence_excl = fence_excl;
 
return ret;
 }
-- 
2.14.1



Re: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14

2017-12-01 Thread Christian König

Am 01.12.2017 um 01:23 schrieb Lyude Paul:

I haven't gone to see where it started, but as of late a good number of
pretty nasty deadlock issues have appeared with the kernel. Easy
reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:

DRI_PRIME=1 glxinfo


Acked-by: Christian König <christian.koe...@amd.com>

Thanks for taking care of this,
Christian.



Additionally, some more race conditions exist that I've managed to
trigger with piglit and lockdep enabled after applying these patches:

 =
 WARNING: suspicious RCU usage
 4.14.3Lyude-Test+ #2 Not tainted
 -
 ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() 
usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by ext_image_dma_b/27451:
  #0:  (reservation_ww_class_mutex){+.+.}, at: [] 
ttm_bo_unref+0x9f/0x3c0 [ttm]

 stack backtrace:
 CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
 Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
 Call Trace:
  dump_stack+0x8e/0xce
  lockdep_rcu_suspicious+0xc5/0x100
  reservation_object_copy_fences+0x292/0x2b0
  ? ttm_bo_unref+0x9f/0x3c0 [ttm]
  ttm_bo_unref+0xbd/0x3c0 [ttm]
  amdgpu_bo_unref+0x2a/0x50 [amdgpu]
  amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
  drm_gem_object_free+0x1f/0x40 [drm]
  drm_gem_object_put_unlocked+0x40/0xb0 [drm]
  drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
  drm_gem_object_release_handle+0x51/0x90 [drm]
  drm_gem_handle_delete+0x5e/0x90 [drm]
  ? drm_gem_handle_create+0x40/0x40 [drm]
  drm_gem_close_ioctl+0x20/0x30 [drm]
  drm_ioctl_kernel+0x5d/0xb0 [drm]
  drm_ioctl+0x2f7/0x3b0 [drm]
  ? drm_gem_handle_create+0x40/0x40 [drm]
  ? trace_hardirqs_on_caller+0xf4/0x190
  ? trace_hardirqs_on+0xd/0x10
  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
  do_vfs_ioctl+0x93/0x670
  ? __fget+0x108/0x1f0
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2

I've also added the relevant fixes for the issue mentioned above.

Christian König (3):
   drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
   dma-buf: make reservation_object_copy_fences rcu save
   drm/amdgpu: reserve root PD while releasing it

Michel Dänzer (1):
   drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

  drivers/dma-buf/reservation.c  | 56 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++--
  drivers/gpu/drm/ttm/ttm_bo.c   | 43 +-
  3 files changed, 74 insertions(+), 38 deletions(-)

--
2.14.3

___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




Re: [PATCH] reservation: don't wait when timeout=0

2017-11-21 Thread Christian König

Am 21.11.2017 um 16:58 schrieb Chris Wilson:

Quoting Christian König (2017-11-21 15:49:55)

Am 21.11.2017 um 15:59 schrieb Rob Clark:

On Tue, Nov 21, 2017 at 9:38 AM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:

Quoting Rob Clark (2017-11-21 14:08:46)

If we are testing if a reservation object's fences have been
signaled with timeout=0 (non-blocking), we need to pass 0 for
timeout to dma_fence_wait_timeout().

Plus bonus spelling correction.

Signed-off-by: Rob Clark <robdcl...@gmail.com>
---
   drivers/dma-buf/reservation.c | 11 +--
   1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..71f51140a9ad 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -420,7 +420,7 @@ EXPORT_SYMBOL_GPL(reservation_object_get_fences_rcu);
*
* RETURNS
* Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
- * greater than zer on success.
+ * greater than zero on success.
*/
   long reservation_object_wait_timeout_rcu(struct reservation_object *obj,
   bool wait_all, bool intr,
@@ -483,7 +483,14 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
  goto retry;
  }

-   ret = dma_fence_wait_timeout(fence, intr, ret);
+   /*
+* Note that dma_fence_wait_timeout() will return 1 if
+* the fence is already signaled, so in the wait_all
+* case when we go through the retry loop again, ret
+* will be greater than 0 and we don't want this to
+* cause _wait_timeout() to block
+*/
+   ret = dma_fence_wait_timeout(fence, intr, timeout ? ret : 0);

One should ask if we should just fix the interface to stop returning
incorrect results (stop "correcting" a completion with 0 jiffies remaining
as 1). A timeout can be distinguished by -ETIME (or your pick of errno).

perhaps -EBUSY, if we go that route (although maybe it should be a
follow-on patch, this one is suitable for backport to stable/lts if
one should so choose..)

I think current approach was chosen to match schedule_timeout() and
other such functions that take a timeout in jiffies.  Not making a
judgement on whether that is a good or bad reason..

We intentionally switched away from that to be in sync with the
wait_event_* interface.

Returning 1 when a function with a zero timeout succeeds is actually
quite common in the kernel.

We actually had this conversation last time, and outside of that it
isn't. Looking at all the convolution to first return 1, then undo the
damage in the caller, it looks pretty silly.


I don't find that very intuitive either, but you would also have to 
handle the error code in the calling function as well.


And it is what the whole kernel does all over the place with it's 
wait_event_* and scheduler timeouts as well.


Regards,
Christian.


-Chris





Re: [PATCH] reservation: don't wait when timeout=0

2017-11-21 Thread Christian König

Am 21.11.2017 um 15:59 schrieb Rob Clark:

On Tue, Nov 21, 2017 at 9:38 AM, Chris Wilson  wrote:

Quoting Rob Clark (2017-11-21 14:08:46)

If we are testing if a reservation object's fences have been
signaled with timeout=0 (non-blocking), we need to pass 0 for
timeout to dma_fence_wait_timeout().

Plus bonus spelling correction.

Signed-off-by: Rob Clark 
---
  drivers/dma-buf/reservation.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..71f51140a9ad 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -420,7 +420,7 @@ EXPORT_SYMBOL_GPL(reservation_object_get_fences_rcu);
   *
   * RETURNS
   * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
- * greater than zer on success.
+ * greater than zero on success.
   */
  long reservation_object_wait_timeout_rcu(struct reservation_object *obj,
  bool wait_all, bool intr,
@@ -483,7 +483,14 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
 goto retry;
 }

-   ret = dma_fence_wait_timeout(fence, intr, ret);
+   /*
+* Note that dma_fence_wait_timeout() will return 1 if
+* the fence is already signaled, so in the wait_all
+* case when we go through the retry loop again, ret
+* will be greater than 0 and we don't want this to
+* cause _wait_timeout() to block
+*/
+   ret = dma_fence_wait_timeout(fence, intr, timeout ? ret : 0);

One should ask if we should just fix the interface to stop returning
incorrect results (stop "correcting" a completion with 0 jiffies remaining
as 1). A timeout can be distinguished by -ETIME (or your pick of errno).

perhaps -EBUSY, if we go that route (although maybe it should be a
follow-on patch, this one is suitable for backport to stable/lts if
one should so choose..)

I think current approach was chosen to match schedule_timeout() and
other such functions that take a timeout in jiffies.  Not making a
judgement on whether that is a good or bad reason..


We intentionally switched away from that to be in sync with the 
wait_event_* interface.


Returning 1 when a function with a zero timeout succeeds is actually 
quite common in the kernel.


Regards,
Christian.


BR,
-R


-Chris

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel





Re: [PATCH] reservation: don't wait when timeout=0

2017-11-21 Thread Christian König

Am 21.11.2017 um 15:08 schrieb Rob Clark:

If we are testing if a reservation object's fences have been
signaled with timeout=0 (non-blocking), we need to pass 0 for
timeout to dma_fence_wait_timeout().

Plus bonus spelling correction.

Signed-off-by: Rob Clark <robdcl...@gmail.com>


Reviewed-by: Christian König <christian.koe...@amd.com>


---
  drivers/dma-buf/reservation.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index dec3a815455d..71f51140a9ad 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -420,7 +420,7 @@ EXPORT_SYMBOL_GPL(reservation_object_get_fences_rcu);
   *
   * RETURNS
   * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
- * greater than zer on success.
+ * greater than zero on success.
   */
  long reservation_object_wait_timeout_rcu(struct reservation_object *obj,
 bool wait_all, bool intr,
@@ -483,7 +483,14 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
goto retry;
}
  
-		ret = dma_fence_wait_timeout(fence, intr, ret);

+   /*
+* Note that dma_fence_wait_timeout() will return 1 if
+* the fence is already signaled, so in the wait_all
+* case when we go through the retry loop again, ret
+* will be greater than 0 and we don't want this to
+* cause _wait_timeout() to block
+*/
+   ret = dma_fence_wait_timeout(fence, intr, timeout ? ret : 0);
dma_fence_put(fence);
if (ret > 0 && wait_all && (i + 1 < shared_count))
goto retry;





Re: [PATCH 0/4] dma-buf: Silence dma_fence __rcu sparse warnings

2017-11-03 Thread Christian König

Patch #4 is Reviewed-by: Christian König <christian.koe...@amd.com>.

The rest is Acked-by: Christian König <christian.koe...@amd.com>.

Regards,
Christian.

Am 02.11.2017 um 21:03 schrieb Ville Syrjala:

From: Ville Syrjälä <ville.syrj...@linux.intel.com>

When building drm+i915 I get around 150 lines of sparse noise from
dma_fence __rcu warnings. This series eliminates all of that.

The first two patches were already posted by Chris, but there wasn't
any real reaction, so I figured I'd repost with a wider Cc list.

As for the other two patches, I'm no expert on dma_fence and I didn't
spend a lot of time looking at it so I can't be sure I annotated all
the accesses correctly. But I figured someone will scream at me if
I got it wrong ;)

Cc: Dave Airlie <airl...@redhat.com>
Cc: Jason Ekstrand <ja...@jlekstrand.net>
Cc: linaro-mm-...@lists.linaro.org
Cc: linux-media@vger.kernel.org
Cc: Alex Deucher <alexander.deuc...@amd.com>
Cc: Christian König <christian.koe...@amd.com>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>

Chris Wilson (2):
   drm/syncobj: Mark up the fence as an RCU protected pointer
   dma-buf/fence: Sparse wants __rcu on the object itself

Ville Syrjälä (2):
   drm/syncobj: Use proper methods for accessing rcu protected pointers
   dma-buf: Use rcu_assign_pointer() to set rcu protected pointers

  drivers/dma-buf/reservation.c |  2 +-
  drivers/gpu/drm/drm_syncobj.c | 11 +++
  include/drm/drm_syncobj.h |  2 +-
  include/linux/dma-fence.h |  2 +-
  4 files changed, 10 insertions(+), 7 deletions(-)





Re: [PATCH] dma-buf: Cleanup comments on dma_buf_map_attachment()

2017-11-02 Thread Christian König

Am 01.11.2017 um 15:06 schrieb Liviu Dudau:

Mappings need to be unmapped by calling dma_buf_unmap_attachment() and
not by calling again dma_buf_map_attachment(). Also fix some spelling
mistakes.

Signed-off-by: Liviu Dudau <liviu.du...@arm.com>


Reviewed-by: Christian König <christian.koe...@amd.com>


---
  drivers/dma-buf/dma-buf.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index bc1cb284111cb..1792385405f0e 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -351,13 +351,13 @@ static inline int is_dma_buf_file(struct file *file)
   *
   * 2. Userspace passes this file-descriptors to all drivers it wants this 
buffer
   *to share with: First the filedescriptor is converted to a _buf using
- *dma_buf_get(). The the buffer is attached to the device using
+ *dma_buf_get(). Then the buffer is attached to the device using
   *dma_buf_attach().
   *
   *Up to this stage the exporter is still free to migrate or reallocate the
   *backing storage.
   *
- * 3. Once the buffer is attached to all devices userspace can inniate DMA
+ * 3. Once the buffer is attached to all devices userspace can initiate DMA
   *access to the shared buffer. In the kernel this is done by calling
   *dma_buf_map_attachment() and dma_buf_unmap_attachment().
   *
@@ -617,7 +617,7 @@ EXPORT_SYMBOL_GPL(dma_buf_detach);
   * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR
   * on error. May return -EINTR if it is interrupted by a signal.
   *
- * A mapping must be unmapped again using dma_buf_map_attachment(). Note that
+ * A mapping must be unmapped by using dma_buf_unmap_attachment(). Note that
   * the underlying backing storage is pinned for as long as a mapping exists,
   * therefore users/importers should not hold onto a mapping for undue amounts 
of
   * time.





Re: [PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-21 Thread Christian König

Am 20.09.2017 um 20:20 schrieb Daniel Vetter:

On Mon, Sep 11, 2017 at 01:06:32PM +0200, Christian König wrote:

Am 11.09.2017 um 12:01 schrieb Chris Wilson:

[SNIP]

Yeah, but that is illegal with a fence objects.

When anybody allocates fences this way it breaks at least
reservation_object_get_fences_rcu(),
reservation_object_wait_timeout_rcu() and
reservation_object_test_signaled_single().

Many, many months ago I sent patches to fix them all.

Found those after a bit a searching. Yeah, those patches where proposed more
than a year ago, but never pushed upstream.

Not sure if we really should go this way. dma_fence objects are shared
between drivers and since we can't judge if it's the correct fence based on
a criteria in the object (only the read counter which is outside) all
drivers need to be correct for this.

I would rather go the way and change dma_fence_release() to wrap
fence->ops->release into call_rcu() to keep the whole RCU handling outside
of the individual drivers.

Hm, I entirely dropped the ball on this, I kinda assumed that we managed
to get some agreement on this between i915 and dma_fence. Adding a pile
more people.


For the meantime I've send a v2 of this patch to fix at least the buggy 
return of NULL when we fail to grab the RCU reference but keeping the 
extra checking for now.


Can I get an rb on this please so that we fix at least the bug at hand?

Thanks,
Christian.



Joonas, Tvrtko, I guess we need to fix this one way or the other.
-Daniel





[PATCH] dma-fence: fix dma_fence_get_rcu_safe v2

2017-09-15 Thread Christian König
From: Christian König <christian.koe...@amd.com>

When dma_fence_get_rcu() fails to acquire a reference it doesn't necessary
mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.

v2: Keep extra check after dma_fence_get_rcu().

Signed-off-by: Christian König <christian.koe...@amd.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
---
 include/linux/dma-fence.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 0a186c4..f4f23cb 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -248,9 +248,12 @@ dma_fence_get_rcu_safe(struct dma_fence * __rcu *fencep)
struct dma_fence *fence;
 
fence = rcu_dereference(*fencep);
-   if (!fence || !dma_fence_get_rcu(fence))
+   if (!fence)
return NULL;
 
+   if (!dma_fence_get_rcu(fence))
+   continue;
+
/* The atomic_inc_not_zero() inside dma_fence_get_rcu()
 * provides a full memory barrier upon success (such as now).
 * This is paired with the write barrier from assigning
-- 
2.7.4



Re: [PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-11 Thread Christian König

Am 11.09.2017 um 12:01 schrieb Chris Wilson:

[SNIP]

Yeah, but that is illegal with a fence objects.

When anybody allocates fences this way it breaks at least
reservation_object_get_fences_rcu(),
reservation_object_wait_timeout_rcu() and
reservation_object_test_signaled_single().

Many, many months ago I sent patches to fix them all.


Found those after a bit a searching. Yeah, those patches where proposed 
more than a year ago, but never pushed upstream.


Not sure if we really should go this way. dma_fence objects are shared 
between drivers and since we can't judge if it's the correct fence based 
on a criteria in the object (only the read counter which is outside) all 
drivers need to be correct for this.


I would rather go the way and change dma_fence_release() to wrap 
fence->ops->release into call_rcu() to keep the whole RCU handling 
outside of the individual drivers.


Regards,
Christian.


Re: [PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-11 Thread Christian König

Am 11.09.2017 um 11:23 schrieb Chris Wilson:

Quoting Christian König (2017-09-11 10:06:50)

Am 11.09.2017 um 10:59 schrieb Chris Wilson:

Quoting Christian König (2017-09-11 09:50:40)

Sorry for the delayed response, but your mail somehow ended up in the
Spam folder.

Am 04.09.2017 um 15:40 schrieb Chris Wilson:

Quoting Christian König (2017-09-04 14:27:33)

From: Christian König <christian.koe...@amd.com>

The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
acquire a reference it doesn't necessary mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.

Which is not guaranteed by the code you wrote either.

The point of the comment is that the mb is only inside the successful
kref_atomic_inc_unless_zero, and that only after that mb do you know
whether or not you have the current fence.

You can argue that you want to replace the
if (!dma_fence_get_rcu())
return NULL
with
if (!dma_fence_get_rcu()
continue;
but it would be incorrect to say that by simply ignoring the
post-condition check that you do have the right fence.

You are completely missing the point here.

It is irrelevant if you have the current fence or not when you return.
You can only guarantee that it is the current fence when you take a look
and that is exactly what we want to avoid.

So the existing code is complete nonsense. Instead what we need to
guarantee is that we return *ANY* fence which we can grab a reference for.

Not quite. We can grab a reference on a fence that was already freed and
reused between the rcu_dereference() and dma_fence_get_rcu().

Reusing a memory structure before the RCU grace period is completed is
illegal, otherwise the whole RCU approach won't work.

RCU only protects that the pointer remains valid. If you use
SLAB_TYPESAFE_BY_RCU, it is possible to reuse the pointer within a grace
period. It does happen and that is the point the comment is trying to
make.


Yeah, but that is illegal with a fence objects.

When anybody allocates fences this way it breaks at least 
reservation_object_get_fences_rcu(), 
reservation_object_wait_timeout_rcu() and 
reservation_object_test_signaled_single().


Cause all of them rely on dma_fence_get() to return NULL when the fence 
isn't valid any more to restart the operation.


When dma_fence_get_rcu() returns a reallocated fence the operation 
wouldn't correctly restart and the end result most likely not be correct 
at all.


Using SLAB_TYPESAFE_BY_RCU is only valid if you can ensure that you have 
the right object using a second criteria and that is not the case with 
fences.


Regards,
Christian.


Re: [PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-11 Thread Christian König

Am 11.09.2017 um 10:59 schrieb Chris Wilson:

Quoting Christian König (2017-09-11 09:50:40)

Sorry for the delayed response, but your mail somehow ended up in the
Spam folder.

Am 04.09.2017 um 15:40 schrieb Chris Wilson:

Quoting Christian König (2017-09-04 14:27:33)

From: Christian König <christian.koe...@amd.com>

The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
acquire a reference it doesn't necessary mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.

Which is not guaranteed by the code you wrote either.

The point of the comment is that the mb is only inside the successful
kref_atomic_inc_unless_zero, and that only after that mb do you know
whether or not you have the current fence.

You can argue that you want to replace the
   if (!dma_fence_get_rcu())
   return NULL
with
   if (!dma_fence_get_rcu()
   continue;
but it would be incorrect to say that by simply ignoring the
post-condition check that you do have the right fence.

You are completely missing the point here.

It is irrelevant if you have the current fence or not when you return.
You can only guarantee that it is the current fence when you take a look
and that is exactly what we want to avoid.

So the existing code is complete nonsense. Instead what we need to
guarantee is that we return *ANY* fence which we can grab a reference for.

Not quite. We can grab a reference on a fence that was already freed and
reused between the rcu_dereference() and dma_fence_get_rcu().


Reusing a memory structure before the RCU grace period is completed is 
illegal, otherwise the whole RCU approach won't work.


So that case can't happen.

Regards,
Christian.


-Chris





Re: [PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-11 Thread Christian König
Sorry for the delayed response, but your mail somehow ended up in the 
Spam folder.


Am 04.09.2017 um 15:40 schrieb Chris Wilson:

Quoting Christian König (2017-09-04 14:27:33)

From: Christian König <christian.koe...@amd.com>

The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
acquire a reference it doesn't necessary mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.

Which is not guaranteed by the code you wrote either.

The point of the comment is that the mb is only inside the successful
kref_atomic_inc_unless_zero, and that only after that mb do you know
whether or not you have the current fence.

You can argue that you want to replace the
if (!dma_fence_get_rcu())
return NULL
with
if (!dma_fence_get_rcu()
continue;
but it would be incorrect to say that by simply ignoring the
post-condition check that you do have the right fence.


You are completely missing the point here.

It is irrelevant if you have the current fence or not when you return. 
You can only guarantee that it is the current fence when you take a look 
and that is exactly what we want to avoid.


So the existing code is complete nonsense. Instead what we need to 
guarantee is that we return *ANY* fence which we can grab a reference for.


See the usual life of a fence* variable looks like this:
1. assigning pointer to fence A;
2. assigning pointer to fence B;
3. assigning pointer to fence C;


When dma_fence_get_rcu_safe() is called between step #1 and step #2 for 
example it is perfectly valid to just return either fence A or fence B.


But it is invalid to return NULL because that suggests that we don't 
need to sync at all.


Regards,
Christian.


Re: [PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-04 Thread Christian König
I really wonder what's wrong with my mail client, but it looks like this 
patch never made it at least to dri-devel.


Forwarding manually now,
Christian.

Am 04.09.2017 um 15:16 schrieb Christian König:

From: Christian König <christian.koe...@amd.com>

The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
acquire a reference it doesn't necessary mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.

Signed-off-by: Christian König <christian.koe...@amd.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
---
  include/linux/dma-fence.h | 23 ++-
  1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index a5195a7..37f3d67 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -246,27 +246,8 @@ dma_fence_get_rcu_safe(struct dma_fence * __rcu *fencep)
struct dma_fence *fence;
  
  		fence = rcu_dereference(*fencep);

-   if (!fence || !dma_fence_get_rcu(fence))
-   return NULL;
-
-   /* The atomic_inc_not_zero() inside dma_fence_get_rcu()
-* provides a full memory barrier upon success (such as now).
-* This is paired with the write barrier from assigning
-* to the __rcu protected fence pointer so that if that
-* pointer still matches the current fence, we know we
-* have successfully acquire a reference to it. If it no
-* longer matches, we are holding a reference to some other
-* reallocated pointer. This is possible if the allocator
-* is using a freelist like SLAB_TYPESAFE_BY_RCU where the
-* fence remains valid for the RCU grace period, but it
-* may be reallocated. When using such allocators, we are
-* responsible for ensuring the reference we get is to
-* the right fence, as below.
-*/
-   if (fence == rcu_access_pointer(*fencep))
-   return rcu_pointer_handoff(fence);
-
-   dma_fence_put(fence);
+   if (!fence || dma_fence_get_rcu(fence))
+   return fence;
} while (1);
  }
  





[PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-04 Thread Christian König
From: Christian König <christian.koe...@amd.com>

The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
acquire a reference it doesn't necessary mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.

Signed-off-by: Christian König <christian.koe...@amd.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
---
 include/linux/dma-fence.h | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index a5195a7..37f3d67 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -246,27 +246,8 @@ dma_fence_get_rcu_safe(struct dma_fence * __rcu *fencep)
struct dma_fence *fence;
 
fence = rcu_dereference(*fencep);
-   if (!fence || !dma_fence_get_rcu(fence))
-   return NULL;
-
-   /* The atomic_inc_not_zero() inside dma_fence_get_rcu()
-* provides a full memory barrier upon success (such as now).
-* This is paired with the write barrier from assigning
-* to the __rcu protected fence pointer so that if that
-* pointer still matches the current fence, we know we
-* have successfully acquire a reference to it. If it no
-* longer matches, we are holding a reference to some other
-* reallocated pointer. This is possible if the allocator
-* is using a freelist like SLAB_TYPESAFE_BY_RCU where the
-* fence remains valid for the RCU grace period, but it
-* may be reallocated. When using such allocators, we are
-* responsible for ensuring the reference we get is to
-* the right fence, as below.
-*/
-   if (fence == rcu_access_pointer(*fencep))
-   return rcu_pointer_handoff(fence);
-
-   dma_fence_put(fence);
+   if (!fence || dma_fence_get_rcu(fence))
+   return fence;
} while (1);
 }
 
-- 
2.7.4



[PATCH] dma-fence: fix dma_fence_get_rcu_safe

2017-09-04 Thread Christian König
From: Christian König <christian.koe...@amd.com>

The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to
acquire a reference it doesn't necessary mean that there is no fence at all.

It usually mean that the fence was replaced by a new one and in this situation
we certainly want to have the new one as result and *NOT* NULL.

Signed-off-by: Christian König <christian.koe...@amd.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Cc: linux-media@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
---
 include/linux/dma-fence.h | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index a5195a7..37f3d67 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -246,27 +246,8 @@ dma_fence_get_rcu_safe(struct dma_fence * __rcu *fencep)
struct dma_fence *fence;
 
fence = rcu_dereference(*fencep);
-   if (!fence || !dma_fence_get_rcu(fence))
-   return NULL;
-
-   /* The atomic_inc_not_zero() inside dma_fence_get_rcu()
-* provides a full memory barrier upon success (such as now).
-* This is paired with the write barrier from assigning
-* to the __rcu protected fence pointer so that if that
-* pointer still matches the current fence, we know we
-* have successfully acquire a reference to it. If it no
-* longer matches, we are holding a reference to some other
-* reallocated pointer. This is possible if the allocator
-* is using a freelist like SLAB_TYPESAFE_BY_RCU where the
-* fence remains valid for the RCU grace period, but it
-* may be reallocated. When using such allocators, we are
-* responsible for ensuring the reference we get is to
-* the right fence, as below.
-*/
-   if (fence == rcu_access_pointer(*fencep))
-   return rcu_pointer_handoff(fence);
-
-   dma_fence_put(fence);
+   if (!fence || dma_fence_get_rcu(fence))
+   return fence;
} while (1);
 }
 
-- 
2.7.4



Re: [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly v2

2017-07-31 Thread Christian König

Ping, what do you guys think of that?

Am 25.07.2017 um 15:35 schrieb Christian König:

From: Christian König <christian.koe...@amd.com>

With hardware resets in mind it is possible that all shared fences are
signaled, but the exlusive isn't. Fix waiting for everything in this situation.

v2: make sure we always wait for the exclusive fence

Signed-off-by: Christian König <christian.koe...@amd.com>
---
  drivers/dma-buf/reservation.c | 33 +++--
  1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 393817e..9d4316d 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -373,12 +373,25 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
long ret = timeout ? timeout : 1;
  
  retry:

-   fence = NULL;
shared_count = 0;
seq = read_seqcount_begin(>seq);
rcu_read_lock();
  
-	if (wait_all) {

+   fence = rcu_dereference(obj->fence_excl);
+   if (fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags)) {
+   if (!dma_fence_get_rcu(fence))
+   goto unlock_retry;
+
+   if (dma_fence_is_signaled(fence)) {
+   dma_fence_put(fence);
+   fence = NULL;
+   }
+
+   } else {
+   fence = NULL;
+   }
+
+   if (!fence && wait_all) {
struct reservation_object_list *fobj =
rcu_dereference(obj->fence);
  
@@ -405,22 +418,6 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj,

}
}
  
-	if (!shared_count) {

-   struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
-
-   if (fence_excl &&
-   !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
- _excl->flags)) {
-   if (!dma_fence_get_rcu(fence_excl))
-   goto unlock_retry;
-
-   if (dma_fence_is_signaled(fence_excl))
-   dma_fence_put(fence_excl);
-   else
-   fence = fence_excl;
-   }
-   }
-
rcu_read_unlock();
if (fence) {
if (read_seqcount_retry(>seq, seq)) {





[PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly v2

2017-07-25 Thread Christian König
From: Christian König <christian.koe...@amd.com>

With hardware resets in mind it is possible that all shared fences are
signaled, but the exlusive isn't. Fix waiting for everything in this situation.

v2: make sure we always wait for the exclusive fence

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/reservation.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 393817e..9d4316d 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -373,12 +373,25 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
long ret = timeout ? timeout : 1;
 
 retry:
-   fence = NULL;
shared_count = 0;
seq = read_seqcount_begin(>seq);
rcu_read_lock();
 
-   if (wait_all) {
+   fence = rcu_dereference(obj->fence_excl);
+   if (fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags)) {
+   if (!dma_fence_get_rcu(fence))
+   goto unlock_retry;
+
+   if (dma_fence_is_signaled(fence)) {
+   dma_fence_put(fence);
+   fence = NULL;
+   }
+
+   } else {
+   fence = NULL;
+   }
+
+   if (!fence && wait_all) {
struct reservation_object_list *fobj =
rcu_dereference(obj->fence);
 
@@ -405,22 +418,6 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
}
}
 
-   if (!shared_count) {
-   struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
-
-   if (fence_excl &&
-   !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
- _excl->flags)) {
-   if (!dma_fence_get_rcu(fence_excl))
-   goto unlock_retry;
-
-   if (dma_fence_is_signaled(fence_excl))
-   dma_fence_put(fence_excl);
-   else
-   fence = fence_excl;
-   }
-   }
-
rcu_read_unlock();
if (fence) {
if (read_seqcount_retry(>seq, seq)) {
-- 
2.7.4



Re: [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

2017-07-25 Thread Christian König

Am 24.07.2017 um 13:57 schrieb Daniel Vetter:

On Mon, Jul 24, 2017 at 11:51 AM, Christian König
<deathsim...@vodafone.de> wrote:

Am 24.07.2017 um 10:33 schrieb Daniel Vetter:

On Fri, Jul 21, 2017 at 06:20:01PM +0200, Christian König wrote:

From: Christian König <christian.koe...@amd.com>

With hardware resets in mind it is possible that all shared fences are
signaled, but the exlusive isn't. Fix waiting for everything in this
situation.

How did you end up with both shared and exclusive fences on the same
reservation object? At least I thought the point of exclusive was that
it's exclusive (and has an implicit barrier on all previous shared
fences). Same for shared fences, they need to wait for the exclusive one
(and replace it).

Is this fallout from the amdgpu trickery where by default you do all
shared fences? I thought we've aligned semantics a while back ...


No, that is perfectly normal even for other drivers. Take a look at the
reservation code.

The exclusive fence replaces all shared fences, but adding a shared fence
doesn't replace the exclusive fence. That actually makes sense, cause when
you want to add move shared fences those need to wait for the last exclusive
fence as well.

Hm right.


Now normally I would agree that when you have shared fences it is sufficient
to wait for all of them cause those operations can't start before the
exclusive one finishes. But with GPU reset and/or the ability to abort
already submitted operations it is perfectly possible that you end up with
an exclusive fence which isn't signaled and a shared fence which is signaled
in the same reservation object.

How does that work? The batch(es) with the shared fence are all
supposed to wait for the exclusive fence before they start, which
means even if you gpu reset and restart/cancel certain things, they
shouldn't be able to complete out of order.


Assume the following:
1. The exclusive fence is some move operation by the kernel which 
executes on a DMA engine.
2. The shared fence is a 3D operation submitted by userspace which 
executes on the 3D engine.


Now we found the 3D engine to be hung and needs a reset, all currently 
submitted jobs are aborted, marked with an error code and their fences 
put into the signaled state.


Since we only reset the 3D engine, the move operation (fortunately) 
isn't affected by this.


I think this applies to all drivers and isn't something amdgpu specific.

Regards,
Christian.



If you outright cancel a fence then you're supposed to first call
dma_fence_set_error(-EIO) and then complete it. Note that atm that
part might be slightly overengineered and I'm not sure about how we
expose stuff to userspace, e.g. dma_fence_set_error(-EAGAIN) is (or
soon, has been) used by i915 for it's internal book-keeping, which
might not be the best to leak to other consumers. But completing
fences (at least exported ones, where userspace or other drivers can
get at them) shouldn't be possible.
-Daniel





Re: [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

2017-07-24 Thread Christian König

Am 24.07.2017 um 10:34 schrieb zhoucm1:



On 2017年07月22日 00:20, Christian König wrote:

From: Christian König <christian.koe...@amd.com>

With hardware resets in mind it is possible that all shared fences are
signaled, but the exlusive isn't. Fix waiting for everything in this 
situation.


Signed-off-by: Christian König <christian.koe...@amd.com>
---
  drivers/dma-buf/reservation.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/reservation.c 
b/drivers/dma-buf/reservation.c

index e2eff86..ce3f9c1 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -461,7 +461,7 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,

  }
  }
  -if (!shared_count) {
+if (!fence) {
previous code seems be a bug, the exclusive fence isn't be waited at 
all if shared_count != 0.


With your fix, there still is a case the exclusive fence could be 
skipped, that when fobj->shared[shared_count-1] isn't signalled.


Yeah, indeed that looks like it needs to be fixed as well.

I'm still completely jet lagged and need to work through tons of stuff 
from last week. Do you have time to take care of fixing up this patch 
and send a v2?


Thanks in advance,
Christian.



Regards,
David Zhou
  struct dma_fence *fence_excl = 
rcu_dereference(obj->fence_excl);

if (fence_excl &&






Re: [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

2017-07-24 Thread Christian König

Am 24.07.2017 um 10:33 schrieb Daniel Vetter:

On Fri, Jul 21, 2017 at 06:20:01PM +0200, Christian König wrote:

From: Christian König <christian.koe...@amd.com>

With hardware resets in mind it is possible that all shared fences are
signaled, but the exlusive isn't. Fix waiting for everything in this situation.

How did you end up with both shared and exclusive fences on the same
reservation object? At least I thought the point of exclusive was that
it's exclusive (and has an implicit barrier on all previous shared
fences). Same for shared fences, they need to wait for the exclusive one
(and replace it).

Is this fallout from the amdgpu trickery where by default you do all
shared fences? I thought we've aligned semantics a while back ...


No, that is perfectly normal even for other drivers. Take a look at the 
reservation code.


The exclusive fence replaces all shared fences, but adding a shared 
fence doesn't replace the exclusive fence. That actually makes sense, 
cause when you want to add move shared fences those need to wait for the 
last exclusive fence as well.


Now normally I would agree that when you have shared fences it is 
sufficient to wait for all of them cause those operations can't start 
before the exclusive one finishes. But with GPU reset and/or the ability 
to abort already submitted operations it is perfectly possible that you 
end up with an exclusive fence which isn't signaled and a shared fence 
which is signaled in the same reservation object.


Christian.


-Daniel


Signed-off-by: Christian König <christian.koe...@amd.com>
---
  drivers/dma-buf/reservation.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index e2eff86..ce3f9c1 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -461,7 +461,7 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
}
}
  
-	if (!shared_count) {

+   if (!fence) {
struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
  
  		if (fence_excl &&

--
2.7.4

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel





[PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

2017-07-21 Thread Christian König
From: Christian König <christian.koe...@amd.com>

With hardware resets in mind it is possible that all shared fences are
signaled, but the exlusive isn't. Fix waiting for everything in this situation.

Signed-off-by: Christian König <christian.koe...@amd.com>
---
 drivers/dma-buf/reservation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index e2eff86..ce3f9c1 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -461,7 +461,7 @@ long reservation_object_wait_timeout_rcu(struct 
reservation_object *obj,
}
}
 
-   if (!shared_count) {
+   if (!fence) {
struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl);
 
if (fence_excl &&
-- 
2.7.4



Re: [PATCH] dma-buf: avoid scheduling on fence status query v2

2017-04-26 Thread Christian König

Am 26.04.2017 um 16:46 schrieb Andres Rodriguez:

When a timeout of zero is specified, the caller is only interested in
the fence status.

In the current implementation, dma_fence_default_wait will always call
schedule_timeout() at least once for an unsignaled fence. This adds a
significant overhead to a fence status query.

Avoid this overhead by returning early if a zero timeout is specified.

v2: move early return after enable_signaling

Signed-off-by: Andres Rodriguez <andre...@gmail.com>


Reviewed-by: Christian König <christian.koe...@amd.com>


---

  If I'm understanding correctly, I don't think we need to register the
  default wait callback. But if that isn't the case please let me know.

  This patch has the same perf improvements as v1.

  drivers/dma-buf/dma-fence.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0918d3f..57da14c 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -402,6 +402,11 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, 
signed long timeout)
}
}
  
+	if (!timeout) {

+   ret = 0;
+   goto out;
+   }
+
cb.base.func = dma_fence_default_wait_cb;
cb.task = current;
list_add(, >cb_list);





Re: [PATCH] dma-buf: avoid scheduling on fence status query

2017-04-26 Thread Christian König

Am 26.04.2017 um 11:59 schrieb Dave Airlie:

On 26 April 2017 at 17:20, Christian König <deathsim...@vodafone.de> wrote:

NAK, I'm wondering how often I have to reject that change. We should
probably add a comment here.

Even with a zero timeout we still need to enable signaling, otherwise some
fence will never signal if userspace just polls on them.

If a caller is only interested in the fence status without enabling the
signaling it should call dma_fence_is_signaled() instead.

Can we not move the return 0 (with spin unlock) down after we enabling
signalling, but before
we enter the schedule_timeout(1)?


Yes, that would be an option.

Christian.



Dave.





Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions

2017-04-26 Thread Christian König

Am 25.04.2017 um 20:20 schrieb Logan Gunthorpe:

This patch introduces functions which kmap the pages inside an sgl.
These functions replace a common pattern of kmap(sg_page(sg)) that is
used in more than 50 places within the kernel.

The motivation for this work is to eventually safely support sgls that
contain io memory. In order for that to work, any access to the contents
of an iomem SGL will need to be done with iomemcpy or hit some warning.
(The exact details of how this will work have yet to be worked out.)
Having all the kmaps in one place is just a first step in that
direction. Additionally, seeing this helps cut down the users of sg_page,
it should make any effort to go to struct-page-less DMAs a little
easier (should that idea ever swing back into favour again).

A flags option is added to select between a regular or atomic mapping so
these functions can replace kmap(sg_page or kmap_atomic(sg_page.
Future work may expand this to have flags for using page_address or
vmap. We include a flag to require the function not to fail to
support legacy code that has no easy error path. Much further in the
future, there may be a flag to allocate memory and copy the data
from/to iomem.

We also add the semantic that sg_map can fail to create a mapping,
despite the fact that the current code this is replacing is assumed to
never fail and the current version of these functions cannot fail. This
is to support iomem which may either have to fail to create the mapping or
allocate memory as a bounce buffer which itself can fail.

Also, in terms of cleanup, a few of the existing kmap(sg_page) users
play things a bit loose in terms of whether they apply sg->offset
so using these helper functions should help avoid such issues.

Signed-off-by: Logan Gunthorpe <log...@deltatee.com>
---


Good to know that somebody is working on this. Those problems troubled 
us as well.


Patch is Acked-by: Christian König <christian.koe...@amd.com>.

Regards,
Christian.


  include/linux/scatterlist.h | 85 +
  1 file changed, 85 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index cb3c8fe..fad170b 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -5,6 +5,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  struct scatterlist {

@@ -126,6 +127,90 @@ static inline struct page *sg_page(struct scatterlist *sg)
return (struct page *)((sg)->page_link & ~0x3);
  }
  
+#define SG_KMAP		 (1 << 0)	/* create a mapping with kmap */

+#define SG_KMAP_ATOMIC  (1 << 1) /* create a mapping with kmap_atomic 
*/
+#define SG_MAP_MUST_NOT_FAIL (1 << 2)/* indicate sg_map should not fail */
+
+/**
+ * sg_map - kmap a page inside an sgl
+ * @sg:SG entry
+ * @offset:Offset into entry
+ * @flags: Flags for creating the mapping
+ *
+ * Description:
+ *   Use this function to map a page in the scatterlist at the specified
+ *   offset. sg->offset is already added for you. Note: the semantics of
+ *   this function are that it may fail. Thus, its output should be checked
+ *   with IS_ERR and PTR_ERR. Otherwise, a pointer to the specified offset
+ *   in the mapped page is returned.
+ *
+ *   Flags can be any of:
+ * * SG_KMAP   - Use kmap to create the mapping
+ * * SG_KMAP_ATOMIC- Use kmap_atomic to map the page atommically.
+ *   Thus, the rules of that function apply: the
+ *   cpu may not sleep until it is unmaped.
+ * * SG_MAP_MUST_NOT_FAIL  - Indicate that sg_map must not fail.
+ *   If it does, it will issue a BUG_ON instead.
+ *   This is intended for legacy code only, it
+ *   is not to be used in new code.
+ *
+ *   Also, consider carefully whether this function is appropriate. It is
+ *   largely not recommended for new code and if the sgl came from another
+ *   subsystem and you don't know what kind of memory might be in the list
+ *   then you definitely should not call it. Non-mappable memory may be in
+ *   the sgl and thus this function may fail unexpectedly. Consider using
+ *   sg_copy_to_buffer instead.
+ **/
+static inline void *sg_map(struct scatterlist *sg, size_t offset, int flags)
+{
+   struct page *pg;
+   unsigned int pg_off;
+   void *ret;
+
+   offset += sg->offset;
+   pg = nth_page(sg_page(sg), offset >> PAGE_SHIFT);
+   pg_off = offset_in_page(offset);
+
+   if (flags & SG_KMAP_ATOMIC)
+   ret = kmap_atomic(pg) + pg_off;
+   else if (flags & SG_KMAP)
+   ret = kmap(pg) + pg_off;
+   else
+   ret = ERR_PTR(-EINVAL);
+
+   /*
+* In theory, this can't happen yet. Once we start adding
+* unmapable memory, it also shouldn't happen unless

Re: [PATCH] dma-buf: avoid scheduling on fence status query

2017-04-26 Thread Christian König
NAK, I'm wondering how often I have to reject that change. We should 
probably add a comment here.


Even with a zero timeout we still need to enable signaling, otherwise 
some fence will never signal if userspace just polls on them.


If a caller is only interested in the fence status without enabling the 
signaling it should call dma_fence_is_signaled() instead.


Regards,
Christian.

Am 26.04.2017 um 04:50 schrieb Andres Rodriguez:

CC a few extra lists I missed.

Regards,
Andres

On 2017-04-25 09:36 PM, Andres Rodriguez wrote:

When a timeout of zero is specified, the caller is only interested in
the fence status.

In the current implementation, dma_fence_default_wait will always call
schedule_timeout() at least once for an unsignaled fence. This adds a
significant overhead to a fence status query.

Avoid this overhead by returning early if a zero timeout is specified.

Signed-off-by: Andres Rodriguez 
---

This heavily affects the performance of the Source2 engine running on
radv.

This patch improves dota2(radv) perf on a i7-6700k+RX480 system from
72fps->81fps.

 drivers/dma-buf/dma-fence.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0918d3f..348e9e2 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -380,6 +380,9 @@ dma_fence_default_wait(struct dma_fence *fence, 
bool intr, signed long timeout)

 if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
 return ret;

+if (!timeout)
+return 0;
+
 spin_lock_irqsave(fence->lock, flags);

 if (intr && signal_pending(current)) {





Re: [PATCH] dma-buf: fence debugging

2017-03-31 Thread Christian König

Am 31.03.2017 um 12:00 schrieb Russell King:

Add debugfs output to report shared and exclusive fences on a dma_buf
object.  This produces output such as:

Dma-buf Objects:
sizeflags   modecount   exp_name
0829440000050005drm
 Exclusive fence: etnaviv 134000.gpu signalled
 Attached Devices:
 gpu-subsystem
Total 1 devices attached


Total 1 objects, 8294400 bytes


Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>


Reviewed-by: Christian König <christian.koe...@amd.com>


---
  drivers/dma-buf/dma-buf.c | 34 +-
  1 file changed, 33 insertions(+), 1 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0007b792827b..f72aaacbe023 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1059,7 +1059,11 @@ static int dma_buf_debug_show(struct seq_file *s, void 
*unused)
int ret;
struct dma_buf *buf_obj;
struct dma_buf_attachment *attach_obj;
-   int count = 0, attach_count;
+   struct reservation_object *robj;
+   struct reservation_object_list *fobj;
+   struct dma_fence *fence;
+   unsigned seq;
+   int count = 0, attach_count, shared_count, i;
size_t size = 0;
  
  	ret = mutex_lock_interruptible(_list.lock);

@@ -1085,6 +1090,34 @@ static int dma_buf_debug_show(struct seq_file *s, void 
*unused)
file_count(buf_obj->file),
buf_obj->exp_name);
  
+		robj = buf_obj->resv;

+   while (true) {
+   seq = read_seqcount_begin(>seq);
+   rcu_read_lock();
+   fobj = rcu_dereference(robj->fence);
+   shared_count = fobj ? fobj->shared_count : 0;
+   fence = rcu_dereference(robj->fence_excl);
+   if (!read_seqcount_retry(>seq, seq))
+   break;
+   rcu_read_unlock();
+   }
+
+   if (fence)
+   seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
+  fence->ops->get_driver_name(fence),
+  fence->ops->get_timeline_name(fence),
+  dma_fence_is_signaled(fence) ? "" : "un");
+   for (i = 0; i < shared_count; i++) {
+   fence = rcu_dereference(fobj->shared[i]);
+   if (!dma_fence_get_rcu(fence))
+   continue;
+   seq_printf(s, "\tShared fence: %s %s %ssignalled\n",
+  fence->ops->get_driver_name(fence),
+  fence->ops->get_timeline_name(fence),
+  dma_fence_is_signaled(fence) ? "" : "un");
+   }
+   rcu_read_unlock();
+
seq_puts(s, "\tAttached Devices:\n");
attach_count = 0;
  





Re: [PATCH] dma-buf: add support for compat ioctl

2017-02-21 Thread Christian König

Am 21.02.2017 um 15:55 schrieb Marek Szyprowski:

Dear All,

On 2017-02-21 15:37, Marek Szyprowski wrote:

Hi Christian,

On 2017-02-21 14:59, Christian König wrote:

Am 21.02.2017 um 14:21 schrieb Marek Szyprowski:
Add compat ioctl support to dma-buf. This lets one to use 
DMA_BUF_IOCTL_SYNC
ioctl from 32bit application on 64bit kernel. Data structures for 
both 32
and 64bit modes are same, so there is no need for additional 
translation

layer.


Well I might be wrong, but IIRC compat_ioctl was just optional and 
if not specified unlocked_ioctl was called instead.


If that is true your patch wouldn't have any effect at all.


Well, then why I got -ENOTTY in the 32bit test app for this ioctl on 
64bit ARM64 kernel without this patch?




I've checked in fs/compat_ioctl.c, I see no fallback in 
COMPAT_SYSCALL_DEFINE3,
so one has to provide compat_ioctl callback to have ioctl working with 
32bit

apps.


Then my memory cheated on me.

In this case the patch is Reviewed-by: Christian König 
<christian.koe...@amd.com>.


Regards,
Christian.



Best regards





Re: [PATCH] dma-buf: add support for compat ioctl

2017-02-21 Thread Christian König

Am 21.02.2017 um 14:21 schrieb Marek Szyprowski:

Add compat ioctl support to dma-buf. This lets one to use DMA_BUF_IOCTL_SYNC
ioctl from 32bit application on 64bit kernel. Data structures for both 32
and 64bit modes are same, so there is no need for additional translation
layer.


Well I might be wrong, but IIRC compat_ioctl was just optional and if 
not specified unlocked_ioctl was called instead.


If that is true your patch wouldn't have any effect at all.

Regards,
Christian.



Signed-off-by: Marek Szyprowski 
---
  drivers/dma-buf/dma-buf.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 718f832a5c71..0007b792827b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -325,6 +325,9 @@ static long dma_buf_ioctl(struct file *file,
.llseek = dma_buf_llseek,
.poll   = dma_buf_poll,
.unlocked_ioctl = dma_buf_ioctl,
+#ifdef CONFIG_COMPAT
+   .compat_ioctl   = dma_buf_ioctl,
+#endif
  };
  
  /*





Re: Enabling peer to peer device transactions for PCIe devices

2017-01-13 Thread Christian König

Am 12.01.2017 um 16:11 schrieb Jerome Glisse:

On Wed, Jan 11, 2017 at 10:54:39PM -0600, Stephen Bates wrote:

On Fri, January 6, 2017 4:10 pm, Logan Gunthorpe wrote:


On 06/01/17 11:26 AM, Jason Gunthorpe wrote:



Make a generic API for all of this and you'd have my vote..


IMHO, you must support basic pinning semantics - that is necessary to
support generic short lived DMA (eg filesystem, etc). That hardware can
clearly do that if it can support ODP.

I agree completely.


What we want is for RDMA, O_DIRECT, etc to just work with special VMAs
(ie. at least those backed with ZONE_DEVICE memory). Then
GPU/NVME/DAX/whatever drivers can just hand these VMAs to userspace
(using whatever interface is most appropriate) and userspace can do what
it pleases with them. This makes _so_ much sense and actually largely
already works today (as demonstrated by iopmem).

+1 for iopmem ;-)

I feel like we are going around and around on this topic. I would like to
see something that is upstream that enables P2P even if it is only the
minimum viable useful functionality to begin. I think aiming for the moon
(which is what HMM and things like it are) are simply going to take more
time if they ever get there.

There is a use case for in-kernel P2P PCIe transfers between two NVMe
devices and between an NVMe device and an RDMA NIC (using NVMe CMBs or
BARs on the NIC). I am even seeing users who now want to move data P2P
between FPGAs and NVMe SSDs and the upstream kernel should be able to
support these users or they will look elsewhere.

The iopmem patchset addressed all the use cases above and while it is not
an in kernel API it could have been modified to be one reasonably easily.
As Logan states the driver can then choose to pass the VMAs to user-space
in a manner that makes sense.

Earlier in the thread someone mentioned LSF/MM. There is already a
proposal to discuss this topic so if you are interested please respond to
the email letting the committee know this topic is of interest to you [1].

Also earlier in the thread someone discussed the issues around the IOMMU.
Given the known issues around P2P transfers in certain CPU root complexes
[2] it might just be a case of only allowing P2P when a PCIe switch
connects the two EPs. Another option is just to use CONFIG_EXPERT and make
sure people are aware of the pitfalls if they invoke the P2P option.


iopmem is not applicable to GPU what i propose is to split the issue in 2
so that everyone can reuse the part that needs to be common namely the DMA
API part where you have to create IOMMU mapping for one device to point
to the other device memory.

We can have a DMA API that is agnostic to how the device memory is manage
(so does not matter if device memory have struct page or not). This what
i have been arguing in this thread. To make progress on this issue we need
to stop conflicting different use case.

So i say let solve the IOMMU issue first and let everyone use it in their
own way with their device. I do not think we can share much more than
that.


Yeah, exactly what I said from the very beginning as well. Just hacking 
together quick solutions doesn't really solve the problem in the long term.


What we need is proper adjusting of the DMA API towards handling of P2P 
and then build solutions for the different use cases on top of that.


We should also avoid falling into the trap of trying to just handle the 
existing get_user_pages and co interfaces so that the existing code 
doesn't need to change. P2P needs to be validated for each use case 
individually and not implemented in workarounds with fingers crossed and 
hoped for the best.


Regards,
Christian.



Cheers,
Jérôme



--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling peer to peer device transactions for PCIe devices

2016-11-27 Thread Christian König

Am 27.11.2016 um 15:02 schrieb Haggai Eran:

On 11/25/2016 9:32 PM, Jason Gunthorpe wrote:

On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote:


Like you say below we have to handle short lived in the usual way, and
that covers basically every device except IB MRs, including the
command queue on a NVMe drive.

Well a problem which wasn't mentioned so far is that while GPUs do have a
page table to mirror the CPU page table, they usually can't recover from
page faults.
So what we do is making sure that all memory accessed by the GPU Jobs stays
in place while those jobs run (pretty much the same pinning you do for the
DMA).

Yes, it is DMA, so this is a valid approach.

But, you don't need page faults from the GPU to do proper coherent
page table mirroring. Basically when the driver submits the work to
the GPU it 'faults' the pages into the CPU and mirror translation
table (instead of pinning).

Like in ODP, MMU notifiers/HMM are used to monitor for translation
changes. If a change comes in the GPU driver checks if an executing
command is touching those pages and blocks the MMU notifier until the
command flushes, then unfaults the page (blocking future commands) and
unblocks the mmu notifier.

I think blocking mmu notifiers against something that is basically
controlled by user-space can be problematic. This can block things like
memory reclaim. If you have user-space access to the device's queues,
user-space can block the mmu notifier forever.

Really good point.

I think this means the bare minimum if we don't have recoverable page 
faults is to have preemption support like Felix described in his answer 
as well.


Going to keep that in mind,
Christian.

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >