Re: [PATCH] drm/scheduler: fix setting the priorty for entities - bisected

2018-08-07 Thread Dieter Nützel

Am 06.08.2018 02:13, schrieb Dieter Nützel:

Am 04.08.2018 06:18, schrieb Dieter Nützel:

Am 04.08.2018 06:12, schrieb Dieter Nützel:

Am 04.08.2018 05:27, schrieb Dieter Nützel:

Am 03.08.2018 13:09, schrieb Christian König:

Am 03.08.2018 um 03:08 schrieb Dieter Nützel:

Hello Christian, AMD guys,

this one _together_ with these series
[PATCH 1/7] drm/amdgpu: use new scheduler load balancing for VMs
https://lists.freedesktop.org/archives/amd-gfx/2018-August/024802.html

on top of
amd-staging-drm-next 53d5f1e4a6d9

freeze whole system (Intel Xeon X3470, RX580) during _first_ mouse 
move.

Same for sddm login or first move in KDE Plasma 5.
NO logs so far. - Expected?


Not even remotely, can you double check which patch from the 
"[PATCH
1/7] drm/amdgpu: use new scheduler load balancing for VMs" series 
is

causing the issue?


Ups,

_both_ 'series' on top of

bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem
definitions into amdgpu_gem header

works without a hitch.

But I have new (latest) µcode from openSUSE Tumbleweed.
kernel-firmware-20180730-35.1.src.rpm

Tested-by: Dieter Nützel 


I take this back.

Last much longer.
Mouse freeze.
Could grep a dmesg with remote phone ;-)

See the attachment.
Dieter


Argh, shi...
wrong dmesg version.

Should be this one. (For sure...)


Puh,

this took some time...
During the 'last' git bisect run => 'first bad commit is' I got next 
freeze.

But I could get a new dmesg.log file per remote phone (see attachment).

git bisect log show this:

SOURCE/amd-staging-drm-next> git bisect log
git bisect start
# good: [adebfff9c806afe1143d69a0174d4580cd27b23d] drm/scheduler: fix
setting the priorty for entities
git bisect good adebfff9c806afe1143d69a0174d4580cd27b23d
# bad: [43202e67a4e6fcb0e6b773e8eb1ed56e1721e882] drm/amdgpu: use
entity instead of ring for CS
git bisect bad 43202e67a4e6fcb0e6b773e8eb1ed56e1721e882
# bad: [9867b3a6ddfb73ee3105871541053f8e49949478] drm/amdgpu: use
scheduler load balancing for compute CS
git bisect bad 9867b3a6ddfb73ee3105871541053f8e49949478
# good: [5d097a4591aa2be16b21adbaa19a8abb76e47ea1] drm/amdgpu: use
scheduler load balancing for SDMA CS
git bisect good 5d097a4591aa2be16b21adbaa19a8abb76e47ea1
# first bad commit: [9867b3a6ddfb73ee3105871541053f8e49949478]
drm/amdgpu: use scheduler load balancing for compute CS

git log --oneline
5d097a4591aa (HEAD,
refs/bisect/good-5d097a4591aa2be16b21adbaa19a8abb76e47ea1) drm/amdgpu:
use scheduler load balancing for SDMA CS
d12ae5172f1f drm/amdgpu: use new scheduler load balancing for VMs
adebfff9c806
(refs/bisect/good-adebfff9c806afe1143d69a0174d4580cd27b23d)
drm/scheduler: fix setting the priorty for entities
bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem
definitions into amdgpu_gem header
5031ae5f9e5c drm/amdgpu: move psp macro into amdgpu_psp header
[-]

I'm not really sure that
drm/amdgpu: use scheduler load balancing for compute CS
is the offender.

One step earlier could it be, too.
drm/amdgpu: use scheduler load balancing for SDMA CS

I'm try running with the SDMA CS patch for the next days.

If you need more ask!


Hello Christian,

running the second day _without_ the 2. patch
[2/7] drm/amdgpu: use scheduler load balancing for SDMA CS
my system is stable, again.

To be clear.
I've now only #1 applied on top of amd-staging-drm-next.
'This one' is still in.
So we should switching the thread.

Dieter




Thanks,
Christian.



Greetings,
Dieter

Am 01.08.2018 16:27, schrieb Christian König:
Since we now deal with multiple rq we need to update all of them, 
not

just the current one.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  3 +--
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 36 
---

 include/drm/gpu_scheduler.h   |  5 ++---
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index df6965761046..9fcc14e2dfcf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -407,12 +407,11 @@ void amdgpu_ctx_priority_override(struct 
amdgpu_ctx *ctx,

 for (i = 0; i < adev->num_rings; i++) {
 ring = adev->rings[i];
 entity = >rings[i].entity;
-    rq = >sched.sched_rq[ctx_prio];

 if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
 continue;

-    drm_sched_entity_set_rq(entity, rq);
+    drm_sched_entity_set_priority(entity, ctx_prio);
 }
 }

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 05dc6ecd4003..85908c7f913e 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -419,29 +419,39 @@ static void 
drm_sched_entity_clear_dep(struct

dma_fence *f, struct dma_fence_cb
 }

 /**
- * drm_sched_entity_set_rq - Sets the run queue for an entity
+ * drm_sched_entity_set_rq_priority - helper for 

Re: [PATCH libdrm 2/4] amdgpu: add a function to find bo by cpu mapping (v2)

2018-08-07 Thread zhoucm1



On 2018年08月08日 12:08, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

v2: update mutex range and rebase

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
  
  /**

+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev  - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a7f0662 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
  }
  
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,

+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i < dev->bo_handles.max_key; i++) {

Hi Jerry,

As Christian catched before, iterating all BOs of device will introduce 
much CPU overhead, this isn't good direction.
Since cpu virtual address is per-process, you should go to kernel to 
find them from vm tree, which obviously takes less time.


Regards,
David Zhou

+   bo = handle_table_lookup(>bo_handles, i);
+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+
+   if (i < dev->bo_handles.max_key) {
+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 3/4] tests/amdgpu: add test for finding bo by CPU mapping

2018-08-07 Thread Junwei Zhang
Add a test for API to query bo by CPU mapping

Signed-off-by: Junwei Zhang 
Reviewed-by: Christian König 
---
 tests/amdgpu/bo_tests.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c
index 9d4da4a..dc2de9b 100644
--- a/tests/amdgpu/bo_tests.c
+++ b/tests/amdgpu/bo_tests.c
@@ -27,6 +27,7 @@
 
 #include "amdgpu_test.h"
 #include "amdgpu_drm.h"
+#include "amdgpu_internal.h"
 
 #define BUFFER_SIZE (4*1024)
 #define BUFFER_ALIGN (4*1024)
@@ -44,6 +45,7 @@ static void amdgpu_bo_metadata(void);
 static void amdgpu_bo_map_unmap(void);
 static void amdgpu_memory_alloc(void);
 static void amdgpu_mem_fail_alloc(void);
+static void amdgpu_bo_find_by_cpu_mapping(void);
 
 CU_TestInfo bo_tests[] = {
{ "Export/Import",  amdgpu_bo_export_import },
@@ -51,6 +53,7 @@ CU_TestInfo bo_tests[] = {
{ "CPU map/unmap",  amdgpu_bo_map_unmap },
{ "Memory alloc Test",  amdgpu_memory_alloc },
{ "Memory fail alloc Test",  amdgpu_mem_fail_alloc },
+   { "Find bo by CPU mapping",  amdgpu_bo_find_by_cpu_mapping },
CU_TEST_INFO_NULL,
 };
 
@@ -262,3 +265,33 @@ static void amdgpu_mem_fail_alloc(void)
CU_ASSERT_EQUAL(r, 0);
}
 }
+
+static void amdgpu_bo_find_by_cpu_mapping(void)
+{
+   amdgpu_bo_handle bo_handle, find_bo_handle;
+   amdgpu_va_handle va_handle;
+   void *bo_cpu;
+   uint64_t bo_mc_address;
+   uint64_t offset;
+   int r;
+
+   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
+   AMDGPU_GEM_DOMAIN_GTT, 0,
+   _handle, _cpu,
+   _mc_address, _handle);
+   CU_ASSERT_EQUAL(r, 0);
+
+   r = amdgpu_find_bo_by_cpu_mapping(device_handle,
+ bo_cpu,
+ 4096,
+ _bo_handle,
+ );
+   CU_ASSERT_EQUAL(r, 0);
+   CU_ASSERT_EQUAL(offset, 0);
+   CU_ASSERT_EQUAL(bo_handle->handle, find_bo_handle->handle);
+
+   atomic_dec(_bo_handle->refcount, 1);
+   r = amdgpu_bo_unmap_and_free(bo_handle, va_handle,
+bo_mc_address, 4096);
+   CU_ASSERT_EQUAL(r, 0);
+}
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 4/4] amdgpu: add a function to create amdgpu bo internally

2018-08-07 Thread Junwei Zhang
a helper function to create and initialize amdgpu bo

Signed-off-by: Junwei Zhang 
---
 amdgpu/amdgpu_bo.c | 81 ++
 1 file changed, 33 insertions(+), 48 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index a7f0662..59cba69 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -48,11 +48,39 @@ static void amdgpu_close_kms_handle(amdgpu_device_handle 
dev,
drmIoctl(dev->fd, DRM_IOCTL_GEM_CLOSE, );
 }
 
+static int amdgpu_bo_create(amdgpu_device_handle dev,
+   uint64_t size,
+   uint32_t handle,
+   amdgpu_bo_handle *buf_handle)
+{
+   struct amdgpu_bo *bo;
+   int r = 0;
+
+   bo = calloc(1, sizeof(struct amdgpu_bo));
+   if (!bo)
+   return -ENOMEM;
+
+   atomic_set(>refcount, 1);
+   bo->dev = dev;
+   bo->alloc_size = size;
+   bo->handle = handle;
+   pthread_mutex_init(>cpu_access_mutex, NULL);
+
+   pthread_mutex_lock(>dev->bo_table_mutex);
+   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+   pthread_mutex_unlock(>dev->bo_table_mutex);
+   if (r)
+   amdgpu_bo_free(bo);
+   else
+   *buf_handle = bo;
+
+   return r;
+}
+
 int amdgpu_bo_alloc(amdgpu_device_handle dev,
struct amdgpu_bo_alloc_request *alloc_buffer,
amdgpu_bo_handle *buf_handle)
 {
-   struct amdgpu_bo *bo;
union drm_amdgpu_gem_create args;
unsigned heap = alloc_buffer->preferred_heap;
int r = 0;
@@ -61,14 +89,6 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev,
if (!(heap & (AMDGPU_GEM_DOMAIN_GTT | AMDGPU_GEM_DOMAIN_VRAM)))
return -EINVAL;
 
-   bo = calloc(1, sizeof(struct amdgpu_bo));
-   if (!bo)
-   return -ENOMEM;
-
-   atomic_set(>refcount, 1);
-   bo->dev = dev;
-   bo->alloc_size = alloc_buffer->alloc_size;
-
memset(, 0, sizeof(args));
args.in.bo_size = alloc_buffer->alloc_size;
args.in.alignment = alloc_buffer->phys_alignment;
@@ -80,25 +100,11 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev,
/* Allocate the buffer with the preferred heap. */
r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_GEM_CREATE,
, sizeof(args));
-   if (r) {
-   free(bo);
-   return r;
-   }
-
-   bo->handle = args.out.handle;
-
-   pthread_mutex_lock(>dev->bo_table_mutex);
-   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
-   pthread_mutex_unlock(>dev->bo_table_mutex);
-
-   pthread_mutex_init(>cpu_access_mutex, NULL);
-
if (r)
-   amdgpu_bo_free(bo);
-   else
-   *buf_handle = bo;
+   return r;
 
-   return r;
+   return amdgpu_bo_create(dev, alloc_buffer->alloc_size, args.out.handle,
+   buf_handle);
 }
 
 int amdgpu_bo_set_metadata(amdgpu_bo_handle bo,
@@ -569,7 +575,6 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
amdgpu_bo_handle *buf_handle)
 {
int r;
-   struct amdgpu_bo *bo;
struct drm_amdgpu_gem_userptr args;
 
args.addr = (uintptr_t)cpu;
@@ -581,27 +586,7 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
if (r)
return r;
 
-   bo = calloc(1, sizeof(struct amdgpu_bo));
-   if (!bo)
-   return -ENOMEM;
-
-   atomic_set(>refcount, 1);
-   bo->dev = dev;
-   bo->alloc_size = size;
-   bo->handle = args.handle;
-
-   pthread_mutex_lock(>dev->bo_table_mutex);
-   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
-   pthread_mutex_unlock(>dev->bo_table_mutex);
-
-   pthread_mutex_init(>cpu_access_mutex, NULL);
-
-   if (r)
-   amdgpu_bo_free(bo);
-   else
-   *buf_handle = bo;
-
-   return r;
+   return amdgpu_bo_create(dev, size, args.handle, buf_handle);
 }
 
 int amdgpu_bo_list_create(amdgpu_device_handle dev,
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 2/4] amdgpu: add a function to find bo by cpu mapping (v2)

2018-08-07 Thread Junwei Zhang
Userspace needs to know if the user memory is from BO or malloc.

v2: update mutex range and rebase

Signed-off-by: Junwei Zhang 
---
 amdgpu/amdgpu.h| 23 +++
 amdgpu/amdgpu_bo.c | 34 ++
 2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
 
 /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
  * Free previosuly allocated memory
  *
  * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a7f0662 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
 }
 
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i < dev->bo_handles.max_key; i++) {
+   bo = handle_table_lookup(>bo_handles, i);
+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+
+   if (i < dev->bo_handles.max_key) {
+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   return 0;
+}
+
 int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 1/4] amdgpu: add bo from user memory to handle table

2018-08-07 Thread Junwei Zhang
When create bo from user memory, add it to handle table
for future query.

Signed-off-by: Junwei Zhang 
Reviewed-by: Christian König 
---
 amdgpu/amdgpu_bo.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 422c7c9..b24e698 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -556,7 +556,16 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
bo->alloc_size = size;
bo->handle = args.handle;
 
-   *buf_handle = bo;
+   pthread_mutex_lock(>dev->bo_table_mutex);
+   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+   pthread_mutex_unlock(>dev->bo_table_mutex);
+
+   pthread_mutex_init(>cpu_access_mutex, NULL);
+
+   if (r)
+   amdgpu_bo_free(bo);
+   else
+   *buf_handle = bo;
 
return r;
 }
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/sriov: give 8s for recover vram under RUNTIME

2018-08-07 Thread Emily Deng
Modify the commit message

Extend the timeout for recovering vram bos from shadows on sr-iov
to cover the worst case scenario for timeslices and VFs

Under runtime, the wait fence time could be quite long when
other VFs are in exclusive mode. For example, for 4 VF, every
VF's exclusive timeout time is set to 3s, then the worst case is
9s. If the VF number is more than 4,then the worst case time will
be longer.
The 8s is the test data, with setting to 8s, it will pass the TDR
test for 1000 times.

SWDEV-161490

Change-Id: Ifc32d56ca7fde01b1f4fe2b0db6959b51909008a
Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1d933db..ef82ad1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3124,7 +3124,7 @@ static int amdgpu_device_handle_vram_lost(struct 
amdgpu_device *adev)
long tmo;
 
if (amdgpu_sriov_runtime(adev))
-   tmo = msecs_to_jiffies(amdgpu_lockup_timeout);
+   tmo = msecs_to_jiffies(8000);
else
tmo = msecs_to_jiffies(100);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu/sriov: give 8s for recover vram under RUNTIME

2018-08-07 Thread Deng, Emily
Hi Alex,
 Thanks for your review. 
 The host gim driver gives every VF fullaccess_timeout is 3000, so for 4 
vf, the worst case is 3*3000(9000), if the vf number is more than 4,
then the time will be longer, so 8000 is not the worst time.
 But with using 8000, it will pass all the TDR test, so it is Ok.  Will 
send a patch to modify the commit message as your suggestion.
 
Best wishes
Emily Deng 

-Original Message-
From: Alex Deucher  
Sent: Wednesday, August 8, 2018 12:42 AM
To: Deng, Emily 
Cc: amd-gfx list ; Liu, Monk 
Subject: Re: [PATCH] drm/amdgpu/sriov: give 8s for recover vram under RUNTIME

On Tue, Aug 7, 2018 at 6:22 AM, Emily Deng  wrote:
> Under runtime, the wait fence time could be quite long when other VFs 
> are in exclusive mode.
>
> SWDEV-161490
>
> Change-Id: Ifc32d56ca7fde01b1f4fe2b0db6959b51909008a
> Signed-off-by: Monk Liu 
> Signed-off-by: Emily Deng 

Seems pretty long.  Is this value based on any evidence (e.g., worse case 
length of time slices, etc.) or just a long value that happens to work?  Might 
be nice to provide a bit more context in the commit message.  E.g., extend the 
timeout for recovering vram bos from shadows on sr-iov to cover the worst case 
scenario for timeslices and VFs.

Alex


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1d933db..ef82ad1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3124,7 +3124,7 @@ static int amdgpu_device_handle_vram_lost(struct 
> amdgpu_device *adev)
> long tmo;
>
> if (amdgpu_sriov_runtime(adev))
> -   tmo = msecs_to_jiffies(amdgpu_lockup_timeout);
> +   tmo = msecs_to_jiffies(8000);
> else
> tmo = msecs_to_jiffies(100);
>
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amdgpu/pp: endian fixes for processpptables.c

2018-08-07 Thread Zhu, Rex
Series is:

Reviewed-by: Rex Zhu 

Regards
Rex
-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Wednesday, August 8, 2018 5:32 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 2/2] drm/amdgpu/pp: endian fixes for processpptables.c

Properly swap when reading from the vbios.

Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/powerplay/hwmgr/processpptables.c  | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
index 925e17104f90..77c14671866c 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
@@ -757,8 +757,8 @@ static int init_non_clock_fields(struct pp_hwmgr *hwmgr,
ps->validation.supportedPowerLevels = pnon_clock_info->ucRequiredPower;
 
if (ATOM_PPLIB_NONCLOCKINFO_VER1 < version) {
-   ps->uvd_clocks.VCLK = pnon_clock_info->ulVCLK;
-   ps->uvd_clocks.DCLK = pnon_clock_info->ulDCLK;
+   ps->uvd_clocks.VCLK = le32_to_cpu(pnon_clock_info->ulVCLK);
+   ps->uvd_clocks.DCLK = le32_to_cpu(pnon_clock_info->ulDCLK);
} else {
ps->uvd_clocks.VCLK = 0;
ps->uvd_clocks.DCLK = 0;
@@ -937,8 +937,9 @@ int pp_tables_get_entry(struct pp_hwmgr *hwmgr,
if (entry_index > powerplay_table->ucNumStates)
return -1;
 
-   pstate_entry = (ATOM_PPLIB_STATE *)((unsigned 
long)powerplay_table + powerplay_table->usStateArrayOffset +
-   entry_index * 
powerplay_table->ucStateEntrySize);
+   pstate_entry = (ATOM_PPLIB_STATE *)((unsigned 
long)powerplay_table +
+   
le16_to_cpu(powerplay_table->usStateArrayOffset) +
+   entry_index * 
powerplay_table->ucStateEntrySize);
 
pnon_clock_info = (ATOM_PPLIB_NONCLOCK_INFO *)((unsigned 
long)powerplay_table +

le16_to_cpu(powerplay_table->usNonClockInfoArrayOffset) + @@ -1063,13 +1064,13 
@@ static int init_overdrive_limits(struct pp_hwmgr *hwmgr,
 , , );
 
if ((fw_info->ucTableFormatRevision == 1)
-   && (fw_info->usStructureSize >= 
sizeof(ATOM_FIRMWARE_INFO_V1_4)))
+   && (le16_to_cpu(fw_info->usStructureSize) >= 
+sizeof(ATOM_FIRMWARE_INFO_V1_4)))
result = init_overdrive_limits_V1_4(hwmgr,
powerplay_table,
(const ATOM_FIRMWARE_INFO_V1_4 *)fw_info);
 
else if ((fw_info->ucTableFormatRevision == 2)
-   && (fw_info->usStructureSize >= 
sizeof(ATOM_FIRMWARE_INFO_V2_1)))
+&& (le16_to_cpu(fw_info->usStructureSize) >= 
+sizeof(ATOM_FIRMWARE_INFO_V2_1)))
result = init_overdrive_limits_V2_1(hwmgr,
powerplay_table,
(const ATOM_FIRMWARE_INFO_V2_1 *)fw_info); @@ 
-1303,7 +1304,7 @@ static int init_clock_voltage_dependency(struct pp_hwmgr 
*hwmgr,
if (0 != powerplay_table4->usVddcDependencyOnSCLKOffset) {
table = (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
(((unsigned long) powerplay_table4) +
-   powerplay_table4->usVddcDependencyOnSCLKOffset);
+
le16_to_cpu(powerplay_table4->usVddcDependencyOnSCLKOffset));
result = get_clock_voltage_dependency_table(hwmgr,
>dyn_state.vddc_dependency_on_sclk, 
table);
}
@@ -1311,7 +1312,7 @@ static int init_clock_voltage_dependency(struct pp_hwmgr 
*hwmgr,
if (result == 0 && (0 != 
powerplay_table4->usVddciDependencyOnMCLKOffset)) {
table = (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
(((unsigned long) powerplay_table4) +
-   
powerplay_table4->usVddciDependencyOnMCLKOffset);
+
le16_to_cpu(powerplay_table4->usVddciDependencyOnMCLKOffset));
result = get_clock_voltage_dependency_table(hwmgr,
>dyn_state.vddci_dependency_on_mclk, 
table);
}
@@ -1319,7 +1320,7 @@ static int init_clock_voltage_dependency(struct pp_hwmgr 
*hwmgr,
if (result == 0 && (0 != 
powerplay_table4->usVddcDependencyOnMCLKOffset)) {
table = (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
(((unsigned long) powerplay_table4) +
-   powerplay_table4->usVddcDependencyOnMCLKOffset);
+

RE: [PATCH 2/2] drm/amdgpu/pp: endian fixes for processpptables.c

2018-08-07 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Wednesday, August 08, 2018 5:32 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 2/2] drm/amdgpu/pp: endian fixes for processpptables.c
> 
> Properly swap when reading from the vbios.
> 
> Signed-off-by: Alex Deucher 
> ---
>  .../gpu/drm/amd/powerplay/hwmgr/processpptables.c  | 30
> --
>  1 file changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
> index 925e17104f90..77c14671866c 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
> @@ -757,8 +757,8 @@ static int init_non_clock_fields(struct pp_hwmgr
> *hwmgr,
>   ps->validation.supportedPowerLevels = pnon_clock_info-
> >ucRequiredPower;
> 
>   if (ATOM_PPLIB_NONCLOCKINFO_VER1 < version) {
> - ps->uvd_clocks.VCLK = pnon_clock_info->ulVCLK;
> - ps->uvd_clocks.DCLK = pnon_clock_info->ulDCLK;
> + ps->uvd_clocks.VCLK = le32_to_cpu(pnon_clock_info-
> >ulVCLK);
> + ps->uvd_clocks.DCLK = le32_to_cpu(pnon_clock_info-
> >ulDCLK);
>   } else {
>   ps->uvd_clocks.VCLK = 0;
>   ps->uvd_clocks.DCLK = 0;
> @@ -937,8 +937,9 @@ int pp_tables_get_entry(struct pp_hwmgr *hwmgr,
>   if (entry_index > powerplay_table->ucNumStates)
>   return -1;
> 
> - pstate_entry = (ATOM_PPLIB_STATE *)((unsigned
> long)powerplay_table + powerplay_table->usStateArrayOffset +
> - entry_index * powerplay_table-
> >ucStateEntrySize);
> + pstate_entry = (ATOM_PPLIB_STATE *)((unsigned
> long)powerplay_table +
> +
> le16_to_cpu(powerplay_table->usStateArrayOffset) +
> + entry_index *
> powerplay_table->ucStateEntrySize);
> 
>   pnon_clock_info = (ATOM_PPLIB_NONCLOCK_INFO
> *)((unsigned long)powerplay_table +
> 
>   le16_to_cpu(powerplay_table->usNonClockInfoArrayOffset) + @@ -
> 1063,13 +1064,13 @@ static int init_overdrive_limits(struct pp_hwmgr
> *hwmgr,
>, , );
> 
>   if ((fw_info->ucTableFormatRevision == 1)
> - && (fw_info->usStructureSize >=
> sizeof(ATOM_FIRMWARE_INFO_V1_4)))
> + && (le16_to_cpu(fw_info->usStructureSize) >=
> +sizeof(ATOM_FIRMWARE_INFO_V1_4)))
>   result = init_overdrive_limits_V1_4(hwmgr,
>   powerplay_table,
>   (const ATOM_FIRMWARE_INFO_V1_4
> *)fw_info);
> 
>   else if ((fw_info->ucTableFormatRevision == 2)
> - && (fw_info->usStructureSize >=
> sizeof(ATOM_FIRMWARE_INFO_V2_1)))
> +  && (le16_to_cpu(fw_info->usStructureSize) >=
> +sizeof(ATOM_FIRMWARE_INFO_V2_1)))
>   result = init_overdrive_limits_V2_1(hwmgr,
>   powerplay_table,
>   (const ATOM_FIRMWARE_INFO_V2_1
> *)fw_info); @@ -1303,7 +1304,7 @@ static int
> init_clock_voltage_dependency(struct pp_hwmgr *hwmgr,
>   if (0 != powerplay_table4-
> >usVddcDependencyOnSCLKOffset) {
>   table =
> (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
>   (((unsigned long) powerplay_table4) +
> - powerplay_table4-
> >usVddcDependencyOnSCLKOffset);
> +  le16_to_cpu(powerplay_table4-
> >usVddcDependencyOnSCLKOffset));
>   result =
> get_clock_voltage_dependency_table(hwmgr,
>   
> >dyn_state.vddc_dependency_on_sclk, table);
>   }
> @@ -1311,7 +1312,7 @@ static int init_clock_voltage_dependency(struct
> pp_hwmgr *hwmgr,
>   if (result == 0 && (0 != powerplay_table4-
> >usVddciDependencyOnMCLKOffset)) {
>   table =
> (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
>   (((unsigned long) powerplay_table4) +
> - powerplay_table4-
> >usVddciDependencyOnMCLKOffset);
> +  le16_to_cpu(powerplay_table4-
> >usVddciDependencyOnMCLKOffset));
>   result =
> get_clock_voltage_dependency_table(hwmgr,
>   
> >dyn_state.vddci_dependency_on_mclk, table);
>   }
> @@ -1319,7 +1320,7 @@ static int init_clock_voltage_dependency(struct
> pp_hwmgr *hwmgr,
>   if (result == 0 && (0 != powerplay_table4-
> >usVddcDependencyOnMCLKOffset)) {
>   table =
> (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
>   (((unsigned long) powerplay_table4) +
> - powerplay_table4-
> >usVddcDependencyOnMCLKOffset);
> +  

RE: [PATCH 1/2] drm/amdgpu/pp: endian fixes for process_pptables_v1_0.c

2018-08-07 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Wednesday, August 08, 2018 5:32 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 1/2] drm/amdgpu/pp: endian fixes for
> process_pptables_v1_0.c
> 
> Properly swap when reading from the vbios.
> 
> Signed-off-by: Alex Deucher 
> ---
>  .../amd/powerplay/hwmgr/process_pptables_v1_0.c| 194 ++---
> 
>  1 file changed, 97 insertions(+), 97 deletions(-)
> 
> diff --git
> a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
> index 4e1fd5393845..ae64ff7153d6 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
> @@ -214,23 +214,23 @@ static int
> get_platform_power_management_table(
>   ptr->ppm_design
>   = atom_ppm_table->ucPpmDesign;
>   ptr->cpu_core_number
> - = atom_ppm_table->usCpuCoreNumber;
> + = le16_to_cpu(atom_ppm_table->usCpuCoreNumber);
>   ptr->platform_tdp
> - = atom_ppm_table->ulPlatformTDP;
> + = le32_to_cpu(atom_ppm_table->ulPlatformTDP);
>   ptr->small_ac_platform_tdp
> - = atom_ppm_table->ulSmallACPlatformTDP;
> + = le32_to_cpu(atom_ppm_table->ulSmallACPlatformTDP);
>   ptr->platform_tdc
> - = atom_ppm_table->ulPlatformTDC;
> + = le32_to_cpu(atom_ppm_table->ulPlatformTDC);
>   ptr->small_ac_platform_tdc
> - = atom_ppm_table->ulSmallACPlatformTDC;
> + = le32_to_cpu(atom_ppm_table->ulSmallACPlatformTDC);
>   ptr->apu_tdp
> - = atom_ppm_table->ulApuTDP;
> + = le32_to_cpu(atom_ppm_table->ulApuTDP);
>   ptr->dgpu_tdp
> - = atom_ppm_table->ulDGpuTDP;
> + = le32_to_cpu(atom_ppm_table->ulDGpuTDP);
>   ptr->dgpu_ulv_power
> - = atom_ppm_table->ulDGpuUlvPower;
> + = le32_to_cpu(atom_ppm_table->ulDGpuUlvPower);
>   ptr->tj_max
> - = atom_ppm_table->ulTjmax;
> + = le32_to_cpu(atom_ppm_table->ulTjmax);
> 
>   pp_table_information->ppm_parameter_table = ptr;
> 
> @@ -355,11 +355,11 @@ static int get_hard_limits(
>   PP_ASSERT_WITH_CODE((0 != limitable->ucNumEntries), "Invalid
> PowerPlay Table!", return -1);
> 
>   /* currently we always take entries[0] parameters */
> - limits->sclk = (uint32_t)limitable->entries[0].ulSCLKLimit;
> - limits->mclk = (uint32_t)limitable->entries[0].ulMCLKLimit;
> - limits->vddc = (uint16_t)limitable->entries[0].usVddcLimit;
> - limits->vddci = (uint16_t)limitable->entries[0].usVddciLimit;
> - limits->vddgfx = (uint16_t)limitable->entries[0].usVddgfxLimit;
> + limits->sclk = le32_to_cpu(limitable->entries[0].ulSCLKLimit);
> + limits->mclk = le32_to_cpu(limitable->entries[0].ulMCLKLimit);
> + limits->vddc = le16_to_cpu(limitable->entries[0].usVddcLimit);
> + limits->vddci = le16_to_cpu(limitable->entries[0].usVddciLimit);
> + limits->vddgfx = le16_to_cpu(limitable->entries[0].usVddgfxLimit);
> 
>   return 0;
>  }
> @@ -396,10 +396,10 @@ static int get_mclk_voltage_dependency_table(
> 
>   ATOM_Tonga_MCLK_Dependency_Record,
>   entries, mclk_dep_table, i);
>   mclk_table_record->vddInd = mclk_dep_record->ucVddcInd;
> - mclk_table_record->vdd_offset = mclk_dep_record-
> >usVddgfxOffset;
> - mclk_table_record->vddci = mclk_dep_record->usVddci;
> - mclk_table_record->mvdd = mclk_dep_record->usMvdd;
> - mclk_table_record->clk = mclk_dep_record->ulMclk;
> + mclk_table_record->vdd_offset =
> le16_to_cpu(mclk_dep_record->usVddgfxOffset);
> + mclk_table_record->vddci = le16_to_cpu(mclk_dep_record-
> >usVddci);
> + mclk_table_record->mvdd = le16_to_cpu(mclk_dep_record-
> >usMvdd);
> + mclk_table_record->clk = le32_to_cpu(mclk_dep_record-
> >ulMclk);
>   }
> 
>   *pp_tonga_mclk_dep_table = mclk_table; @@ -443,8 +443,8 @@
> static int get_sclk_voltage_dependency_table(
> 
>   phm_ppt_v1_clock_voltage_dependency_record,
>   entries, sclk_table, i);
>   sclk_table_record->vddInd = sclk_dep_record-
> >ucVddInd;
> - sclk_table_record->vdd_offset = sclk_dep_record-
> >usVddcOffset;
> - sclk_table_record->clk = sclk_dep_record->ulSclk;
> + sclk_table_record->vdd_offset =
> le16_to_cpu(sclk_dep_record->usVddcOffset);
> + sclk_table_record->clk =
> le32_to_cpu(sclk_dep_record->ulSclk);
>   sclk_table_record->cks_enable =
>   (((sclk_dep_record-
> >ucCKSVOffsetandDisable & 0x80) >> 7) == 0) ? 1 : 0;
>

[PATCH 2/2] drm/amdgpu/pp: endian fixes for processpptables.c

2018-08-07 Thread Alex Deucher
Properly swap when reading from the vbios.

Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/powerplay/hwmgr/processpptables.c  | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
index 925e17104f90..77c14671866c 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/processpptables.c
@@ -757,8 +757,8 @@ static int init_non_clock_fields(struct pp_hwmgr *hwmgr,
ps->validation.supportedPowerLevels = pnon_clock_info->ucRequiredPower;
 
if (ATOM_PPLIB_NONCLOCKINFO_VER1 < version) {
-   ps->uvd_clocks.VCLK = pnon_clock_info->ulVCLK;
-   ps->uvd_clocks.DCLK = pnon_clock_info->ulDCLK;
+   ps->uvd_clocks.VCLK = le32_to_cpu(pnon_clock_info->ulVCLK);
+   ps->uvd_clocks.DCLK = le32_to_cpu(pnon_clock_info->ulDCLK);
} else {
ps->uvd_clocks.VCLK = 0;
ps->uvd_clocks.DCLK = 0;
@@ -937,8 +937,9 @@ int pp_tables_get_entry(struct pp_hwmgr *hwmgr,
if (entry_index > powerplay_table->ucNumStates)
return -1;
 
-   pstate_entry = (ATOM_PPLIB_STATE *)((unsigned 
long)powerplay_table + powerplay_table->usStateArrayOffset +
-   entry_index * 
powerplay_table->ucStateEntrySize);
+   pstate_entry = (ATOM_PPLIB_STATE *)((unsigned 
long)powerplay_table +
+   
le16_to_cpu(powerplay_table->usStateArrayOffset) +
+   entry_index * 
powerplay_table->ucStateEntrySize);
 
pnon_clock_info = (ATOM_PPLIB_NONCLOCK_INFO *)((unsigned 
long)powerplay_table +

le16_to_cpu(powerplay_table->usNonClockInfoArrayOffset) +
@@ -1063,13 +1064,13 @@ static int init_overdrive_limits(struct pp_hwmgr *hwmgr,
 , , );
 
if ((fw_info->ucTableFormatRevision == 1)
-   && (fw_info->usStructureSize >= 
sizeof(ATOM_FIRMWARE_INFO_V1_4)))
+   && (le16_to_cpu(fw_info->usStructureSize) >= 
sizeof(ATOM_FIRMWARE_INFO_V1_4)))
result = init_overdrive_limits_V1_4(hwmgr,
powerplay_table,
(const ATOM_FIRMWARE_INFO_V1_4 *)fw_info);
 
else if ((fw_info->ucTableFormatRevision == 2)
-   && (fw_info->usStructureSize >= 
sizeof(ATOM_FIRMWARE_INFO_V2_1)))
+&& (le16_to_cpu(fw_info->usStructureSize) >= 
sizeof(ATOM_FIRMWARE_INFO_V2_1)))
result = init_overdrive_limits_V2_1(hwmgr,
powerplay_table,
(const ATOM_FIRMWARE_INFO_V2_1 *)fw_info);
@@ -1303,7 +1304,7 @@ static int init_clock_voltage_dependency(struct pp_hwmgr 
*hwmgr,
if (0 != powerplay_table4->usVddcDependencyOnSCLKOffset) {
table = (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
(((unsigned long) powerplay_table4) +
-   powerplay_table4->usVddcDependencyOnSCLKOffset);
+
le16_to_cpu(powerplay_table4->usVddcDependencyOnSCLKOffset));
result = get_clock_voltage_dependency_table(hwmgr,
>dyn_state.vddc_dependency_on_sclk, 
table);
}
@@ -1311,7 +1312,7 @@ static int init_clock_voltage_dependency(struct pp_hwmgr 
*hwmgr,
if (result == 0 && (0 != 
powerplay_table4->usVddciDependencyOnMCLKOffset)) {
table = (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
(((unsigned long) powerplay_table4) +
-   
powerplay_table4->usVddciDependencyOnMCLKOffset);
+
le16_to_cpu(powerplay_table4->usVddciDependencyOnMCLKOffset));
result = get_clock_voltage_dependency_table(hwmgr,
>dyn_state.vddci_dependency_on_mclk, 
table);
}
@@ -1319,7 +1320,7 @@ static int init_clock_voltage_dependency(struct pp_hwmgr 
*hwmgr,
if (result == 0 && (0 != 
powerplay_table4->usVddcDependencyOnMCLKOffset)) {
table = (ATOM_PPLIB_Clock_Voltage_Dependency_Table *)
(((unsigned long) powerplay_table4) +
-   powerplay_table4->usVddcDependencyOnMCLKOffset);
+
le16_to_cpu(powerplay_table4->usVddcDependencyOnMCLKOffset));
result = get_clock_voltage_dependency_table(hwmgr,
>dyn_state.vddc_dependency_on_mclk, 
table);
}
@@ -1327,7 +1328,7 @@ static int init_clock_voltage_dependency(struct pp_hwmgr 

[PATCH 1/2] drm/amdgpu/pp: endian fixes for process_pptables_v1_0.c

2018-08-07 Thread Alex Deucher
Properly swap when reading from the vbios.

Signed-off-by: Alex Deucher 
---
 .../amd/powerplay/hwmgr/process_pptables_v1_0.c| 194 ++---
 1 file changed, 97 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
index 4e1fd5393845..ae64ff7153d6 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c
@@ -214,23 +214,23 @@ static int get_platform_power_management_table(
ptr->ppm_design
= atom_ppm_table->ucPpmDesign;
ptr->cpu_core_number
-   = atom_ppm_table->usCpuCoreNumber;
+   = le16_to_cpu(atom_ppm_table->usCpuCoreNumber);
ptr->platform_tdp
-   = atom_ppm_table->ulPlatformTDP;
+   = le32_to_cpu(atom_ppm_table->ulPlatformTDP);
ptr->small_ac_platform_tdp
-   = atom_ppm_table->ulSmallACPlatformTDP;
+   = le32_to_cpu(atom_ppm_table->ulSmallACPlatformTDP);
ptr->platform_tdc
-   = atom_ppm_table->ulPlatformTDC;
+   = le32_to_cpu(atom_ppm_table->ulPlatformTDC);
ptr->small_ac_platform_tdc
-   = atom_ppm_table->ulSmallACPlatformTDC;
+   = le32_to_cpu(atom_ppm_table->ulSmallACPlatformTDC);
ptr->apu_tdp
-   = atom_ppm_table->ulApuTDP;
+   = le32_to_cpu(atom_ppm_table->ulApuTDP);
ptr->dgpu_tdp
-   = atom_ppm_table->ulDGpuTDP;
+   = le32_to_cpu(atom_ppm_table->ulDGpuTDP);
ptr->dgpu_ulv_power
-   = atom_ppm_table->ulDGpuUlvPower;
+   = le32_to_cpu(atom_ppm_table->ulDGpuUlvPower);
ptr->tj_max
-   = atom_ppm_table->ulTjmax;
+   = le32_to_cpu(atom_ppm_table->ulTjmax);
 
pp_table_information->ppm_parameter_table = ptr;
 
@@ -355,11 +355,11 @@ static int get_hard_limits(
PP_ASSERT_WITH_CODE((0 != limitable->ucNumEntries), "Invalid PowerPlay 
Table!", return -1);
 
/* currently we always take entries[0] parameters */
-   limits->sclk = (uint32_t)limitable->entries[0].ulSCLKLimit;
-   limits->mclk = (uint32_t)limitable->entries[0].ulMCLKLimit;
-   limits->vddc = (uint16_t)limitable->entries[0].usVddcLimit;
-   limits->vddci = (uint16_t)limitable->entries[0].usVddciLimit;
-   limits->vddgfx = (uint16_t)limitable->entries[0].usVddgfxLimit;
+   limits->sclk = le32_to_cpu(limitable->entries[0].ulSCLKLimit);
+   limits->mclk = le32_to_cpu(limitable->entries[0].ulMCLKLimit);
+   limits->vddc = le16_to_cpu(limitable->entries[0].usVddcLimit);
+   limits->vddci = le16_to_cpu(limitable->entries[0].usVddciLimit);
+   limits->vddgfx = le16_to_cpu(limitable->entries[0].usVddgfxLimit);
 
return 0;
 }
@@ -396,10 +396,10 @@ static int get_mclk_voltage_dependency_table(
ATOM_Tonga_MCLK_Dependency_Record,
entries, mclk_dep_table, i);
mclk_table_record->vddInd = mclk_dep_record->ucVddcInd;
-   mclk_table_record->vdd_offset = mclk_dep_record->usVddgfxOffset;
-   mclk_table_record->vddci = mclk_dep_record->usVddci;
-   mclk_table_record->mvdd = mclk_dep_record->usMvdd;
-   mclk_table_record->clk = mclk_dep_record->ulMclk;
+   mclk_table_record->vdd_offset = 
le16_to_cpu(mclk_dep_record->usVddgfxOffset);
+   mclk_table_record->vddci = 
le16_to_cpu(mclk_dep_record->usVddci);
+   mclk_table_record->mvdd = le16_to_cpu(mclk_dep_record->usMvdd);
+   mclk_table_record->clk = le32_to_cpu(mclk_dep_record->ulMclk);
}
 
*pp_tonga_mclk_dep_table = mclk_table;
@@ -443,8 +443,8 @@ static int get_sclk_voltage_dependency_table(

phm_ppt_v1_clock_voltage_dependency_record,
entries, sclk_table, i);
sclk_table_record->vddInd = sclk_dep_record->ucVddInd;
-   sclk_table_record->vdd_offset = 
sclk_dep_record->usVddcOffset;
-   sclk_table_record->clk = sclk_dep_record->ulSclk;
+   sclk_table_record->vdd_offset = 
le16_to_cpu(sclk_dep_record->usVddcOffset);
+   sclk_table_record->clk = 
le32_to_cpu(sclk_dep_record->ulSclk);
sclk_table_record->cks_enable =
(((sclk_dep_record->ucCKSVOffsetandDisable & 
0x80) >> 7) == 0) ? 1 : 0;
sclk_table_record->cks_voffset = 
(sclk_dep_record->ucCKSVOffsetandDisable & 0x7F);
@@ -475,12 +475,12 @@ static int get_sclk_voltage_dependency_table(

phm_ppt_v1_clock_voltage_dependency_record,

[PATCH v6] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()
v3: Replace looping with get_fences_rcu and special case the promotion
of a single shared fence directly to an exclusive fence, bypassing the
fence array.
v4: Drop the fence array ref after assigning to reservation_object

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
Testcase: igt/amd_prime/amd-to-i915
References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported 
bo's. (v5)")
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
Reviewed-by: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 59 ---
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 3fdd5688da0b..0394f5c77f81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
 #include "amdgpu_display.h"
 #include 
 #include 
+#include 
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
@@ -188,6 +189,48 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct dma_fence **fences;
+   unsigned int count;
+   int r;
+
+   if (!reservation_object_get_list(obj)) /* no shared fences to convert */
+   return 0;
+
+   r = reservation_object_get_fences_rcu(obj, NULL, , );
+   if (r)
+   return r;
+
+   if (count == 0) {
+   /* Now that was unexpected. */
+   } else if (count == 1) {
+   reservation_object_add_excl_fence(obj, fences[0]);
+   dma_fence_put(fences[0]);
+   kfree(fences);
+   } else {
+   struct dma_fence_array *array;
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   dma_fence_put(>base);
+   }
+
+   return 0;
+
+err_fences_put:
+   while (count--)
+   dma_fence_put(fences[count]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
 /**
  * amdgpu_gem_map_attach - _buf_ops.attach implementation
  * @dma_buf: shared DMA buffer
@@ -219,16 +262,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 
if (attach->dev->driver != adev->dev->driver) {
/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the reservation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   r = __reservation_object_make_exclusive(bo->tbo.resv);
+   if (r)
goto error_unreserve;
-   }
}
 
/* pin buffer into GTT */
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v5] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Christian König

Am 07.08.2018 um 20:32 schrieb Chris Wilson:

amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()
v3: Replace looping with get_fences_rcu and special case the promotion
of a single shared fence directly to an exclusive fence, bypassing the
fence array.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
Testcase: igt/amd_prime/amd-to-i915
References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported 
bo's. (v5)")
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 58 +++
  1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 3fdd5688da0b..06a310b5b07b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
  #include "amdgpu_display.h"
  #include 
  #include 
+#include 
  
  static const struct dma_buf_ops amdgpu_dmabuf_ops;
  
@@ -188,6 +189,47 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,

return ERR_PTR(ret);
  }
  
+static int

+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct dma_fence **fences;
+   unsigned int count;
+   int r;
+
+   if (!reservation_object_get_list(obj)) /* no shared fences to convert */
+   return 0;
+
+   r = reservation_object_get_fences_rcu(obj, NULL, , );
+   if (r)
+   return r;
+
+   if (count == 0) {
+   /* Now that was unexpected. */
+   } else if (count == 1) {
+   reservation_object_add_excl_fence(obj, fences[0]);
+   dma_fence_put(fences[0]);
+   kfree(fences);
+   } else {
+   struct dma_fence_array *array;
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   }
+
+   return 0;
+
+err_fences_put:
+   while (count--)
+   dma_fence_put(fences[count]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
  /**
   * amdgpu_gem_map_attach - _buf_ops.attach implementation
   * @dma_buf: shared DMA buffer
@@ -219,16 +261,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
  
  	if (attach->dev->driver != adev->dev->driver) {

/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the reservation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   r = __reservation_object_make_exclusive(bo->tbo.resv);
+   if (r)
goto error_unreserve;
-   }
}
  
  	/* pin buffer into GTT */


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v5] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()
v3: Replace looping with get_fences_rcu and special case the promotion
of a single shared fence directly to an exclusive fence, bypassing the
fence array.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
Testcase: igt/amd_prime/amd-to-i915
References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported 
bo's. (v5)")
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 58 +++
 1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 3fdd5688da0b..06a310b5b07b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
 #include "amdgpu_display.h"
 #include 
 #include 
+#include 
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
@@ -188,6 +189,47 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct dma_fence **fences;
+   unsigned int count;
+   int r;
+
+   if (!reservation_object_get_list(obj)) /* no shared fences to convert */
+   return 0;
+
+   r = reservation_object_get_fences_rcu(obj, NULL, , );
+   if (r)
+   return r;
+
+   if (count == 0) {
+   /* Now that was unexpected. */
+   } else if (count == 1) {
+   reservation_object_add_excl_fence(obj, fences[0]);
+   dma_fence_put(fences[0]);
+   kfree(fences);
+   } else {
+   struct dma_fence_array *array;
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   }
+
+   return 0;
+
+err_fences_put:
+   while (count--)
+   dma_fence_put(fences[count]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
 /**
  * amdgpu_gem_map_attach - _buf_ops.attach implementation
  * @dma_buf: shared DMA buffer
@@ -219,16 +261,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 
if (attach->dev->driver != adev->dev->driver) {
/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the reservation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   r = __reservation_object_make_exclusive(bo->tbo.resv);
+   if (r)
goto error_unreserve;
-   }
}
 
/* pin buffer into GTT */
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
Quoting Christian König (2018-08-07 19:18:55)
> Am 07.08.2018 um 20:14 schrieb Chris Wilson:
> > Quoting Christian König (2018-08-07 18:57:16)
> >> Am 07.08.2018 um 18:08 schrieb Chris Wilson:
> >>> amdgpu only uses shared-fences internally, but dmabuf importers rely on
> >>> implicit write hazard tracking via the reservation_object.fence_excl.
> >>> For example, the importer use the write hazard for timing a page flip to
> >>> only occur after the exporter has finished flushing its write into the
> >>> surface. As such, on exporting a dmabuf, we must either flush all
> >>> outstanding fences (for we do not know which are writes and should have
> >>> been exclusive) or alternatively create a new exclusive fence that is
> >>> the composite of all the existing shared fences, and so will only be
> >>> signaled when all earlier fences are signaled (ensuring that we can not
> >>> be signaled before the completion of any earlier write).
> >>>
> >>> v2: reservation_object is already locked by amdgpu_bo_reserve()
> >>>
> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
> >>> Testcase: igt/amd_prime/amd-to-i915
> >>> References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime 
> >>> exported bo's. (v5)")
> >>> Signed-off-by: Chris Wilson 
> >>> Cc: Alex Deucher 
> >>> Cc: "Christian König" 
> >>> ---
> >>>
> >>> This time, hopefully proofread and references complete.
> >>> -Chris
> >>>
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
> >>>1 file changed, 60 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> >>> index 1c5d97f4b4dd..dff2b01a3d89 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> >>> @@ -37,6 +37,7 @@
> >>>#include "amdgpu_display.h"
> >>>#include 
> >>>#include 
> >>> +#include 
> >>>
> >>>static const struct dma_buf_ops amdgpu_dmabuf_ops;
> >>>
> >>> @@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device 
> >>> *dev,
> >>>return ERR_PTR(ret);
> >>>}
> >>>
> >>> +static int
> >>> +__reservation_object_make_exclusive(struct reservation_object *obj)
> >>> +{
> >>> + struct reservation_object_list *fobj;
> >>> + struct dma_fence_array *array;
> >>> + struct dma_fence **fences;
> >>> + unsigned int count, i;
> >>> +
> >>> + fobj = reservation_object_get_list(obj);
> >>> + if (!fobj)
> >>> + return 0;
> >>> +
> >>> + count = !!rcu_access_pointer(obj->fence_excl);
> >>> + count += fobj->shared_count;
> >>> +
> >>> + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
> >>> + if (!fences)
> >>> + return -ENOMEM;
> >>> +
> >>> + for (i = 0; i < fobj->shared_count; i++) {
> >>> + struct dma_fence *f =
> >>> + rcu_dereference_protected(fobj->shared[i],
> >>> +   
> >>> reservation_object_held(obj));
> >>> +
> >>> + fences[i] = dma_fence_get(f);
> >>> + }
> >>> +
> >>> + if (rcu_access_pointer(obj->fence_excl)) {
> >>> + struct dma_fence *f =
> >>> + rcu_dereference_protected(obj->fence_excl,
> >>> +   
> >>> reservation_object_held(obj));
> >>> +
> >>> + fences[i] = dma_fence_get(f);
> >>> + }
> >>> +
> >>> + array = dma_fence_array_create(count, fences,
> >>> +dma_fence_context_alloc(1), 0,
> >>> +false);
> >>> + if (!array)
> >>> + goto err_fences_put;
> >>> +
> >>> + reservation_object_add_excl_fence(obj, >base);
> >>> + return 0;
> >>> +
> >>> +err_fences_put:
> >>> + for (i = 0; i < count; i++)
> >>> + dma_fence_put(fences[i]);
> >>> + kfree(fences);
> >>> + return -ENOMEM;
> >>> +}
> >>> +
> >> This can be simplified a lot. See amdgpu_pasid_free_delayed() for an
> >> example:
> > {
> >   if (!reservation_object_get_list(obj))
> >   return 0;
> >
> >   r = reservation_object_get_fences_rcu(obj, NULL, , );
> >   if (r)
> >   return r;
> >
> >   array = dma_fence_array_create(count, fences,
> >  dma_fence_context_alloc(1), 0,
> >  false);
> >   if (!array)
> >   goto err_fences_put;
> >
> >   reservation_object_add_excl_fence(obj, >base);
> >   return 0;
> >
> > err:
> >   ...
> > }
> >
> > My starting point was going to be use get_fences_rcu, but get_fences_rcu
> > can hardly be called simple for where the lock is required to be held
> > start to finish ;)
> 
> What are you talking about? get_fences_rcu doesn't require any locking 
> at all.
> 
> You only need to the locking to make sure that 

Re: [PATCH v4] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Christian König

Am 07.08.2018 um 20:14 schrieb Chris Wilson:

Quoting Christian König (2018-08-07 18:57:16)

Am 07.08.2018 um 18:08 schrieb Chris Wilson:

amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
Testcase: igt/amd_prime/amd-to-i915
References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported 
bo's. (v5)")
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---

This time, hopefully proofread and references complete.
-Chris

---
   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
   1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..dff2b01a3d89 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
   #include "amdgpu_display.h"
   #include 
   #include 
+#include 
   
   static const struct dma_buf_ops amdgpu_dmabuf_ops;
   
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,

   return ERR_PTR(ret);
   }
   
+static int

+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+ struct reservation_object_list *fobj;
+ struct dma_fence_array *array;
+ struct dma_fence **fences;
+ unsigned int count, i;
+
+ fobj = reservation_object_get_list(obj);
+ if (!fobj)
+ return 0;
+
+ count = !!rcu_access_pointer(obj->fence_excl);
+ count += fobj->shared_count;
+
+ fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+ if (!fences)
+ return -ENOMEM;
+
+ for (i = 0; i < fobj->shared_count; i++) {
+ struct dma_fence *f =
+ rcu_dereference_protected(fobj->shared[i],
+   reservation_object_held(obj));
+
+ fences[i] = dma_fence_get(f);
+ }
+
+ if (rcu_access_pointer(obj->fence_excl)) {
+ struct dma_fence *f =
+ rcu_dereference_protected(obj->fence_excl,
+   reservation_object_held(obj));
+
+ fences[i] = dma_fence_get(f);
+ }
+
+ array = dma_fence_array_create(count, fences,
+dma_fence_context_alloc(1), 0,
+false);
+ if (!array)
+ goto err_fences_put;
+
+ reservation_object_add_excl_fence(obj, >base);
+ return 0;
+
+err_fences_put:
+ for (i = 0; i < count; i++)
+ dma_fence_put(fences[i]);
+ kfree(fences);
+ return -ENOMEM;
+}
+

This can be simplified a lot. See amdgpu_pasid_free_delayed() for an
example:

{
if (!reservation_object_get_list(obj))
return 0;

r = reservation_object_get_fences_rcu(obj, NULL, , );
if (r)
return r;

array = dma_fence_array_create(count, fences,
   dma_fence_context_alloc(1), 0,
   false);
if (!array)
goto err_fences_put;

reservation_object_add_excl_fence(obj, >base);
return 0;

err:
...
}

My starting point was going to be use get_fences_rcu, but get_fences_rcu
can hardly be called simple for where the lock is required to be held
start to finish ;)


What are you talking about? get_fences_rcu doesn't require any locking 
at all.


You only need to the locking to make sure that between creating the 
fence array and calling reservation_object_add_excl_fence() no other 
fence is added.


Christian.


-Chris
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
Quoting Christian König (2018-08-07 18:57:16)
> Am 07.08.2018 um 18:08 schrieb Chris Wilson:
> > amdgpu only uses shared-fences internally, but dmabuf importers rely on
> > implicit write hazard tracking via the reservation_object.fence_excl.
> > For example, the importer use the write hazard for timing a page flip to
> > only occur after the exporter has finished flushing its write into the
> > surface. As such, on exporting a dmabuf, we must either flush all
> > outstanding fences (for we do not know which are writes and should have
> > been exclusive) or alternatively create a new exclusive fence that is
> > the composite of all the existing shared fences, and so will only be
> > signaled when all earlier fences are signaled (ensuring that we can not
> > be signaled before the completion of any earlier write).
> >
> > v2: reservation_object is already locked by amdgpu_bo_reserve()
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
> > Testcase: igt/amd_prime/amd-to-i915
> > References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime 
> > exported bo's. (v5)")
> > Signed-off-by: Chris Wilson 
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > ---
> >
> > This time, hopefully proofread and references complete.
> > -Chris
> >
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
> >   1 file changed, 60 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > index 1c5d97f4b4dd..dff2b01a3d89 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > @@ -37,6 +37,7 @@
> >   #include "amdgpu_display.h"
> >   #include 
> >   #include 
> > +#include 
> >   
> >   static const struct dma_buf_ops amdgpu_dmabuf_ops;
> >   
> > @@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device 
> > *dev,
> >   return ERR_PTR(ret);
> >   }
> >   
> > +static int
> > +__reservation_object_make_exclusive(struct reservation_object *obj)
> > +{
> > + struct reservation_object_list *fobj;
> > + struct dma_fence_array *array;
> > + struct dma_fence **fences;
> > + unsigned int count, i;
> > +
> > + fobj = reservation_object_get_list(obj);
> > + if (!fobj)
> > + return 0;
> > +
> > + count = !!rcu_access_pointer(obj->fence_excl);
> > + count += fobj->shared_count;
> > +
> > + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
> > + if (!fences)
> > + return -ENOMEM;
> > +
> > + for (i = 0; i < fobj->shared_count; i++) {
> > + struct dma_fence *f =
> > + rcu_dereference_protected(fobj->shared[i],
> > +   
> > reservation_object_held(obj));
> > +
> > + fences[i] = dma_fence_get(f);
> > + }
> > +
> > + if (rcu_access_pointer(obj->fence_excl)) {
> > + struct dma_fence *f =
> > + rcu_dereference_protected(obj->fence_excl,
> > +   
> > reservation_object_held(obj));
> > +
> > + fences[i] = dma_fence_get(f);
> > + }
> > +
> > + array = dma_fence_array_create(count, fences,
> > +dma_fence_context_alloc(1), 0,
> > +false);
> > + if (!array)
> > + goto err_fences_put;
> > +
> > + reservation_object_add_excl_fence(obj, >base);
> > + return 0;
> > +
> > +err_fences_put:
> > + for (i = 0; i < count; i++)
> > + dma_fence_put(fences[i]);
> > + kfree(fences);
> > + return -ENOMEM;
> > +}
> > +
> 
> This can be simplified a lot. See amdgpu_pasid_free_delayed() for an 
> example:

{
if (!reservation_object_get_list(obj))
return 0;

r = reservation_object_get_fences_rcu(obj, NULL, , );
if (r)
return r;

array = dma_fence_array_create(count, fences,
   dma_fence_context_alloc(1), 0,
   false);
if (!array)
goto err_fences_put;

reservation_object_add_excl_fence(obj, >base);
return 0;

err:
...
}

My starting point was going to be use get_fences_rcu, but get_fences_rcu
can hardly be called simple for where the lock is required to be held
start to finish ;)
-Chris
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Christian König

Am 07.08.2018 um 18:08 schrieb Chris Wilson:

amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
Testcase: igt/amd_prime/amd-to-i915
References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported 
bo's. (v5)")
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---

This time, hopefully proofread and references complete.
-Chris

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
  1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..dff2b01a3d89 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
  #include "amdgpu_display.h"
  #include 
  #include 
+#include 
  
  static const struct dma_buf_ops amdgpu_dmabuf_ops;
  
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,

return ERR_PTR(ret);
  }
  
+static int

+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct reservation_object_list *fobj;
+   struct dma_fence_array *array;
+   struct dma_fence **fences;
+   unsigned int count, i;
+
+   fobj = reservation_object_get_list(obj);
+   if (!fobj)
+   return 0;
+
+   count = !!rcu_access_pointer(obj->fence_excl);
+   count += fobj->shared_count;
+
+   fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   for (i = 0; i < fobj->shared_count; i++) {
+   struct dma_fence *f =
+   rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   if (rcu_access_pointer(obj->fence_excl)) {
+   struct dma_fence *f =
+   rcu_dereference_protected(obj->fence_excl,
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   return 0;
+
+err_fences_put:
+   for (i = 0; i < count; i++)
+   dma_fence_put(fences[i]);
+   kfree(fences);
+   return -ENOMEM;
+}
+


This can be simplified a lot. See amdgpu_pasid_free_delayed() for an 
example:
    r = reservation_object_get_fences_rcu(resv, NULL, , 
);

    if (r)
    goto fallback;

    if (count == 0) {
    amdgpu_pasid_free(pasid);
    return;
    }

    if (count == 1) {
    fence = fences[0];
    kfree(fences);
    } else {
    uint64_t context = dma_fence_context_alloc(1);
    struct dma_fence_array *array;

    array = dma_fence_array_create(count, fences, context,
   1, false);
    if (!array) {
    kfree(fences);
    goto fallback;
    }
    fence = >base;
    }



Regards,
Christian.


  /**
   * amdgpu_gem_map_attach - _buf_ops.attach implementation
   * @dma_buf: shared DMA buffer
@@ -219,16 +271,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
  
  	if (attach->dev->driver != adev->dev->driver) {

/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the reservation object into a single exclusive
+* 

Re: [PATCH] drm: Remove defunct dma_buf_kmap stubs

2018-08-07 Thread Daniel Vetter
On Tue, Aug 07, 2018 at 06:47:48PM +0100, Chris Wilson wrote:
> Since commit 09ea0dfbf972 ("dma-buf: make map_atomic and map function
> pointers optional"), we no longer need to provide stub no-op functions
> as the core now provides them directly.
> 
> References: 09ea0dfbf972 ("dma-buf: make map_atomic and map function pointers 
> optional")
> Signed-off-by: Chris Wilson 
> Cc: Daniel Vetter 
> Cc: Gerd Hoffmann 
> Cc: Alex Deucher 
> Cc: "Christian König" 

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  2 --
>  drivers/gpu/drm/drm_prime.c   | 30 ---
>  include/drm/drm_prime.h   |  3 ---
>  3 files changed, 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> index 1c5d97f4b4dd..3fdd5688da0b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> @@ -338,8 +338,6 @@ static const struct dma_buf_ops amdgpu_dmabuf_ops = {
>   .unmap_dma_buf = drm_gem_unmap_dma_buf,
>   .release = drm_gem_dmabuf_release,
>   .begin_cpu_access = amdgpu_gem_begin_cpu_access,
> - .map = drm_gem_dmabuf_kmap,
> - .unmap = drm_gem_dmabuf_kunmap,
>   .mmap = drm_gem_dmabuf_mmap,
>   .vmap = drm_gem_dmabuf_vmap,
>   .vunmap = drm_gem_dmabuf_vunmap,
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 186db2e4c57a..e03d3bdaa59b 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -433,34 +433,6 @@ void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void 
> *vaddr)
>  }
>  EXPORT_SYMBOL(drm_gem_dmabuf_vunmap);
>  
> -/**
> - * drm_gem_dmabuf_kmap - map implementation for GEM
> - * @dma_buf: buffer to be mapped
> - * @page_num: page number within the buffer
> - *
> - * Not implemented. This can be used as the _buf_ops.map callback.
> - */
> -void *drm_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num)
> -{
> - return NULL;
> -}
> -EXPORT_SYMBOL(drm_gem_dmabuf_kmap);
> -
> -/**
> - * drm_gem_dmabuf_kunmap - unmap implementation for GEM
> - * @dma_buf: buffer to be unmapped
> - * @page_num: page number within the buffer
> - * @addr: virtual address of the buffer
> - *
> - * Not implemented. This can be used as the _buf_ops.unmap callback.
> - */
> -void drm_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num,
> -void *addr)
> -{
> -
> -}
> -EXPORT_SYMBOL(drm_gem_dmabuf_kunmap);
> -
>  /**
>   * drm_gem_dmabuf_mmap - dma_buf mmap implementation for GEM
>   * @dma_buf: buffer to be mapped
> @@ -489,8 +461,6 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops 
> =  {
>   .map_dma_buf = drm_gem_map_dma_buf,
>   .unmap_dma_buf = drm_gem_unmap_dma_buf,
>   .release = drm_gem_dmabuf_release,
> - .map = drm_gem_dmabuf_kmap,
> - .unmap = drm_gem_dmabuf_kunmap,
>   .mmap = drm_gem_dmabuf_mmap,
>   .vmap = drm_gem_dmabuf_vmap,
>   .vunmap = drm_gem_dmabuf_vunmap,
> diff --git a/include/drm/drm_prime.h b/include/drm/drm_prime.h
> index d716d653b096..e2032fbc0f08 100644
> --- a/include/drm/drm_prime.h
> +++ b/include/drm/drm_prime.h
> @@ -93,9 +93,6 @@ void drm_gem_unmap_dma_buf(struct dma_buf_attachment 
> *attach,
>  enum dma_data_direction dir);
>  void *drm_gem_dmabuf_vmap(struct dma_buf *dma_buf);
>  void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr);
> -void *drm_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num);
> -void drm_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num,
> -void *addr);
>  int drm_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma);
>  
>  int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page 
> **pages,
> -- 
> 2.18.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm: Remove defunct dma_buf_kmap stubs

2018-08-07 Thread Christian König

Am 07.08.2018 um 19:47 schrieb Chris Wilson:

Since commit 09ea0dfbf972 ("dma-buf: make map_atomic and map function
pointers optional"), we no longer need to provide stub no-op functions
as the core now provides them directly.

References: 09ea0dfbf972 ("dma-buf: make map_atomic and map function pointers 
optional")
Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
Cc: Gerd Hoffmann 
Cc: Alex Deucher 
Cc: "Christian König" 


Reviewed-by: Christian König 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  2 --
  drivers/gpu/drm/drm_prime.c   | 30 ---
  include/drm/drm_prime.h   |  3 ---
  3 files changed, 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..3fdd5688da0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -338,8 +338,6 @@ static const struct dma_buf_ops amdgpu_dmabuf_ops = {
.unmap_dma_buf = drm_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
.begin_cpu_access = amdgpu_gem_begin_cpu_access,
-   .map = drm_gem_dmabuf_kmap,
-   .unmap = drm_gem_dmabuf_kunmap,
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 186db2e4c57a..e03d3bdaa59b 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -433,34 +433,6 @@ void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void 
*vaddr)
  }
  EXPORT_SYMBOL(drm_gem_dmabuf_vunmap);
  
-/**

- * drm_gem_dmabuf_kmap - map implementation for GEM
- * @dma_buf: buffer to be mapped
- * @page_num: page number within the buffer
- *
- * Not implemented. This can be used as the _buf_ops.map callback.
- */
-void *drm_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num)
-{
-   return NULL;
-}
-EXPORT_SYMBOL(drm_gem_dmabuf_kmap);
-
-/**
- * drm_gem_dmabuf_kunmap - unmap implementation for GEM
- * @dma_buf: buffer to be unmapped
- * @page_num: page number within the buffer
- * @addr: virtual address of the buffer
- *
- * Not implemented. This can be used as the _buf_ops.unmap callback.
- */
-void drm_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num,
-  void *addr)
-{
-
-}
-EXPORT_SYMBOL(drm_gem_dmabuf_kunmap);
-
  /**
   * drm_gem_dmabuf_mmap - dma_buf mmap implementation for GEM
   * @dma_buf: buffer to be mapped
@@ -489,8 +461,6 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
.map_dma_buf = drm_gem_map_dma_buf,
.unmap_dma_buf = drm_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
-   .map = drm_gem_dmabuf_kmap,
-   .unmap = drm_gem_dmabuf_kunmap,
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
diff --git a/include/drm/drm_prime.h b/include/drm/drm_prime.h
index d716d653b096..e2032fbc0f08 100644
--- a/include/drm/drm_prime.h
+++ b/include/drm/drm_prime.h
@@ -93,9 +93,6 @@ void drm_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
   enum dma_data_direction dir);
  void *drm_gem_dmabuf_vmap(struct dma_buf *dma_buf);
  void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr);
-void *drm_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num);
-void drm_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num,
-  void *addr);
  int drm_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma);
  
  int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm: Remove defunct dma_buf_kmap stubs

2018-08-07 Thread Chris Wilson
Since commit 09ea0dfbf972 ("dma-buf: make map_atomic and map function
pointers optional"), we no longer need to provide stub no-op functions
as the core now provides them directly.

References: 09ea0dfbf972 ("dma-buf: make map_atomic and map function pointers 
optional")
Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
Cc: Gerd Hoffmann 
Cc: Alex Deucher 
Cc: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  2 --
 drivers/gpu/drm/drm_prime.c   | 30 ---
 include/drm/drm_prime.h   |  3 ---
 3 files changed, 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..3fdd5688da0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -338,8 +338,6 @@ static const struct dma_buf_ops amdgpu_dmabuf_ops = {
.unmap_dma_buf = drm_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
.begin_cpu_access = amdgpu_gem_begin_cpu_access,
-   .map = drm_gem_dmabuf_kmap,
-   .unmap = drm_gem_dmabuf_kunmap,
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 186db2e4c57a..e03d3bdaa59b 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -433,34 +433,6 @@ void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void 
*vaddr)
 }
 EXPORT_SYMBOL(drm_gem_dmabuf_vunmap);
 
-/**
- * drm_gem_dmabuf_kmap - map implementation for GEM
- * @dma_buf: buffer to be mapped
- * @page_num: page number within the buffer
- *
- * Not implemented. This can be used as the _buf_ops.map callback.
- */
-void *drm_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num)
-{
-   return NULL;
-}
-EXPORT_SYMBOL(drm_gem_dmabuf_kmap);
-
-/**
- * drm_gem_dmabuf_kunmap - unmap implementation for GEM
- * @dma_buf: buffer to be unmapped
- * @page_num: page number within the buffer
- * @addr: virtual address of the buffer
- *
- * Not implemented. This can be used as the _buf_ops.unmap callback.
- */
-void drm_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num,
-  void *addr)
-{
-
-}
-EXPORT_SYMBOL(drm_gem_dmabuf_kunmap);
-
 /**
  * drm_gem_dmabuf_mmap - dma_buf mmap implementation for GEM
  * @dma_buf: buffer to be mapped
@@ -489,8 +461,6 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
.map_dma_buf = drm_gem_map_dma_buf,
.unmap_dma_buf = drm_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
-   .map = drm_gem_dmabuf_kmap,
-   .unmap = drm_gem_dmabuf_kunmap,
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
diff --git a/include/drm/drm_prime.h b/include/drm/drm_prime.h
index d716d653b096..e2032fbc0f08 100644
--- a/include/drm/drm_prime.h
+++ b/include/drm/drm_prime.h
@@ -93,9 +93,6 @@ void drm_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
   enum dma_data_direction dir);
 void *drm_gem_dmabuf_vmap(struct dma_buf *dma_buf);
 void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr);
-void *drm_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num);
-void drm_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num,
-  void *addr);
 int drm_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma);
 
 int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/sriov: give 8s for recover vram under RUNTIME

2018-08-07 Thread Alex Deucher
On Tue, Aug 7, 2018 at 6:22 AM, Emily Deng  wrote:
> Under runtime, the wait fence time could be quite long when
> other VFs are in exclusive mode.
>
> SWDEV-161490
>
> Change-Id: Ifc32d56ca7fde01b1f4fe2b0db6959b51909008a
> Signed-off-by: Monk Liu 
> Signed-off-by: Emily Deng 

Seems pretty long.  Is this value based on any evidence (e.g., worse
case length of time slices, etc.) or just a long value that happens to
work?  Might be nice to provide a bit more context in the commit
message.  E.g., extend the timeout for recovering vram bos from
shadows on sr-iov to cover the worst case scenario for timeslices and
VFs.

Alex


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1d933db..ef82ad1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3124,7 +3124,7 @@ static int amdgpu_device_handle_vram_lost(struct 
> amdgpu_device *adev)
> long tmo;
>
> if (amdgpu_sriov_runtime(adev))
> -   tmo = msecs_to_jiffies(amdgpu_lockup_timeout);
> +   tmo = msecs_to_jiffies(8000);
> else
> tmo = msecs_to_jiffies(100);
>
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v4] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
Testcase: igt/amd_prime/amd-to-i915
References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported 
bo's. (v5)")
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---

This time, hopefully proofread and references complete.
-Chris

---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..dff2b01a3d89 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
 #include "amdgpu_display.h"
 #include 
 #include 
+#include 
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct reservation_object_list *fobj;
+   struct dma_fence_array *array;
+   struct dma_fence **fences;
+   unsigned int count, i;
+
+   fobj = reservation_object_get_list(obj);
+   if (!fobj)
+   return 0;
+
+   count = !!rcu_access_pointer(obj->fence_excl);
+   count += fobj->shared_count;
+
+   fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   for (i = 0; i < fobj->shared_count; i++) {
+   struct dma_fence *f =
+   rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   if (rcu_access_pointer(obj->fence_excl)) {
+   struct dma_fence *f =
+   rcu_dereference_protected(obj->fence_excl,
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   return 0;
+
+err_fences_put:
+   for (i = 0; i < count; i++)
+   dma_fence_put(fences[i]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
 /**
  * amdgpu_gem_map_attach - _buf_ops.attach implementation
  * @dma_buf: shared DMA buffer
@@ -219,16 +271,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 
if (attach->dev->driver != adev->dev->driver) {
/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the reservation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   r = __reservation_object_make_exclusive(bo->tbo.resv);
+   if (r)
goto error_unreserve;
-   }
}
 
/* pin buffer into GTT */
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Linux-v4.18-rc6] modpost-errors when compiling with clang-7 and CONFIG_DRM_AMDGPU=m

2018-08-07 Thread Sedat Dilek
On Sun, Jul 29, 2018 at 4:39 PM, Christian König
 wrote:
>> Do you need further informations?
>
> No, that is a known issue.
>

Hi,

is there any progress on this?

Thanks.

Regards,
- Sedat -

> Regards,
> Christian.
>
>
> Am 29.07.2018 um 15:52 schrieb Sedat Dilek:
>>
>> Hi,
>>
>> when compiling with clang-7 and CONFIG_DRM_AMDGPU=m I see the following...
>>
>>if [ "" = "-pg" ]; then if [ arch/x86/boot/compressed/misc.o !=
>> "scripts/mod/empty.o" ]; then ./scripts/recordmcount
>> "arch/x86/boot/compressed/misc.o"; fi; fi;
>> ERROR: "__addsf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__subdf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__gedf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__fixunssfsi" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__floatunsisf" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__unordsf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__gesf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__mulsf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__truncdfsf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__ltsf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__muldf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__divdf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__eqsf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__floatsisf" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__ledf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__gtsf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__fixdfsi" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__floatunsidf" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__nesf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__adddf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__extendsfdf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__fixunsdfsi" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__lesf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__ltdf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__floatsidf" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__subsf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__gtdf2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__fixsfsi" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__divsf3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> ERROR: "__floatdidf" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
>> make[4]: *** [scripts/Makefile.modpost:92: __modpost] Error 1
>> make[3]: *** [Makefile:1208: modules] Error 2
>> make[3]: *** Waiting for unfinished jobs
>>
>> For now I have disabled CONFIG_DRM_AMDGPU=n.
>>
>> Do you need further informations?
>>
>> Kind regards,
>> - Sedat -
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Daniel Vetter
On Tue, Aug 7, 2018 at 3:54 PM, Christian König
 wrote:
> Am 07.08.2018 um 15:37 schrieb Daniel Vetter:
>>
>> On Tue, Aug 07, 2018 at 03:17:06PM +0200, Christian König wrote:
>>>
>>> Am 07.08.2018 um 14:59 schrieb Daniel Vetter:

 On Tue, Aug 07, 2018 at 02:51:50PM +0200, Christian König wrote:
>
> Am 07.08.2018 um 14:43 schrieb Daniel Vetter:
>>
>> On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:
>>>
>>> Am 07.08.2018 um 13:05 schrieb Chris Wilson:

 amdgpu only uses shared-fences internally, but dmabuf importers rely
 on
 implicit write hazard tracking via the
 reservation_object.fence_excl.
>>>
>>> Well I would rather suggest a solution where we stop doing this.
>>>
>>> The problem here is that i915 assumes that a write operation always
>>> needs
>>> exclusive access to an object protected by an reservation object.
>>>
>>> At least for amdgpu, radeon and nouveau this assumption is incorrect,
>>> but
>>> only amdgpu has a design where userspace is not lying to the kernel
>>> about
>>> it's read/write access.
>>>
>>> What we should do instead is to add a flag to each shared fence to
>>> note if
>>> it is a write operation or not. Then we can trivially add a function
>>> to wait
>>> on on those in i915.
>>>
>>> I should have pushed harder for this solution when the problem came
>>> up
>>> initially,
>>
>> For shared buffers in an implicit syncing setup like dri2 the
>> exclusive
>> fence _is_ your write fence.
>
> And exactly that is absolutely not correct if you ask me.
>
> The exclusive fence is for two use cases, the first one is for drivers
> which
> don't care about concurrent accesses and only use serialized accesses
> and
> the second is for kernel uses under the hood of userspace, evictions,
> buffer
> moves etc..
>
> What i915 does is to abuse that concept for write once read many
> situations,
> and that in turn is the bug we need to fix here.

 Again, the exclusive fence was added for implicit sync usage like
 dri2/3.
 _Not_ for your own buffer manager. If you want to separate these two
 usages, then I guess we can do that, but claiming that i915 is the odd
 one
 out just aint true. You're arguing against all other kms drivers we have
 here.
>>>
>>> No I'm not. I discussed exactly that use case with Maarten when the
>>> reservation object was introduced.
>>>
>>> I think that Maarten named the explicit fence on purpose "explicit" fence
>>> and not "write" fence to make that distinction clear.
>>>
>>> I have to admit that this wasn't really documented, but it indeed was the
>>> original purpose to get away from the idea that writes needs to be
>>> exclusive.
>>>
>> That's how this stuff works. Note it's only
>> for implicit fencing for dri2 shared buffers. i915 lies as much as
>> everyone else for explicit syncing.
>
> That is really really ugly and should be fixed instead.

 It works and is uapi now. What's the gain in "fixing" it?
>>>
>>> It allows you to implement explicit fencing in the uapi without breaking
>>> backward compatibility.
>>>
>>> See the situation with amdgpu and intel is the same as with userspace
>>> with
>>> mixed implicit and explicit fencing.
>>>
>>> If that isn't fixed we will run into the same problem when DRI3 gets
>>> implicit fencing and some applications starts to use it while others
>>> still
>>> rely on the explicit fencing.
>>
>> I think we're a bit cross-eyed on exact semantics, but yes this is exactly
>> the use-case. Chris Wilson's use-case tries to emulate exactly what would
>> happen if implicitly fenced amdgpu rendered buffer should be displayed on
>> an i915 output. Then you need to set the exclusive fence correctly.
>
>
> Yes, exactly. That's what I totally agree on.
>
>> And yes we called them exclusive/shared because shared fences could also
>> be write fences. But for the implicitly synced userspace case, exclusive =
>> The write fence.
>>
>> So probably Chris' patch ends up syncing too much (since for explicit
>> syncing you don't want to attach an exclusive fence, because userspace
>> passes the fence around already to the atomic ioctl). But at least it's
>> correct for the implicit case. But that's entirely an optimization problem
>> within amdgpu.
>
>
> Completely agree as well, but Chris patch actually just optimizes things. It
> is not necessary for correct operation.

Duh, I didn't read the patch carefully enough ...

> See previously we just waited for the BO to be idle, now Chris patch
> collects all fences (shared and exclusive) and adds that as single elusive
> fence to avoid blocking.
>
>> Summary: If you have a shared buffer used in implicitly synced buffer
>> sharing, you _must_ set the exclusive fence to cover all write access.
>
>
> And 

Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Christian König

Am 07.08.2018 um 15:37 schrieb Daniel Vetter:

On Tue, Aug 07, 2018 at 03:17:06PM +0200, Christian König wrote:

Am 07.08.2018 um 14:59 schrieb Daniel Vetter:

On Tue, Aug 07, 2018 at 02:51:50PM +0200, Christian König wrote:

Am 07.08.2018 um 14:43 schrieb Daniel Vetter:

On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:

Am 07.08.2018 um 13:05 schrieb Chris Wilson:

amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.

Well I would rather suggest a solution where we stop doing this.

The problem here is that i915 assumes that a write operation always needs
exclusive access to an object protected by an reservation object.

At least for amdgpu, radeon and nouveau this assumption is incorrect, but
only amdgpu has a design where userspace is not lying to the kernel about
it's read/write access.

What we should do instead is to add a flag to each shared fence to note if
it is a write operation or not. Then we can trivially add a function to wait
on on those in i915.

I should have pushed harder for this solution when the problem came up
initially,

For shared buffers in an implicit syncing setup like dri2 the exclusive
fence _is_ your write fence.

And exactly that is absolutely not correct if you ask me.

The exclusive fence is for two use cases, the first one is for drivers which
don't care about concurrent accesses and only use serialized accesses and
the second is for kernel uses under the hood of userspace, evictions, buffer
moves etc..

What i915 does is to abuse that concept for write once read many situations,
and that in turn is the bug we need to fix here.

Again, the exclusive fence was added for implicit sync usage like dri2/3.
_Not_ for your own buffer manager. If you want to separate these two
usages, then I guess we can do that, but claiming that i915 is the odd one
out just aint true. You're arguing against all other kms drivers we have
here.

No I'm not. I discussed exactly that use case with Maarten when the
reservation object was introduced.

I think that Maarten named the explicit fence on purpose "explicit" fence
and not "write" fence to make that distinction clear.

I have to admit that this wasn't really documented, but it indeed was the
original purpose to get away from the idea that writes needs to be
exclusive.


That's how this stuff works. Note it's only
for implicit fencing for dri2 shared buffers. i915 lies as much as
everyone else for explicit syncing.

That is really really ugly and should be fixed instead.

It works and is uapi now. What's the gain in "fixing" it?

It allows you to implement explicit fencing in the uapi without breaking
backward compatibility.

See the situation with amdgpu and intel is the same as with userspace with
mixed implicit and explicit fencing.

If that isn't fixed we will run into the same problem when DRI3 gets
implicit fencing and some applications starts to use it while others still
rely on the explicit fencing.

I think we're a bit cross-eyed on exact semantics, but yes this is exactly
the use-case. Chris Wilson's use-case tries to emulate exactly what would
happen if implicitly fenced amdgpu rendered buffer should be displayed on
an i915 output. Then you need to set the exclusive fence correctly.


Yes, exactly. That's what I totally agree on.


And yes we called them exclusive/shared because shared fences could also
be write fences. But for the implicitly synced userspace case, exclusive =
The write fence.

So probably Chris' patch ends up syncing too much (since for explicit
syncing you don't want to attach an exclusive fence, because userspace
passes the fence around already to the atomic ioctl). But at least it's
correct for the implicit case. But that's entirely an optimization problem
within amdgpu.


Completely agree as well, but Chris patch actually just optimizes 
things. It is not necessary for correct operation.


See previously we just waited for the BO to be idle, now Chris patch 
collects all fences (shared and exclusive) and adds that as single 
elusive fence to avoid blocking.



Summary: If you have a shared buffer used in implicitly synced buffer
sharing, you _must_ set the exclusive fence to cover all write access.


And how should command submission know that?

I mean we add the exclusive or shared fence during command submission, 
but that IOCTL has not the slightest idea if the BO is then used for 
explicit or for implicit fencing.



In any other case (not shared, or not as part of implicitly synced
protocol), you can do whatever you want. So we're not conflagrating
exclusive with write here at all. And yes this is exactly what i915 and
all other drivers do - for explicit fencing we don't set the exclusive
fence, but leave that all to userspace. Also, userspace tells us (because
it knows how the protocol works, not the kernel) when to set an exclusive
fence.


To extend that at least the lower layers of the user space 

Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Daniel Vetter
On Tue, Aug 07, 2018 at 03:17:06PM +0200, Christian König wrote:
> Am 07.08.2018 um 14:59 schrieb Daniel Vetter:
> > On Tue, Aug 07, 2018 at 02:51:50PM +0200, Christian König wrote:
> > > Am 07.08.2018 um 14:43 schrieb Daniel Vetter:
> > > > On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:
> > > > > Am 07.08.2018 um 13:05 schrieb Chris Wilson:
> > > > > > amdgpu only uses shared-fences internally, but dmabuf importers 
> > > > > > rely on
> > > > > > implicit write hazard tracking via the 
> > > > > > reservation_object.fence_excl.
> > > > > Well I would rather suggest a solution where we stop doing this.
> > > > > 
> > > > > The problem here is that i915 assumes that a write operation always 
> > > > > needs
> > > > > exclusive access to an object protected by an reservation object.
> > > > > 
> > > > > At least for amdgpu, radeon and nouveau this assumption is incorrect, 
> > > > > but
> > > > > only amdgpu has a design where userspace is not lying to the kernel 
> > > > > about
> > > > > it's read/write access.
> > > > > 
> > > > > What we should do instead is to add a flag to each shared fence to 
> > > > > note if
> > > > > it is a write operation or not. Then we can trivially add a function 
> > > > > to wait
> > > > > on on those in i915.
> > > > > 
> > > > > I should have pushed harder for this solution when the problem came up
> > > > > initially,
> > > > For shared buffers in an implicit syncing setup like dri2 the exclusive
> > > > fence _is_ your write fence.
> > > And exactly that is absolutely not correct if you ask me.
> > > 
> > > The exclusive fence is for two use cases, the first one is for drivers 
> > > which
> > > don't care about concurrent accesses and only use serialized accesses and
> > > the second is for kernel uses under the hood of userspace, evictions, 
> > > buffer
> > > moves etc..
> > > 
> > > What i915 does is to abuse that concept for write once read many 
> > > situations,
> > > and that in turn is the bug we need to fix here.
> > Again, the exclusive fence was added for implicit sync usage like dri2/3.
> > _Not_ for your own buffer manager. If you want to separate these two
> > usages, then I guess we can do that, but claiming that i915 is the odd one
> > out just aint true. You're arguing against all other kms drivers we have
> > here.
> 
> No I'm not. I discussed exactly that use case with Maarten when the
> reservation object was introduced.
> 
> I think that Maarten named the explicit fence on purpose "explicit" fence
> and not "write" fence to make that distinction clear.
> 
> I have to admit that this wasn't really documented, but it indeed was the
> original purpose to get away from the idea that writes needs to be
> exclusive.
> 
> > > > That's how this stuff works. Note it's only
> > > > for implicit fencing for dri2 shared buffers. i915 lies as much as
> > > > everyone else for explicit syncing.
> > > That is really really ugly and should be fixed instead.
> > It works and is uapi now. What's the gain in "fixing" it?
> 
> It allows you to implement explicit fencing in the uapi without breaking
> backward compatibility.
> 
> See the situation with amdgpu and intel is the same as with userspace with
> mixed implicit and explicit fencing.
> 
> If that isn't fixed we will run into the same problem when DRI3 gets
> implicit fencing and some applications starts to use it while others still
> rely on the explicit fencing.

I think we're a bit cross-eyed on exact semantics, but yes this is exactly
the use-case. Chris Wilson's use-case tries to emulate exactly what would
happen if implicitly fenced amdgpu rendered buffer should be displayed on
an i915 output. Then you need to set the exclusive fence correctly.

And yes we called them exclusive/shared because shared fences could also
be write fences. But for the implicitly synced userspace case, exclusive =
The write fence.

So probably Chris' patch ends up syncing too much (since for explicit
syncing you don't want to attach an exclusive fence, because userspace
passes the fence around already to the atomic ioctl). But at least it's
correct for the implicit case. But that's entirely an optimization problem
within amdgpu.

Summary: If you have a shared buffer used in implicitly synced buffer
sharing, you _must_ set the exclusive fence to cover all write access.

In any other case (not shared, or not as part of implicitly synced
protocol), you can do whatever you want. So we're not conflagrating
exclusive with write here at all. And yes this is exactly what i915 and
all other drivers do - for explicit fencing we don't set the exclusive
fence, but leave that all to userspace. Also, userspace tells us (because
it knows how the protocol works, not the kernel) when to set an exclusive
fence. For historical reasons the relevant uapi parts are called
read/write, but that's just an accident of (maybe too clever) uapi reuse,
not their actual semantics.
-Daniel
> 
> Regards,
> Christian.
> 
> > And this was
> > 

Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Christian König

Am 07.08.2018 um 14:59 schrieb Daniel Vetter:

On Tue, Aug 07, 2018 at 02:51:50PM +0200, Christian König wrote:

Am 07.08.2018 um 14:43 schrieb Daniel Vetter:

On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:

Am 07.08.2018 um 13:05 schrieb Chris Wilson:

amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.

Well I would rather suggest a solution where we stop doing this.

The problem here is that i915 assumes that a write operation always needs
exclusive access to an object protected by an reservation object.

At least for amdgpu, radeon and nouveau this assumption is incorrect, but
only amdgpu has a design where userspace is not lying to the kernel about
it's read/write access.

What we should do instead is to add a flag to each shared fence to note if
it is a write operation or not. Then we can trivially add a function to wait
on on those in i915.

I should have pushed harder for this solution when the problem came up
initially,

For shared buffers in an implicit syncing setup like dri2 the exclusive
fence _is_ your write fence.

And exactly that is absolutely not correct if you ask me.

The exclusive fence is for two use cases, the first one is for drivers which
don't care about concurrent accesses and only use serialized accesses and
the second is for kernel uses under the hood of userspace, evictions, buffer
moves etc..

What i915 does is to abuse that concept for write once read many situations,
and that in turn is the bug we need to fix here.

Again, the exclusive fence was added for implicit sync usage like dri2/3.
_Not_ for your own buffer manager. If you want to separate these two
usages, then I guess we can do that, but claiming that i915 is the odd one
out just aint true. You're arguing against all other kms drivers we have
here.


No I'm not. I discussed exactly that use case with Maarten when the 
reservation object was introduced.


I think that Maarten named the explicit fence on purpose "explicit" 
fence and not "write" fence to make that distinction clear.


I have to admit that this wasn't really documented, but it indeed was 
the original purpose to get away from the idea that writes needs to be 
exclusive.



That's how this stuff works. Note it's only
for implicit fencing for dri2 shared buffers. i915 lies as much as
everyone else for explicit syncing.

That is really really ugly and should be fixed instead.

It works and is uapi now. What's the gain in "fixing" it?


It allows you to implement explicit fencing in the uapi without breaking 
backward compatibility.


See the situation with amdgpu and intel is the same as with userspace 
with mixed implicit and explicit fencing.


If that isn't fixed we will run into the same problem when DRI3 gets 
implicit fencing and some applications starts to use it while others 
still rely on the explicit fencing.


Regards,
Christian.


And this was
discussed at length when intel and freedreno folks talked about
implementing explicit sync and android sync pts.


Now as long as you only share within amdgpu you can do whatever you want
to, but for anything shared outside of amdgpu, if the buffer isn't shared
through an explicit syncing protocol, then you do have to set the
exclusive fence. That's at least how this stuff works right now with all
other drivers.

For i915 we had to do some uapi trickery to make this work in all cases,
since only really userspace knows whether a write should be a shared or
exclusive write. But that's stricly an issue limited to your driver, and
dosn't need changes to reservation object all throughout the stack.

You don't need to change the reservation object all throughout the stack. A
simple flag if a shared fence is a write or not should be doable.

Give me a day or two to prototype that,

See my other reply, i think that's only needed for amdgpu internal
book-keeping, so that you can create the correct exclusive fence on first
export. But yeah, adding a bitfield to track which fences need to become
exclusive shouldn't be a tricky solution to implement for amdgpu.
-Daniel


Christian.


Aside: If you want to attach multiple write fences to the exclusive fence,
that should be doable with the array fences.
-Daniel


Christian.


For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Testcase: igt/amd_prime/amd-to-i915
Signed-off-by: Chris Wilson 
Cc: Alex 

Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Daniel Vetter
On Tue, Aug 07, 2018 at 02:51:50PM +0200, Christian König wrote:
> Am 07.08.2018 um 14:43 schrieb Daniel Vetter:
> > On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:
> > > Am 07.08.2018 um 13:05 schrieb Chris Wilson:
> > > > amdgpu only uses shared-fences internally, but dmabuf importers rely on
> > > > implicit write hazard tracking via the reservation_object.fence_excl.
> > > Well I would rather suggest a solution where we stop doing this.
> > > 
> > > The problem here is that i915 assumes that a write operation always needs
> > > exclusive access to an object protected by an reservation object.
> > > 
> > > At least for amdgpu, radeon and nouveau this assumption is incorrect, but
> > > only amdgpu has a design where userspace is not lying to the kernel about
> > > it's read/write access.
> > > 
> > > What we should do instead is to add a flag to each shared fence to note if
> > > it is a write operation or not. Then we can trivially add a function to 
> > > wait
> > > on on those in i915.
> > > 
> > > I should have pushed harder for this solution when the problem came up
> > > initially,
> > For shared buffers in an implicit syncing setup like dri2 the exclusive
> > fence _is_ your write fence.
> 
> And exactly that is absolutely not correct if you ask me.
> 
> The exclusive fence is for two use cases, the first one is for drivers which
> don't care about concurrent accesses and only use serialized accesses and
> the second is for kernel uses under the hood of userspace, evictions, buffer
> moves etc..
> 
> What i915 does is to abuse that concept for write once read many situations,
> and that in turn is the bug we need to fix here.

Again, the exclusive fence was added for implicit sync usage like dri2/3.
_Not_ for your own buffer manager. If you want to separate these two
usages, then I guess we can do that, but claiming that i915 is the odd one
out just aint true. You're arguing against all other kms drivers we have
here.

> > That's how this stuff works. Note it's only
> > for implicit fencing for dri2 shared buffers. i915 lies as much as
> > everyone else for explicit syncing.
> 
> That is really really ugly and should be fixed instead.

It works and is uapi now. What's the gain in "fixing" it? And this was
discussed at length when intel and freedreno folks talked about
implementing explicit sync and android sync pts.

> > Now as long as you only share within amdgpu you can do whatever you want
> > to, but for anything shared outside of amdgpu, if the buffer isn't shared
> > through an explicit syncing protocol, then you do have to set the
> > exclusive fence. That's at least how this stuff works right now with all
> > other drivers.
> > 
> > For i915 we had to do some uapi trickery to make this work in all cases,
> > since only really userspace knows whether a write should be a shared or
> > exclusive write. But that's stricly an issue limited to your driver, and
> > dosn't need changes to reservation object all throughout the stack.
> 
> You don't need to change the reservation object all throughout the stack. A
> simple flag if a shared fence is a write or not should be doable.
> 
> Give me a day or two to prototype that,

See my other reply, i think that's only needed for amdgpu internal
book-keeping, so that you can create the correct exclusive fence on first
export. But yeah, adding a bitfield to track which fences need to become
exclusive shouldn't be a tricky solution to implement for amdgpu.
-Daniel

> Christian.
> 
> > Aside: If you want to attach multiple write fences to the exclusive fence,
> > that should be doable with the array fences.
> > -Daniel
> > 
> > > Christian.
> > > 
> > > > For example, the importer use the write hazard for timing a page flip to
> > > > only occur after the exporter has finished flushing its write into the
> > > > surface. As such, on exporting a dmabuf, we must either flush all
> > > > outstanding fences (for we do not know which are writes and should have
> > > > been exclusive) or alternatively create a new exclusive fence that is
> > > > the composite of all the existing shared fences, and so will only be
> > > > signaled when all earlier fences are signaled (ensuring that we can not
> > > > be signaled before the completion of any earlier write).
> > > > 
> > > > v2: reservation_object is already locked by amdgpu_bo_reserve()
> > > > 
> > > > Testcase: igt/amd_prime/amd-to-i915
> > > > Signed-off-by: Chris Wilson 
> > > > Cc: Alex Deucher 
> > > > Cc: "Christian König" 
> > > > ---
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 
> > > > ---
> > > >1 file changed, 60 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > > > index 1c5d97f4b4dd..576a83946c25 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > > > @@ -37,6 +37,7 @@

Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Christian König

Am 07.08.2018 um 14:43 schrieb Daniel Vetter:

On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:

Am 07.08.2018 um 13:05 schrieb Chris Wilson:

amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.

Well I would rather suggest a solution where we stop doing this.

The problem here is that i915 assumes that a write operation always needs
exclusive access to an object protected by an reservation object.

At least for amdgpu, radeon and nouveau this assumption is incorrect, but
only amdgpu has a design where userspace is not lying to the kernel about
it's read/write access.

What we should do instead is to add a flag to each shared fence to note if
it is a write operation or not. Then we can trivially add a function to wait
on on those in i915.

I should have pushed harder for this solution when the problem came up
initially,

For shared buffers in an implicit syncing setup like dri2 the exclusive
fence _is_ your write fence.


And exactly that is absolutely not correct if you ask me.

The exclusive fence is for two use cases, the first one is for drivers 
which don't care about concurrent accesses and only use serialized 
accesses and the second is for kernel uses under the hood of userspace, 
evictions, buffer moves etc..


What i915 does is to abuse that concept for write once read many 
situations, and that in turn is the bug we need to fix here.



That's how this stuff works. Note it's only
for implicit fencing for dri2 shared buffers. i915 lies as much as
everyone else for explicit syncing.


That is really really ugly and should be fixed instead.


Now as long as you only share within amdgpu you can do whatever you want
to, but for anything shared outside of amdgpu, if the buffer isn't shared
through an explicit syncing protocol, then you do have to set the
exclusive fence. That's at least how this stuff works right now with all
other drivers.

For i915 we had to do some uapi trickery to make this work in all cases,
since only really userspace knows whether a write should be a shared or
exclusive write. But that's stricly an issue limited to your driver, and
dosn't need changes to reservation object all throughout the stack.


You don't need to change the reservation object all throughout the 
stack. A simple flag if a shared fence is a write or not should be doable.


Give me a day or two to prototype that,
Christian.


Aside: If you want to attach multiple write fences to the exclusive fence,
that should be doable with the array fences.
-Daniel


Christian.


For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Testcase: igt/amd_prime/amd-to-i915
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
   1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..576a83946c25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
   #include "amdgpu_display.h"
   #include 
   #include 
+#include 
   static const struct dma_buf_ops amdgpu_dmabuf_ops;
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
   }
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct reservation_object_list *fobj;
+   struct dma_fence_array *array;
+   struct dma_fence **fences;
+   unsigned int count, i;
+
+   fobj = reservation_object_get_list(obj);
+   if (!fobj)
+   return 0;
+
+   count = !!rcu_access_pointer(obj->fence_excl);
+   count += fobj->shared_count;
+
+   fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   for (i = 0; i < fobj->shared_count; i++) {
+   struct dma_fence *f =
+   rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   if (rcu_access_pointer(obj->fence_excl)) {
+   struct dma_fence *f =
+   rcu_dereference_protected(obj->fence_excl,
+   

Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Daniel Vetter
On Tue, Aug 07, 2018 at 02:43:22PM +0200, Daniel Vetter wrote:
> On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:
> > Am 07.08.2018 um 13:05 schrieb Chris Wilson:
> > > amdgpu only uses shared-fences internally, but dmabuf importers rely on
> > > implicit write hazard tracking via the reservation_object.fence_excl.
> > 
> > Well I would rather suggest a solution where we stop doing this.
> > 
> > The problem here is that i915 assumes that a write operation always needs
> > exclusive access to an object protected by an reservation object.
> > 
> > At least for amdgpu, radeon and nouveau this assumption is incorrect, but
> > only amdgpu has a design where userspace is not lying to the kernel about
> > it's read/write access.
> > 
> > What we should do instead is to add a flag to each shared fence to note if
> > it is a write operation or not. Then we can trivially add a function to wait
> > on on those in i915.
> > 
> > I should have pushed harder for this solution when the problem came up
> > initially,
> 
> For shared buffers in an implicit syncing setup like dri2 the exclusive
> fence _is_ your write fence. That's how this stuff works. Note it's only
> for implicit fencing for dri2 shared buffers. i915 lies as much as
> everyone else for explicit syncing.

Just realized that dri3 is exactly in the same boat still afaiui. Anyway,
tldr here isn't that i915 is the exception, but amdgpu. If you have a
shared buffer used in an implicitly synced protocol, you have to set the
exclusive fence to guard writes. How you do that is up to you really. And
if a bitfield in reservation_object would help in tracking these I guess
we could add that, but I think that could as well just be put into an
amdgpu structure somewhere. And would be less confusing for everyone else
if it's not in struct reservation_object.
-Daniel

> 
> Now as long as you only share within amdgpu you can do whatever you want
> to, but for anything shared outside of amdgpu, if the buffer isn't shared
> through an explicit syncing protocol, then you do have to set the
> exclusive fence. That's at least how this stuff works right now with all
> other drivers.
> 
> For i915 we had to do some uapi trickery to make this work in all cases,
> since only really userspace knows whether a write should be a shared or
> exclusive write. But that's stricly an issue limited to your driver, and
> dosn't need changes to reservation object all throughout the stack.
> 
> Aside: If you want to attach multiple write fences to the exclusive fence,
> that should be doable with the array fences.
> -Daniel
> 
> > Christian.
> > 
> > > For example, the importer use the write hazard for timing a page flip to
> > > only occur after the exporter has finished flushing its write into the
> > > surface. As such, on exporting a dmabuf, we must either flush all
> > > outstanding fences (for we do not know which are writes and should have
> > > been exclusive) or alternatively create a new exclusive fence that is
> > > the composite of all the existing shared fences, and so will only be
> > > signaled when all earlier fences are signaled (ensuring that we can not
> > > be signaled before the completion of any earlier write).
> > > 
> > > v2: reservation_object is already locked by amdgpu_bo_reserve()
> > > 
> > > Testcase: igt/amd_prime/amd-to-i915
> > > Signed-off-by: Chris Wilson 
> > > Cc: Alex Deucher 
> > > Cc: "Christian König" 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
> > >   1 file changed, 60 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > > index 1c5d97f4b4dd..576a83946c25 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > > @@ -37,6 +37,7 @@
> > >   #include "amdgpu_display.h"
> > >   #include 
> > >   #include 
> > > +#include 
> > >   static const struct dma_buf_ops amdgpu_dmabuf_ops;
> > > @@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device 
> > > *dev,
> > >   return ERR_PTR(ret);
> > >   }
> > > +static int
> > > +__reservation_object_make_exclusive(struct reservation_object *obj)
> > > +{
> > > + struct reservation_object_list *fobj;
> > > + struct dma_fence_array *array;
> > > + struct dma_fence **fences;
> > > + unsigned int count, i;
> > > +
> > > + fobj = reservation_object_get_list(obj);
> > > + if (!fobj)
> > > + return 0;
> > > +
> > > + count = !!rcu_access_pointer(obj->fence_excl);
> > > + count += fobj->shared_count;
> > > +
> > > + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
> > > + if (!fences)
> > > + return -ENOMEM;
> > > +
> > > + for (i = 0; i < fobj->shared_count; i++) {
> > > + struct dma_fence *f =
> > > + rcu_dereference_protected(fobj->shared[i],
> > > +   reservation_object_held(obj));
> > 

Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Daniel Vetter
On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:
> Am 07.08.2018 um 13:05 schrieb Chris Wilson:
> > amdgpu only uses shared-fences internally, but dmabuf importers rely on
> > implicit write hazard tracking via the reservation_object.fence_excl.
> 
> Well I would rather suggest a solution where we stop doing this.
> 
> The problem here is that i915 assumes that a write operation always needs
> exclusive access to an object protected by an reservation object.
> 
> At least for amdgpu, radeon and nouveau this assumption is incorrect, but
> only amdgpu has a design where userspace is not lying to the kernel about
> it's read/write access.
> 
> What we should do instead is to add a flag to each shared fence to note if
> it is a write operation or not. Then we can trivially add a function to wait
> on on those in i915.
> 
> I should have pushed harder for this solution when the problem came up
> initially,

For shared buffers in an implicit syncing setup like dri2 the exclusive
fence _is_ your write fence. That's how this stuff works. Note it's only
for implicit fencing for dri2 shared buffers. i915 lies as much as
everyone else for explicit syncing.

Now as long as you only share within amdgpu you can do whatever you want
to, but for anything shared outside of amdgpu, if the buffer isn't shared
through an explicit syncing protocol, then you do have to set the
exclusive fence. That's at least how this stuff works right now with all
other drivers.

For i915 we had to do some uapi trickery to make this work in all cases,
since only really userspace knows whether a write should be a shared or
exclusive write. But that's stricly an issue limited to your driver, and
dosn't need changes to reservation object all throughout the stack.

Aside: If you want to attach multiple write fences to the exclusive fence,
that should be doable with the array fences.
-Daniel

> Christian.
> 
> > For example, the importer use the write hazard for timing a page flip to
> > only occur after the exporter has finished flushing its write into the
> > surface. As such, on exporting a dmabuf, we must either flush all
> > outstanding fences (for we do not know which are writes and should have
> > been exclusive) or alternatively create a new exclusive fence that is
> > the composite of all the existing shared fences, and so will only be
> > signaled when all earlier fences are signaled (ensuring that we can not
> > be signaled before the completion of any earlier write).
> > 
> > v2: reservation_object is already locked by amdgpu_bo_reserve()
> > 
> > Testcase: igt/amd_prime/amd-to-i915
> > Signed-off-by: Chris Wilson 
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
> >   1 file changed, 60 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > index 1c5d97f4b4dd..576a83946c25 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > @@ -37,6 +37,7 @@
> >   #include "amdgpu_display.h"
> >   #include 
> >   #include 
> > +#include 
> >   static const struct dma_buf_ops amdgpu_dmabuf_ops;
> > @@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device 
> > *dev,
> > return ERR_PTR(ret);
> >   }
> > +static int
> > +__reservation_object_make_exclusive(struct reservation_object *obj)
> > +{
> > +   struct reservation_object_list *fobj;
> > +   struct dma_fence_array *array;
> > +   struct dma_fence **fences;
> > +   unsigned int count, i;
> > +
> > +   fobj = reservation_object_get_list(obj);
> > +   if (!fobj)
> > +   return 0;
> > +
> > +   count = !!rcu_access_pointer(obj->fence_excl);
> > +   count += fobj->shared_count;
> > +
> > +   fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
> > +   if (!fences)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < fobj->shared_count; i++) {
> > +   struct dma_fence *f =
> > +   rcu_dereference_protected(fobj->shared[i],
> > + reservation_object_held(obj));
> > +
> > +   fences[i] = dma_fence_get(f);
> > +   }
> > +
> > +   if (rcu_access_pointer(obj->fence_excl)) {
> > +   struct dma_fence *f =
> > +   rcu_dereference_protected(obj->fence_excl,
> > + reservation_object_held(obj));
> > +
> > +   fences[i] = dma_fence_get(f);
> > +   }
> > +
> > +   array = dma_fence_array_create(count, fences,
> > +  dma_fence_context_alloc(1), 0,
> > +  false);
> > +   if (!array)
> > +   goto err_fences_put;
> > +
> > +   reservation_object_add_excl_fence(obj, >base);
> > +   return 0;
> > +
> > +err_fences_put:
> > +   for (i = 0; i < count; i++)
> > +   dma_fence_put(fences[i]);
> > +   

Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Christian König

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
  
  /**

+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev  - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
  }
  
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,

+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i <= dev->bo_handles.max_key; i++) {
+   bo = handle_table_lookup(>bo_handles, i);
+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   if (i <= dev->bo_handles.max_key) {


This check needs to be inside the lock or otherwise max_key could change 
at the same time as you do the check.


Apart from that this looks good to me if you can live with the extra 
overhead of scanning all BOs.


Regards,
Christian.


+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+
+   return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Christian König

Am 07.08.2018 um 12:23 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 05:59 PM, Christian König wrote:

Am 07.08.2018 um 11:52 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 05:33 PM, Christian König wrote:

Am 07.08.2018 um 10:28 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 04:20 PM, Christian König wrote:
Well NAK, that wasn't the intention of putting all BOs into the 
handle table.


You should still use the kernel implementation.


I thought we have discussed that in below mail thread. any gap?

[PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of 
bo (v3)

{{{
>> Well we could just completely drop the kernel implementation 
and use

>> an userspace implementation.
>
> Do you mean to implement finding bo by cpu address in libdrm 
completely?


Yes, exactly.
}}}


Ok, well that is a misunderstanding. I was just speculating about 
that.


On the other hand if searching all BOs for the right one is not to 
much overhead for that workaround I'm fine with it.


Just drop patch #2 in this series. All unused entries should be 
initialized to zero and walking the extra 512 isn't that much 
overhead.


Yes, reserve it locally for now.
May I get your RB?


I've got a few minor comments on the last patch as well.

Give me a moment to figure out how to push to libdrm master repostory 
now (they migrated to github and my old account doesn't work any more).


After that I can comment on your patches.


Got that, thanks.

BTW, where is the libdrm master github repo?


Yeah, well exactly that was the problem the address had changed.

The new one is ssh://g...@gitlab.freedesktop.org:mesa/drm.git

Regards,
Christian.


I can only find it from freedesktop.

Jerry



Thanks,
Christian.



Regards,
Jerry



Regards,
Christian.



Regards,
Jerry



Christian.

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h    | 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  amdgpu_bo_handle *buf_handle);
  /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See 
#amdgpu_device_initialize()

+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly 
aligned)

+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user 
memory

+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int 
amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,

  }
  }
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+    int i;
+    struct amdgpu_bo *bo;
+
+    if (cpu == NULL || size == 0)
+    return -EINVAL;
+
+    pthread_mutex_lock(>bo_table_mutex);
+    for (i = 0; i <= dev->bo_handles.max_key; i++) {
+    bo = handle_table_lookup(>bo_handles, i);
+    if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+    continue;
+    if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + 
bo->alloc_size))

+    break;
+    }
+ pthread_mutex_unlock(>bo_table_mutex);
+
+    if (i <= dev->bo_handles.max_key) {
+    atomic_inc(>refcount);
+    *buf_handle = bo;
+    *offset_in_bo = cpu - bo->cpu_ptr;
+    } else {
+    *buf_handle = NULL;
+    *offset_in_bo = 0;
+    }
+
+    return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx







Re: [PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Christian König

Am 07.08.2018 um 13:05 schrieb Chris Wilson:

amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.


Well I would rather suggest a solution where we stop doing this.

The problem here is that i915 assumes that a write operation always 
needs exclusive access to an object protected by an reservation object.


At least for amdgpu, radeon and nouveau this assumption is incorrect, 
but only amdgpu has a design where userspace is not lying to the kernel 
about it's read/write access.


What we should do instead is to add a flag to each shared fence to note 
if it is a write operation or not. Then we can trivially add a function 
to wait on on those in i915.


I should have pushed harder for this solution when the problem came up 
initially,

Christian.


For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Testcase: igt/amd_prime/amd-to-i915
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
  1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..576a83946c25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
  #include "amdgpu_display.h"
  #include 
  #include 
+#include 
  
  static const struct dma_buf_ops amdgpu_dmabuf_ops;
  
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,

return ERR_PTR(ret);
  }
  
+static int

+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct reservation_object_list *fobj;
+   struct dma_fence_array *array;
+   struct dma_fence **fences;
+   unsigned int count, i;
+
+   fobj = reservation_object_get_list(obj);
+   if (!fobj)
+   return 0;
+
+   count = !!rcu_access_pointer(obj->fence_excl);
+   count += fobj->shared_count;
+
+   fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   for (i = 0; i < fobj->shared_count; i++) {
+   struct dma_fence *f =
+   rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   if (rcu_access_pointer(obj->fence_excl)) {
+   struct dma_fence *f =
+   rcu_dereference_protected(obj->fence_excl,
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   return 0;
+
+err_fences_put:
+   for (i = 0; i < count; i++)
+   dma_fence_put(fences[i]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
  /**
   * amdgpu_gem_map_attach - _buf_ops.attach implementation
   * @dma_buf: shared DMA buffer
@@ -219,16 +271,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
  
  	if (attach->dev->driver != adev->dev->driver) {

/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the resevation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   r = 

[PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Testcase: igt/amd_prime/amd-to-i915
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..576a83946c25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
 #include "amdgpu_display.h"
 #include 
 #include 
+#include 
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct reservation_object_list *fobj;
+   struct dma_fence_array *array;
+   struct dma_fence **fences;
+   unsigned int count, i;
+
+   fobj = reservation_object_get_list(obj);
+   if (!fobj)
+   return 0;
+
+   count = !!rcu_access_pointer(obj->fence_excl);
+   count += fobj->shared_count;
+
+   fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   for (i = 0; i < fobj->shared_count; i++) {
+   struct dma_fence *f =
+   rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   if (rcu_access_pointer(obj->fence_excl)) {
+   struct dma_fence *f =
+   rcu_dereference_protected(obj->fence_excl,
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   return 0;
+
+err_fences_put:
+   for (i = 0; i < count; i++)
+   dma_fence_put(fences[i]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
 /**
  * amdgpu_gem_map_attach - _buf_ops.attach implementation
  * @dma_buf: shared DMA buffer
@@ -219,16 +271,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 
if (attach->dev->driver != adev->dev->driver) {
/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the resevation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   r = __reservation_object_make_exclusive(bo->tbo.resv);
+   if (r)
goto error_unreserve;
-   }
}
 
/* pin buffer into GTT */
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
Quoting Huang Rui (2018-08-07 11:56:24)
> On Tue, Aug 07, 2018 at 11:45:00AM +0100, Chris Wilson wrote:
> > amdgpu only uses shared-fences internally, but dmabuf importers rely on
> > implicit write hazard tracking via the reservation_object.fence_excl.
> > For example, the importer use the write hazard for timing a page flip to
> > only occur after the exporter has finished flushing its write into the
> > surface. As such, on exporting a dmabuf, we must either flush all
> > outstanding fences (for we do not know which are writes and should have
> > been exclusive) or alternatively create a new exclusive fence that is
> > the composite of all the existing shared fences, and so will only be
> > signaled when all earlier fences are signaled (ensuring that we can not
> > be signaled before the completion of any earlier write).
> > 
> > Testcase: igt/amd_prime/amd-to-i915
> > Signed-off-by: Chris Wilson 
> > Cc: Alex Deucher 
> > Cc: "Christian König" 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 70 ---
> >  1 file changed, 62 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > index 1c5d97f4b4dd..47e6ec5510b6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> > @@ -37,6 +37,7 @@
> >  #include "amdgpu_display.h"
> >  #include 
> >  #include 
> > +#include 
> >  
> >  static const struct dma_buf_ops amdgpu_dmabuf_ops;
> >  
> > @@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device 
> > *dev,
> >   return ERR_PTR(ret);
> >  }
> >  
> > +static int
> > +__reservation_object_make_exclusive(struct reservation_object *obj)
> > +{
> 
> Why not you move the helper to reservation.c, and then export symbol for
> this file?

I have not seen anything else that would wish to use this helper. The
first task is to solve this issue here before worrying about
generalisation.
-Chris
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

v2: reservation_object is already locked by amdgpu_bo_reserve()

Testcase: igt/amd_prime/amd-to-i915
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ---
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..27f639c69e35 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
 #include "amdgpu_display.h"
 #include 
 #include 
+#include 
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct reservation_object_list *fobj;
+   struct dma_fence_array *array;
+   struct dma_fence **fences;
+   unsigned int count, i;
+
+   fobj = reservation_object_get_list(obj);
+   if (!fobj)
+   return 0;
+
+   count = !!rcu_access_pointer(obj->fence_excl);
+   count += fobj->shared_count;
+
+   fences = kmalloc_array(sizeof(*fences), count, GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   for (i = 0; i < fobj->shared_count; i++) {
+   struct dma_fence *f =
+   rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   if (rcu_access_pointer(obj->fence_excl)) {
+   struct dma_fence *f =
+   rcu_dereference_protected(obj->fence_excl,
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   return 0;
+
+err_fences_put:
+   for (i = 0; i < count; i++)
+   dma_fence_put(fences[i]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
 /**
  * amdgpu_gem_map_attach - _buf_ops.attach implementation
  * @dma_buf: shared DMA buffer
@@ -219,16 +271,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 
if (attach->dev->driver != adev->dev->driver) {
/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the resevation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   r = __reservation_object_make_exclusive(bo->tbo.resv);
+   if (r)
goto error_unreserve;
-   }
}
 
/* pin buffer into GTT */
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Huang Rui
On Tue, Aug 07, 2018 at 11:45:00AM +0100, Chris Wilson wrote:
> amdgpu only uses shared-fences internally, but dmabuf importers rely on
> implicit write hazard tracking via the reservation_object.fence_excl.
> For example, the importer use the write hazard for timing a page flip to
> only occur after the exporter has finished flushing its write into the
> surface. As such, on exporting a dmabuf, we must either flush all
> outstanding fences (for we do not know which are writes and should have
> been exclusive) or alternatively create a new exclusive fence that is
> the composite of all the existing shared fences, and so will only be
> signaled when all earlier fences are signaled (ensuring that we can not
> be signaled before the completion of any earlier write).
> 
> Testcase: igt/amd_prime/amd-to-i915
> Signed-off-by: Chris Wilson 
> Cc: Alex Deucher 
> Cc: "Christian König" 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 70 ---
>  1 file changed, 62 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> index 1c5d97f4b4dd..47e6ec5510b6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> @@ -37,6 +37,7 @@
>  #include "amdgpu_display.h"
>  #include 
>  #include 
> +#include 
>  
>  static const struct dma_buf_ops amdgpu_dmabuf_ops;
>  
> @@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
>   return ERR_PTR(ret);
>  }
>  
> +static int
> +__reservation_object_make_exclusive(struct reservation_object *obj)
> +{

Why not you move the helper to reservation.c, and then export symbol for
this file?

Thanks,
Ray

> + struct reservation_object_list *fobj;
> + struct dma_fence_array *array;
> + struct dma_fence **fences;
> + unsigned int count, i;
> +
> + fobj = reservation_object_get_list(obj);
> + if (!fobj)
> + return 0;
> +
> + count = !!rcu_access_pointer(obj->fence_excl);
> + count += fobj->shared_count;
> +
> + fences = kmalloc_array(sizeof(*fences), count, GFP_KERNEL);
> + if (!fences)
> + return -ENOMEM;
> +
> + for (i = 0; i < fobj->shared_count; i++) {
> + struct dma_fence *f =
> + rcu_dereference_protected(fobj->shared[i],
> +   reservation_object_held(obj));
> +
> + fences[i] = dma_fence_get(f);
> + }
> +
> + if (rcu_access_pointer(obj->fence_excl)) {
> + struct dma_fence *f =
> + rcu_dereference_protected(obj->fence_excl,
> +   reservation_object_held(obj));
> +
> + fences[i] = dma_fence_get(f);
> + }
> +
> + array = dma_fence_array_create(count, fences,
> +dma_fence_context_alloc(1), 0,
> +false);
> + if (!array)
> + goto err_fences_put;
> +
> + reservation_object_add_excl_fence(obj, >base);
> + return 0;
> +
> +err_fences_put:
> + for (i = 0; i < count; i++)
> + dma_fence_put(fences[i]);
> + kfree(fences);
> + return -ENOMEM;
> +}
> +
>  /**
>   * amdgpu_gem_map_attach - _buf_ops.attach implementation
>   * @dma_buf: shared DMA buffer
> @@ -219,16 +271,18 @@ static int amdgpu_gem_map_attach(struct dma_buf 
> *dma_buf,
>  
>   if (attach->dev->driver != adev->dev->driver) {
>   /*
> -  * Wait for all shared fences to complete before we switch to 
> future
> -  * use of exclusive fence on this prime shared bo.
> +  * We only create shared fences for internal use, but importers
> +  * of the dmabuf rely on exclusive fences for implicitly
> +  * tracking write hazards. As any of the current fences may
> +  * correspond to a write, we need to convert all existing
> +  * fences on the resevation object into a single exclusive
> +  * fence.
>*/
> - r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
> - true, false,
> - MAX_SCHEDULE_TIMEOUT);
> - if (unlikely(r < 0)) {
> - DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
> + reservation_object_lock(bo->tbo.resv, NULL);
> + r = __reservation_object_make_exclusive(bo->tbo.resv);
> + reservation_object_unlock(bo->tbo.resv);
> + if (r)
>   goto error_unreserve;
> - }
>   }
>  
>   /* pin buffer into GTT */
> -- 
> 2.18.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___

[PATCH] drm/amdgpu: Transfer fences to dmabuf importer

2018-08-07 Thread Chris Wilson
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).

Testcase: igt/amd_prime/amd-to-i915
Signed-off-by: Chris Wilson 
Cc: Alex Deucher 
Cc: "Christian König" 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 70 ---
 1 file changed, 62 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..47e6ec5510b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
 #include "amdgpu_display.h"
 #include 
 #include 
+#include 
 
 static const struct dma_buf_ops amdgpu_dmabuf_ops;
 
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
 }
 
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+   struct reservation_object_list *fobj;
+   struct dma_fence_array *array;
+   struct dma_fence **fences;
+   unsigned int count, i;
+
+   fobj = reservation_object_get_list(obj);
+   if (!fobj)
+   return 0;
+
+   count = !!rcu_access_pointer(obj->fence_excl);
+   count += fobj->shared_count;
+
+   fences = kmalloc_array(sizeof(*fences), count, GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   for (i = 0; i < fobj->shared_count; i++) {
+   struct dma_fence *f =
+   rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   if (rcu_access_pointer(obj->fence_excl)) {
+   struct dma_fence *f =
+   rcu_dereference_protected(obj->fence_excl,
+ reservation_object_held(obj));
+
+   fences[i] = dma_fence_get(f);
+   }
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1), 0,
+  false);
+   if (!array)
+   goto err_fences_put;
+
+   reservation_object_add_excl_fence(obj, >base);
+   return 0;
+
+err_fences_put:
+   for (i = 0; i < count; i++)
+   dma_fence_put(fences[i]);
+   kfree(fences);
+   return -ENOMEM;
+}
+
 /**
  * amdgpu_gem_map_attach - _buf_ops.attach implementation
  * @dma_buf: shared DMA buffer
@@ -219,16 +271,18 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
 
if (attach->dev->driver != adev->dev->driver) {
/*
-* Wait for all shared fences to complete before we switch to 
future
-* use of exclusive fence on this prime shared bo.
+* We only create shared fences for internal use, but importers
+* of the dmabuf rely on exclusive fences for implicitly
+* tracking write hazards. As any of the current fences may
+* correspond to a write, we need to convert all existing
+* fences on the resevation object into a single exclusive
+* fence.
 */
-   r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
-   true, false,
-   MAX_SCHEDULE_TIMEOUT);
-   if (unlikely(r < 0)) {
-   DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+   reservation_object_lock(bo->tbo.resv, NULL);
+   r = __reservation_object_make_exclusive(bo->tbo.resv);
+   reservation_object_unlock(bo->tbo.resv);
+   if (r)
goto error_unreserve;
-   }
}
 
/* pin buffer into GTT */
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdkfd: Use true and false for boolean values

2018-08-07 Thread Russell, Kent
Thanks for that!
Reviewed-by: Kent Russell 

-Original Message-
From: amd-gfx  On Behalf Of Huang Rui
Sent: Tuesday, August 07, 2018 1:28 AM
To: Gustavo A. R. Silva ; Kuehling, Felix 

Cc: Oded Gabbay ; Zhou, David(ChunMing) 
; David Airlie ; 
linux-ker...@vger.kernel.org; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; Deucher, Alexander 
; Koenig, Christian 
Subject: Re: [PATCH] drm/amdkfd: Use true and false for boolean values

On Sat, Aug 04, 2018 at 07:27:02PM -0500, Gustavo A. R. Silva wrote:
> Return statements in functions returning bool should use true or false 
> instead of an integer value.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Looks good for me.
Reviewed-by: Huang Rui 

Add Felix for his awareness.

> ---
>  drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c 
> b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> index 5d2475d..16af9d1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> +++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> @@ -62,12 +62,12 @@ static bool cik_event_interrupt_isr(struct kfd_dev *dev,
>   vmid  = (ihre->ring_id & 0xff00) >> 8;
>   if (vmid < dev->vm_info.first_vmid_kfd ||
>   vmid > dev->vm_info.last_vmid_kfd)
> - return 0;
> + return false;
>  
>   /* If there is no valid PASID, it's likely a firmware bug */
>   pasid = (ihre->ring_id & 0x) >> 16;
>   if (WARN_ONCE(pasid == 0, "FW bug: No PASID in KFD interrupt"))
> - return 0;
> + return false;
>  
>   /* Interrupt types we care about: various signals and faults.
>* They will be forwarded to a work queue (see below).
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Zhang, Jerry (Junwei)

On 08/07/2018 05:59 PM, Christian König wrote:

Am 07.08.2018 um 11:52 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 05:33 PM, Christian König wrote:

Am 07.08.2018 um 10:28 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 04:20 PM, Christian König wrote:

Well NAK, that wasn't the intention of putting all BOs into the handle table.

You should still use the kernel implementation.


I thought we have discussed that in below mail thread. any gap?

[PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo (v3)
{{{
>> Well we could just completely drop the kernel implementation and use
>> an userspace implementation.
>
> Do you mean to implement finding bo by cpu address in libdrm completely?

Yes, exactly.
}}}


Ok, well that is a misunderstanding. I was just speculating about that.

On the other hand if searching all BOs for the right one is not to much 
overhead for that workaround I'm fine with it.

Just drop patch #2 in this series. All unused entries should be initialized to 
zero and walking the extra 512 isn't that much overhead.


Yes, reserve it locally for now.
May I get your RB?


I've got a few minor comments on the last patch as well.

Give me a moment to figure out how to push to libdrm master repostory now (they 
migrated to github and my old account doesn't work any more).

After that I can comment on your patches.


Got that, thanks.

BTW, where is the libdrm master github repo?
I can only find it from freedesktop.

Jerry



Thanks,
Christian.



Regards,
Jerry



Regards,
Christian.



Regards,
Jerry



Christian.

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
  amdgpu_bo_handle *buf_handle);
  /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+int i;
+struct amdgpu_bo *bo;
+
+if (cpu == NULL || size == 0)
+return -EINVAL;
+
+pthread_mutex_lock(>bo_table_mutex);
+for (i = 0; i <= dev->bo_handles.max_key; i++) {
+bo = handle_table_lookup(>bo_handles, i);
+if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+continue;
+if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+break;
+}
+pthread_mutex_unlock(>bo_table_mutex);
+
+if (i <= dev->bo_handles.max_key) {
+atomic_inc(>refcount);
+*buf_handle = bo;
+*offset_in_bo = cpu - bo->cpu_ptr;
+} else {
+*buf_handle = NULL;
+*offset_in_bo = 0;
+}
+
+return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/sriov: give 8s for recover vram under RUNTIME

2018-08-07 Thread Emily Deng
Under runtime, the wait fence time could be quite long when
other VFs are in exclusive mode.

SWDEV-161490

Change-Id: Ifc32d56ca7fde01b1f4fe2b0db6959b51909008a
Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1d933db..ef82ad1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3124,7 +3124,7 @@ static int amdgpu_device_handle_vram_lost(struct 
amdgpu_device *adev)
long tmo;
 
if (amdgpu_sriov_runtime(adev))
-   tmo = msecs_to_jiffies(amdgpu_lockup_timeout);
+   tmo = msecs_to_jiffies(8000);
else
tmo = msecs_to_jiffies(100);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread zhoucm1



On 2018年08月07日 17:30, Christian König wrote:

Am 07.08.2018 um 09:55 schrieb zhoucm1:



On 2018年08月07日 15:26, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h    | 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  amdgpu_bo_handle *buf_handle);
    /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
  +int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+    int i;
+    struct amdgpu_bo *bo;
+
+    if (cpu == NULL || size == 0)
+    return -EINVAL;
+
+    pthread_mutex_lock(>bo_table_mutex);
+    for (i = 0; i <= dev->bo_handles.max_key; i++) {
+    bo = handle_table_lookup(>bo_handles, i);

explicit cast is encouraged, like "

bo = (struct amdgpu_bo *)handle_table_lookup(>bo_handles, i);


Actually it isn't. We use kernel coding style here, so explicit casts 
from "void*" should be avoided:
Casting the return value which is a void pointer is redundant. The 
conversion from void pointer to any other pointer type is guaranteed 
by the C programming language.

understand, personally, I still like explicit cast, which read easily.

David


I already had to remove quite a bunch of explicit casts because of 
this, so please stop adding new ones.


Regards,
Christian.




"

otherwise, the series looks good to me.

Regards,
David Zhou

+    if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+    continue;
+    if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + 
bo->alloc_size))

+    break;
+    }
+    pthread_mutex_unlock(>bo_table_mutex);
+
+    if (i <= dev->bo_handles.max_key) {
+    atomic_inc(>refcount);
+    *buf_handle = bo;
+    *offset_in_bo = cpu - bo->cpu_ptr;
+    } else {
+    *buf_handle = NULL;
+    *offset_in_bo = 0;
+    }
+
+    return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 1/4] amdgpu: add bo from user memory to handle table

2018-08-07 Thread Christian König

Am 07.08.2018 um 11:54 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 03:51 PM, zhoucm1 wrote:



On 2018年08月07日 15:26, Junwei Zhang wrote:

When create bo from user memory, add it to handle table
for future query.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu_bo.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 422c7c9..b24e698 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -556,7 +556,16 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  bo->alloc_size = size;
  bo->handle = args.handle;
-    *buf_handle = bo;
+    pthread_mutex_lock(>dev->bo_table_mutex);
+    r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+    pthread_mutex_unlock(>dev->bo_table_mutex);
+
+    pthread_mutex_init(>cpu_access_mutex, NULL);

This line is nothing with patch itself, please separate from it.


We may not add it for user mem bo, not sure if user would access that 
bo by cpu mapping.

so add that at the same time.

Do you mean add it in another patch?


I'm ok with adding it in this patch, cause it is a necessary fix.

But we might have another patch to move common BO field initialization 
in a separate function, e.g. to avoid duplicating that stuff to often.


Anyway this patch is Reviewed-by: Christian König 
 for now.


Christian.



Jerry



Regards,
David Zhou

+
+    if (r)
+    amdgpu_bo_free(bo);
+    else
+    *buf_handle = bo;
  return r;
  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Christian König

Am 07.08.2018 um 11:52 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 05:33 PM, Christian König wrote:

Am 07.08.2018 um 10:28 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 04:20 PM, Christian König wrote:
Well NAK, that wasn't the intention of putting all BOs into the 
handle table.


You should still use the kernel implementation.


I thought we have discussed that in below mail thread. any gap?

[PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of 
bo (v3)

{{{
>> Well we could just completely drop the kernel implementation and use
>> an userspace implementation.
>
> Do you mean to implement finding bo by cpu address in libdrm 
completely?


Yes, exactly.
}}}


Ok, well that is a misunderstanding. I was just speculating about that.

On the other hand if searching all BOs for the right one is not to 
much overhead for that workaround I'm fine with it.


Just drop patch #2 in this series. All unused entries should be 
initialized to zero and walking the extra 512 isn't that much overhead.


Yes, reserve it locally for now.
May I get your RB?


I've got a few minor comments on the last patch as well.

Give me a moment to figure out how to push to libdrm master repostory 
now (they migrated to github and my old account doesn't work any more).


After that I can comment on your patches.

Thanks,
Christian.



Regards,
Jerry



Regards,
Christian.



Regards,
Jerry



Christian.

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h    | 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  amdgpu_bo_handle *buf_handle);
  /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user 
memory

+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+    int i;
+    struct amdgpu_bo *bo;
+
+    if (cpu == NULL || size == 0)
+    return -EINVAL;
+
+    pthread_mutex_lock(>bo_table_mutex);
+    for (i = 0; i <= dev->bo_handles.max_key; i++) {
+    bo = handle_table_lookup(>bo_handles, i);
+    if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+    continue;
+    if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + 
bo->alloc_size))

+    break;
+    }
+    pthread_mutex_unlock(>bo_table_mutex);
+
+    if (i <= dev->bo_handles.max_key) {
+    atomic_inc(>refcount);
+    *buf_handle = bo;
+    *offset_in_bo = cpu - bo->cpu_ptr;
+    } else {
+    *buf_handle = NULL;
+    *offset_in_bo = 0;
+    }
+
+    return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 1/4] amdgpu: add bo from user memory to handle table

2018-08-07 Thread Zhang, Jerry (Junwei)

On 08/07/2018 03:51 PM, zhoucm1 wrote:



On 2018年08月07日 15:26, Junwei Zhang wrote:

When create bo from user memory, add it to handle table
for future query.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu_bo.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 422c7c9..b24e698 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -556,7 +556,16 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
  bo->alloc_size = size;
  bo->handle = args.handle;
-*buf_handle = bo;
+pthread_mutex_lock(>dev->bo_table_mutex);
+r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+pthread_mutex_unlock(>dev->bo_table_mutex);
+
+pthread_mutex_init(>cpu_access_mutex, NULL);

This line is nothing with patch itself, please separate from it.


We may not add it for user mem bo, not sure if user would access that bo by cpu 
mapping.
so add that at the same time.

Do you mean add it in another patch?

Jerry



Regards,
David Zhou

+
+if (r)
+amdgpu_bo_free(bo);
+else
+*buf_handle = bo;
  return r;
  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Zhang, Jerry (Junwei)

On 08/07/2018 05:33 PM, Christian König wrote:

Am 07.08.2018 um 10:28 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 04:20 PM, Christian König wrote:

Well NAK, that wasn't the intention of putting all BOs into the handle table.

You should still use the kernel implementation.


I thought we have discussed that in below mail thread. any gap?

[PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo (v3)
{{{
>> Well we could just completely drop the kernel implementation and use
>> an userspace implementation.
>
> Do you mean to implement finding bo by cpu address in libdrm completely?

Yes, exactly.
}}}


Ok, well that is a misunderstanding. I was just speculating about that.

On the other hand if searching all BOs for the right one is not to much 
overhead for that workaround I'm fine with it.

Just drop patch #2 in this series. All unused entries should be initialized to 
zero and walking the extra 512 isn't that much overhead.


Yes, reserve it locally for now.
May I get your RB?

Regards,
Jerry



Regards,
Christian.



Regards,
Jerry



Christian.

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
  amdgpu_bo_handle *buf_handle);
  /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+int i;
+struct amdgpu_bo *bo;
+
+if (cpu == NULL || size == 0)
+return -EINVAL;
+
+pthread_mutex_lock(>bo_table_mutex);
+for (i = 0; i <= dev->bo_handles.max_key; i++) {
+bo = handle_table_lookup(>bo_handles, i);
+if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+continue;
+if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+break;
+}
+pthread_mutex_unlock(>bo_table_mutex);
+
+if (i <= dev->bo_handles.max_key) {
+atomic_inc(>refcount);
+*buf_handle = bo;
+*offset_in_bo = cpu - bo->cpu_ptr;
+} else {
+*buf_handle = NULL;
+*offset_in_bo = 0;
+}
+
+return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Christian König

Am 07.08.2018 um 10:28 schrieb Zhang, Jerry (Junwei):

On 08/07/2018 04:20 PM, Christian König wrote:
Well NAK, that wasn't the intention of putting all BOs into the 
handle table.


You should still use the kernel implementation.


I thought we have discussed that in below mail thread. any gap?

[PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo 
(v3)

{{{
>> Well we could just completely drop the kernel implementation and use
>> an userspace implementation.
>
> Do you mean to implement finding bo by cpu address in libdrm 
completely?


Yes, exactly.
}}}


Ok, well that is a misunderstanding. I was just speculating about that.

On the other hand if searching all BOs for the right one is not to much 
overhead for that workaround I'm fine with it.


Just drop patch #2 in this series. All unused entries should be 
initialized to zero and walking the extra 512 isn't that much overhead.


Regards,
Christian.



Regards,
Jerry



Christian.

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h    | 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  amdgpu_bo_handle *buf_handle);
  /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+    int i;
+    struct amdgpu_bo *bo;
+
+    if (cpu == NULL || size == 0)
+    return -EINVAL;
+
+    pthread_mutex_lock(>bo_table_mutex);
+    for (i = 0; i <= dev->bo_handles.max_key; i++) {
+    bo = handle_table_lookup(>bo_handles, i);
+    if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+    continue;
+    if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + 
bo->alloc_size))

+    break;
+    }
+    pthread_mutex_unlock(>bo_table_mutex);
+
+    if (i <= dev->bo_handles.max_key) {
+    atomic_inc(>refcount);
+    *buf_handle = bo;
+    *offset_in_bo = cpu - bo->cpu_ptr;
+    } else {
+    *buf_handle = NULL;
+    *offset_in_bo = 0;
+    }
+
+    return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Christian König

Am 07.08.2018 um 09:55 schrieb zhoucm1:



On 2018年08月07日 15:26, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h    | 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  amdgpu_bo_handle *buf_handle);
    /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
  +int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+    int i;
+    struct amdgpu_bo *bo;
+
+    if (cpu == NULL || size == 0)
+    return -EINVAL;
+
+    pthread_mutex_lock(>bo_table_mutex);
+    for (i = 0; i <= dev->bo_handles.max_key; i++) {
+    bo = handle_table_lookup(>bo_handles, i);

explicit cast is encouraged, like "

bo = (struct amdgpu_bo *)handle_table_lookup(>bo_handles, i);


Actually it isn't. We use kernel coding style here, so explicit casts 
from "void*" should be avoided:
Casting the return value which is a void pointer is redundant. The 
conversion from void pointer to any other pointer type is guaranteed 
by the C programming language.


I already had to remove quite a bunch of explicit casts because of this, 
so please stop adding new ones.


Regards,
Christian.




"

otherwise, the series looks good to me.

Regards,
David Zhou

+    if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+    continue;
+    if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+    break;
+    }
+    pthread_mutex_unlock(>bo_table_mutex);
+
+    if (i <= dev->bo_handles.max_key) {
+    atomic_inc(>refcount);
+    *buf_handle = bo;
+    *offset_in_bo = cpu - bo->cpu_ptr;
+    } else {
+    *buf_handle = NULL;
+    *offset_in_bo = 0;
+    }
+
+    return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix unused variable ‘rq’

2018-08-07 Thread Christian König

Am 07.08.2018 um 10:47 schrieb Huang Rui:

This patch is to fix the warning to remove unused variable.

/home/ray/linux/drivers/gpu/drm//amd/amdgpu/amdgpu_ctx.c:
In function ‘amdgpu_ctx_priority_override’:
/home/ray/linux/drivers/gpu/drm//amd/amdgpu/amdgpu_ctx.c:397:23:
warning: unused variable ‘rq’ [-Wunused-variable]
   struct drm_sched_rq *rq;
^

Signed-off-by: Huang Rui 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 9fcc14e..02d563c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -394,7 +394,6 @@ void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,
  {
int i;
struct amdgpu_device *adev = ctx->adev;
-   struct drm_sched_rq *rq;
struct drm_sched_entity *entity;
struct amdgpu_ring *ring;
enum drm_sched_priority ctx_prio;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix unused variable ‘rq’

2018-08-07 Thread Huang Rui
This patch is to fix the warning to remove unused variable.

/home/ray/linux/drivers/gpu/drm//amd/amdgpu/amdgpu_ctx.c:
In function ‘amdgpu_ctx_priority_override’:
/home/ray/linux/drivers/gpu/drm//amd/amdgpu/amdgpu_ctx.c:397:23:
warning: unused variable ‘rq’ [-Wunused-variable]
  struct drm_sched_rq *rq;
   ^

Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 9fcc14e..02d563c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -394,7 +394,6 @@ void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,
 {
int i;
struct amdgpu_device *adev = ctx->adev;
-   struct drm_sched_rq *rq;
struct drm_sched_entity *entity;
struct amdgpu_ring *ring;
enum drm_sched_priority ctx_prio;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Zhang, Jerry (Junwei)

On 08/07/2018 04:20 PM, Christian König wrote:

Well NAK, that wasn't the intention of putting all BOs into the handle table.

You should still use the kernel implementation.


I thought we have discussed that in below mail thread. any gap?

[PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo (v3)
{{{
>> Well we could just completely drop the kernel implementation and use
>> an userspace implementation.
>
> Do you mean to implement finding bo by cpu address in libdrm completely?

Yes, exactly.
}}}

Regards,
Jerry



Christian.

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
  amdgpu_bo_handle *buf_handle);
  /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+int i;
+struct amdgpu_bo *bo;
+
+if (cpu == NULL || size == 0)
+return -EINVAL;
+
+pthread_mutex_lock(>bo_table_mutex);
+for (i = 0; i <= dev->bo_handles.max_key; i++) {
+bo = handle_table_lookup(>bo_handles, i);
+if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+continue;
+if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+break;
+}
+pthread_mutex_unlock(>bo_table_mutex);
+
+if (i <= dev->bo_handles.max_key) {
+atomic_inc(>refcount);
+*buf_handle = bo;
+*offset_in_bo = cpu - bo->cpu_ptr;
+} else {
+*buf_handle = NULL;
+*offset_in_bo = 0;
+}
+
+return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Christian König
Well NAK, that wasn't the intention of putting all BOs into the handle 
table.


You should still use the kernel implementation.

Christian.

Am 07.08.2018 um 09:26 schrieb Junwei Zhang:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
  
  /**

+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev  - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
  }
  
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,

+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i <= dev->bo_handles.max_key; i++) {
+   bo = handle_table_lookup(>bo_handles, i);
+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   if (i <= dev->bo_handles.max_key) {
+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+
+   return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread zhoucm1



On 2018年08月07日 15:26, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
  
  /**

+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev  - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
  }
  
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,

+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i <= dev->bo_handles.max_key; i++) {
+   bo = handle_table_lookup(>bo_handles, i);

explicit cast is encouraged, like "

bo = (struct amdgpu_bo *)handle_table_lookup(>bo_handles, i);

"

otherwise, the series looks good to me.

Regards,
David Zhou

+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   if (i <= dev->bo_handles.max_key) {
+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+
+   return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 1/4] amdgpu: add bo from user memory to handle table

2018-08-07 Thread zhoucm1



On 2018年08月07日 15:26, Junwei Zhang wrote:

When create bo from user memory, add it to handle table
for future query.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu_bo.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 422c7c9..b24e698 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -556,7 +556,16 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
bo->alloc_size = size;
bo->handle = args.handle;
  
-	*buf_handle = bo;

+   pthread_mutex_lock(>dev->bo_table_mutex);
+   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+   pthread_mutex_unlock(>dev->bo_table_mutex);
+
+   pthread_mutex_init(>cpu_access_mutex, NULL);

This line is nothing with patch itself, please separate from it.

Regards,
David Zhou

+
+   if (r)
+   amdgpu_bo_free(bo);
+   else
+   *buf_handle = bo;
  
  	return r;

  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 2/4] amdgpu: add count for handle table

2018-08-07 Thread Junwei Zhang
count indicats the total number of key in handle table
max_key becomes the max value of key

Signed-off-by: Junwei Zhang 
---
 amdgpu/handle_table.c | 18 +++---
 amdgpu/handle_table.h |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/amdgpu/handle_table.c b/amdgpu/handle_table.c
index 15cd476..34e8027 100644
--- a/amdgpu/handle_table.c
+++ b/amdgpu/handle_table.c
@@ -31,34 +31,37 @@
 drm_private int handle_table_insert(struct handle_table *table, uint32_t key,
void *value)
 {
-   if (key >= table->max_key) {
+   if (key >= table->count) {
uint32_t alignment = sysconf(_SC_PAGESIZE) / sizeof(void*);
-   uint32_t max_key = ALIGN(key, alignment);
+   uint32_t count = ALIGN(key, alignment);
void **values;
 
-   values = realloc(table->values, max_key * sizeof(void *));
+   values = realloc(table->values, count * sizeof(void *));
if (!values)
return -ENOMEM;
 
-   memset(values + table->max_key, 0, (max_key - table->max_key) *
+   memset(values + table->count, 0, (count - table->count) *
   sizeof(void *));
 
-   table->max_key = max_key;
+   table->count = count;
table->values = values;
}
+   if (key > table->max_key)
+   table->max_key = key;
+
table->values[key] = value;
return 0;
 }
 
 drm_private void handle_table_remove(struct handle_table *table, uint32_t key)
 {
-   if (key < table->max_key)
+   if (key <= table->max_key)
table->values[key] = NULL;
 }
 
 drm_private void *handle_table_lookup(struct handle_table *table, uint32_t key)
 {
-   if (key < table->max_key)
+   if (key <= table->max_key)
return table->values[key];
else
return NULL;
@@ -68,5 +71,6 @@ drm_private void handle_table_fini(struct handle_table *table)
 {
free(table->values);
table->max_key = 0;
+   table->count = 0;
table->values = NULL;
 }
diff --git a/amdgpu/handle_table.h b/amdgpu/handle_table.h
index 461193f..007bb58 100644
--- a/amdgpu/handle_table.h
+++ b/amdgpu/handle_table.h
@@ -29,6 +29,7 @@
 
 struct handle_table {
uint32_tmax_key;
+   uint32_tcount;
void**values;
 };
 
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread Junwei Zhang
Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
 amdgpu/amdgpu.h| 23 +++
 amdgpu/amdgpu_bo.c | 34 ++
 2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
 
 /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
  * Free previosuly allocated memory
  *
  * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
 }
 
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i <= dev->bo_handles.max_key; i++) {
+   bo = handle_table_lookup(>bo_handles, i);
+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   if (i <= dev->bo_handles.max_key) {
+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+
+   return 0;
+}
+
 int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 4/4] tests/amdgpu: add test for finding bo by CPU mapping

2018-08-07 Thread Junwei Zhang
Add a test for API to query bo by CPU mapping

Signed-off-by: Junwei Zhang 
Reviewed-by: Christian König 
---
 tests/amdgpu/bo_tests.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c
index 9d4da4a..dc2de9b 100644
--- a/tests/amdgpu/bo_tests.c
+++ b/tests/amdgpu/bo_tests.c
@@ -27,6 +27,7 @@
 
 #include "amdgpu_test.h"
 #include "amdgpu_drm.h"
+#include "amdgpu_internal.h"
 
 #define BUFFER_SIZE (4*1024)
 #define BUFFER_ALIGN (4*1024)
@@ -44,6 +45,7 @@ static void amdgpu_bo_metadata(void);
 static void amdgpu_bo_map_unmap(void);
 static void amdgpu_memory_alloc(void);
 static void amdgpu_mem_fail_alloc(void);
+static void amdgpu_bo_find_by_cpu_mapping(void);
 
 CU_TestInfo bo_tests[] = {
{ "Export/Import",  amdgpu_bo_export_import },
@@ -51,6 +53,7 @@ CU_TestInfo bo_tests[] = {
{ "CPU map/unmap",  amdgpu_bo_map_unmap },
{ "Memory alloc Test",  amdgpu_memory_alloc },
{ "Memory fail alloc Test",  amdgpu_mem_fail_alloc },
+   { "Find bo by CPU mapping",  amdgpu_bo_find_by_cpu_mapping },
CU_TEST_INFO_NULL,
 };
 
@@ -262,3 +265,33 @@ static void amdgpu_mem_fail_alloc(void)
CU_ASSERT_EQUAL(r, 0);
}
 }
+
+static void amdgpu_bo_find_by_cpu_mapping(void)
+{
+   amdgpu_bo_handle bo_handle, find_bo_handle;
+   amdgpu_va_handle va_handle;
+   void *bo_cpu;
+   uint64_t bo_mc_address;
+   uint64_t offset;
+   int r;
+
+   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
+   AMDGPU_GEM_DOMAIN_GTT, 0,
+   _handle, _cpu,
+   _mc_address, _handle);
+   CU_ASSERT_EQUAL(r, 0);
+
+   r = amdgpu_find_bo_by_cpu_mapping(device_handle,
+ bo_cpu,
+ 4096,
+ _bo_handle,
+ );
+   CU_ASSERT_EQUAL(r, 0);
+   CU_ASSERT_EQUAL(offset, 0);
+   CU_ASSERT_EQUAL(bo_handle->handle, find_bo_handle->handle);
+
+   atomic_dec(_bo_handle->refcount, 1);
+   r = amdgpu_bo_unmap_and_free(bo_handle, va_handle,
+bo_mc_address, 4096);
+   CU_ASSERT_EQUAL(r, 0);
+}
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 1/4] amdgpu: add bo from user memory to handle table

2018-08-07 Thread Junwei Zhang
When create bo from user memory, add it to handle table
for future query.

Signed-off-by: Junwei Zhang 
---
 amdgpu/amdgpu_bo.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 422c7c9..b24e698 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -556,7 +556,16 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
bo->alloc_size = size;
bo->handle = args.handle;
 
-   *buf_handle = bo;
+   pthread_mutex_lock(>dev->bo_table_mutex);
+   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+   pthread_mutex_unlock(>dev->bo_table_mutex);
+
+   pthread_mutex_init(>cpu_access_mutex, NULL);
+
+   if (r)
+   amdgpu_bo_free(bo);
+   else
+   *buf_handle = bo;
 
return r;
 }
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx