Re: [PATCH] drm/amdgpu: wait for all rings to drain before runtime suspending

2019-12-10 Thread zhoucm1


On 2019/12/11 上午6:08, Alex Deucher wrote:

Add a safety check to runtime suspend to make sure all outstanding
fences have signaled before we suspend.  Doesn't fix any known issue.

We already do this via the fence driver suspend function, but we
just force completion rather than bailing.  This bails on runtime
suspend so we can try again later once the fences are signaled to
avoid missing any outstanding work.


The idea sounds OK to me, but if you want to drain the rings, you should 
make sure no more submission, right?


So you should park all schedulers before waiting for all outstanding 
fences completed.


-David



Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +++-
  1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 2f367146c72c..81322b0a8acf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1214,13 +1214,23 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
struct pci_dev *pdev = to_pci_dev(dev);
struct drm_device *drm_dev = pci_get_drvdata(pdev);
struct amdgpu_device *adev = drm_dev->dev_private;
-   int ret;
+   int ret, i;
  
  	if (!adev->runpm) {

pm_runtime_forbid(dev);
return -EBUSY;
}
  
+	/* wait for all rings to drain before suspending */

+   for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+   struct amdgpu_ring *ring = adev->rings[i];
+   if (ring && ring->sched.ready) {
+   ret = amdgpu_fence_wait_empty(ring);
+   if (ret)
+   return -EBUSY;
+   }
+   }
+
if (amdgpu_device_supports_boco(drm_dev))
drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
drm_kms_helper_poll_disable(drm_dev);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: Default disable GDS for compute+gfx

2019-08-29 Thread zhoucm1


On 2019/8/29 下午3:22, Christian König wrote:

Am 29.08.19 um 07:55 schrieb zhoucm1:



On 2019/8/29 上午1:08, Marek Olšák wrote:
It can't break an older driver, because there is no older driver 
that requires the static allocation.


Note that closed source drivers don't count, because they don't need 
backward compatibility.


Yes, I agree, we don't need take care of closed source stack.

But AMDVLK is indeed an open source stack, many fans are using it, we 
need keep its compatibility, don't we?




Actually that is still under discussion.

But AMDVLK should have never ever used the static GDS space in the 
first place. We only added that for a transition time for old OpenGL 
and it shouldn't have leaked into the upstream driver.


Not sure what's the best approach here. We could revert "[PATCH] 
drm/amdgpu: remove static GDS, GWS and OA", but that would break KFD. 
So we can only choose between two evils here.


Only alternative I can see which would work for both would be to still 
allocate the static GDS, GWS and OA space, but make it somehow dynamic 
so that the KFD can swap it out again.


Agree with you.

-David



Christian.


-David



Marek

On Wed, Aug 28, 2019 at 2:44 AM zhoucm1 <mailto:zhou...@amd.com>> wrote:



On 2019/7/23 上午3:08, Christian König wrote:
> Am 22.07.19 um 17:34 schrieb Greathouse, Joseph:
>> Units in the GDS block default to allowing all VMIDs access
to all
>> entries. Disable shader access to the GDS, GWS, and OA blocks
from all
>> compute and gfx VMIDs by default. For compute, HWS firmware
will set
>> up the access bits for the appropriate VMID when a compute queue
>> requires access to these blocks.
>> The driver will handle enabling access on-demand for graphics
VMIDs.

gds_switch is depending on job->gds/gws/oa/_base/size.

"[PATCH] drm/amdgpu: remove static GDS, GWS and OA allocation", the
default allocations in kernel were removed. If some UMD stacks
don't
pass gds/gws/oa allocation to bo_list, then kernel will not enable
access of them, that will break previous driver.

do we need revert "[PATCH] drm/amdgpu: remove static GDS, GWS
and OA
allocation" ?

-David

>>
>> Leaving VMID0 with full access because otherwise HWS cannot
save or
>> restore values during task switch.
>>
>> v2: Fixed code and comment styling.
>>
>> Change-Id: I3d768a96935d2ed1dff09b02c995090f4fbfa539
>> Signed-off-by: Joseph Greathouse mailto:joseph.greatho...@amd.com>>
>
> Reviewed-by: Christian König mailto:christian.koe...@amd.com>>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 25
++---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 24
+---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 24
+---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 24
+---
>>   4 files changed, 69 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 73dcb632a3ce..2a9692bc34b4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -1516,17 +1516,27 @@ static void
>> gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
>>   }
>>   nv_grbm_select(adev, 0, 0, 0, 0);
>>   mutex_unlock(>srbm_mutex);
>> +}
>>   -    /* Initialize all compute VMIDs to have no GDS, GWS, or OA
>> -   acccess. These should be enabled by FW for target
VMIDs. */
>> -    for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * i, 0);
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * i, 0);
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, i, 0);
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, i, 0);
>> +static void gfx_v10_0_init_gds_vmid(struct amdgpu_device *adev)
>> +{
>> +    int vmid;
>> +
>> +    /*
>> + * Initialize all compute and user-gfx VMIDs to have no
GDS,
>> GWS, or OA
>> + * access. Compute VMIDs should be enabled by FW for
target VMIDs,
>> + * the driver can enable them for graphics. VMID0 should
maintain
>> + * access so that HWS firmware can save/restore entries.
>> + */
>> +    for (vmid = 1; vmid < 16; vmid++) {
>> +    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 *
vmid, 0);
 

Re: [PATCH v2] drm/amdgpu: Default disable GDS for compute+gfx

2019-08-28 Thread zhoucm1


On 2019/8/29 上午1:08, Marek Olšák wrote:
It can't break an older driver, because there is no older driver that 
requires the static allocation.


Note that closed source drivers don't count, because they don't need 
backward compatibility.


Yes, I agree, we don't need take care of closed source stack.

But AMDVLK is indeed an open source stack, many fans are using it, we 
need keep its compatibility, don't we?


-David



Marek

On Wed, Aug 28, 2019 at 2:44 AM zhoucm1 <mailto:zhou...@amd.com>> wrote:



On 2019/7/23 上午3:08, Christian König wrote:
> Am 22.07.19 um 17:34 schrieb Greathouse, Joseph:
>> Units in the GDS block default to allowing all VMIDs access to all
>> entries. Disable shader access to the GDS, GWS, and OA blocks
from all
>> compute and gfx VMIDs by default. For compute, HWS firmware
will set
>> up the access bits for the appropriate VMID when a compute queue
>> requires access to these blocks.
>> The driver will handle enabling access on-demand for graphics
VMIDs.

gds_switch is depending on job->gds/gws/oa/_base/size.

"[PATCH] drm/amdgpu: remove static GDS, GWS and OA allocation", the
default allocations in kernel were removed. If some UMD stacks don't
pass gds/gws/oa allocation to bo_list, then kernel will not enable
access of them, that will break previous driver.

do we need revert "[PATCH] drm/amdgpu: remove static GDS, GWS and OA
allocation" ?

-David

>>
>> Leaving VMID0 with full access because otherwise HWS cannot save or
>> restore values during task switch.
>>
>> v2: Fixed code and comment styling.
>>
>> Change-Id: I3d768a96935d2ed1dff09b02c995090f4fbfa539
>> Signed-off-by: Joseph Greathouse mailto:joseph.greatho...@amd.com>>
>
> Reviewed-by: Christian König mailto:christian.koe...@amd.com>>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 25
++---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 24
+---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 24
+---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 24
+---
>>   4 files changed, 69 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 73dcb632a3ce..2a9692bc34b4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -1516,17 +1516,27 @@ static void
>> gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
>>   }
>>   nv_grbm_select(adev, 0, 0, 0, 0);
>>   mutex_unlock(>srbm_mutex);
>> +}
>>   -    /* Initialize all compute VMIDs to have no GDS, GWS, or OA
>> -   acccess. These should be enabled by FW for target VMIDs. */
>> -    for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * i, 0);
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * i, 0);
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, i, 0);
>> -    WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, i, 0);
>> +static void gfx_v10_0_init_gds_vmid(struct amdgpu_device *adev)
>> +{
>> +    int vmid;
>> +
>> +    /*
>> + * Initialize all compute and user-gfx VMIDs to have no GDS,
>> GWS, or OA
>> + * access. Compute VMIDs should be enabled by FW for
target VMIDs,
>> + * the driver can enable them for graphics. VMID0 should
maintain
>> + * access so that HWS firmware can save/restore entries.
>> + */
>> +    for (vmid = 1; vmid < 16; vmid++) {
>> +    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * vmid, 0);
>> +    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * vmid, 0);
>> +    WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, vmid, 0);
>> +    WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, vmid, 0);
>>   }
>>   }
>>   +
>>   static void gfx_v10_0_tcp_harvest(struct amdgpu_device *adev)
>>   {
>>   int i, j, k;
>> @@ -1629,6 +1639,7 @@ static void gfx_v10_0_constants_init(struct
>> amdgpu_device *adev)
>>   mutex_unlock(>srbm_mutex);
>>     gfx_v10_0_init_compute_vmid(adev);
>> +    gfx_v10_0_init_gds_vmid(adev);
>>     }
>>   diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
>

Re: [PATCH v2] drm/amdgpu: Default disable GDS for compute+gfx

2019-08-28 Thread zhoucm1


On 2019/7/23 上午3:08, Christian König wrote:

Am 22.07.19 um 17:34 schrieb Greathouse, Joseph:

Units in the GDS block default to allowing all VMIDs access to all
entries. Disable shader access to the GDS, GWS, and OA blocks from all
compute and gfx VMIDs by default. For compute, HWS firmware will set
up the access bits for the appropriate VMID when a compute queue
requires access to these blocks.
The driver will handle enabling access on-demand for graphics VMIDs.


gds_switch is depending on job->gds/gws/oa/_base/size.

"[PATCH] drm/amdgpu: remove static GDS, GWS and OA allocation", the 
default allocations in kernel were removed. If some UMD stacks don't 
pass gds/gws/oa allocation to bo_list, then kernel will not enable 
access of them, that will break previous driver.


do we need revert "[PATCH] drm/amdgpu: remove static GDS, GWS and OA 
allocation" ?


-David



Leaving VMID0 with full access because otherwise HWS cannot save or
restore values during task switch.

v2: Fixed code and comment styling.

Change-Id: I3d768a96935d2ed1dff09b02c995090f4fbfa539
Signed-off-by: Joseph Greathouse 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 25 ++---
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 24 +---
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 24 +---
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 24 +---
  4 files changed, 69 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

index 73dcb632a3ce..2a9692bc34b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1516,17 +1516,27 @@ static void 
gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)

  }
  nv_grbm_select(adev, 0, 0, 0, 0);
  mutex_unlock(>srbm_mutex);
+}
  -    /* Initialize all compute VMIDs to have no GDS, GWS, or OA
-   acccess. These should be enabled by FW for target VMIDs. */
-    for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
-    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * i, 0);
-    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * i, 0);
-    WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, i, 0);
-    WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, i, 0);
+static void gfx_v10_0_init_gds_vmid(struct amdgpu_device *adev)
+{
+    int vmid;
+
+    /*
+ * Initialize all compute and user-gfx VMIDs to have no GDS, 
GWS, or OA

+ * access. Compute VMIDs should be enabled by FW for target VMIDs,
+ * the driver can enable them for graphics. VMID0 should maintain
+ * access so that HWS firmware can save/restore entries.
+ */
+    for (vmid = 1; vmid < 16; vmid++) {
+    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * vmid, 0);
+    WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * vmid, 0);
+    WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, vmid, 0);
+    WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, vmid, 0);
  }
  }
  +
  static void gfx_v10_0_tcp_harvest(struct amdgpu_device *adev)
  {
  int i, j, k;
@@ -1629,6 +1639,7 @@ static void gfx_v10_0_constants_init(struct 
amdgpu_device *adev)

  mutex_unlock(>srbm_mutex);
    gfx_v10_0_init_compute_vmid(adev);
+    gfx_v10_0_init_gds_vmid(adev);
    }
  diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c

index 3f98624772a4..48796b6824cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -1877,14 +1877,23 @@ static void gfx_v7_0_init_compute_vmid(struct 
amdgpu_device *adev)

  }
  cik_srbm_select(adev, 0, 0, 0, 0);
  mutex_unlock(>srbm_mutex);
+}
  -    /* Initialize all compute VMIDs to have no GDS, GWS, or OA
-   acccess. These should be enabled by FW for target VMIDs. */
-    for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
-    WREG32(amdgpu_gds_reg_offset[i].mem_base, 0);
-    WREG32(amdgpu_gds_reg_offset[i].mem_size, 0);
-    WREG32(amdgpu_gds_reg_offset[i].gws, 0);
-    WREG32(amdgpu_gds_reg_offset[i].oa, 0);
+static void gfx_v7_0_init_gds_vmid(struct amdgpu_device *adev)
+{
+    int vmid;
+
+    /*
+ * Initialize all compute and user-gfx VMIDs to have no GDS, 
GWS, or OA

+ * access. Compute VMIDs should be enabled by FW for target VMIDs,
+ * the driver can enable them for graphics. VMID0 should maintain
+ * access so that HWS firmware can save/restore entries.
+ */
+    for (vmid = 1; vmid < 16; vmid++) {
+    WREG32(amdgpu_gds_reg_offset[vmid].mem_base, 0);
+    WREG32(amdgpu_gds_reg_offset[vmid].mem_size, 0);
+    WREG32(amdgpu_gds_reg_offset[vmid].gws, 0);
+    WREG32(amdgpu_gds_reg_offset[vmid].oa, 0);
  }
  }
  @@ -1966,6 +1975,7 @@ static void gfx_v7_0_constants_init(struct 
amdgpu_device *adev)

  mutex_unlock(>srbm_mutex);
    gfx_v7_0_init_compute_vmid(adev);
+    

Re: [PATCH] drm/amdgpu: fix error handling in amdgpu_cs_process_fence_dep

2019-07-30 Thread zhoucm1

Looks very clean, Reviewed-by: Chunming Zhou 


On 2019年07月30日 17:18, Christian König wrote:

We always need to drop the ctx reference and should check
for errors first and then dereference the fence pointer.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 --
  1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c691df6f7a57..def029ab5657 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1042,29 +1042,27 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
return r;
}
  
-		fence = amdgpu_ctx_get_fence(ctx, entity,

-deps[i].handle);
+   fence = amdgpu_ctx_get_fence(ctx, entity, deps[i].handle);
+   amdgpu_ctx_put(ctx);
+
+   if (IS_ERR(fence))
+   return PTR_ERR(fence);
+   else if (!fence)
+   continue;
  
  		if (chunk->chunk_id == AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES) {

-   struct drm_sched_fence *s_fence = 
to_drm_sched_fence(fence);
+   struct drm_sched_fence *s_fence;
struct dma_fence *old = fence;
  
+			s_fence = to_drm_sched_fence(fence);

fence = dma_fence_get(_fence->scheduled);
dma_fence_put(old);
}
  
-		if (IS_ERR(fence)) {

-   r = PTR_ERR(fence);
-   amdgpu_ctx_put(ctx);
+   r = amdgpu_sync_fence(p->adev, >job->sync, fence, true);
+   dma_fence_put(fence);
+   if (r)
return r;
-   } else if (fence) {
-   r = amdgpu_sync_fence(p->adev, >job->sync, fence,
-   true);
-   dma_fence_put(fence);
-   amdgpu_ctx_put(ctx);
-   if (r)
-   return r;
-   }
}
return 0;
  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: amdgpf: BUG: NULL pointer dereference and memory leak

2019-07-30 Thread zhoucm1



On 2019年07月30日 17:04, Koenig, Christian wrote:

Am 30.07.19 um 10:47 schrieb 亿一:

Hi  alll,
   While analyzing the source code, I notice that function
amdgpu_cs_process_fence_dep() may exist NULL pointer dereference and
memory leak in the following code fragments:


fence = amdgpu_ctx_get_fence(ctx, entity,
  deps[i].handle);

if (chunk->chunk_id == AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES) {
  struct drm_sched_fence *s_fence = to_drm_sched_fence(fence);
  struct dma_fence *old = fence;

  fence = dma_fence_get(_fence->scheduled);
  dma_fence_put(old);
}

if (IS_ERR(fence)) {
   r = PTR_ERR(fence);
   amdgpu_ctx_put(ctx);
   return r;
} else if (fence) {
r = amdgpu_sync_fence(p->adev, >job->sync, fence,
   true);
dma_fence_put(fence);
 amdgpu_ctx_put(ctx);
 if (r)
 return r;
 }

function amdgpu_ctx_get_fence may return NULL pointer,  which will
cause NULL pointer dereference. What's more,  IS_ERR() would not
return true when pointer is NULL,  which will cause the ctx reference
leaked.

That handling is actually correct.

The problem is the "if (chunk->chunk_id ==
AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES)" stuff above.

That comes to early and needs to be moved below checking the fence for
errors. Going to send a fix for this to the mailing list in a minute.

Lin Yi is right I think, we leaked ctx reference when fence is NULL.

-David


Thanks for the notice,
Christian.


But I don't know how to fix it, so report it to you all.

Best Regards.
Lin Yi.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: handle AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID on gfx10

2019-06-27 Thread zhoucm1

any reason for not care .emit_ib_size in this one?

-David


On 2019年06月27日 06:35, Marek Olšák wrote:

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 6baaa65a1daa..5b807a19bbbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4257,20 +4257,36 @@ static void gfx_v10_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
  }
  
  static void gfx_v10_0_ring_emit_ib_compute(struct amdgpu_ring *ring,

   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
  {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
  
+	/* Currently, there is a high possibility to get wave ID mismatch

+* between ME and GDS, leading to a hw deadlock, because ME generates
+* different wave IDs than the GDS expects. This situation happens
+* randomly when at least 5 compute pipes use GDS ordered append.
+* The wave IDs generated by ME are also wrong after suspend/resume.
+* Those are probably bugs somewhere else in the kernel driver.
+*
+* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME and
+* GDS to 0 for this ring (me/pipe).
+*/
+   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID);
+   amdgpu_ring_write(ring, 
ring->adev->gds.gds_compute_max_wave_id);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
BUG_ON(ib->gpu_addr & 0x3); /* Dword align */
amdgpu_ring_write(ring,
  #ifdef __BIG_ENDIAN
(2 << 0) |
  #endif
lower_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, control);
  }
@@ -5103,20 +5119,21 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
}
  }
  
  static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)

  {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
default:
adev->gds.gds_size = 0x1;
+   adev->gds.gds_compute_max_wave_id = 0x4ff;
adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
  
  	adev->gds.gws_size = 64;

adev->gds.oa_size = 16;
  }
  
  static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device *adev,

  u32 bitmap)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10

2019-05-23 Thread zhoucm1



On 2019年05月22日 20:59, Christian König wrote:

[CAUTION: External Email]

BOs on the LRU might be blocked during command submission
and cause OOM situations.

Avoid this by blocking for the first busy BO not locked by
the same ticket as the BO we are searching space for.

v10: completely start over with the patch since we didn't
  handled a whole bunch of corner cases.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 77 ++--
  1 file changed, 66 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4c6389d849ed..861facac33d4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -771,32 +771,72 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
   * b. Otherwise, trylock it.
   */
  static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked)
+   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
  {
 bool ret = false;

-   *locked = false;
 if (bo->resv == ctx->resv) {
 reservation_object_assert_held(bo->resv);
 if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
 || !list_empty(>ddestroy))
 ret = true;
+   *locked = false;
+   if (busy)
+   *busy = false;
 } else {
-   *locked = reservation_object_trylock(bo->resv);
-   ret = *locked;
+   ret = reservation_object_trylock(bo->resv);
+   *locked = ret;
+   if (busy)
+   *busy = !ret;
 }

 return ret;
  }

+/**
+ * ttm_mem_evict_wait_busy - wait for a busy BO to become available
+ *
+ * @busy_bo: BO which couldn't be locked with trylock
+ * @ctx: operation context
+ * @ticket: acquire ticket
+ *
+ * Try to lock a busy buffer object to avoid failing eviction.
+ */
+static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
+  struct ttm_operation_ctx *ctx,
+  struct ww_acquire_ctx *ticket)
+{
+   int r;
+
+   if (!busy_bo || !ticket)
+   return -EBUSY;
+
+   if (ctx->interruptible)
+   r = reservation_object_lock_interruptible(busy_bo->resv,
+ ticket);
+   else
+   r = reservation_object_lock(busy_bo->resv, ticket);
+
+   /*
+* TODO: It would be better to keep the BO locked until allocation is at
+* least tried one more time, but that would mean a much larger rework
+* of TTM.
+*/
+   if (!r)
+   reservation_object_unlock(busy_bo->resv);
+
+   return r == -EDEADLK ? -EAGAIN : r;
+}
+
  static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
uint32_t mem_type,
const struct ttm_place *place,
-  struct ttm_operation_ctx *ctx)
+  struct ttm_operation_ctx *ctx,
+  struct ww_acquire_ctx *ticket)
  {
+   struct ttm_buffer_object *bo = NULL, *busy_bo = NULL;
 struct ttm_bo_global *glob = bdev->glob;
 struct ttm_mem_type_manager *man = >man[mem_type];
-   struct ttm_buffer_object *bo = NULL;
 bool locked = false;
 unsigned i;
 int ret;
@@ -804,8 +844,15 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
 spin_lock(>lru_lock);
 for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
 list_for_each_entry(bo, >lru[i], lru) {
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
+   bool busy;
+
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ,
+   )) {
+   if (busy && !busy_bo &&
+   bo->resv->lock.ctx != ticket)
+   busy_bo = bo;
 continue;
+   }

 if (place && !bdev->driver->eviction_valuable(bo,
   place)) {
@@ -824,8 +871,13 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
 }

 if (!bo) {
+   if (busy_bo)
+   ttm_bo_get(busy_bo);
 spin_unlock(>lru_lock);
-   return -EBUSY;
+   ret = ttm_mem_evict_wait_busy(busy_bo, ctx, ticket);
If you rely on EAGAIN, why do you still try to lock busy_bo? any 
negative effect if directly return EAGAIN without tring lock?


-David

+   if (busy_bo)
+   ttm_bo_put(busy_bo);
+   return ret;
 }

 

Re: [PATCH 01/10] drm/ttm: Make LRU removal optional.

2019-05-23 Thread zhoucm1



On 2019年05月22日 20:59, Christian König wrote:

[CAUTION: External Email]

We are already doing this for DMA-buf imports and also for
amdgpu VM BOs for quite a while now.

If this doesn't run into any problems we are probably going
to stop removing BOs from the LRU altogether.

Signed-off-by: Christian König 
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  9 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  4 ++--
  drivers/gpu/drm/qxl/qxl_release.c |  2 +-
  drivers/gpu/drm/radeon/radeon_gem.c   |  2 +-
  drivers/gpu/drm/radeon/radeon_object.c|  2 +-
  drivers/gpu/drm/ttm/ttm_execbuf_util.c| 20 +++
  drivers/gpu/drm/virtio/virtgpu_ioctl.c|  2 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  3 ++-
  drivers/gpu/drm/vmwgfx/vmwgfx_validation.h|  2 +-
  include/drm/ttm/ttm_bo_driver.h   |  5 -
  include/drm/ttm/ttm_execbuf_util.h|  3 ++-
  13 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index e1cae4a37113..647e18f9e136 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -574,7 +574,7 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
 amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);

 ret = ttm_eu_reserve_buffers(>ticket, >list,
-false, >duplicates);
+false, >duplicates, true);
 if (!ret)
 ctx->reserved = true;
 else {
@@ -647,7 +647,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 }

 ret = ttm_eu_reserve_buffers(>ticket, >list,
-false, >duplicates);
+false, >duplicates, true);
 if (!ret)
 ctx->reserved = true;
 else
@@ -1800,7 +1800,8 @@ static int validate_invalid_user_pages(struct 
amdkfd_process_info *process_info)
 }

 /* Reserve all BOs and page tables for validation */
-   ret = ttm_eu_reserve_buffers(, _list, false, );
+   ret = ttm_eu_reserve_buffers(, _list, false, ,
+true);
 WARN(!list_empty(), "Duplicates should be empty");
 if (ret)
 goto out_free;
@@ -2006,7 +2007,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, 
struct dma_fence **ef)
 }

 ret = ttm_eu_reserve_buffers(, ,
-false, _save);
+false, _save, true);
 if (ret) {
 pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
 goto ttm_reserve_fail;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d72cc583ebd1..fff558cf385b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 }

 r = ttm_eu_reserve_buffers(>ticket, >validated, true,
-  );
+  , true);
 if (unlikely(r != 0)) {
 if (r != -ERESTARTSYS)
 DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 54dd02a898b9..06f83cac0d3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 list_add(_tv.head, );
 amdgpu_vm_get_pd_bo(vm, , );

-   r = ttm_eu_reserve_buffers(, , true, NULL);
+   r = ttm_eu_reserve_buffers(, , true, NULL, true);
 if (r) {
 DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
 return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 7b840367004c..d513a5ad03dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,

 amdgpu_vm_get_pd_bo(vm, , _pd);

-   r = ttm_eu_reserve_buffers(, , false, );
+   r = ttm_eu_reserve_buffers(, , false, , true);
 if (r) {
 dev_err(adev->dev, "leaking bo va because "
 "we fail to reserve bo (%d)\n", r);
@@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,

 amdgpu_vm_get_pd_bo(>vm, , _pd);

-   r = ttm_eu_reserve_buffers(, , true, );
+   r = ttm_eu_reserve_buffers(, , true, , true);
   

Re: [PATCH libdrm 1/7] addr cs chunk for syncobj timeline

2019-05-14 Thread zhoucm1

Thank you, Lionel.

-David


On 2019年05月14日 17:49, Lionel Landwerlin wrote:

[CAUTION: External Email]

With the small nits, patches 2 & 4 are : Reviewed-by: Lionel Landwerlin

The other patches are a bit amdgpu specific so maybe you might want
someone more familiar with amdgpu to review them.
Still I didn't see anything wrong with them so remaining patches are :
Acked-by: Lionel Landwerlin 

I'll send the IGT stuff shortly.

Thanks,

-Lionel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/7] addr cs chunk for syncobj timeline

2019-05-13 Thread zhoucm1

ping... for patch set.


On 2019年05月13日 17:52, Chunming Zhou wrote:

[CAUTION: External Email]

Signed-off-by: Chunming Zhou 
---
  include/drm/amdgpu_drm.h | 9 +
  1 file changed, 9 insertions(+)

diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index d0701ffc..3d0318e6 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -528,6 +528,8 @@ struct drm_amdgpu_gem_va {
  #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
  #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
  #define AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES 0x07
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT0x08
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x09

  struct drm_amdgpu_cs_chunk {
 __u32   chunk_id;
@@ -608,6 +610,13 @@ struct drm_amdgpu_cs_chunk_sem {
 __u32 handle;
  };

+struct drm_amdgpu_cs_chunk_syncobj {
+   __u32 handle;
+   __u32 flags;
+   __u64 point;
+};
+
+
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ 0
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD  1
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD2
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/ttm: fix busy memory to fail other user v6

2019-05-07 Thread zhoucm1



On 2019年05月07日 19:13, Koenig, Christian wrote:

Am 07.05.19 um 13:08 schrieb zhoucm1:


On 2019年05月07日 18:53, Koenig, Christian wrote:

Am 07.05.19 um 11:36 schrieb Chunming Zhou:

heavy gpu job could occupy memory long time, which lead other user
fail to get memory.

basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).
2. If we then run into this EBUSY condition in TTM check if the BO
we need memory for (or rather the ww_mutex of its reservation
object) has a ticket assigned.
3. If we have a ticket we grab a reference to the first BO on the
LRU, drop the LRU lock and try to grab the reservation lock with the
ticket.
4. If getting the reservation lock with the ticket succeeded we
check if the BO is still the first one on the LRU in question (the
BO could have moved).
5. If the BO is still the first one on the LRU in question we try to
evict it as we would evict any other BO.
6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.
v4: fix some missing
v5: handle first_bo unlock and bo_get/put
v6: abstract unified iterate function, and handle all possible
usecase not only pinned bo.

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
    drivers/gpu/drm/ttm/ttm_bo.c | 113
++-
    1 file changed, 97 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
b/drivers/gpu/drm/ttm/ttm_bo.c
index 8502b3ed2d88..bbf1d14d00a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,13 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
     * b. Otherwise, trylock it.
     */
    static bool ttm_bo_evict_swapout_allowable(struct
ttm_buffer_object *bo,
-    struct ttm_operation_ctx *ctx, bool *locked)
+    struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
    {
    bool ret = false;
       *locked = false;
+    if (busy)
+    *busy = false;
    if (bo->resv == ctx->resv) {
    reservation_object_assert_held(bo->resv);
    if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
@@ -779,35 +781,45 @@ static bool
ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
    } else {
    *locked = reservation_object_trylock(bo->resv);
    ret = *locked;
+    if (!ret && busy)
+    *busy = true;
    }
       return ret;
    }
    -static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
-   uint32_t mem_type,
-   const struct ttm_place *place,
-   struct ttm_operation_ctx *ctx)
+static struct ttm_buffer_object*
+ttm_mem_find_evitable_bo(struct ttm_bo_device *bdev,
+ struct ttm_mem_type_manager *man,
+ const struct ttm_place *place,
+ struct ttm_operation_ctx *ctx,
+ struct ttm_buffer_object **first_bo,
+ bool *locked)
    {
-    struct ttm_bo_global *glob = bdev->glob;
-    struct ttm_mem_type_manager *man = >man[mem_type];
    struct ttm_buffer_object *bo = NULL;
-    bool locked = false;
-    unsigned i;
-    int ret;
+    int i;
    -    spin_lock(>lru_lock);
+    if (first_bo)
+    *first_bo = NULL;
    for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
    list_for_each_entry(bo, >lru[i], lru) {
-    if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
+    bool busy = false;
+    if (!ttm_bo_evict_swapout_allowable(bo, ctx, locked,
+    )) {

A newline between declaration and code please.


+    if (first_bo && !(*first_bo) && busy) {
+    ttm_bo_get(bo);
+    *first_bo = bo;
+    }
    continue;
+    }
       if (place && !bdev->driver->eviction_valuable(bo,
  place)) {
-    if (locked)
+    if (*locked)
    reservation_object_unlock(bo->resv);
    continue;
    }
+
    break;
    }
    @@ -818,9 +830,66 @@ static int ttm_mem_evict_first(struct
ttm_bo_device *bdev,
    bo = NULL;
    }
    +    return bo;
+}
+
+static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
+   uint32_t mem_type,
+   const struct ttm_place *place,
+   struct ttm_operation_ctx *ctx)
+{
+    struct ttm_bo_global *glob = bdev->glob;
+    struct ttm_mem_type_manager *man = >man[mem_type];
+    struct ttm_buffer_object *bo = NULL, *first_bo = NULL;
+    bool locked = false;
+    int ret;
+
+    spin_lock(>lru_lock);
+    bo = ttm_mem_find_evitable_bo(bdev, man, place, ctx, _bo,
+  );
    if (!bo) {
+    struct ttm_operation_ctx busy_ctx;
+
    spin_unl

Re: [PATCH 1/2] drm/ttm: fix busy memory to fail other user v6

2019-05-07 Thread zhoucm1



On 2019年05月07日 18:53, Koenig, Christian wrote:

Am 07.05.19 um 11:36 schrieb Chunming Zhou:

heavy gpu job could occupy memory long time, which lead other user fail to get 
memory.

basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).
2. If we then run into this EBUSY condition in TTM check if the BO we need 
memory for (or rather the ww_mutex of its reservation object) has a ticket 
assigned.
3. If we have a ticket we grab a reference to the first BO on the LRU, drop the 
LRU lock and try to grab the reservation lock with the ticket.
4. If getting the reservation lock with the ticket succeeded we check if the BO 
is still the first one on the LRU in question (the BO could have moved).
5. If the BO is still the first one on the LRU in question we try to evict it 
as we would evict any other BO.
6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.
v4: fix some missing
v5: handle first_bo unlock and bo_get/put
v6: abstract unified iterate function, and handle all possible usecase not only 
pinned bo.

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
   drivers/gpu/drm/ttm/ttm_bo.c | 113 ++-
   1 file changed, 97 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8502b3ed2d88..bbf1d14d00a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,13 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
* b. Otherwise, trylock it.
*/
   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked)
+   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
   {
bool ret = false;
   
   	*locked = false;

+   if (busy)
+   *busy = false;
if (bo->resv == ctx->resv) {
reservation_object_assert_held(bo->resv);
if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
@@ -779,35 +781,45 @@ static bool ttm_bo_evict_swapout_allowable(struct 
ttm_buffer_object *bo,
} else {
*locked = reservation_object_trylock(bo->resv);
ret = *locked;
+   if (!ret && busy)
+   *busy = true;
}
   
   	return ret;

   }
   
-static int ttm_mem_evict_first(struct ttm_bo_device *bdev,

-  uint32_t mem_type,
-  const struct ttm_place *place,
-  struct ttm_operation_ctx *ctx)
+static struct ttm_buffer_object*
+ttm_mem_find_evitable_bo(struct ttm_bo_device *bdev,
+struct ttm_mem_type_manager *man,
+const struct ttm_place *place,
+struct ttm_operation_ctx *ctx,
+struct ttm_buffer_object **first_bo,
+bool *locked)
   {
-   struct ttm_bo_global *glob = bdev->glob;
-   struct ttm_mem_type_manager *man = >man[mem_type];
struct ttm_buffer_object *bo = NULL;
-   bool locked = false;
-   unsigned i;
-   int ret;
+   int i;
   
-	spin_lock(>lru_lock);

+   if (first_bo)
+   *first_bo = NULL;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
list_for_each_entry(bo, >lru[i], lru) {
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
+   bool busy = false;
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, locked,
+   )) {

A newline between declaration and code please.


+   if (first_bo && !(*first_bo) && busy) {
+   ttm_bo_get(bo);
+   *first_bo = bo;
+   }
continue;
+   }
   
   			if (place && !bdev->driver->eviction_valuable(bo,

  place)) {
-   if (locked)
+   if (*locked)
reservation_object_unlock(bo->resv);
continue;
}
+
break;
}
   
@@ -818,9 +830,66 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,

bo = NULL;
}
   
+	return bo;

+}
+
+static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
+  uint32_t mem_type,
+  const struct ttm_place *place,
+  struct ttm_operation_ctx *ctx)
+{
+   struct ttm_bo_global *glob = bdev->glob;
+   struct ttm_mem_type_manager *man = >man[mem_type];
+   struct 

Re: [PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v4

2019-04-01 Thread zhoucm1



On 2019年04月01日 16:19, Lionel Landwerlin wrote:

On 01/04/2019 06:54, Zhou, David(ChunMing) wrote:



-Original Message-
From: Lionel Landwerlin 
Sent: Saturday, March 30, 2019 10:09 PM
To: Koenig, Christian ; Zhou, David(ChunMing)
; dri-de...@lists.freedesktop.org; amd-
g...@lists.freedesktop.org; ja...@jlekstrand.net; Hector, Tobias

Subject: Re: [PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point
interface v4

On 28/03/2019 15:18, Christian König wrote:

Am 28.03.19 um 14:50 schrieb Lionel Landwerlin:

On 25/03/2019 08:32, Chunming Zhou wrote:

From: Christian König 

Use the dma_fence_chain object to create a timeline of fence objects
instead of just replacing the existing fence.

v2: rebase and cleanup
v3: fix garbage collection parameters
v4: add unorder point check, print a warn calltrace

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
   drivers/gpu/drm/drm_syncobj.c | 39
+++
   include/drm/drm_syncobj.h |  5 +
   2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c index 5329e66598c6..19a9ce638119
100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -122,6 +122,45 @@ static void drm_syncobj_remove_wait(struct
drm_syncobj *syncobj,
   spin_unlock(>lock);
   }
   +/**
+ * drm_syncobj_add_point - add new timeline point to the syncobj
+ * @syncobj: sync object to add timeline point do
+ * @chain: chain node to use to add the point
+ * @fence: fence to encapsulate in the chain node
+ * @point: sequence number to use for the point
+ *
+ * Add the chain node as new timeline point to the syncobj.
+ */
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+   struct dma_fence_chain *chain,
+   struct dma_fence *fence,
+   uint64_t point)
+{
+    struct syncobj_wait_entry *cur, *tmp;
+    struct dma_fence *prev;
+
+    dma_fence_get(fence);
+
+    spin_lock(>lock);
+
+    prev = drm_syncobj_fence_get(syncobj);
+    /* You are adding an unorder point to timeline, which could
cause payload returned from query_ioctl is 0! */
+    WARN_ON_ONCE(prev && prev->seqno >= point);


I think the WARN/BUG macros should only fire when there is an issue
with programming from within the kernel.

But this particular warning can be triggered by an application.


Probably best to just remove it?

Yeah, that was also my argument against it.

Key point here is that we still want to note somehow that userspace
did something wrong and returning an error is not an option.

Maybe just use DRM_ERROR with a static variable to print the message
only once.

Christian.

I don't really see any point in printing an error once. If you run your
application twice you end up thinking there was an issue just on the 
first run

but it's actually always wrong.

Except this nitpick, is there any other concern to push whole patch 
set? Is that time to push whole patch set?


-David



Looks good to me.
Does that mean we can add your RB on patch set so that we can submit the 
patch set to drm-misc-next branch?




I have an additional change to make drm_syncobj_find_fence() also 
return the drm_syncobj : 
https://github.com/djdeath/linux/commit/0b7732b267b931339d71fe6f493ea6fa4eab453e


This is needed in i915 to avoid looking up the drm_syncobj handle twice.

Our driver allows to wait on the syncobj's dma_fence that we're then 
going to replace so we need to get bot the fence & syncobj at the same 
time.


I guess it can go in a follow up series.

Yes, agree.

Thanks for your effort as well,
-David



-Lionel




Unless we're willing to take the syncobj lock for longer periods of 
time when
adding points, I guess we'll have to defer validation to validation 
layers.



-Lionel



-Lionel



+    dma_fence_chain_init(chain, prev, fence, point);
+    rcu_assign_pointer(syncobj->fence, >base);
+
+    list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+    list_del_init(>node);
+    syncobj_wait_syncobj_func(syncobj, cur);
+    }
+    spin_unlock(>lock);
+
+    /* Walk the chain once to trigger garbage collection */
+    dma_fence_chain_for_each(fence, prev);
+    dma_fence_put(prev);
+}
+EXPORT_SYMBOL(drm_syncobj_add_point);
+
   /**
    * drm_syncobj_replace_fence - replace fence in a sync object.
    * @syncobj: Sync object to replace fence in diff --git
a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index
0311c9fdbd2f..6cf7243a1dc5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -27,6 +27,7 @@
   #define __DRM_SYNCOBJ_H__
     #include 
+#include 
     struct drm_file;
   @@ -112,6 +113,10 @@ drm_syncobj_fence_get(struct drm_syncobj
*syncobj)
     struct drm_syncobj *drm_syncobj_find(struct drm_file
*file_private,
    u32 handle);
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+   struct dma_fence_chain *chain,
+   struct dma_fence *fence,
+   

Re: [PATCH 1/9] dma-buf: add new dma_fence_chain container v6

2019-03-22 Thread zhoucm1

how about the attached?

If ok, I will merge to pathc#1.


-David


On 2019年03月21日 22:40, Christian König wrote:
No, atomic cmpxchg is a hardware operation. If you want to replace 
that you need a lock again.


Maybe just add a comment and use an explicit cast to void* ? Not sure 
if that silences the warning.


Christian.

Am 21.03.19 um 15:13 schrieb Zhou, David(ChunMing):

cmpxchg be replaced by some simple c sentance?
otherwise we have to remove __rcu of chian->prev.

-David

 Original Message 
Subject: Re: [PATCH 1/9] dma-buf: add new dma_fence_chain container v6
From: Christian König
To: "Zhou, David(ChunMing)" ,kbuild test robot ,"Zhou, David(ChunMing)"
CC: 
kbuild-...@01.org,dri-de...@lists.freedesktop.org,amd-gfx@lists.freedesktop.org,lionel.g.landwer...@intel.com,ja...@jlekstrand.net,"Koenig, 
Christian" ,"Hector, Tobias"


Hi David,

For the cmpxchg() case I of hand don't know either. Looks like so far
nobody has used cmpxchg() with rcu protected structures.

The other cases should be replaced by RCU_INIT_POINTER() or
rcu_dereference_protected(.., true);

Regards,
Christian.

Am 21.03.19 um 07:34 schrieb zhoucm1:
> Hi Lionel and Christian,
>
> Below is robot report for chain->prev, which was added __rcu as you
> suggested.
>
> How to fix this line "tmp = cmpxchg(>prev, prev, 
replacement); "?

> I checked kernel header file, seems it has no cmpxchg for rcu.
>
> Any suggestion to fix this robot report?
>
> Thanks,
> -David
>
> On 2019年03月21日 08:24, kbuild test robot wrote:
>> Hi Chunming,
>>
>> I love your patch! Perhaps something to improve:
>>
>> [auto build test WARNING on linus/master]
>> [also build test WARNING on v5.1-rc1 next-20190320]
>> [if your patch is applied to the wrong git tree, please drop us a
>> note to help improve the system]
>>
>> url:
>> 
https://github.com/0day-ci/linux/commits/Chunming-Zhou/dma-buf-add-new-dma_fence_chain-container-v6/20190320-223607

>> reproduce:
>>  # apt-get install sparse
>>  make ARCH=x86_64 allmodconfig
>>  make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
>>
>>
>> sparse warnings: (new ones prefixed by >>)
>>
>>>> drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in
>>>> initializer (different address spaces) @@    expected struct
>>>> dma_fence [noderef] *__old @@    got  dma_fence [noderef]
>>>> *__old @@
>>     drivers/dma-buf/dma-fence-chain.c:73:23: expected struct
>> dma_fence [noderef] *__old
>>     drivers/dma-buf/dma-fence-chain.c:73:23: got struct dma_fence
>> *[assigned] prev
>>>> drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in
>>>> initializer (different address spaces) @@    expected struct
>>>> dma_fence [noderef] *__new @@    got  dma_fence [noderef]
>>>> *__new @@
>>     drivers/dma-buf/dma-fence-chain.c:73:23: expected struct
>> dma_fence [noderef] *__new
>>     drivers/dma-buf/dma-fence-chain.c:73:23: got struct dma_fence
>> *[assigned] replacement
>>>> drivers/dma-buf/dma-fence-chain.c:73:21: sparse: incorrect type in
>>>> assignment (different address spaces) @@    expected struct
>>>> dma_fence *tmp @@    got struct dma_fence [noderef] >>> dma_fence *tmp @@
>>     drivers/dma-buf/dma-fence-chain.c:73:21: expected struct
>> dma_fence *tmp
>>     drivers/dma-buf/dma-fence-chain.c:73:21: got struct dma_fence
>> [noderef] *[assigned] __ret
>>>> drivers/dma-buf/dma-fence-chain.c:190:28: sparse: incorrect type in
>>>> argument 1 (different address spaces) @@    expected struct
>>>> dma_fence *fence @@    got struct dma_fence struct dma_fence 
*fence @@

>>     drivers/dma-buf/dma-fence-chain.c:190:28: expected struct
>> dma_fence *fence
>>     drivers/dma-buf/dma-fence-chain.c:190:28: got struct dma_fence
>> [noderef] *prev
>>>> drivers/dma-buf/dma-fence-chain.c:222:21: sparse: incorrect type in
>>>> assignment (different address spaces) @@    expected struct
>>>> dma_fence [noderef] *prev @@    got [noderef] *prev @@
>>     drivers/dma-buf/dma-fence-chain.c:222:21: expected struct
>> dma_fence [noderef] *prev
>>     drivers/dma-buf/dma-fence-chain.c:222:21: got struct dma_fence
>> *prev
>>     drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression
>> using sizeof(void)
>>     drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression
>> using sizeof(void)
>>
>> vim +73 drivers/dma-buf/dma-fence-chain.c
>>
>>  38
>>  39    /**
&g

Re: [PATCH 1/9] dma-buf: add new dma_fence_chain container v6

2019-03-21 Thread zhoucm1

Hi Lionel and Christian,

Below is robot report for chain->prev, which was added __rcu as you 
suggested.


How to fix this line "tmp = cmpxchg(>prev, prev, replacement); "?
I checked kernel header file, seems it has no cmpxchg for rcu.

Any suggestion to fix this robot report?

Thanks,
-David

On 2019年03月21日 08:24, kbuild test robot wrote:

Hi Chunming,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.1-rc1 next-20190320]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Chunming-Zhou/dma-buf-add-new-dma_fence_chain-container-v6/20190320-223607
reproduce:
 # apt-get install sparse
 make ARCH=x86_64 allmodconfig
 make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'


sparse warnings: (new ones prefixed by >>)


drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in initializer (different 
address spaces) @@expected struct dma_fence [noderef] *__old @@got  
dma_fence [noderef] *__old @@

drivers/dma-buf/dma-fence-chain.c:73:23:expected struct dma_fence [noderef] 
*__old
drivers/dma-buf/dma-fence-chain.c:73:23:got struct dma_fence 
*[assigned] prev

drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in initializer (different 
address spaces) @@expected struct dma_fence [noderef] *__new @@got  
dma_fence [noderef] *__new @@

drivers/dma-buf/dma-fence-chain.c:73:23:expected struct dma_fence [noderef] 
*__new
drivers/dma-buf/dma-fence-chain.c:73:23:got struct dma_fence 
*[assigned] replacement

drivers/dma-buf/dma-fence-chain.c:73:21: sparse: incorrect type in assignment 
(different address spaces) @@expected struct dma_fence *tmp @@got struct 
dma_fence [noderef] 
drivers/dma-buf/dma-fence-chain.c:73:21:expected struct dma_fence *tmp
drivers/dma-buf/dma-fence-chain.c:73:21:got struct dma_fence [noderef] 
*[assigned] __ret

drivers/dma-buf/dma-fence-chain.c:190:28: sparse: incorrect type in argument 1 
(different address spaces) @@expected struct dma_fence *fence @@got 
struct dma_fence struct dma_fence *fence @@

drivers/dma-buf/dma-fence-chain.c:190:28:expected struct dma_fence 
*fence
drivers/dma-buf/dma-fence-chain.c:190:28:got struct dma_fence [noderef] 
*prev

drivers/dma-buf/dma-fence-chain.c:222:21: sparse: incorrect type in assignment (different 
address spaces) @@expected struct dma_fence [noderef] *prev @@got 
[noderef] *prev @@

drivers/dma-buf/dma-fence-chain.c:222:21:expected struct dma_fence [noderef] 
*prev
drivers/dma-buf/dma-fence-chain.c:222:21:got struct dma_fence *prev
drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression using 
sizeof(void)
drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression using 
sizeof(void)

vim +73 drivers/dma-buf/dma-fence-chain.c

 38 
 39 /**
 40  * dma_fence_chain_walk - chain walking function
 41  * @fence: current chain node
 42  *
 43  * Walk the chain to the next node. Returns the next fence or NULL if 
we are at
 44  * the end of the chain. Garbage collects chain nodes which are already
 45  * signaled.
 46  */
 47 struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
 48 {
 49 struct dma_fence_chain *chain, *prev_chain;
 50 struct dma_fence *prev, *replacement, *tmp;
 51 
 52 chain = to_dma_fence_chain(fence);
 53 if (!chain) {
 54 dma_fence_put(fence);
 55 return NULL;
 56 }
 57 
 58 while ((prev = dma_fence_chain_get_prev(chain))) {
 59 
 60 prev_chain = to_dma_fence_chain(prev);
 61 if (prev_chain) {
 62 if (!dma_fence_is_signaled(prev_chain->fence))
 63 break;
 64 
 65 replacement = 
dma_fence_chain_get_prev(prev_chain);
 66 } else {
 67 if (!dma_fence_is_signaled(prev))
 68 break;
 69 
 70 replacement = NULL;
 71 }
 72 
   > 73  tmp = cmpxchg(>prev, prev, replacement);
 74 if (tmp == prev)
 75 dma_fence_put(tmp);
 76 else
 77 dma_fence_put(replacement);
 78 dma_fence_put(prev);
 79 }
 80 
 81 dma_fence_put(fence);
 82 return prev;
 83 }
 84 EXPORT_SYMBOL(dma_fence_chain_walk);
 85 
 86 /**
 87  * dma_fence_chain_find_seqno - find fence chain node by seqno
 88  * @pfence: pointer to the chain node where to start
 89  * @seqno: the sequence 

Re: [PATCH 8/8] drm/amdgpu: use the new VM backend for clears

2019-03-19 Thread zhoucm1

patch#2 and patch#4 are Ached-by: Chunming Zhou 

patch#1, #3, #5~#8 are Reviewed-by: Chunming Zhou 



On 2019年03月19日 20:44, Christian König wrote:

And remove the existing code when it is unused.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 89 +-
  1 file changed, 32 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 729da1c486cd..af1a7020c3ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -711,11 +711,9 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
  {
struct ttm_operation_ctx ctx = { true, false };
unsigned level = adev->vm_manager.root_level;
+   struct amdgpu_vm_update_params params;
struct amdgpu_bo *ancestor = bo;
-   struct dma_fence *fence = NULL;
unsigned entries, ats_entries;
-   struct amdgpu_ring *ring;
-   struct amdgpu_job *job;
uint64_t addr;
int r;
  
@@ -750,8 +748,6 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,

}
}
  
-	ring = container_of(vm->entity.rq->sched, struct amdgpu_ring, sched);

-
r = ttm_bo_validate(>tbo, >placement, );
if (r)
return r;
@@ -772,60 +768,45 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
  
  	}
  
-	r = amdgpu_job_alloc_with_ib(adev, 64, );

+   memset(, 0, sizeof(params));
+   params.adev = adev;
+   params.vm = vm;
+
+   r = vm->update_funcs->prepare(, AMDGPU_FENCE_OWNER_KFD, NULL);
if (r)
return r;
  
-	do {

-   addr = amdgpu_bo_gpu_offset(bo);
-   if (ats_entries) {
-   uint64_t ats_value;
-
-   ats_value = AMDGPU_PTE_DEFAULT_ATC;
-   if (level != AMDGPU_VM_PTB)
-   ats_value |= AMDGPU_PDE_PTE;
-
-   amdgpu_vm_set_pte_pde(adev, >ibs[0], addr, 0,
- ats_entries, 0, ats_value);
-   addr += ats_entries * 8;
-   }
-
-   if (entries) {
-   uint64_t value = 0;
-
-   /* Workaround for fault priority problem on GMC9 */
-   if (level == AMDGPU_VM_PTB &&
-   adev->asic_type >= CHIP_VEGA10)
-   value = AMDGPU_PTE_EXECUTABLE;
-
-   amdgpu_vm_set_pte_pde(adev, >ibs[0], addr, 0,
- entries, 0, value);
-   }
+   addr = 0;
+   if (ats_entries) {
+   uint64_t ats_value;
  
-		bo = bo->shadow;

-   } while (bo);
+   ats_value = AMDGPU_PTE_DEFAULT_ATC;
+   if (level != AMDGPU_VM_PTB)
+   ats_value |= AMDGPU_PDE_PTE;
  
-	amdgpu_ring_pad_ib(ring, >ibs[0]);

+   r = vm->update_funcs->update(, bo, addr, 0, ats_entries,
+0, ats_value);
+   if (r)
+   return r;
  
-	WARN_ON(job->ibs[0].length_dw > 64);

-   r = amdgpu_sync_resv(adev, >sync, vm->root.base.bo->tbo.resv,
-AMDGPU_FENCE_OWNER_KFD, false);
-   if (r)
-   goto error_free;
+   addr += ats_entries * 8;
+   }
  
-	r = amdgpu_job_submit(job, >entity, AMDGPU_FENCE_OWNER_UNDEFINED,

- );
-   if (r)
-   goto error_free;
+   if (entries) {
+   uint64_t value = 0;
  
-	amdgpu_bo_fence(vm->root.base.bo, fence, true);

-   dma_fence_put(fence);
+   /* Workaround for fault priority problem on GMC9 */
+   if (level == AMDGPU_VM_PTB &&
+   adev->asic_type >= CHIP_VEGA10)
+   value = AMDGPU_PTE_EXECUTABLE;
  
-	return 0;

+   r = vm->update_funcs->update(, bo, addr, 0, entries,
+0, value);
+   if (r)
+   return r;
+   }
  
-error_free:

-   amdgpu_job_free(job);
-   return r;
+   return vm->update_funcs->commit(, NULL);
  }
  
  /**

@@ -913,7 +894,7 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
if (r)
goto error_free_pt;
  
-	return 1;

+   return 0;
  
  error_free_pt:

amdgpu_bo_unref(>shadow);
@@ -1421,12 +1402,10 @@ static int amdgpu_vm_update_ptes(struct 
amdgpu_vm_update_params *params,
unsigned shift, parent_shift, mask;
uint64_t incr, entry_end, pe_start;
struct amdgpu_bo *pt;
-   bool need_to_sync;
  
  		r = amdgpu_vm_alloc_pts(params->adev, params->vm, );

-   if (r < 0)
+   if (r)
return r;
-   need_to_sync = (r && 

Re: [PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v3

2019-03-19 Thread zhoucm1



On 2019年03月19日 19:54, Lionel Landwerlin wrote:

On 15/03/2019 12:09, Chunming Zhou wrote:
v2: individually allocate chain array, since chain node is free 
independently.
v3: all existing points must be already signaled before cpu perform 
signal operation,

 so add check condition for that.

Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/drm_internal.h |   2 +
  drivers/gpu/drm/drm_ioctl.c    |   2 +
  drivers/gpu/drm/drm_syncobj.c  | 103 +
  include/uapi/drm/drm.h |   1 +
  4 files changed, 108 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h 
b/drivers/gpu/drm/drm_internal.h

index dd11ae5f1eef..d9a483a5fce0 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  struct drm_file *file_private);
  int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void 
*data,

+  struct drm_file *file_private);
  int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
  struct drm_file *file_private);
  diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 92b3b7b2fd81..d337f161909c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
drm_syncobj_timeline_signal_ioctl,

+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
drm_crtc_get_sequence_ioctl, DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 306c7b7e2770..eaeb038f97d7 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1183,6 +1183,109 @@ drm_syncobj_signal_ioctl(struct drm_device 
*dev, void *data,

  return ret;
  }
  +int
+drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private)
+{
+    struct drm_syncobj_timeline_array *args = data;
+    struct drm_syncobj **syncobjs;
+    struct dma_fence_chain **chains;
+    uint64_t *points;
+    uint32_t i, j, timeline_count = 0;
+    int ret;
+
+    if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+    return -EOPNOTSUPP;
+
+    if (args->pad != 0)
+    return -EINVAL;
+
+    if (args->count_handles == 0)
+    return -EINVAL;
+
+    ret = drm_syncobj_array_find(file_private,
+ u64_to_user_ptr(args->handles),
+ args->count_handles,
+ );
+    if (ret < 0)
+    return ret;
+
+    for (i = 0; i < args->count_handles; i++) {
+    struct dma_fence_chain *chain;
+    struct dma_fence *fence;
+
+    fence = drm_syncobj_fence_get(syncobjs[i]);
+    chain = to_dma_fence_chain(fence);
+    if (chain) {
+    struct dma_fence *iter;
+
+    dma_fence_chain_for_each(iter, fence) {
+    if (!iter)
+    break;
+    if (!dma_fence_is_signaled(iter)) {
+    dma_fence_put(iter);
+    DRM_ERROR("Client must guarantee all existing 
timeline points signaled before performing host signal operation!");

+    ret = -EPERM;
+    goto out;



Sorry if I'm failing to remember whether we discussed this before.


Signaling a point from the host should be fine even if the previous 
points in the timeline are not signaled.

ok, will remove that checking.



After all this can happen on the device side as well (out of order 
signaling).



I thought the thing we didn't want is out of order submission.

Just checking the last chain node seqno against the host signal point 
should be enough.



What about simply returning -EPERM, we can warn the application from 
userspace?

OK, will add that.





+    }
+    }
+    }
+    }
+
+    points = kmalloc_array(args->count_handles, sizeof(*points),
+   GFP_KERNEL);
+    if (!points) {
+    ret = -ENOMEM;
+    goto out;
+    }
+    if (!u64_to_user_ptr(args->points)) {
+    memset(points, 0, args->count_handles * sizeof(uint64_t));
+    } else if (copy_from_user(points, u64_to_user_ptr(args->points),
+  sizeof(uint64_t) * args->count_handles)) {
+    ret = -EFAULT;
+    goto err_points;
+    }
+
+
+    for (i = 0; i < 

Re: [PATCH] drm/amdgpu: enable bo priority setting from user space

2019-03-07 Thread zhoucm1



On 2019年03月07日 17:55, Michel Dänzer wrote:

On 2019-03-07 10:15 a.m., Chunming Zhou wrote:

Signed-off-by: Chunming Zhou 

Please provide corresponding UMD patches showing how this is to be used.

spec is here:
https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html, 
please search "|VkMemoryPriorityAllocateInfoEXT|".


Fortunately, Windows guy already implemented it before, otherwise, I 
cannot find ready code on opensource, I hate this chicken first and egg 
first question :
https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/gpuMemory.cpp, 
please search "createInfo.priority".
https://github.com/GPUOpen-Drivers/pal/blob/dev/inc/core/palGpuMemory.h, 
priority definition is here.






@@ -229,6 +231,14 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
if (args->in.domains & ~AMDGPU_GEM_DOMAIN_MASK)
return -EINVAL;
  
+	/* check priority */

+   if (args->in.priority == 0) {

Did you verify that this is 0 with old userspace compiled against struct
drm_amdgpu_gem_create_in without the priority field?
Without priority field, I don't think we can check here. Do you mean we 
need to add a new args struct?






+   /* default is normal */
+   args->in.priority = TTM_BO_PRIORITY_NORMAL;
+   } else if (args->in.priority > TTM_MAX_BO_PRIORITY) {
+   args->in.priority = TTM_MAX_BO_PRIORITY;
+   DRM_ERROR("priority specified from user space is over MAX 
priority\n");

This must be DRM_DEBUG, or buggy/malicious userspace can spam dmesg.

Will change.





@@ -252,6 +262,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
  
  	r = amdgpu_gem_object_create(adev, size, args->in.alignment,

 (u32)(0x & args->in.domains),
+args->in.priority - 1,
 flags, ttm_bo_type_device, resv, );

It might be less confusing to subtract 1 after checking against
TTM_MAX_BO_PRIORITY instead of here. Still kind of confusing though. How
about this instead:

Make the priority field of struct drm_amdgpu_gem_create_in signed. In
amdgpu_gem_create_ioctl, clamp the priority to the supported range:

args->in.priority += TTM_BO_PRIORITY_NORMAL;
args->in.priority = max(args->in.priority, 0);
args->in.priority = min(args->in.priority,
TTM_BO_PRIORITY_NORMAL - 1);

This way userspace doesn't need to do a weird mapping of the priority
values (where 0 and 2 have the same meaning), and the range of supported
priorities could at least theoretically be extended without breaking
userspace.

First, I want to explain a bit the priority value from vulkan:
"    From Vulkan Spec, 0.0 <= value <= 1.0, and the granularity of the 
priorities is implementation-dependent.
 One thing Spec forced is that if VkMemoryPriority not specified as 
default behavior, it is as if the
 priority value is 0.5. Our strategy is that map 0.5 to 
GpuMemPriority::Normal-GpuMemPriorityOffset::Offset0,
 which is consistent to MemoryPriorityDefault. We adopts 
GpuMemPriority::VeryLow, GpuMemPriority::Low,
 GpuMemPriority::Normal, GpuMemPriority::High, 4 priority grades, 
each of which contains 8 steps of offests.
 This maps [0.0-1.0) to totally 32 steps. Finally, 1.0 maps to 
GpuMemPriority::VeryHigh.

"

So my original purpose is directly use Priority enum defined in PAL, 
like this:

 "
/// Specifies Base Level priority per GPU memory allocation as a hint to 
the memory manager in the event it needs to

/// select allocations to page out of their preferred heaps.
enum class GpuMemPriority : uint32
{
    Unused    = 0x0,  ///< Indicates that the allocation is not 
currently being used at all, and should be the first

  ///  choice to be paged out.
    VeryLow   = 0x1,  ///< Lowest priority to keep in its preferred heap.
    Low   = 0x2,  ///< Low priority to keep in its preferred heap.
    Normal    = 0x3,  ///< Normal priority to keep in its preferred heap.
    High  = 0x4,  ///< High priority to keep in its preferred heap 
(e.g., render targets).
    VeryHigh  = 0x5,  ///< Highest priority to keep in its preferred 
heap.  Last choice to be paged out (e.g., page

  ///  tables, displayable allocations).
    Count
};
"

If according your idea, we will need to convert it again when hooking 
linux implementation.

So what do think we still use unsigned?





@@ -304,6 +315,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
  
  	/* create a gem object to contain this object in */

r = amdgpu_gem_object_create(adev, args->size, 0, AMDGPU_GEM_DOMAIN_CPU,
+TTM_BO_PRIORITY_NORMAL,
 0, ttm_bo_type_device, NULL, );

Should the userptr ioctl also allow setting the priority?


We can.





diff --git 

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-20 Thread zhoucm1



On 2019年02月20日 15:59, Koenig, Christian wrote:

Am 20.02.19 um 05:53 schrieb zhoucm1:




On 2019年02月19日 19:32, Koenig, Christian wrote:

Hi David,


Could you have a look if it's reasonable?


Patch #1 is also something I already fixed on my local branch.

But patch #2 won't work like this.

We can't return an error from drm_syncobj_add_point() because we 
already submitted work to the hardware. And just dropping the fence 
like you do in the patch is a clearly no-go as well.


Then do you have any idea to skip the messed order signal point?


No, I don't think we can actually do this.
But as Lionel pointed out, user mode shouldn't query a smaller timeline 
payload compared to last time, we must skip messed order signal point!


-David



The only solution I can see would be to lock down the syncobj to 
modifications while command submission is in progress. And that in 
turn would mean a huge bunch of ww_mutex overhead we will certainly 
want to avoid.


Christian.



-David


Regards,
Christian.

Am 19.02.19 um 11:46 schrieb zhoucm1:


Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is 
already rebased to latest drm-misc(kernel 5.0). You can directly 
use that branch.



-David


On 2019年02月19日 01:01, Koenig, Christian wrote:

Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


David is the expert on that, but as far as I know that is 
forbidden by the vulkan spec.


I'm not finding anything in the vulkan spec that makes out of 
order signaling illegal.
That's why I came up with this test, just verifying that the 
timeline does not go backward in term of its payload.


Well we need to handle this case gracefully in the kernel, so it 
is still a good testcase.


Christian.



-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on 
purpose :)


Anyway we really need to handle invalid order graceful here. 
E.g. either the same way as during CS or we abort and return an 
error message.


I think just using the same approach as during CS ist the best 
we can do.


Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:


Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't 
in-order, and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just 
give a warning?


Otherwise like Lionel's unexpected use cases, which easily leads 
to deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f>

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-19 Thread zhoucm1



On 2019年02月19日 19:32, Koenig, Christian wrote:

Hi David,


Could you have a look if it's reasonable?


Patch #1 is also something I already fixed on my local branch.

But patch #2 won't work like this.

We can't return an error from drm_syncobj_add_point() because we 
already submitted work to the hardware. And just dropping the fence 
like you do in the patch is a clearly no-go as well.


Then do you have any idea to skip the messed order signal point?

-David


Regards,
Christian.

Am 19.02.19 um 11:46 schrieb zhoucm1:


Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is 
already rebased to latest drm-misc(kernel 5.0). You can directly use 
that branch.



-David


On 2019年02月19日 01:01, Koenig, Christian wrote:

Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


David is the expert on that, but as far as I know that is forbidden 
by the vulkan spec.


I'm not finding anything in the vulkan spec that makes out of order 
signaling illegal.
That's why I came up with this test, just verifying that the 
timeline does not go backward in term of its payload.


Well we need to handle this case gracefully in the kernel, so it is 
still a good testcase.


Christian.



-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. 
either the same way as during CS or we abort and return an error 
message.


I think just using the same approach as during CS ist the best we 
can do.


Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:


Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't 
in-order, and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give 
a warning?


Otherwise like Lionel's unexpected use cases, which easily leads 
to deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-19 Thread zhoucm1

Hi Lionel,

the attached should fix your problem and also messed signal order.

Hi Christian,

Could you have a look if it's reasonable?


btw: I pushed to change to 
https://github.com/amingriyue/timeline-syncobj-kernel, which is already 
rebased to latest drm-misc(kernel 5.0). You can directly use that branch.



-David


On 2019年02月19日 01:01, Koenig, Christian wrote:

Am 18.02.19 um 13:07 schrieb Lionel Landwerlin:

Thanks guys :)

You mentioned that signaling out of order is illegal.
Is this illegal with regard to the vulkan spec or to the syncobj 
implementation?


David is the expert on that, but as far as I know that is forbidden by 
the vulkan spec.


I'm not finding anything in the vulkan spec that makes out of order 
signaling illegal.
That's why I came up with this test, just verifying that the timeline 
does not go backward in term of its payload.


Well we need to handle this case gracefully in the kernel, so it is 
still a good testcase.


Christian.



-Lionel

On 18/02/2019 11:01, Koenig, Christian wrote:

Hi David,

well I think Lionel is testing the invalid signal order on purpose :)

Anyway we really need to handle invalid order graceful here. E.g. 
either the same way as during CS or we abort and return an error 
message.


I think just using the same approach as during CS ist the best we 
can do.


Regards,
Christian


Am 18.02.2019 11:35 schrieb "Zhou, David(ChunMing)" 
:


Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't 
in-order, and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a 
warning?


Otherwise like Lionel's unexpected use cases, which easily leads to 
deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? 

Re: [PATCH 09/11] drm/syncobj: add transition iotcls between binary and timeline

2019-02-18 Thread zhoucm1

Hi Lionel,

I checked your igt test case,

uint64_t points[5] = { 1, 5, 3, 7, 6 };

which is illegal signal order.

I must admit we should handle it gracefully if signal isn't in-order, 
and we shouldn't lead to deadlock.


Hi Christian,

Can we just ignore when signal point X <= timeline Y? Or just give a 
warning?


Otherwise like Lionel's unexpected use cases, which easily leads to 
deadlock.



-David


On 2019年02月15日 22:28, Lionel Landwerlin wrote:

Hi David,

Thanks a lot for point me to the tests you've added in IGT.
While adding a test with that signals fences imported into a timeline
syncobj out of order, I ran into a deadlock.
Here is the test :
https://github.com/djdeath/intel-gpu-tools/commit/1e46cf7e7bff09b78a24367ddc2314f97eb0a1b9

Trying to kill the deadlocked process I got this backtrace :


[   33.969136] [IGT] syncobj_timeline: starting subtest signal-order
[   60.452823] watchdog: BUG: soft lockup - CPU#6 stuck for 23s!
[syncobj_timelin:2021]
[   60.452826] Modules linked in: rfcomm cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio sch_fq_codel ib_iser snd_hda_intel
rdma_cm iw_cm snd_hda_codec ib_cm snd_hda_core snd_hwdep intel_rapl
snd_pcm ib_core x86_pkg_temp_thermal intel_powerclamp configf
s coretemp iscsi_tcp snd_seq_midi libiscsi_tcp snd_seq_midi_event
libiscsi kvm_intel scsi_transport_iscsi kvm btusb snd_rawmidi irqbypass
btrtl intel_cstate intel_rapl_perf btbcm btintel bluetooth snd_seq
snd_seq_device snd_timer input_leds ecdh_generic snd soundcore mei_me
mei intel_pch_thermal mac_hid acpi_pad parp
ort_pc ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit
ghash_clmulni_intel prime_numbers
drm_kms_helper aesni_intel syscopyarea sysfillrect
[   60.452876]  sysimgblt fb_sys_fops aes_x86_64 crypto_simd sdhci_pci
cryptd drm e1000e glue_helper cqhci sdhci wmi video
[   60.452881] CPU: 6 PID: 2021 Comm: syncobj_timelin Tainted: G
U    5.0.0-rc5+ #337
[   60.452882] Hardware name:  /NUC6i7KYB, BIOS
KYSKLi70.86A.0042.2016.0929.1933 09/29/2016
[   60.452886] RIP: 0010:dma_fence_chain_walk+0x22c/0x260
[   60.452888] Code: ff e9 93 fe ff ff 48 8b 45 08 48 8b 40 18 48 85 c0
74 0c 48 89 ef e8 33 0f 58 00 84 c0 75 23 f0 41 ff 4d 00 0f 88 99 87 2f
00 <0f> 85 05 fe ff ff 4c 89 ef e8 56 ea ff ff 48 89 d8 5b 5d 41 5c 41
[   60.452888] RSP: 0018:9a5804653ca8 EFLAGS: 00010296 ORIG_RAX:
ff13
[   60.452889] RAX:  RBX: 8f5690fb2480 RCX:
8f5690fb2f00
[   60.452890] RDX: 003e3730 RSI:  RDI:
8f5690fb2180
[   60.452891] RBP: 8f5690fb2180 R08:  R09:
8f5690fb2eb0
[   60.452891] R10:  R11: 8f5660469860 R12:
8f5690fb2f68
[   60.452892] R13: 8f5690fb2f00 R14: 0003 R15:
8f5655a45fc0
[   60.452913] FS:  7fdc5c459980() GS:8f569eb8()
knlGS:
[   60.452913] CS:  0010 DS:  ES:  CR0: 80050033
[   60.452914] CR2: 7f9d74336dd8 CR3: 00084a67e004 CR4:
003606e0
[   60.452915] DR0:  DR1:  DR2:

[   60.452915] DR3:  DR6: fffe0ff0 DR7:
0400
[   60.452916] Call Trace:
[   60.452958]  drm_syncobj_add_point+0x102/0x160 [drm]
[   60.452965]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452971]  drm_syncobj_transfer_ioctl+0x10f/0x180 [drm]
[   60.452978]  drm_ioctl_kernel+0xac/0xf0 [drm]
[   60.452984]  drm_ioctl+0x2eb/0x3b0 [drm]
[   60.452990]  ? drm_syncobj_fd_to_handle_ioctl+0x1b0/0x1b0 [drm]
[   60.452992]  ? sw_sync_ioctl+0x347/0x370
[   60.452994]  do_vfs_ioctl+0xa4/0x640
[   60.452995]  ? __fput+0x134/0x220
[   60.452997]  ? do_fcntl+0x1a5/0x650
[   60.452998]  ksys_ioctl+0x70/0x80
[   60.452999]  __x64_sys_ioctl+0x16/0x20
[   60.453002]  do_syscall_64+0x55/0x110
[   60.453004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.453005] RIP: 0033:0x7fdc5b6e45d7
[   60.453006] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   60.453007] RSP: 002b:7fff25c4d198 EFLAGS: 0206 ORIG_RAX:
0010
[   60.453008] RAX: ffda RBX:  RCX:
7fdc5b6e45d7
[   60.453008] RDX: 7fff25c4d200 RSI: c02064cc RDI:
0003
[   60.453009] RBP: 7fff25c4d1d0 R08:  R09:
001e
[   60.453010] R10:  R11: 0206 R12:
563d3959e4d0
[   60.453010] R13: 7fff25c4d620 R14:  R15:

[   88.447359] watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
[syncobj_timelin:2021]

Re: [PATCH 06/11] drm/syncobj: add timeline payload query ioctl v4

2019-02-17 Thread zhoucm1



On 2019年02月17日 03:22, Christian König wrote:

Am 15.02.19 um 20:31 schrieb Lionel Landwerlin via amd-gfx:

On 07/12/2018 09:55, Chunming Zhou wrote:

user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
  drivers/gpu/drm/drm_internal.h |  2 ++
  drivers/gpu/drm/drm_ioctl.c    |  2 ++
  drivers/gpu/drm/drm_syncobj.c  | 43 
++

  include/uapi/drm/drm.h | 10 
  4 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h 
b/drivers/gpu/drm/drm_internal.h

index 18b41e10195c..dab4d5936441 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -184,6 +184,8 @@ int drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  struct drm_file *file_private);
  int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private);
    /* drm_framebuffer.c */
  void drm_framebuffer_print_info(struct drm_printer *p, unsigned 
int indent,

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a9a17ed35cc4..7578ef6dc1d1 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -681,6 +681,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
    DRM_UNLOCKED|DRM_RENDER_ALLOW),
+    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
  DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
drm_crtc_get_sequence_ioctl, DRM_UNLOCKED),
  DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
  DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, 
drm_mode_create_lease_ioctl, DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 348079bb0965..f97fa00ca1d0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1061,3 +1061,46 @@ drm_syncobj_signal_ioctl(struct drm_device 
*dev, void *data,

    return ret;
  }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private)
+{
+    struct drm_syncobj_timeline_array *args = data;
+    struct drm_syncobj **syncobjs;
+    uint64_t __user *points = u64_to_user_ptr(args->points);
+    uint32_t i;
+    int ret;
+
+    if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+    return -ENODEV;
+
+    if (args->pad != 0)
+    return -EINVAL;
+
+    if (args->count_handles == 0)
+    return -EINVAL;
+
+    ret = drm_syncobj_array_find(file_private,
+ u64_to_user_ptr(args->handles),
+ args->count_handles,
+ );
+    if (ret < 0)
+    return ret;
+
+    for (i = 0; i < args->count_handles; i++) {
+    struct dma_fence_chain *chain;
+    struct dma_fence *fence;
+    uint64_t point;
+
+    fence = drm_syncobj_fence_get(syncobjs[i]);
+    chain = to_dma_fence_chain(fence);
+    point = chain ? fence->seqno : 0;



Sorry, I don' t want to sound annoying, but this looks like this 
could report values going backward.


Well please be annoying as much as you can :) But yeah all that stuff 
has been discussed before as well.




Anything add a point X to a timeline that has reached value Y with X 
< Y would trigger that.


Yes, that can indeed happen.

trigger what? when adding x (x < y), then return 0 when query?
Why would this happen?
No, syncobj->fence should always be there and the last chain node, if it 
is ever added.


-David
But adding a timeline point X which is before the already added point 
Y is illegal in the first place :)


So when the application does something stupid and breaks it can just 
keep the pieces.


In the kernel we still do the most defensive thing and sync to 
everything in this case.


I'm just not sure if we should print an error into syslog or just 
continue silently.


Regards,
Christian.



Either through the submission or userspace signaling or importing 
another syncpoint's fence.



-Lionel



+    ret = copy_to_user([i], , sizeof(uint64_t));
+    ret = ret ? -EFAULT : 0;
+    if (ret)
+    break;
+    }
+    drm_syncobj_array_free(syncobjs, args->count_handles);
+
+    return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 0092111d002c..b2c36f2b2599 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -767,6 +767,14 @@ struct drm_syncobj_array {
  __u32 pad;
  };
  +struct 

Re: [PATCH] drm/ttm: stop always moving BOs on the LRU on page fault

2019-01-13 Thread zhoucm1



On 2019年01月11日 21:15, Christian König wrote:

Move the BO on the LRU only when it is actually moved by a DMA
operation.

Signed-off-by: Christian König 

Tested-And-Reviewed-by: Chunming Zhou 

I just sent lru_notify  v2 patches, please review them. With yours and 
mine, the OOM issue is fixed without negative effect.


-David

---
  drivers/gpu/drm/ttm/ttm_bo_vm.c | 19 ---
  1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index a1d977fbade5..e86a29a1e51f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -71,7 +71,7 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct 
ttm_buffer_object *bo,
ttm_bo_get(bo);
up_read(>vma->vm_mm->mmap_sem);
(void) dma_fence_wait(bo->moving, true);
-   ttm_bo_unreserve(bo);
+   reservation_object_unlock(bo->resv);
ttm_bo_put(bo);
goto out_unlock;
}
@@ -131,11 +131,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
-   err = ttm_bo_reserve(bo, true, true, NULL);
-   if (unlikely(err != 0)) {
-   if (err != -EBUSY)
-   return VM_FAULT_NOPAGE;
-
+   if (unlikely(!reservation_object_trylock(bo->resv))) {
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
ttm_bo_get(bo);
@@ -165,6 +161,8 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
}
  
  	if (bdev->driver->fault_reserve_notify) {

+   struct dma_fence *moving = dma_fence_get(bo->moving);
+
err = bdev->driver->fault_reserve_notify(bo);
switch (err) {
case 0:
@@ -177,6 +175,13 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out_unlock;
}
+
+   if (bo->moving != moving) {
+   spin_lock(>glob->lru_lock);
+   ttm_bo_move_to_lru_tail(bo, NULL);
+   spin_unlock(>glob->lru_lock);
+   }
+   dma_fence_put(moving);
}
  
  	/*

@@ -291,7 +296,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
  out_io_unlock:
ttm_mem_io_unlock(man);
  out_unlock:
-   ttm_bo_unreserve(bo);
+   reservation_object_unlock(bo->resv);
return ret;
  }
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu: disable vm fault irq during prt accessed

2019-01-03 Thread zhoucm1

need dummy page for that?


-David


On 2019年01月03日 17:01, Christian König wrote:

NAK, the problem is not the interrupt.

E.g. causing faults by accessing unmapped pages with the SDMA can 
still crash the MC.


The key point is that SDMA can't work with PRT tiles on pre-gmc9 and 
we need to forbid access on the application side.


Regards,
Christian.

Am 03.01.19 um 09:54 schrieb Chunming Zhou:

For pre-gmc9, UMD can only access unmapped PRT tile from CB/TC without
firing VM fault. Kernel would still receive the VM fault interrupt
and output the error message if SDMA is the mc_client.
GMC9 don't need the same since it handle the PRT in different way.
We cannot just skip message for SDMA, as Christian pointed, VM fault
could crash mc block, so we disable vm fault irq during prt range is
accesed.
The nagative is normal vm fault could be ignored during that peroid
without enabling vm_debug kernel parameter.

Change-Id: Ic3c62393768eca90e3e45eaf81e7f26f2e91de84
Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 6 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 6 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 6 ++
  3 files changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c

index dae73f6768c2..175c4b319559 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -486,6 +486,10 @@ static void gmc_v6_0_set_prt(struct 
amdgpu_device *adev, bool enable)

  WREG32(mmVM_PRT_APERTURE1_HIGH_ADDR, high);
  WREG32(mmVM_PRT_APERTURE2_HIGH_ADDR, high);
  WREG32(mmVM_PRT_APERTURE3_HIGH_ADDR, high);
+    /* Note: when vm_debug enabled, vm fault from SDMAx accessing
+ * PRT range is normal. */
+    if (!amdgpu_vm_debug)
+    amdgpu_irq_put(adev, >gmc.vm_fault, 0);
  } else {
  WREG32(mmVM_PRT_APERTURE0_LOW_ADDR, 0xfff);
  WREG32(mmVM_PRT_APERTURE1_LOW_ADDR, 0xfff);
@@ -495,6 +499,8 @@ static void gmc_v6_0_set_prt(struct amdgpu_device 
*adev, bool enable)

  WREG32(mmVM_PRT_APERTURE1_HIGH_ADDR, 0x0);
  WREG32(mmVM_PRT_APERTURE2_HIGH_ADDR, 0x0);
  WREG32(mmVM_PRT_APERTURE3_HIGH_ADDR, 0x0);
+    if (!amdgpu_vm_debug)
+    amdgpu_irq_get(adev, >gmc.vm_fault, 0);
  }
  }
  diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c

index 5bdeb358bfb5..a4d6d219f4e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -582,6 +582,10 @@ static void gmc_v7_0_set_prt(struct 
amdgpu_device *adev, bool enable)

  WREG32(mmVM_PRT_APERTURE1_HIGH_ADDR, high);
  WREG32(mmVM_PRT_APERTURE2_HIGH_ADDR, high);
  WREG32(mmVM_PRT_APERTURE3_HIGH_ADDR, high);
+    /* Note: when vm_debug enabled, vm fault from SDMAx accessing
+ * PRT range is normal. */
+    if (!amdgpu_vm_debug)
+    amdgpu_irq_put(adev, >gmc.vm_fault, 0);
  } else {
  WREG32(mmVM_PRT_APERTURE0_LOW_ADDR, 0xfff);
  WREG32(mmVM_PRT_APERTURE1_LOW_ADDR, 0xfff);
@@ -591,6 +595,8 @@ static void gmc_v7_0_set_prt(struct amdgpu_device 
*adev, bool enable)

  WREG32(mmVM_PRT_APERTURE1_HIGH_ADDR, 0x0);
  WREG32(mmVM_PRT_APERTURE2_HIGH_ADDR, 0x0);
  WREG32(mmVM_PRT_APERTURE3_HIGH_ADDR, 0x0);
+    if (!amdgpu_vm_debug)
+    amdgpu_irq_get(adev, >gmc.vm_fault, 0);
  }
  }
  diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c

index 5150ab614eaa..eea2eb7fc2f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -808,6 +808,10 @@ static void gmc_v8_0_set_prt(struct 
amdgpu_device *adev, bool enable)

  WREG32(mmVM_PRT_APERTURE1_HIGH_ADDR, high);
  WREG32(mmVM_PRT_APERTURE2_HIGH_ADDR, high);
  WREG32(mmVM_PRT_APERTURE3_HIGH_ADDR, high);
+    /* Note: when vm_debug enabled, vm fault from SDMAx accessing
+ * PRT range is normal. */
+    if (!amdgpu_vm_debug)
+    amdgpu_irq_put(adev, >gmc.vm_fault, 0);
  } else {
  WREG32(mmVM_PRT_APERTURE0_LOW_ADDR, 0xfff);
  WREG32(mmVM_PRT_APERTURE1_LOW_ADDR, 0xfff);
@@ -817,6 +821,8 @@ static void gmc_v8_0_set_prt(struct amdgpu_device 
*adev, bool enable)

  WREG32(mmVM_PRT_APERTURE1_HIGH_ADDR, 0x0);
  WREG32(mmVM_PRT_APERTURE2_HIGH_ADDR, 0x0);
  WREG32(mmVM_PRT_APERTURE3_HIGH_ADDR, 0x0);
+    if (!amdgpu_vm_debug)
+    amdgpu_irq_get(adev, >gmc.vm_fault, 0);
  }
  }




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH -next] drm/amdgpu: Fix return value check in amdgpu_allocate_static_csa()

2018-12-03 Thread zhoucm1



On 2018年12月04日 14:39, Wei Yongjun wrote:

Fix the return value check which testing the wrong variable
in amdgpu_allocate_static_csa().

Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file")
Signed-off-by: Wei Yongjun 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index 0c590dd..a5fbc6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -43,7 +43,7 @@ int amdgpu_allocate_static_csa(struct amdgpu_device *adev, 
struct amdgpu_bo **bo
r = amdgpu_bo_create_kernel(adev, size, PAGE_SIZE,
domain, bo,
NULL, );
-   if (!bo)
+   if (!r)
return -ENOMEM;
I guess original is correct as well, if you want to change it, you can 
make like below, not your 'if (!r)':

                if (r)
                        return r;

-David
  
  	memset(ptr, 0, size);






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container v2

2018-12-02 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
 drop prev reference during garbage collection if it's not a chain fence.

Signed-off-by: Christian König 
---
  drivers/dma-buf/Makefile  |   3 +-
  drivers/dma-buf/dma-fence-chain.c | 235 ++
  include/linux/dma-fence-chain.h   |  79 +
  3 files changed, 316 insertions(+), 1 deletion(-)
  create mode 100644 drivers/dma-buf/dma-fence-chain.c
  create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
  obj-$(CONFIG_SYNC_FILE)   += sync_file.o
  obj-$(CONFIG_SW_SYNC) += sw_sync.o sync_debug.o
  obj-$(CONFIG_UDMABUF) += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..de05101fc48d
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,235 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(>prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg(>prev, prev, replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence pointer to the chain node which will signal this sequence
+ * number. If no sequence number is provided then this is a no-op.
+ *
+ * Returns EINVAL if the fence is not a chain node or the sequence number has
+ * not yet advanced far enough.
+ */
+int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)
+{
+   struct dma_fence_chain *chain;
+
+   if (!seqno)
+   return 0;
+
+   chain = to_dma_fence_chain(*pfence);
+   if (!chain || chain->base.seqno < seqno)
+   return -EINVAL;
+
+   dma_fence_chain_for_each(*pfence) {
+   if ((*pfence)->context != chain->base.context ||
+   

Re: [PATCH 04/11] drm/syncobj: use only a single stub fence

2018-11-29 Thread zhoucm1
Could you move this one to dma-fence as you said? Which will be used in 
other place as well.


-David


On 2018年11月28日 22:50, Christian König wrote:

Extract of useful code from the timeline work. Let's use just a single
stub fence instance instead of allocating a new one all the time.

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 67 ++-
  1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b92e3c726229..f78321338c1f 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,10 +56,8 @@
  #include "drm_internal.h"
  #include 
  
-struct drm_syncobj_stub_fence {

-   struct dma_fence base;
-   spinlock_t lock;
-};
+static DEFINE_SPINLOCK(stub_fence_lock);
+static struct dma_fence stub_fence;
  
  static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)

  {
@@ -71,6 +69,25 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.get_timeline_name = drm_syncobj_stub_fence_get_name,
  };
  
+/**

+ * drm_syncobj_get_stub_fence - return a signaled fence
+ *
+ * Return a stub fence which is already signaled.
+ */
+static struct dma_fence *drm_syncobj_get_stub_fence(void)
+{
+   spin_lock(_fence_lock);
+   if (!stub_fence.ops) {
+   dma_fence_init(_fence,
+  _syncobj_stub_fence_ops,
+  _fence_lock,
+  0, 0);
+   dma_fence_signal_locked(_fence);
+   }
+   spin_unlock(_fence_lock);
+
+   return dma_fence_get(_fence);
+}
  
  /**

   * drm_syncobj_find - lookup and reference a sync object.
@@ -188,23 +205,18 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  
-static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)

+/**
+ * drm_syncobj_assign_null_handle - assign a stub fence to the sync object
+ * @syncobj: sync object to assign the fence on
+ *
+ * Assign a already signaled stub fence to the sync object.
+ */
+static void drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
  {
-   struct drm_syncobj_stub_fence *fence;
-   fence = kzalloc(sizeof(*fence), GFP_KERNEL);
-   if (fence == NULL)
-   return -ENOMEM;
+   struct dma_fence *fence = drm_syncobj_get_stub_fence();
  
-	spin_lock_init(>lock);

-   dma_fence_init(>base, _syncobj_stub_fence_ops,
-  >lock, 0, 0);
-   dma_fence_signal(>base);
-
-   drm_syncobj_replace_fence(syncobj, >base);
-
-   dma_fence_put(>base);
-
-   return 0;
+   drm_syncobj_replace_fence(syncobj, fence);
+   dma_fence_put(fence);
  }
  
  /**

@@ -272,7 +284,6 @@ EXPORT_SYMBOL(drm_syncobj_free);
  int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
   struct dma_fence *fence)
  {
-   int ret;
struct drm_syncobj *syncobj;
  
  	syncobj = kzalloc(sizeof(struct drm_syncobj), GFP_KERNEL);

@@ -283,13 +294,8 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
INIT_LIST_HEAD(>cb_list);
spin_lock_init(>lock);
  
-	if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {

-   ret = drm_syncobj_assign_null_handle(syncobj);
-   if (ret < 0) {
-   drm_syncobj_put(syncobj);
-   return ret;
-   }
-   }
+   if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
+   drm_syncobj_assign_null_handle(syncobj);
  
  	if (fence)

drm_syncobj_replace_fence(syncobj, fence);
@@ -980,11 +986,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
  
-	for (i = 0; i < args->count_handles; i++) {

-   ret = drm_syncobj_assign_null_handle(syncobjs[i]);
-   if (ret < 0)
-   break;
-   }
+   for (i = 0; i < args->count_handles; i++)
+   drm_syncobj_assign_null_handle(syncobjs[i]);
  
  	drm_syncobj_array_free(syncobjs, args->count_handles);
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for amdgpu v3

2018-11-29 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

From: Chunming Zhou 

v2: symbos are stored in lexical order.
v3: drop export/import and extra query indirection

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
  amdgpu/amdgpu-symbol-check |  2 ++
  amdgpu/amdgpu.h| 39 +++
  amdgpu/amdgpu_cs.c | 23 +++
  3 files changed, 64 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 6f5e0f95..4553736f 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -49,8 +49,10 @@ amdgpu_cs_submit
  amdgpu_cs_submit_raw
  amdgpu_cs_syncobj_export_sync_file
  amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_query
  amdgpu_cs_syncobj_reset
  amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_wait
  amdgpu_cs_syncobj_wait
  amdgpu_cs_wait_fences
  amdgpu_cs_wait_semaphore
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index dc51659a..330658a0 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1489,6 +1489,45 @@ int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
   int64_t timeout_nsec, unsigned flags,
   uint32_t *first_signaled);
  
+/**

+ *  Wait for one or all sync objects on their points to signal.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [in] array of sync points to wait
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles,
+   int64_t timeout_nsec, unsigned flags,
+   uint32_t *first_signaled);
+/**
+ *  Query sync objects payloads.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which presents
+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles);
+
  /**
   *  Export kernel sync object to shareable fd.
   *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 3b8231aa..e4a547c6 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -661,6 +661,29 @@ drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle 
dev,
  flags, first_signaled);
  }
  
+drm_public int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,

+  uint32_t *handles, uint64_t 
*points,
+  unsigned num_handles,
+  int64_t timeout_nsec, unsigned 
flags,
+  uint32_t *first_signaled)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineWait(dev->fd, handles, points, num_handles,
+ timeout_nsec, flags, first_signaled);
+}
+
+drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t *points,
This interfaces is public to umd, I think they like "uint64_t **points" 
for batch query, I've verified before, it works well and more convenience.
If removing num_handles, that means only one syncobj to query, I agree 
with "uint64_t *point".


-David

+  unsigned num_handles)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjQuery(dev->fd, handles, points, num_handles);
+}
+
  drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
uint32_t handle,
int *shared_fd)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: restart syncobj timeline changes v2

2018-11-29 Thread zhoucm1
Looks very very good, and I applied them in my local, and tested by 
./amdgpu_test -s 9 and synobj_basic/wait of IGT today.


+Daniel, Chris, Eric, Could you also have a look as well?


-David



On 2018年11月28日 22:50, Christian König wrote:

Tested this patch set more extensively in the last two weeks and fixed tons of 
additional bugs.

Still only testing with hand made DRM patches, but those are now rather 
reliable at least on amdgpu. Setting up igt is the next thing on the TODO list.

UAPI seems to be pretty solid already except for two changes:
1. Dropping an extra flag in the wait interface which was default behavior 
anyway.
2. Dropped the extra indirection in the query interface.

Additional to that I'm thinking if we shouldn't replace the flags parameter to 
find_fence() with a timeout value instead to limit how long we want to wait for 
a fence to appear.

Please test and comment,
Christian.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/5] add timeline wait/query ioctl v2

2018-11-29 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

From: Chunming Zhou 

v2: drop export/import

Signed-off-by: Chunming Zhou 
---
  xf86drm.c | 44 
  xf86drm.h |  8 
  2 files changed, 52 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 71ad54ba..afa2f466 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4277,3 +4277,47 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
  ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_SIGNAL, );
  return ret;
  }
+
+drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled)
+{
+struct drm_syncobj_timeline_wait args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.timeout_nsec = timeout_nsec;
+args.count_handles = num_handles;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, );
+if (ret < 0)
+return -errno;
+
+if (first_signaled)
+*first_signaled = args.first_signaled;
+return ret;
+}
+
+
+drm_public int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
We should change 'uint64_t *point' to 'uint64_t **points', otherwise, 
userspace always needs a copy to their own variable.


-David

+  uint32_t handle_count)
+{
+struct drm_syncobj_timeline_query args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_QUERY, );
+if (ret)
+return ret;
+return 0;
+}
+
+
diff --git a/xf86drm.h b/xf86drm.h
index 7773d71a..2dae1694 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -870,11 +870,19 @@ extern int drmSyncobjFDToHandle(int fd, int obj_fd, 
uint32_t *handle);
  
  extern int drmSyncobjImportSyncFile(int fd, uint32_t handle, int sync_file_fd);

  extern int drmSyncobjExportSyncFile(int fd, uint32_t handle, int 
*sync_file_fd);
+extern int drmSyncobjImportSyncFile2(int fd, uint32_t handle, uint64_t point, 
int sync_file_fd);
+extern int drmSyncobjExportSyncFile2(int fd, uint32_t handle, uint64_t point, 
int *sync_file_fd);
  extern int drmSyncobjWait(int fd, uint32_t *handles, unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
  uint32_t *first_signaled);
  extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
  extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled);
+extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count);
  
  #if defined(__cplusplus)

  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 01/11] dma-buf: make fence sequence numbers 64 bit

2018-11-29 Thread zhoucm1



On 2018年11月28日 22:50, Christian König wrote:

For a lot of use cases we need 64bit sequence numbers. Currently drivers
overload the dma_fence structure to store the additional bits.

Stop doing that and make the sequence number in the dma_fence always
64bit.

For compatibility with hardware which can do only 32bit sequences the
comparisons in __dma_fence_is_later still only takes the lower 32bits as
significant.

Can't we compare 64bit variable directly?  Can we do it as below?

-static inline bool __dma_fence_is_later(u32 f1, u32 f2)
+static inline bool __dma_fence_is_later(u64 f1, u64 f2)
 {
-   return (int)(f1 - f2) > 0;
+   return (f1 > f2) ? true : false;

 }

-David



Signed-off-by: Christian König 
---
  drivers/dma-buf/dma-fence.c|  2 +-
  drivers/dma-buf/sw_sync.c  |  2 +-
  drivers/dma-buf/sync_file.c|  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c |  2 +-
  drivers/gpu/drm/i915/i915_sw_fence.c   |  2 +-
  drivers/gpu/drm/i915/intel_engine_cs.c |  2 +-
  drivers/gpu/drm/vgem/vgem_fence.c  |  4 ++--
  include/linux/dma-fence.h  | 14 +++---
  8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1551ca7df394..37e24b69e94b 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -615,7 +615,7 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
   */
  void
  dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
-  spinlock_t *lock, u64 context, unsigned seqno)
+  spinlock_t *lock, u64 context, u64 seqno)
  {
BUG_ON(!lock);
BUG_ON(!ops || !ops->get_driver_name || !ops->get_timeline_name);
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 53c1d6d36a64..32dcf7b4c935 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -172,7 +172,7 @@ static bool timeline_fence_enable_signaling(struct 
dma_fence *fence)
  static void timeline_fence_value_str(struct dma_fence *fence,
char *str, int size)
  {
-   snprintf(str, size, "%d", fence->seqno);
+   snprintf(str, size, "%lld", fence->seqno);
  }
  
  static void timeline_fence_timeline_value_str(struct dma_fence *fence,

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 35dd06479867..4f6305ca52c8 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -144,7 +144,7 @@ char *sync_file_get_name(struct sync_file *sync_file, char 
*buf, int len)
} else {
struct dma_fence *fence = sync_file->fence;
  
-		snprintf(buf, len, "%s-%s%llu-%d",

+   snprintf(buf, len, "%s-%s%llu-%lld",
 fence->ops->get_driver_name(fence),
 fence->ops->get_timeline_name(fence),
 fence->context,
@@ -258,7 +258,7 @@ static struct sync_file *sync_file_merge(const char *name, 
struct sync_file *a,
  
  			i_b++;

} else {
-   if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+   if (__dma_fence_is_later(pt_a->seqno, pt_b->seqno))
add_fence(fences, , pt_a);
else
add_fence(fences, , pt_b);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
index 12f2bf97611f..bfaf5c6323be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
@@ -388,7 +388,7 @@ void amdgpu_sa_bo_dump_debug_info(struct amdgpu_sa_manager 
*sa_manager,
   soffset, eoffset, eoffset - soffset);
  
  		if (i->fence)

-   seq_printf(m, " protected by 0x%08x on context %llu",
+   seq_printf(m, " protected by 0x%016llx on context %llu",
   i->fence->seqno, i->fence->context);
  
  		seq_printf(m, "\n");

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index 6dbeed079ae5..11bcdabd5177 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -393,7 +393,7 @@ static void timer_i915_sw_fence_wake(struct timer_list *t)
if (!fence)
return;
  
-	pr_notice("Asynchronous wait on fence %s:%s:%x timed out (hint:%pS)\n",

+   pr_notice("Asynchronous wait on fence %s:%s:%llx timed out 
(hint:%pS)\n",
  cb->dma->ops->get_driver_name(cb->dma),
  cb->dma->ops->get_timeline_name(cb->dma),
  cb->dma->seqno,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
b/drivers/gpu/drm/i915/intel_engine_cs.c
index 217ed3ee1cab..f28a66c67d34 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1236,7 +1236,7 @@ static void print_request(struct drm_printer *m,
  
  	x = 

Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-26 Thread zhoucm1

Yeah, you need another drm patch as well when you apply my patch. Attached.

-David


On 2018年11月27日 08:40, Sharma, Deepak wrote:


On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:



-Original Message-
From: Christian König 
Sent: Monday, November 26, 2018 5:23 PM
To: Sharma, Deepak ; Zhou, David(ChunMing)
; Koenig, Christian ;
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
dereference

Am 26.11.18 um 02:59 schrieb Sharma, Deepak:

在 2018/11/24 2:10, Koenig, Christian 写道:

Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):

在 2018/11/23 21:30, Koenig, Christian 写道:

Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):

在 2018/11/22 19:25, Christian König 写道:

Am 22.11.18 um 07:56 schrieb Sharma, Deepak:

when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.

Again, this is clearly incorrect. The my other mails on the
earlier patch.

Sorry for I didn't get your history, but looks from the patch
itself, it is still a valid patch, isn't it?

No, the semantic of amdgpu_ctx_get_fence() is that we return NULL
when the fence is already signaled.

So this patch could totally break userspace because it changes the
behavior when we try to sync to an already signaled fence.

Ah, I got your meaning, how about attached patch?

Yeah something like this, but I would just give the
DRM_SYNCOBJ_CREATE_SIGNALED instead.

I mean that's what this flag is good for isn't it?

Yeah, I give a flag initally when creating patch, but as you know, there is a

swich case not be able to use that flag:

    case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
    fd = get_unused_fd_flags(O_CLOEXEC);
    if (fd < 0) {
    dma_fence_put(fence);
    return fd;
    }

    sync_file = sync_file_create(fence);
    dma_fence_put(fence);
    if (!sync_file) {
    put_unused_fd(fd);
    return -ENOMEM;
    }

    fd_install(fd, sync_file->file);
    info->out.handle = fd;
    return 0;

So I change to stub fence instead.

Yeah, I've missed that case. Not sure if the sync_file can deal with a NULL
fence.

We should then probably move the stub fence function into
dma_fence_stub.c under drivers/dma-buf to keep the stuff together.

Yes, you wrap it to review first with your stub fence, we can do it separately 
first.

-David

-David

I have not applied this patch.
The issue was trying to address is when amdgpu_cs_ioctl() failed due to

low memory (ENOMEM) but userspace chose to proceed and called
amdgpu_cs_fence_to_handle_ioctl().

In amdgpu_cs_fence_to_handle_ioctl(), fence is null and later causing
NULL pointer dereference, this patch was to avoid that and system panic

But I understand now that its a valid case retuning NULL if fence was already
signaled but need to handle case so avoid kernel panic. Seems David patch
should fix this, I will test it tomorrow.

Mhm, but don't we bail out with an error if we ask for a failed command
submission? If not that sounds like a bug as well.

Christian.


Where do we do that?
I see error
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
BUG: unable to handle kernel NULL pointer dereference at 0008
Did some more debugging,dma_fence_is_array() is causing NULL pointer
dereference call through sync_file_ioctl.

Also I think changes in David patch cant be applied on
amd-staging-drm-next, which all patches I should take to get it correctly?


-Deepak

Christian.


-David

If that patch was applied then please revert it immediately.

Christian.


-David

If you have already pushed the patch then please revert.

Christian.


Signed-off-by: Deepak Sharma 
---
     drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
     1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence
*amdgpu_cs_get_fence(struct amdgpu_device *adev,
       fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);
     amdgpu_ctx_put(ctx);
+    if(!fence)
+    return ERR_PTR(-EINVAL);
       return fence;
     }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Received: from SN1PR12MB0509.namprd12.prod.outlook.com (2603:10b6:a02:a8::31)
 by BY1PR12MB0502.namprd12.prod.outlook.com with HTTPS via
 BYAPR03CA0018.NAMPRD03.PROD.OUTLOOK.COM; Thu, 15 Nov 2018 11:12:57 +
Received: from MWHPR12CA0057.namprd12.prod.outlook.com (2603:10b6:300:103::19)
 by SN1PR12MB0509.namprd12.prod.outlook.com (2a01:111:e400:5866::25) 

Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-21 Thread zhoucm1



On 2018年11月22日 14:56, Sharma, Deepak wrote:

when returned fence is not valid mostly due to userspace ignored
previous error causes NULL pointer dereference.

Signed-off-by: Deepak Sharma 

Reviewed-by: Chunming Zhou 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 024dfbd87f11..14166cd8a12f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1403,6 +1403,8 @@ static struct dma_fence *amdgpu_cs_get_fence(struct 
amdgpu_device *adev,
  
  	fence = amdgpu_ctx_get_fence(ctx, entity, user->seq_no);

amdgpu_ctx_put(ctx);
+   if(!fence)
+   return ERR_PTR(-EINVAL);
  
  	return fence;

  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-19 Thread zhoucm1


[snip]

Went boom:

https://bugs.freedesktop.org/show_bug.cgi?id=108490

Can we revert pls?

Sorry for bug, please.
In fact, the bug is already caught and fixed, just the fix part isn't 
in patch#1, but in patch#2:


Have you reverted? If not, I can send that fix in one minute.


Regards,
David Zhou


Also, can we please have igts for this stuff so that intel-gfx-ci could
test this properly before it's all fireworks?

Hi Daniel V,

Could you point me which problem I encounter when I run syncobj_wait of igt?

jenkins@jenkins-MS-7984:~/freedesktop/igt-gpu-tools/tests$ ./syncobj_wait
IGT-Version: 1.23-g94ebd21 (x86_64) (Linux: 4.19.0-rc5-custom+ x86_64)
Test requirement not met in function igt_require_sw_sync, file 
sw_sync.c:240:

Test requirement: kernel_has_sw_sync()
Last errno: 2, No such file or directory

Thanks,
David Zhou
Seems we cannot avoid igt now and vulkan CTS isn't enough, I will 
find some time next week to lean IGT, looks v10 is need.


Regards,
David Zhou


Thanks, Daniel


The rest in the series looks good to me as well,

Can I get your RB on them first?

but I certainly want the radv/anv developers to take a look as 
well as

Daniel suggested.
Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of 
you take
a look the rest of series for u/k interface? So that we can move to 
next

step for libdrm patches?

Thanks,
David

Christian.


Thanks,
David Zhou


-Daniel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-19 Thread zhoucm1



On 2018年10月19日 17:20, zhoucm1 wrote:



On 2018年10月19日 16:55, Daniel Vetter wrote:

On Fri, Oct 19, 2018 at 10:29:55AM +0800, zhoucm1 wrote:


On 2018年10月18日 19:50, Christian König wrote:

Am 18.10.18 um 05:11 schrieb zhoucm1:


On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

    +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;
Out of curiosity, why the pointer and not embedding? base is 
kinda

misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.

Well I never said that you can't embed the fence array into
the signal_pt.

You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that 
around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate 
allocation,

aside from making base less confusing.

That's indeed my initial implementation for signal_pt/wait_pt with
fence based, but after long and many discussions, we get current
solution, as you see, the version is up to v8 :).

For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate
allocation, seems which is mandatory.
So it is a pointer.

But the name is historical from initial, and indeed be kinda
misleading for a pointer, I will change it to fence_array instead in
coming v9.

To avoid running into a v10 I've just pushed this version upstream :)

Thanks a lot.

(This time reply to the right patch, silly me)

Went boom:

https://bugs.freedesktop.org/show_bug.cgi?id=108490

Can we revert pls?

Sorry for bug, please.
In fact, the bug is already caught and fixed, just the fix part isn't in 
patch#1, but in patch#2:


Have you reverted? If not, I can send that fix in one minute.


Regards,
David Zhou


Also, can we please have igts for this stuff so that intel-gfx-ci could
test this properly before it's all fireworks?
Seems we cannot avoid igt now and vulkan CTS isn't enough, I will find 
some time next week to lean IGT, looks v10 is need.


Regards,
David Zhou


Thanks, Daniel


The rest in the series looks good to me as well,

Can I get your RB on them first?


but I certainly want the radv/anv developers to take a look as well as
Daniel suggested.
Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of 
you take
a look the rest of series for u/k interface? So that we can move to 
next

step for libdrm patches?

Thanks,
David

Christian.


Thanks,
David Zhou


-Daniel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-19 Thread zhoucm1



On 2018年10月19日 16:55, Daniel Vetter wrote:

On Fri, Oct 19, 2018 at 10:29:55AM +0800, zhoucm1 wrote:


On 2018年10月18日 19:50, Christian König wrote:

Am 18.10.18 um 05:11 schrieb zhoucm1:


On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

    +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.

Well I never said that you can't embed the fence array into
the signal_pt.

You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate allocation,
aside from making base less confusing.

That's indeed my initial implementation for signal_pt/wait_pt with
fence based, but after long and many discussions, we get current
solution, as you see, the version is up to v8 :).

For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate
allocation, seems which is mandatory.
So it is a pointer.

But the name is historical from initial, and indeed be kinda
misleading for a pointer, I will change it to fence_array instead in
coming v9.

To avoid running into a v10 I've just pushed this version upstream :)

Thanks a lot.

(This time reply to the right patch, silly me)

Went boom:

https://bugs.freedesktop.org/show_bug.cgi?id=108490

Can we revert pls?

Sorry for bug, please.


Also, can we please have igts for this stuff so that intel-gfx-ci could
test this properly before it's all fireworks?
Seems we cannot avoid igt now and vulkan CTS isn't enough, I will find 
some time next week to lean IGT, looks v10 is need.


Regards,
David Zhou


Thanks, Daniel


The rest in the series looks good to me as well,

Can I get your RB on them first?


but I certainly want the radv/anv developers to take a look as well as
Daniel suggested.

Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of you take
a look the rest of series for u/k interface? So that we can move to next
step for libdrm patches?

Thanks,
David

Christian.


Thanks,
David Zhou


-Daniel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-18 Thread zhoucm1



On 2018年10月18日 19:50, Christian König wrote:

Am 18.10.18 um 05:11 schrieb zhoucm1:



On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

   +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.
Well I never said that you can't embed the fence array into the 
signal_pt.


You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate allocation,
aside from making base less confusing.
That's indeed my initial implementation for signal_pt/wait_pt with 
fence based, but after long and many discussions, we get current 
solution, as you see, the version is up to v8 :).


For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate allocation, 
seems which is mandatory.

So it is a pointer.

But the name is historical from initial, and indeed be kinda 
misleading for a pointer, I will change it to fence_array instead in 
coming v9.


To avoid running into a v10 I've just pushed this version upstream :)

Thanks a lot.


The rest in the series looks good to me as well,

Can I get your RB on them first?

but I certainly want the radv/anv developers to take a look as well as 
Daniel suggested.
Ping @Dave/Bas/Jason or other radv/anv developers, Could anyone of you 
take a look the rest of series for u/k interface? So that we can move to 
next step for libdrm patches?


Thanks,
David


Christian.



Thanks,
David Zhou


-Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix amdgpu_vm_fini

2018-10-18 Thread zhoucm1



On 2018年10月18日 20:31, Christian König wrote:

We should not remove mappings in rbtree_postorder_for_each_entry_safe
because that rebalances the tree.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 6904d794d60a..01d94de6a6a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3235,7 +3235,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
rbtree_postorder_for_each_entry_safe(mapping, tmp,
 >va.rb_root, rb) {
list_del(>list);
-   amdgpu_vm_it_remove(mapping, >va);
kfree(mapping);
At least, we should add some comments here, like rb_root would be 
invalid, we cannot use it any more, although that is fact.

Or as you suggested to me before, we can change to while(next_node)...:)

anyway, depending on your favorite, Reviewed-by: Chunming Zhou 


}
list_for_each_entry_safe(mapping, tmp, >freed, list) {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-17 Thread zhoucm1



On 2018年10月17日 18:24, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:29 AM Koenig, Christian
 wrote:

Am 17.10.18 um 11:17 schrieb zhoucm1:

[SNIP]

   +struct drm_syncobj_signal_pt {
+struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so
it's a pointer.
If you don't like 'base' name, I can change it.

Well I never said that you can't embed the fence array into the signal_pt.

You just need to make sure that we don't affect the drm_syncobj
lilecycle as well, e.g. that we don't also need to keep that around.

I don't see a problem with that, as long as drm_syncobj keeps a
reference to the fence while it's on the timeline list. Which it
already does. And embedding would avoid that 2nd separate allocation,
aside from making base less confusing.
That's indeed my initial implementation for signal_pt/wait_pt with fence 
based, but after long and many discussions, we get current solution, as 
you see, the version is up to v8 :).


For here  why the pointer and not embedding?
Two reasons:
1. their lifecycles are not same.
2. It is a fence array usage, which always needs separate allocation, 
seems which is mandatory.

So it is a pointer.

But the name is historical from initial, and indeed be kinda misleading 
for a pointer, I will change it to fence_array instead in coming v9.


Thanks,
David Zhou


-Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-17 Thread zhoucm1

+Jason as well.


On 2018年10月17日 18:22, Daniel Vetter wrote:

On Wed, Oct 17, 2018 at 11:17 AM zhoucm1  wrote:



On 2018年10月17日 16:09, Daniel Vetter wrote:

On Mon, Oct 15, 2018 at 04:55:48PM +0800, Chunming Zhou wrote:

This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
 * CPU query - A host operation that allows querying the payload of the
   timeline syncobj.
 * CPU wait - A host operation that allows a blocking wait for a
   timeline syncobj to reach a specified value.
 * Device wait - A device operation that allows waiting for a
   timeline syncobj to reach a specified value.
 * Device signal - A device operation that allows advancing the
   timeline syncobj to a specified value.

v1:
Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point 
of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
  a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
  b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb.

v6: (Christian)
1. merge syncobj_timeline to syncobj structure.
2. simplify some check sentences.
3. some misc change.
4. fix CTS failed issue.

v7: (Christian)
1. error handling when creating signal pt.
2. remove timeline naming in func.
3. export flags in find_fence.
4. allow reset timeline.

v8:
1. use wait_event_interruptible without timeout
2. rename _TYPE_INDIVIDUAL to _TYPE_BINARY

individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Can we please have these low-level syncobj tests as part of igt, together
with all the other syncobj tests which are there already?

Good suggestion first, I'm just not familiar with igt( build, run
cmd...), maybe we can add it later.


Really doesn't
make much sense imo to splits things on the test suite front.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
Reviewed-by: Christian König 
---
   drivers/gpu/drm/drm_syncobj.c  | 287 ++---
   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
   include/drm/drm_syncobj.h  |  65 ++---
   include/uapi/drm/drm.h |   1 +
   4 files changed, 281 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index f796c9fc3858..67472bd77c83 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
   #include "drm_internal.h"
   #include 

+/* merge normal syncobj to timeline syncobj, the point interval is 1 */
+#define DRM_SYNCOBJ_BINARY_POINT 1
+
   struct drm_syncobj_stub_fence {
  struct dma_fence base;
  spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
  .release = drm_syncobj_stub_fence_release,
   };

+struct drm_syncobj_signal_pt {
+struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.

Yeah, Christian doesn't like signal_pt lifecycle same as fence, so it's
a pointer.
If you don't like 'base' name, I 

Re: [PATCH 2/7] drm: add syncobj timeline support v8

2018-10-17 Thread zhoucm1



On 2018年10月17日 16:09, Daniel Vetter wrote:

On Mon, Oct 15, 2018 at 04:55:48PM +0800, Chunming Zhou wrote:

This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
* CPU query - A host operation that allows querying the payload of the
  timeline syncobj.
* CPU wait - A host operation that allows a blocking wait for a
  timeline syncobj to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline syncobj to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline syncobj to a specified value.

v1:
Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point 
of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
 a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
 b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb.

v6: (Christian)
1. merge syncobj_timeline to syncobj structure.
2. simplify some check sentences.
3. some misc change.
4. fix CTS failed issue.

v7: (Christian)
1. error handling when creating signal pt.
2. remove timeline naming in func.
3. export flags in find_fence.
4. allow reset timeline.

v8:
1. use wait_event_interruptible without timeout
2. rename _TYPE_INDIVIDUAL to _TYPE_BINARY

individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Can we please have these low-level syncobj tests as part of igt, together
with all the other syncobj tests which are there already?
Good suggestion first, I'm just not familiar with igt( build, run 
cmd...), maybe we can add it later.



Really doesn't
make much sense imo to splits things on the test suite front.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c  | 287 ++---
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
  include/drm/drm_syncobj.h  |  65 ++---
  include/uapi/drm/drm.h |   1 +
  4 files changed, 281 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index f796c9fc3858..67472bd77c83 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
  #include "drm_internal.h"
  #include 
  
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */

+#define DRM_SYNCOBJ_BINARY_POINT 1
+
  struct drm_syncobj_stub_fence {
struct dma_fence base;
spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.release = drm_syncobj_stub_fence_release,
  };
  
+struct drm_syncobj_signal_pt {

+   struct dma_fence_array *base;

Out of curiosity, why the pointer and not embedding? base is kinda
misleading for a pointer.
Yeah, Christian doesn't like signal_pt lifecycle same as fence, so it's 
a pointer.

If you don't like 'base' name, I can change it.




+   u64value;
+   struct list_head list;
+};
  
  /**

   * drm_syncobj_find - lookup and reference a 

Re: [PATCH 7/7] drm/amdgpu: update version for timeline syncobj support in amdgpu

2018-10-17 Thread zhoucm1



On 2018年10月16日 20:54, Christian König wrote:

I've added my rb to patch #1 and pushed it to drm-misc-next.

I would really like to get an rb from other people on patch #2 before 
proceeding.


Daniel, Dave and all the other usual suspects on the list what is your 
opinion on this implementation?
Thanks for head up, @Daniel, @Dave, or the others, Could you take a look 
for the series?


Thanks,
David


Christian.

Am 15.10.2018 um 11:04 schrieb Koenig, Christian:

I'm on sick leave today.

But I will see what I can do later in the afternoon,
Christian.

Am 15.10.2018 um 11:01 schrieb Zhou, David(ChunMing):

Ping...
Christian, Could I get your RB on the series? And help me to push to 
drm-misc?

After that I can rebase libdrm header file based on drm-next.

Thanks,
David Zhou


-Original Message-
From: amd-gfx  On Behalf Of
Chunming Zhou
Sent: Monday, October 15, 2018 4:56 PM
To: dri-de...@lists.freedesktop.org
Cc: Zhou, David(ChunMing) ; amd-
g...@lists.freedesktop.org
Subject: [PATCH 7/7] drm/amdgpu: update version for timeline syncobj
support in amdgpu

Signed-off-by: Chunming Zhou 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6870909da926..58cba492ba55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -70,9 +70,10 @@
    * - 3.25.0 - Add support for sensor query info (stable pstate 
sclk/mclk).

    * - 3.26.0 - GFX9: Process AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE.
    * - 3.27.0 - Add new chunk to to AMDGPU_CS to enable BO_LIST 
creation.

+ * - 3.28.0 - Add syncobj timeline support to AMDGPU_CS.
    */
   #define KMS_DRIVER_MAJOR    3
-#define KMS_DRIVER_MINOR    27
+#define KMS_DRIVER_MINOR    28
   #define KMS_DRIVER_PATCHLEVEL    0

   int amdgpu_vram_limit = 0;
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

2018-09-19 Thread zhoucm1



On 2018年09月19日 16:07, Christian König wrote:

Am 19.09.2018 um 10:03 schrieb Zhou, David(ChunMing):



-Original Message-
From: amd-gfx  On Behalf Of
Christian K?nig
Sent: Wednesday, September 19, 2018 3:45 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; Daniel Vetter ; amd-
g...@lists.freedesktop.org
Subject: Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

Am 19.09.2018 um 09:32 schrieb zhoucm1:


On 2018年09月19日 15:18, Christian König wrote:

Am 19.09.2018 um 06:26 schrieb Chunming Zhou:

[snip]

   *fence = NULL;
   drm_syncobj_add_callback_locked(syncobj, cb, func); @@
-164,6 +177,153 @@ void drm_syncobj_remove_callback(struct
drm_syncobj *syncobj,
   spin_unlock(>lock);
   }
   +static void drm_syncobj_timeline_init(struct drm_syncobj
*syncobj)

We still have _timeline_ in the name here.

the func is relevant to timeline members, or which name is proper?
Yeah, but we now use the timeline implementation for the individual 
syncobj

as well.

Not a big issue, but I would just name it
drm_syncobj_init()/drm_syncobj_fini.
There is already drm_syncobj_init/fini in drm_syncboj.c , any other 
name can be suggested?


Hui what? I actually checked that there is no 
drm_syncobj_init()/drm_syncobj_fini() in drm_syncobj.c before 
suggesting it. Am I missing something?

messed syncobj_create/destroy in brain :(




+{
+    spin_lock(>lock);
+    syncobj->timeline_context = dma_fence_context_alloc(1);

[snip]

+}
+
+int drm_syncobj_lookup_fence(struct drm_syncobj *syncobj, u64
+point,
+   struct dma_fence **fence) {
+
+    return drm_syncobj_search_fence(syncobj, point,
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,

I still have a bad feeling setting that flag as default cause it
might change the behavior for the UAPI.

Maybe export drm_syncobj_search_fence directly? E.g. with the flags
parameter.

previous v5 indeed do this, you let me wrap it, need change back?

No, the problem is that drm_syncobj_find_fence() is still using
drm_syncobj_lookup_fence() which sets the flag instead of
drm_syncobj_search_fence() without the flag.

That changes the UAPI behavior because previously we would have 
returned

an error code and now we block for a fence to appear.

So I think the right solution would be to add the flags parameter to
drm_syncobj_find_fence() and let the driver decide if we need to 
block or

get -ENOENT.

Got your means,
Exporting flag in func is easy,
  but driver doesn't pass flag, which flag is proper by default? We 
still need to give a default flag in patch, don't we?


Well proper solution is to keep the old behavior as it is for now.

So passing 0 as flag by default and making sure we get a -ENOENT in 
that case sounds like the right approach to me.


Adding the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT flag can happen when 
the driver starts to provide a proper point as well.

Sounds more flexible.

v7 will come soon.

thanks,
David Zhou


Christian.



Thanks,
David Zhou


Regards,
Christian.


Regards,
David Zhou

Regards,
Christian.


+    fence);
+}
+EXPORT_SYMBOL(drm_syncobj_lookup_fence);
+
   /**
    * drm_syncobj_find_fence - lookup and reference the fence in a
sync object
    * @file_private: drm file private pointer @@ -228,7 +443,7 @@
static int drm_syncobj_assign_null_handle(struct
drm_syncobj *syncobj)
    * @fence: out parameter for the fence
    *
    * This is just a convenience function that combines
drm_syncobj_find() and
- * drm_syncobj_fence_get().
+ * drm_syncobj_lookup_fence().
    *
    * Returns 0 on success or a negative error value on failure. On
success @fence
    * contains a reference to the fence, which must be released by
calling @@ -236,18 +451,11 @@ static int
drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
    */
   int drm_syncobj_find_fence(struct drm_file *file_private,
  u32 handle, u64 point,
-   struct dma_fence **fence) -{
+   struct dma_fence **fence) {
   struct drm_syncobj *syncobj = drm_syncobj_find(file_private,
handle);
-    int ret = 0;
-
-    if (!syncobj)
-    return -ENOENT;
+    int ret;
   -    *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
-    ret = -EINVAL;
-    }
+    ret = drm_syncobj_lookup_fence(syncobj, point, fence);
   drm_syncobj_put(syncobj);
   return ret;
   }
@@ -264,7 +472,7 @@ void drm_syncobj_free(struct kref *kref)
   struct drm_syncobj *syncobj = container_of(kref,
  struct drm_syncobj,
  refcount);
-    drm_syncobj_replace_fence(syncobj, 0, NULL);
+    drm_syncobj_timeline_fini(syncobj);
   kfree(syncobj);
   }
   EXPORT_SYMBOL(drm_syncobj_free);
@@ -294,6 +502,11 @@ int drm_syncobj_create(struct drm_syncobj
**out_syncobj, uint32_t flags,
   kref_init(>refcount);
   INIT_LIST_HEAD(>cb_list);
   spin_lock_init(>loc

Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

2018-09-19 Thread zhoucm1



On 2018年09月19日 15:18, Christian König wrote:

Am 19.09.2018 um 06:26 schrieb Chunming Zhou:

[snip]

  *fence = NULL;
  drm_syncobj_add_callback_locked(syncobj, cb, func);
@@ -164,6 +177,153 @@ void drm_syncobj_remove_callback(struct 
drm_syncobj *syncobj,

  spin_unlock(>lock);
  }
  +static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj)


We still have _timeline_ in the name here.

the func is relevant to timeline members, or which name is proper?




+{
+    spin_lock(>lock);
+    syncobj->timeline_context = dma_fence_context_alloc(1);

[snip]

+}
+
+int drm_syncobj_lookup_fence(struct drm_syncobj *syncobj, u64 point,
+   struct dma_fence **fence) {
+
+    return drm_syncobj_search_fence(syncobj, point,
+    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,


I still have a bad feeling setting that flag as default cause it might 
change the behavior for the UAPI.


Maybe export drm_syncobj_search_fence directly? E.g. with the flags 
parameter.

previous v5 indeed do this, you let me wrap it, need change back?

Regards,
David Zhou


Regards,
Christian.


+    fence);
+}
+EXPORT_SYMBOL(drm_syncobj_lookup_fence);
+
  /**
   * drm_syncobj_find_fence - lookup and reference the fence in a 
sync object

   * @file_private: drm file private pointer
@@ -228,7 +443,7 @@ static int drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)

   * @fence: out parameter for the fence
   *
   * This is just a convenience function that combines 
drm_syncobj_find() and

- * drm_syncobj_fence_get().
+ * drm_syncobj_lookup_fence().
   *
   * Returns 0 on success or a negative error value on failure. On 
success @fence
   * contains a reference to the fence, which must be released by 
calling
@@ -236,18 +451,11 @@ static int 
drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)

   */
  int drm_syncobj_find_fence(struct drm_file *file_private,
 u32 handle, u64 point,
-   struct dma_fence **fence)
-{
+   struct dma_fence **fence) {
  struct drm_syncobj *syncobj = drm_syncobj_find(file_private, 
handle);

-    int ret = 0;
-
-    if (!syncobj)
-    return -ENOENT;
+    int ret;
  -    *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
-    ret = -EINVAL;
-    }
+    ret = drm_syncobj_lookup_fence(syncobj, point, fence);
  drm_syncobj_put(syncobj);
  return ret;
  }
@@ -264,7 +472,7 @@ void drm_syncobj_free(struct kref *kref)
  struct drm_syncobj *syncobj = container_of(kref,
 struct drm_syncobj,
 refcount);
-    drm_syncobj_replace_fence(syncobj, 0, NULL);
+    drm_syncobj_timeline_fini(syncobj);
  kfree(syncobj);
  }
  EXPORT_SYMBOL(drm_syncobj_free);
@@ -294,6 +502,11 @@ int drm_syncobj_create(struct drm_syncobj 
**out_syncobj, uint32_t flags,

  kref_init(>refcount);
  INIT_LIST_HEAD(>cb_list);
  spin_lock_init(>lock);
+    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE)
+    syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE;
+    else
+    syncobj->type = DRM_SYNCOBJ_TYPE_INDIVIDUAL;
+    drm_syncobj_timeline_init(syncobj);
    if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
  ret = drm_syncobj_assign_null_handle(syncobj);
@@ -576,7 +789,8 @@ drm_syncobj_create_ioctl(struct drm_device *dev, 
void *data,

  return -ENODEV;
    /* no valid flags yet */
-    if (args->flags & ~DRM_SYNCOBJ_CREATE_SIGNALED)
+    if (args->flags & ~(DRM_SYNCOBJ_CREATE_SIGNALED |
+    DRM_SYNCOBJ_CREATE_TYPE_TIMELINE))
  return -EINVAL;
    return drm_syncobj_create_as_handle(file_private,
@@ -669,9 +883,8 @@ static void syncobj_wait_syncobj_func(struct 
drm_syncobj *syncobj,

  struct syncobj_wait_entry *wait =
  container_of(cb, struct syncobj_wait_entry, syncobj_cb);
  -    /* This happens inside the syncobj lock */
-    wait->fence = 
dma_fence_get(rcu_dereference_protected(syncobj->fence,

- lockdep_is_held(>lock)));
+    drm_syncobj_search_fence(syncobj, 0, 0, >fence);
+
  wake_up_process(wait->task);
  }
  @@ -698,7 +911,8 @@ static signed long 
drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,

  signaled_count = 0;
  for (i = 0; i < count; ++i) {
  entries[i].task = current;
-    entries[i].fence = drm_syncobj_fence_get(syncobjs[i]);
+    ret = drm_syncobj_search_fence(syncobjs[i], 0, 0,
+   [i].fence);
  if (!entries[i].fence) {
  if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
  continue;
@@ -970,12 +1184,19 @@ drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  if (ret < 0)
  return ret;
  -    for (i = 0; i < args->count_handles; i++)
-    drm_syncobj_replace_fence(syncobjs[i], 0, NULL);
-
+    for (i = 0; i < args->count_handles; i++) {
+    if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
+    

Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

2018-09-19 Thread zhoucm1



On 2018年09月19日 15:18, Christian König wrote:

Am 19.09.2018 um 06:26 schrieb Chunming Zhou:

[snip]

  *fence = NULL;
  drm_syncobj_add_callback_locked(syncobj, cb, func);
@@ -164,6 +177,153 @@ void drm_syncobj_remove_callback(struct 
drm_syncobj *syncobj,

  spin_unlock(>lock);
  }
  +static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj)


We still have _timeline_ in the name here.

the func is relevant to timeline members, or which name is proper?




+{
+    spin_lock(>lock);
+    syncobj->timeline_context = dma_fence_context_alloc(1);

[snip]

+}
+
+int drm_syncobj_lookup_fence(struct drm_syncobj *syncobj, u64 point,
+   struct dma_fence **fence) {
+
+    return drm_syncobj_search_fence(syncobj, point,
+    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,


I still have a bad feeling setting that flag as default cause it might 
change the behavior for the UAPI.


Maybe export drm_syncobj_search_fence directly? E.g. with the flags 
parameter.

previous v5 indeed do this, you let me wrap it, need change back?

Regards,
David Zhou


Regards,
Christian.


+    fence);
+}
+EXPORT_SYMBOL(drm_syncobj_lookup_fence);
+
  /**
   * drm_syncobj_find_fence - lookup and reference the fence in a 
sync object

   * @file_private: drm file private pointer
@@ -228,7 +443,7 @@ static int drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)

   * @fence: out parameter for the fence
   *
   * This is just a convenience function that combines 
drm_syncobj_find() and

- * drm_syncobj_fence_get().
+ * drm_syncobj_lookup_fence().
   *
   * Returns 0 on success or a negative error value on failure. On 
success @fence
   * contains a reference to the fence, which must be released by 
calling
@@ -236,18 +451,11 @@ static int 
drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)

   */
  int drm_syncobj_find_fence(struct drm_file *file_private,
 u32 handle, u64 point,
-   struct dma_fence **fence)
-{
+   struct dma_fence **fence) {
  struct drm_syncobj *syncobj = drm_syncobj_find(file_private, 
handle);

-    int ret = 0;
-
-    if (!syncobj)
-    return -ENOENT;
+    int ret;
  -    *fence = drm_syncobj_fence_get(syncobj);
-    if (!*fence) {
-    ret = -EINVAL;
-    }
+    ret = drm_syncobj_lookup_fence(syncobj, point, fence);
  drm_syncobj_put(syncobj);
  return ret;
  }
@@ -264,7 +472,7 @@ void drm_syncobj_free(struct kref *kref)
  struct drm_syncobj *syncobj = container_of(kref,
 struct drm_syncobj,
 refcount);
-    drm_syncobj_replace_fence(syncobj, 0, NULL);
+    drm_syncobj_timeline_fini(syncobj);
  kfree(syncobj);
  }
  EXPORT_SYMBOL(drm_syncobj_free);
@@ -294,6 +502,11 @@ int drm_syncobj_create(struct drm_syncobj 
**out_syncobj, uint32_t flags,

  kref_init(>refcount);
  INIT_LIST_HEAD(>cb_list);
  spin_lock_init(>lock);
+    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE)
+    syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE;
+    else
+    syncobj->type = DRM_SYNCOBJ_TYPE_INDIVIDUAL;
+    drm_syncobj_timeline_init(syncobj);
    if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
  ret = drm_syncobj_assign_null_handle(syncobj);
@@ -576,7 +789,8 @@ drm_syncobj_create_ioctl(struct drm_device *dev, 
void *data,

  return -ENODEV;
    /* no valid flags yet */
-    if (args->flags & ~DRM_SYNCOBJ_CREATE_SIGNALED)
+    if (args->flags & ~(DRM_SYNCOBJ_CREATE_SIGNALED |
+    DRM_SYNCOBJ_CREATE_TYPE_TIMELINE))
  return -EINVAL;
    return drm_syncobj_create_as_handle(file_private,
@@ -669,9 +883,8 @@ static void syncobj_wait_syncobj_func(struct 
drm_syncobj *syncobj,

  struct syncobj_wait_entry *wait =
  container_of(cb, struct syncobj_wait_entry, syncobj_cb);
  -    /* This happens inside the syncobj lock */
-    wait->fence = 
dma_fence_get(rcu_dereference_protected(syncobj->fence,

- lockdep_is_held(>lock)));
+    drm_syncobj_search_fence(syncobj, 0, 0, >fence);
+
  wake_up_process(wait->task);
  }
  @@ -698,7 +911,8 @@ static signed long 
drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,

  signaled_count = 0;
  for (i = 0; i < count; ++i) {
  entries[i].task = current;
-    entries[i].fence = drm_syncobj_fence_get(syncobjs[i]);
+    ret = drm_syncobj_search_fence(syncobjs[i], 0, 0,
+   [i].fence);
  if (!entries[i].fence) {
  if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
  continue;
@@ -970,12 +1184,19 @@ drm_syncobj_reset_ioctl(struct drm_device 
*dev, void *data,

  if (ret < 0)
  return ret;
  -    for (i = 0; i < args->count_handles; i++)
-    drm_syncobj_replace_fence(syncobjs[i], 0, NULL);
-
+    for (i = 0; i < args->count_handles; i++) {
+    if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
+    

Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-18 Thread zhoucm1



On 2018年09月18日 16:32, Christian König wrote:

+    for (i = 0; i < args->count_handles; i++) {
+    if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
+    DRM_ERROR("timeline syncobj cannot reset!\n");


Why not? I mean that should still work or do I miss anything?
timeline semaphore spec doesn't require reset interface, it says the 
timeline value only can be changed by signal operations.


Yeah, but we don't care about the timeline spec in the kernel.

Question is rather if that still makes sense to support that and as 
far as I can see it should be trivial to reinitialize the object. 

Hi Daniel Rakos,

Could you give a comment on this question? Is it necessary to support 
timeline reset interface?  I only see the timeline value can be changed 
by signal operations in Spec.



Thanks,
David Zhou
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-17 Thread zhoucm1



On 2018年09月17日 16:37, Christian König wrote:

Am 14.09.2018 um 12:37 schrieb Chunming Zhou:
This patch is for VK_KHR_timeline_semaphore extension, semaphore is 
called syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer 
payload

identifying a point in a timeline. Such timeline syncobjs support the
following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline syncobj.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline syncobj to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline syncobj to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline syncobj to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on 
any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last signal 
point, and this wait PT is only signaled by related signal point PT.

2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb

normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 294 ++---
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  62 +++--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 292 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index e9ce623d049e..e78d076f2703 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval is 
1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
  struct drm_syncobj_stub_fence {
  struct dma_fence base;
  spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops 
drm_syncobj_stub_fence_ops = {

  .release = drm_syncobj_stub_fence_release,
  };
  +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;
+    u64    value;
+    struct list_head list;
+};
    /**
   * drm_syncobj_find - lookup and reference a sync object.
@@ -124,7 +132,7 @@ static int 
drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,

  {
  int ret;
  -    *fence = drm_syncobj_fence_get(syncobj);
+    ret = drm_syncobj_search_fence(syncobj, 0, 0, fence);
  if (*fence)


Don't we need to check ret here instead?

both ok, if you like check ret, will update it in v6.




  return 1;
  @@ -133,10 +141,10 @@ static int 
drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,

   * have the lock, try one more time just to be sure we don't add a
   * callback when a fence has already been set.
   */
-    if (syncobj->fence) {
-    *fence = 
dma_fence_get(rcu_dereference_protected(syncobj->fence,

- lockdep_is_held(>lock)));
-    ret = 1;
+    if (fence) {
+    drm_syncobj_search_fence(syncobj, 0, 0, fence);
+    if (*fence)
+    ret = 1;


That doesn't looks correct to me, drm_syncobj_search_fence() would try 
to grab the lock once more.


That 

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread zhoucm1



On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when

that one returns NULL you use wait_event_*() to wait for a signal
point >= your wait point to appear and try again.

e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, which is
need when wait ioctl returns.

I don't really see a problem with that. When you wait for the first
one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence you
make sure that they wake up your thread when they get one.

So essentially exactly what drm_syncobj_fence_get_or_add_callback()
already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


    Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.
I thought more a bit, we don't that mechanism at all, if use 
advanced wait
pt, we can easily use fence array to achieve it for wait ioctl, we 
should use
kernel existing feature as much as possible, not invent another, 
shouldn't we?

I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.
This is obviously a workaround when doing for wait ioctl, Do you see 
it used in other place?



And I absolutely don't see a
need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, 
and we just need  to allocate a wait pt.  If keeping old syncobj cb 
workaround, all wait pt logic still is there, just save allocation 
and wait pt handling, in fact, which part isn't complex at all. But 
compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb 
with the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi 
event macro, but for now just adding the sequence number should do the 
trick.


Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence' what 
Daniel Vetter said before, which obviously a right direction.



Anyway, I will change the patch as you like if no other comment, so that 
the patch can pass soon.


Thanks,
David Zhou


Regards,
Christian.



Thanks,
David Zhou

Regards,
Christian.


Thanks,
David Zhou

Christian.


Thanks,
David Zhou

Regards,
Christian.


Back to my implementation, it already fixes all your concerns
before, and can be able to easily used in wait_ioctl. When you
feel that is complicated, I guess that is because we merged all
logic to that and much clean up in one patch. In fact, it already
is very simple, timeline_init/fini, create signal/

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread zhoucm1



On 2018年09月14日 11:14, zhoucm1 wrote:



On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when
that one returns NULL you use wait_event_*() to wait for a 
signal

point >= your wait point to appear and try again.
e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC 
have

no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, 
which is

need when wait ioctl returns.
I don't really see a problem with that. When you wait for the 
first

one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence 
you

make sure that they wake up your thread when they get one.

So essentially exactly what 
drm_syncobj_fence_get_or_add_callback()

already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


    Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.
I thought more a bit, we don't that mechanism at all, if use 
advanced wait
pt, we can easily use fence array to achieve it for wait ioctl, we 
should use
kernel existing feature as much as possible, not invent another, 
shouldn't we?

I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.
This is obviously a workaround when doing for wait ioctl, Do you see 
it used in other place?



And I absolutely don't see a
need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, 
and we just need  to allocate a wait pt.  If keeping old syncobj cb 
workaround, all wait pt logic still is there, just save allocation 
and wait pt handling, in fact, which part isn't complex at all. But 
compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb 
with the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi 
event macro, but for now just adding the sequence number should do 
the trick.


Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence' 
what Daniel Vetter said before, which obviously a right direction.



Anyway, I will change the patch as you like if no other comment, so 
that the patch can pass soon.
When I try to remove wait pt future fence, I encounter another problem, 
drm_syncobj_find_fence cannot get a fence if signal pt already is 
collected as garbage, then CS will report error, any idea for that?
I still think the future fence is right thing, Could you give futher 
thought on it again? Otherwise, we could need various workarounds.


Thanks,
David Zhou


Thanks,
David Zhou


Regards,
Christian.



Thank

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-12 Thread zhoucm1



On 2018年09月12日 19:05, Christian König wrote:

Am 12.09.2018 um 12:20 schrieb zhoucm1:

[SNIP]

Drop the term semaphore here, better use syncobj.
This is from VK_KHR_timeline_semaphore extension describe, not my 
invention, I just quote it. In kernel side, we call syncobj, in UMD, 
they still call semaphore.


Yeah, but we don't care about close source UMD names in the kernel and'
the open source UMD calls it syncobj as well.




[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct 
drm_syncobj *syncobj,

+   struct drm_syncobj_wait_pt *wait_pt)
+{


That whole approach still looks horrible complicated to me.

It's already very close to what you said before.



Especially the separation of signal and wait pt is completely 
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the signal 
point which it will trigger.
Yeah, I tried this, but when I implement cpu wait ioctl on specific 
point, we need a advanced wait pt fence, otherwise, we could still 
need old syncobj cb.


Why? I mean you just need to call drm_syncobj_find_fence() and when 
that one returns NULL you use wait_event_*() to wait for a signal 
point >= your wait point to appear and try again.
e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no 
fence yet, as you said, during drm_syncobj_find_fence(A) is working on 
wait_event, syncobjB and syncobjC could already be signaled, then we 
don't know which one is first signaled, which is need when wait ioctl 
returns.


Back to my implementation, it already fixes all your concerns before, 
and can be able to easily used in wait_ioctl. When you feel that is 
complicated, I guess that is because we merged all logic to that and 
much clean up in one patch. In fact, it already is very simple, 
timeline_init/fini, create signal/wait_pt, find signal_pt for wait_pt, 
garbage collection, just them.


Thanks,
David Zhou


Regards,
Christian.




Thanks,
David Zhou


Regards,
Christian.

+    struct drm_syncobj_timeline *timeline = 
>syncobj_timeline;

+    struct drm_syncobj_signal_pt *signal_pt;
+    int ret;
+
+    if (wait_pt->signal_pt_fence) {
+    return;
+    } else if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&
+   (wait_pt->value <= timeline->timeline)) {
+    dma_fence_signal(_pt->base.base);
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    dma_fence_put(_pt->base.base);
+    return;
+    }
+
+    list_for_each_entry(signal_pt, >signal_pt_list, 
list) {

+    if (wait_pt->value < signal_pt->value)
+    continue;
+    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) &&
+    (wait_pt->value != signal_pt->value))
+    continue;
+    wait_pt->signal_pt_fence = 
dma_fence_get(_pt->base->base);

+    ret = dma_fence_add_callback(wait_pt->signal_pt_fence,
+ _pt->wait_cb,
+ wait_pt_func);
+    if (ret == -ENOENT) {
+    dma_fence_signal(_pt->base.base);
+    dma_fence_put(wait_pt->signal_pt_fence);
+    wait_pt->signal_pt_fence = NULL;
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    dma_fence_put(_pt->base.base);
+    } else if (ret < 0) {
+    dma_fence_put(wait_pt->signal_pt_fence);
+    DRM_ERROR("add callback error!");
+    } else {
+    /* after adding callback, remove from rb tree */
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    }
+    return;
+    }
+    /* signaled pt was released */
+    if (!wait_pt->signal_pt_fence && (wait_pt->value <=
+  timeline->signal_point)) {
+    dma_fence_signal(_pt->base.base);
+    rb_erase(_pt->node,
+ >wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    dma_fence_put(_pt->base.base);
+    }
  }
  -void drm_syncobj_add_callback(struct drm_syncobj *syncobj,
-  struct drm_syncobj_cb *cb,
-  drm_syncobj_func_t func)
+static int drm_syncobj_timeline_create_signal_pt(struct 
drm_syncobj *syncobj,

+ struct dma_fence *fence,
+ u64 point)
  {
+    struct drm_syncobj_signal_pt *signal_pt =
+    kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL);
+    struct drm_syncobj_signal_pt *tail_pt;
+    struct dma_fence **fences;
+    struct rb_node *node;
+    struct drm_syncobj_wait_pt *tail_wait_pt = NULL;
+    int num_fences = 0;
+    int ret = 0, i;
+
+    if (!signal_pt)
+    return -ENOMEM;
+    if (syncobj->syncobj_timeline.signal_point >= point) {
+    DRM_WARN("A later signal is ready!")

Re: [PATCH 2/2] drm/amdgpu: use a single linked list for amdgpu_vm_bo_base

2018-09-12 Thread zhoucm1

Reviewed-by: Chunming Zhou 


On 2018年09月12日 16:55, Christian König wrote:

Instead of the double linked list. Gets the size of amdgpu_vm_pt down to
64 bytes again.

We could even reduce it down to 32 bytes, but that would require some
rather extreme hacks.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  4 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 38 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +-
  4 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index de990bdcdd6c..e6909252aefa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -448,7 +448,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
return -ENOMEM;
drm_gem_private_object_init(adev->ddev, >gem_base, size);
INIT_LIST_HEAD(>shadow_list);
-   INIT_LIST_HEAD(>va);
+   bo->vm_bo = NULL;
bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
bp->domain;
bo->allowed_domains = bo->preferred_domains;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 907fdf46d895..64337ff2ad63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -89,8 +89,8 @@ struct amdgpu_bo {
void*metadata;
u32 metadata_size;
unsignedprime_shared_count;
-   /* list of all virtual address to which this bo is associated to */
-   struct list_headva;
+   /* per VM structure for page tables and with virtual addresses */
+   struct amdgpu_vm_bo_base*vm_bo;
/* Constant after initialization */
struct drm_gem_object   gem_base;
struct amdgpu_bo*parent;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index cb6a5114128e..fb6b16273c54 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -309,12 +309,13 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,
  {
base->vm = vm;
base->bo = bo;
-   INIT_LIST_HEAD(>bo_list);
+   base->next = NULL;
INIT_LIST_HEAD(>vm_status);
  
  	if (!bo)

return;
-   list_add_tail(>bo_list, >va);
+   base->next = bo->vm_bo;
+   bo->vm_bo = base;
  
  	if (bo->tbo.resv != vm->root.base.bo->tbo.resv)

return;
@@ -352,7 +353,7 @@ static struct amdgpu_vm_pt *amdgpu_vm_pt_parent(struct 
amdgpu_vm_pt *pt)
if (!parent)
return NULL;
  
-	return list_first_entry(>va, struct amdgpu_vm_pt, base.bo_list);

+   return container_of(parent->vm_bo, struct amdgpu_vm_pt, base);
  }
  
  /**

@@ -954,7 +955,7 @@ static void amdgpu_vm_free_pts(struct amdgpu_device *adev,
for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry) {
  
  		if (entry->base.bo) {

-   list_del(>base.bo_list);
+   entry->base.bo->vm_bo = NULL;
list_del(>base.vm_status);
amdgpu_bo_unref(>base.bo->shadow);
amdgpu_bo_unref(>base.bo);
@@ -1162,12 +1163,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job, bool need_
  struct amdgpu_bo_va *amdgpu_vm_bo_find(struct amdgpu_vm *vm,
   struct amdgpu_bo *bo)
  {
-   struct amdgpu_bo_va *bo_va;
+   struct amdgpu_vm_bo_base *base;
  
-	list_for_each_entry(bo_va, >va, base.bo_list) {

-   if (bo_va->base.vm == vm) {
-   return bo_va;
-   }
+   for (base = bo->vm_bo; base; base = base->next) {
+   if (base->vm != vm)
+   continue;
+
+   return container_of(base, struct amdgpu_bo_va, base);
}
return NULL;
  }
@@ -2728,11 +2730,21 @@ void amdgpu_vm_bo_rmv(struct amdgpu_device *adev,
struct amdgpu_bo_va_mapping *mapping, *next;
struct amdgpu_bo *bo = bo_va->base.bo;
struct amdgpu_vm *vm = bo_va->base.vm;
+   struct amdgpu_vm_bo_base **base;
  
-	if (bo && bo->tbo.resv == vm->root.base.bo->tbo.resv)

-   vm->bulk_moveable = false;
+   if (bo) {
+   if (bo->tbo.resv == vm->root.base.bo->tbo.resv)
+   vm->bulk_moveable = false;
  
-	list_del(_va->base.bo_list);

+   for (base = _va->base.bo->vm_bo; *base;
+base = &(*base)->next) {
+   if (*base != _va->base)
+   continue;
+
+   *base = bo_va->base.next;
+  

Re: [PATCH 1/2] drm/amdgpu: remove amdgpu_bo_list_entry.robj

2018-09-12 Thread zhoucm1

Reviewed-by: Chunming Zhou 


On 2018年09月12日 16:55, Christian König wrote:

We can get that just by casting tv.bo.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 42 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |  1 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 58 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  3 +-
  4 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index b80243d3972e..14d2982a47cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -49,8 +49,11 @@ static void amdgpu_bo_list_free(struct kref *ref)
   refcount);
struct amdgpu_bo_list_entry *e;
  
-	amdgpu_bo_list_for_each_entry(e, list)

-   amdgpu_bo_unref(>robj);
+   amdgpu_bo_list_for_each_entry(e, list) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+
+   amdgpu_bo_unref();
+   }
  
  	call_rcu(>rhead, amdgpu_bo_list_free_rcu);

  }
@@ -112,21 +115,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
entry = [last_entry++];
}
  
-		entry->robj = bo;

entry->priority = min(info[i].bo_priority,
  AMDGPU_BO_LIST_MAX_PRIORITY);
-   entry->tv.bo = >robj->tbo;
-   entry->tv.shared = !entry->robj->prime_shared_count;
-
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
-   list->gds_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
-   list->gws_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
-   list->oa_obj = entry->robj;
-
-   total_size += amdgpu_bo_size(entry->robj);
-   trace_amdgpu_bo_list_set(list, entry->robj);
+   entry->tv.bo = >tbo;
+   entry->tv.shared = !bo->prime_shared_count;
+
+   if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
+   list->gds_obj = bo;
+   if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
+   list->gws_obj = bo;
+   if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
+   list->oa_obj = bo;
+
+   total_size += amdgpu_bo_size(bo);
+   trace_amdgpu_bo_list_set(list, bo);
}
  
  	list->first_userptr = first_userptr;

@@ -138,8 +140,11 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
return 0;
  
  error_free:

-   while (i--)
-   amdgpu_bo_unref([i].robj);
+   while (i--) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(array[i].tv.bo);
+
+   amdgpu_bo_unref();
+   }
kvfree(list);
return r;
  
@@ -191,9 +196,10 @@ void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,

 * with the same priority, i.e. it must be stable.
 */
amdgpu_bo_list_for_each_entry(e, list) {
+   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
unsigned priority = e->priority;
  
-		if (!e->robj->parent)

+   if (!bo->parent)
list_add_tail(>tv.head, [priority]);
  
  		e->user_pages = NULL;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index 61b089768e1c..7c5f5d1601e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -32,7 +32,6 @@ struct amdgpu_bo_va;
  struct amdgpu_fpriv;
  
  struct amdgpu_bo_list_entry {

-   struct amdgpu_bo*robj;
struct ttm_validate_buffer  tv;
struct amdgpu_bo_va *bo_va;
uint32_tpriority;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c5cc648a1b4e..2e488c6f9562 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -39,6 +39,7 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser 
*p,
  uint32_t *offset)
  {
struct drm_gem_object *gobj;
+   struct amdgpu_bo *bo;
unsigned long size;
int r;
  
@@ -46,21 +47,21 @@ static int amdgpu_cs_user_fence_chunk(struct amdgpu_cs_parser *p,

if (gobj == NULL)
return -EINVAL;
  
-	p->uf_entry.robj = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));

+   bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj));
p->uf_entry.priority = 0;
-   p->uf_entry.tv.bo = >uf_entry.robj->tbo;
+   p->uf_entry.tv.bo = >tbo;
p->uf_entry.tv.shared = true;

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-12 Thread zhoucm1



On 2018年09月12日 15:22, Christian König wrote:

Ping? Have you seen my comments here?

Sorry, I didn't see this reply.  inline...



Looks like you haven't addressed any of them in your last mail.

Christian.

Am 06.09.2018 um 09:25 schrieb Christian König:

Am 06.09.2018 um 08:25 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an 
integer payload

identifying a point in a timeline. Such timeline semaphores support the


Drop the term semaphore here, better use syncobj.
This is from VK_KHR_timeline_semaphore extension describe, not my 
invention, I just quote it. In kernel side, we call syncobj, in UMD, 
they still call semaphore.





following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion 
fence, when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last 
signal point, and this wait PT is only signaled by related signal 
point PT.

2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)

Tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 516 
+

  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 ++--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 448 insertions(+), 151 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index e9ce623d049e..94b31de23858 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval 
is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
  struct drm_syncobj_stub_fence {
  struct dma_fence base;
  spinlock_t lock;
@@ -82,6 +85,18 @@ static const struct dma_fence_ops 
drm_syncobj_stub_fence_ops = {

  .release = drm_syncobj_stub_fence_release,
  };
  +struct drm_syncobj_signal_pt {
+    struct dma_fence_array *base;
+    u64    value;
+    struct list_head list;
+};
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_pt_fence;
+    struct dma_fence_cb wait_cb;
+    struct rb_node   node;
+    u64    value;
+};
    /**
   * drm_syncobj_find - lookup and reference a sync object.
@@ -109,61 +124,238 @@ struct drm_syncobj *drm_syncobj_find(struct 
drm_file *file_private,

  }
  EXPORT_SYMBOL(drm_syncobj_find);
  -static void drm_syncobj_add_callback_locked(struct drm_syncobj 
*syncobj,

-    struct drm_syncobj_cb *cb,
-    drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj,
+  struct drm_syncobj_timeline *syncobj_timeline)


Since we merged timeline and singleton syncobj you can drop the extra 
_timeline_ part in the function name I think.

Will try in v5.




  {
-    cb->func = func;
-    list_add_tail(>node, >cb_list);
+    

Re: [PATCH] drm/amdgpu: Fix the dead lock issue.

2018-09-10 Thread zhoucm1



On 2018年09月11日 11:37, zhoucm1 wrote:



On 2018年09月11日 11:32, Deng, Emily wrote:

-Original Message-
From: amd-gfx  On Behalf Of
zhoucm1
Sent: Tuesday, September 11, 2018 11:28 AM
To: Deng, Emily ; Zhou, David(ChunMing)
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.



On 2018年09月11日 11:23, Deng, Emily wrote:

-Original Message-
From: Zhou, David(ChunMing)
Sent: Tuesday, September 11, 2018 11:03 AM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.



On 2018年09月11日 10:51, Emily Deng wrote:

It will ramdomly have the dead lock issue when test TDR:
1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
amdgpu_bo_create locked the bo's resv lock 3.
amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4.
amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
lock.

v2:
  Make a local copy of the list

Signed-off-by: Emily Deng 
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21

-

    1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2a21267..8c81404 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3105,6 +3105,9 @@ static int

amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

    long r = 1;
    int i = 0;
    long tmo;
+    struct list_head local_shadow_list;
+
+    INIT_LIST_HEAD(_shadow_list);

    if (amdgpu_sriov_runtime(adev))
    tmo = msecs_to_jiffies(8000);
@@ -3112,8 +3115,19 @@ static int

amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

    tmo = msecs_to_jiffies(100);

    DRM_INFO("recover vram bo from shadow start\n");
+
+    mutex_lock(>shadow_list_lock);
+    list_splice_init(>shadow_list, _shadow_list);
+    mutex_unlock(>shadow_list_lock);
+
+
    mutex_lock(>shadow_list_lock);

local_shadow_list is a local variable, I think it doesn't need lock
at all, no one change it. Otherwise looks good to me.

The bo->shadow_list which now is in local_shadow_list maybe destroy in
case that it already in amdgpu_bo_destroy, then it will change

local_shadow_list, so need lock the shadow_list_lock.
Ah, sorry for noise, I forget you don't reference these BOs.
Yes, I don't reference these Bos, as I found even reference these 
Bos, it still couldn't avoid the case that another process is already

in amdgpu_bo_destroy.
??? that shouldn't happen, the reference is belonged to list. But back 
to here, we don't need reference them.
And since no shadow BO is added to local after splice, we'd better to 
use list_next_entry to iterate the local shadow list instead of 
list_for_each_entry_safe.


Thanks,
David Zhou

Thanks,
David Zhou

Best wishes
Emily Deng

Thanks,
David Zhou
-    list_for_each_entry_safe(bo, tmp, >shadow_list, 
shadow_list) {
+    list_for_each_entry_safe(bo, tmp, _shadow_list, 
shadow_list) {
because shadow list doesn't take bo reference, we should give a 
amdgpu_bo_ref(bo) with attached patch before unlock.

You can have a try.

Thanks,
David Zhou

+ mutex_unlock(>shadow_list_lock);
+
+    if (!bo)
+    continue;
+
    next = NULL;
    amdgpu_device_recover_vram_from_shadow(adev, ring, bo,

);

    if (fence) {
@@ -3132,9 +3146,14 @@ static int
amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

    dma_fence_put(fence);
    fence = next;
+    mutex_lock(>shadow_list_lock);
    }
    mutex_unlock(>shadow_list_lock);

+    mutex_lock(>shadow_list_lock);
+    list_splice_init(_shadow_list, >shadow_list);
+    mutex_unlock(>shadow_list_lock);
+
    if (fence) {
    r = dma_fence_wait_timeout(fence, false, tmo);
    if (r == 0)

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


>From 7676fc70738699dddca5627e21be0d82e91aea05 Mon Sep 17 00:00:00 2001
From: Chunming Zhou 
Date: Tue, 11 Sep 2018 13:37:31 +0800
Subject: [PATCH] drm/amdgpu: changing of shadow bo reference should take
 shadow lock

because shadow list doesn't reference shadow bo.

Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b0e14a3d54ef..50651157203b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -101,11 +101,9 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tb

Re: [PATCH] drm/amdgpu: Fix the dead lock issue.

2018-09-10 Thread zhoucm1



On 2018年09月11日 11:32, Deng, Emily wrote:

-Original Message-
From: amd-gfx  On Behalf Of
zhoucm1
Sent: Tuesday, September 11, 2018 11:28 AM
To: Deng, Emily ; Zhou, David(ChunMing)
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.



On 2018年09月11日 11:23, Deng, Emily wrote:

-Original Message-
From: Zhou, David(ChunMing)
Sent: Tuesday, September 11, 2018 11:03 AM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.



On 2018年09月11日 10:51, Emily Deng wrote:

It will ramdomly have the dead lock issue when test TDR:
1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
amdgpu_bo_create locked the bo's resv lock 3.
amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4.
amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
lock.

v2:
  Make a local copy of the list

Signed-off-by: Emily Deng 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21

-

1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2a21267..8c81404 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3105,6 +3105,9 @@ static int

amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

long r = 1;
int i = 0;
long tmo;
+   struct list_head local_shadow_list;
+
+   INIT_LIST_HEAD(_shadow_list);

if (amdgpu_sriov_runtime(adev))
tmo = msecs_to_jiffies(8000);
@@ -3112,8 +3115,19 @@ static int

amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

tmo = msecs_to_jiffies(100);

DRM_INFO("recover vram bo from shadow start\n");
+
+   mutex_lock(>shadow_list_lock);
+   list_splice_init(>shadow_list, _shadow_list);
+   mutex_unlock(>shadow_list_lock);
+
+
mutex_lock(>shadow_list_lock);

local_shadow_list is a local variable, I think it doesn't need lock
at all, no one change it. Otherwise looks good to me.

The bo->shadow_list which now is in local_shadow_list maybe destroy in
case that it already in amdgpu_bo_destroy, then it will change

local_shadow_list, so need lock the shadow_list_lock.
Ah, sorry for noise, I forget you don't reference these BOs.

Yes, I don't reference these Bos, as I found even reference these Bos, it still 
couldn't avoid the case that another process is already
in amdgpu_bo_destroy.
??? that shouldn't happen, the reference is belonged to list. But back 
to here, we don't need reference them.
And since no shadow BO is added to local after splice, we'd better to 
use list_next_entry to iterate the local shadow list instead of 
list_for_each_entry_safe.


Thanks,
David Zhou

Thanks,
David Zhou

Best wishes
Emily Deng

Thanks,
David Zhou

-   list_for_each_entry_safe(bo, tmp, >shadow_list, shadow_list) {
+   list_for_each_entry_safe(bo, tmp, _shadow_list, shadow_list) {
+   mutex_unlock(>shadow_list_lock);
+
+   if (!bo)
+   continue;
+
next = NULL;
amdgpu_device_recover_vram_from_shadow(adev, ring, bo,

);

if (fence) {
@@ -3132,9 +3146,14 @@ static int
amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

dma_fence_put(fence);
fence = next;
+   mutex_lock(>shadow_list_lock);
}
mutex_unlock(>shadow_list_lock);

+   mutex_lock(>shadow_list_lock);
+   list_splice_init(_shadow_list, >shadow_list);
+   mutex_unlock(>shadow_list_lock);
+
if (fence) {
r = dma_fence_wait_timeout(fence, false, tmo);
if (r == 0)

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Fix the dead lock issue.

2018-09-10 Thread zhoucm1



On 2018年09月11日 11:23, Deng, Emily wrote:

-Original Message-
From: Zhou, David(ChunMing)
Sent: Tuesday, September 11, 2018 11:03 AM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.



On 2018年09月11日 10:51, Emily Deng wrote:

It will ramdomly have the dead lock issue when test TDR:
1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow
is waiting for the shadow_list_lock 4.
amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
lock.

v2:
 Make a local copy of the list

Signed-off-by: Emily Deng 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21

-

   1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2a21267..8c81404 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3105,6 +3105,9 @@ static int

amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

long r = 1;
int i = 0;
long tmo;
+   struct list_head local_shadow_list;
+
+   INIT_LIST_HEAD(_shadow_list);

if (amdgpu_sriov_runtime(adev))
tmo = msecs_to_jiffies(8000);
@@ -3112,8 +3115,19 @@ static int

amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

tmo = msecs_to_jiffies(100);

DRM_INFO("recover vram bo from shadow start\n");
+
+   mutex_lock(>shadow_list_lock);
+   list_splice_init(>shadow_list, _shadow_list);
+   mutex_unlock(>shadow_list_lock);
+
+
mutex_lock(>shadow_list_lock);

local_shadow_list is a local variable, I think it doesn't need lock at all, no 
one
change it. Otherwise looks good to me.

The bo->shadow_list which now is in local_shadow_list maybe destroy in case 
that it already in amdgpu_bo_destroy, then it will
change local_shadow_list, so need lock the shadow_list_lock.

Ah, sorry for noise, I forget you don't reference these BOs.

Thanks,
David Zhou

Best wishes
Emily Deng

Thanks,
David Zhou

-   list_for_each_entry_safe(bo, tmp, >shadow_list, shadow_list) {
+   list_for_each_entry_safe(bo, tmp, _shadow_list, shadow_list) {
+   mutex_unlock(>shadow_list_lock);
+
+   if (!bo)
+   continue;
+
next = NULL;
amdgpu_device_recover_vram_from_shadow(adev, ring, bo,

);

if (fence) {
@@ -3132,9 +3146,14 @@ static int
amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)

dma_fence_put(fence);
fence = next;
+   mutex_lock(>shadow_list_lock);
}
mutex_unlock(>shadow_list_lock);

+   mutex_lock(>shadow_list_lock);
+   list_splice_init(_shadow_list, >shadow_list);
+   mutex_unlock(>shadow_list_lock);
+
if (fence) {
r = dma_fence_wait_timeout(fence, false, tmo);
if (r == 0)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Fix the dead lock issue.

2018-09-10 Thread zhoucm1



On 2018年09月11日 10:51, Emily Deng wrote:

It will ramdomly have the dead lock issue when test TDR:
1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock
2. amdgpu_bo_create locked the bo's resv lock
3. amdgpu_bo_create_shadow is waiting for the shadow_list_lock
4. amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
lock.

v2:
Make a local copy of the list

Signed-off-by: Emily Deng 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2a21267..8c81404 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3105,6 +3105,9 @@ static int amdgpu_device_handle_vram_lost(struct 
amdgpu_device *adev)
long r = 1;
int i = 0;
long tmo;
+   struct list_head local_shadow_list;
+
+   INIT_LIST_HEAD(_shadow_list);
  
  	if (amdgpu_sriov_runtime(adev))

tmo = msecs_to_jiffies(8000);
@@ -3112,8 +3115,19 @@ static int amdgpu_device_handle_vram_lost(struct 
amdgpu_device *adev)
tmo = msecs_to_jiffies(100);
  
  	DRM_INFO("recover vram bo from shadow start\n");

+
+   mutex_lock(>shadow_list_lock);
+   list_splice_init(>shadow_list, _shadow_list);
+   mutex_unlock(>shadow_list_lock);
+
+
mutex_lock(>shadow_list_lock);
local_shadow_list is a local variable, I think it doesn't need lock at 
all, no one change it. Otherwise looks good to me.


Thanks,
David Zhou

-   list_for_each_entry_safe(bo, tmp, >shadow_list, shadow_list) {
+   list_for_each_entry_safe(bo, tmp, _shadow_list, shadow_list) {
+   mutex_unlock(>shadow_list_lock);
+
+   if (!bo)
+   continue;
+
next = NULL;
amdgpu_device_recover_vram_from_shadow(adev, ring, bo, );
if (fence) {
@@ -3132,9 +3146,14 @@ static int amdgpu_device_handle_vram_lost(struct 
amdgpu_device *adev)
  
  		dma_fence_put(fence);

fence = next;
+   mutex_lock(>shadow_list_lock);
}
mutex_unlock(>shadow_list_lock);
  
+	mutex_lock(>shadow_list_lock);

+   list_splice_init(_shadow_list, >shadow_list);
+   mutex_unlock(>shadow_list_lock);
+
if (fence) {
r = dma_fence_wait_timeout(fence, false, tmo);
if (r == 0)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] drm/amdgpu: Fix compute VM BO params after rebase

2018-09-05 Thread zhoucm1



On 2018年09月06日 08:28, Felix Kuehling wrote:

The intent of two commits was lost in the last rebase:

810955b drm/amdgpu: Fix acquiring VM on large-BAR systems
b5d21aa drm/amdgpu: Don't use shadow BO for compute context

This commit restores the original behaviour:
* Don't set AMDGPU_GEM_CREATE_NO_CPU_ACCESS for page directories
   to allow them to be reused for compute VMs
* Don't create shadow BOs for page tables in compute VMs

Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ea5e277..5e7a3de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -577,10 +577,13 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev,
   *
   * @adev: amdgpu_device pointer
   * @vm: requesting vm
+ * @level: level in the page table hierarchy
+ * @no_shadow: disable creation of shadow BO for this VM
   * @bp: resulting BO allocation parameters
   */
  static void amdgpu_vm_bo_param(struct amdgpu_device *adev, struct amdgpu_vm 
*vm,
-  int level, struct amdgpu_bo_param *bp)
+  int level, bool no_shadow,
+  struct amdgpu_bo_param *bp)

How about adding no_shadow to bp? Which also can consider to be parm of bo.

Regards,
David Zhou

  {
memset(bp, 0, sizeof(*bp));
  
@@ -595,9 +598,8 @@ static void amdgpu_vm_bo_param(struct amdgpu_device *adev, struct amdgpu_vm *vm,

AMDGPU_GEM_CREATE_CPU_GTT_USWC;
if (vm->use_cpu_for_update)
bp->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
-   else
-   bp->flags |= AMDGPU_GEM_CREATE_SHADOW |
-   AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
+   else if (!no_shadow)
+   bp->flags |= AMDGPU_GEM_CREATE_SHADOW;
bp->type = ttm_bo_type_kernel;
if (vm->root.base.bo)
bp->resv = vm->root.base.bo->tbo.resv;
@@ -626,6 +628,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
  unsigned level, bool ats)
  {
unsigned shift = amdgpu_vm_level_shift(adev, level);
+   bool no_shadow = !vm->root.base.bo->shadow;
struct amdgpu_bo_param bp;
unsigned pt_idx, from, to;
int r;
@@ -650,7 +653,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
saddr = saddr & ((1 << shift) - 1);
eaddr = eaddr & ((1 << shift) - 1);
  
-	amdgpu_vm_bo_param(adev, vm, level, );

+   amdgpu_vm_bo_param(adev, vm, level, no_shadow, );
  
  	/* walk over the address space and allocate the page tables */

for (pt_idx = from; pt_idx <= to; ++pt_idx) {
@@ -2709,6 +2712,7 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev, 
uint32_t min_vm_size,
  int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
   int vm_context, unsigned int pasid)
  {
+   bool no_shadow = (vm_context == AMDGPU_VM_CONTEXT_COMPUTE);
struct amdgpu_bo_param bp;
struct amdgpu_bo *root;
int r, i;
@@ -2748,7 +2752,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
  "CPU update of VM recommended only for large BAR system\n");
vm->last_update = NULL;
  
-	amdgpu_vm_bo_param(adev, vm, adev->vm_manager.root_level, );

+   amdgpu_vm_bo_param(adev, vm, adev->vm_manager.root_level, no_shadow,
+  );
r = amdgpu_bo_create(adev, , );
if (r)
goto error_free_sched_entity;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 17:20, Christian König wrote:

Am 04.09.2018 um 11:00 schrieb zhoucm1:



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb 
node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 
collection and remove the first node from the tree as long as 
it is signaled.


5. When enable_signaling is requested for a node we cascade 
that to the left using rb_prev.
    This ensures that signaling is enabled for the current 
fence as well as all previous fences.


6. A wait just looks into the tree for the signal point lower 
or equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there 
is no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if 
a wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure 
the timeline value is bigger than wait pt value, that means 
signal_pt8 is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need 
to return anything.
8 is a signaled node, waitA/signal operation do garbage collection, 
how waitB(7) know the garbage history?


Well we of course keep what the last garbage collected number is, 
don't we?


Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just 
a windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop 
increasing when syncobj is released.


Yeah, but the syncobj can live for a very very long time. Not having 
some form of shrinking it when fences are signaled is certainly not 
going to fly very far.

I will try to fix this problem.
btw, when I try your suggestion, I find it will be difficult to 
implement drm_syncobj_array_wait_timeout by your idea, since it needs 
first_signaled. if there is un-signaled syncobj, we will still register 
cb to timeline value change, then still back to need enble_signaling.


Thanks,
David Zhou


Regards,
Christian.



Anyway kref is a good way to solve the 'free' problem, I will try to 
use it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree 
since you always need to keep them around just in case somebody asks 
for a lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a 
previous point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the 
looked up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree 
object to provide the wait operation, as soon as the sync_obj is 
destroyed we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way 
we can drop all unused signal points as soon as the sync_obj is 
destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs 
less memory and each node has always the same size which means 
we can use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx








___
amd-gfx mailing 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 
is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need to 
return anything.
8 is a signaled node, waitA/signal operation do garbage collection, how 
waitB(7) know the garbage history?




Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop increasing 
when syncobj is released.


Anyway kref is a good way to solve the 'free' problem, I will try to use 
it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the looked 
up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed 
we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is no 
timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 is 
need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks for 
7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15? Since there is no timeline as a 
line, I think this is not right direction.




The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root (e.g. 
the sync object). It only destructs itself when the looked up 
references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed we 
don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can use 
a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 15:00, Christian König wrote:

Am 04.09.2018 um 06:04 schrieb zhoucm1:



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on 
previous pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 
syncobj_timeline->signal_point every time when signal ops is 
performed.
    b. when wait ops is comming, wait pt will fetch above 
last signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I 
debug them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub 
fence as well, and in the meantime, their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. 
blocking in drm_syncobj_timeline_fini() until all waits are 
done is not a good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried 
to do that, I found that cannot meet the advance signal 
requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but 
when you want to wait for a specific point we need to condense 
the not yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the 
sync_obj alive for the synchronization to work. That is most 
likely not a good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline 
structure as a reference. This way syncobj itself can be released 
first, wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.


I've thought about that as well, but came to the conclusion that you 
run into problems because of circle dependencies.


E.g. sync_obj references sync_point and sync_point references sync_obj.
sync_obj can be freed first, only sync point references syncobj_timeline 
structure, syncobj_timeline doesn't reference sync_pt, no circle dep.




Additional to that it is quite a bit larger memory footprint because 
you need to keep the sync_obj around as well.
all signaled sync_pt are freed immediately except syncobj_timeline 
sturcture, where does extra memory foootprint take?












Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need 
that optimization.
Do you mean increasing timeline based on signal fence is not 
better? only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline va

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-03 Thread zhoucm1



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on previous 
pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub fence 
as well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking 
in drm_syncobj_timeline_fini() until all waits are done is not a 
good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried to 
do that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but when 
you want to wait for a specific point we need to condense the not 
yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the sync_obj 
alive for the synchronization to work. That is most likely not a 
good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline structure 
as a reference. This way syncobj itself can be released first, 
wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.







Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need that 
optimization.
Do you mean increasing timeline based on signal fence is not better? 
only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value immidiately and cannot 
free signal pt immidiately, and we also need to consider it to CPU 
query and wait.


That is actually the better coding style. We usually try to avoid 
doing things in interrupt handlers as much as possible.
OK, I see your concern, how about to delay handling to a workqueue? this 
way, we only increase timeline value and wake up workqueue in fence cb, 
is that acceptable?





How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-30 Thread zhoucm1



On 2018年08月30日 15:25, Christian König wrote:

Am 30.08.2018 um 05:50 schrieb zhoucm1:



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an 
integer payload
identifying a point in a timeline. Such timeline semaphores support 
the

following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always 
is signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion 
fence, when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last 
signal point, and this wait PT is only signaled by related signal 
point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 
-

  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval 
is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct 
dma_fence *fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct 
dma_fence *fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence 
and itself signal fence from signal ops.

    b.

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread zhoucm1



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer 
payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last signal 
point, and this wait PT is only signaled by related signal point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 -
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval is 
1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence 
*fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence 
*fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence and 
itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 

Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo

2018-08-28 Thread zhoucm1



On 2018年08月28日 20:47, Christian König wrote:

Am 28.08.2018 um 14:40 schrieb Emily Deng:

Fix the VMC page fault when the running sequence is as below:
1.amdgpu_gem_create_ioctl
2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
amdgpu_vm_bo_base_init, so won't called
list_add_tail(>bo_list, >va). Even the bo was evicted,
it won't set the bo_base->moved.
3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
list_move_tail(>vm_status, >evicted), but not set the
bo_base->moved.
4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
not set true, the function amdgpu_vm_bo_insert_map will call
list_move(_va->base.vm_status, >moved)
5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
moved list, not in the evict list. So VMC page fault occurs.

Signed-off-by: Emily Deng 


Good catch, patch is Reviewed-by: Christian König 

Really good debug, Emily, you can add my Reviewed-by: Chunming Zhou 
 as well if you still don't push it yet.


Regards,
David Zhou



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 1f4b8df..015e20e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct 
amdgpu_vm_bo_base *base,

   * is validated on next vm use to avoid fault.
   * */
  list_move_tail(>vm_status, >evicted);
+    base->moved = true;
  }
    /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 01/10] drm/amdgpu: use only the lower address space on GMC9

2018-08-27 Thread zhoucm1



On 2018年08月28日 03:03, Felix Kuehling wrote:

The point of this series seems to be to allow access to small system
memory BOs (one page) without a GART mapping. I'm guessing that reduces
pressure on the GART and removes the need for HDP and TLB flushes.
I think if adding these explain/reason to comments are better to enable 
AGP apperture. If that's true, it's really a clever idea.


Regards,
David Zhou


Why
does Patch 10 only enable that on GFXv9? Is there less benefit on older
chips?

Is this related to your recent changes to allow page tables in system
memory?

See my replies to patch 6 and 8. Other than that, the series is
Acked-by: Felix Kuehling 

Regards,
   Felix


On 2018-08-27 12:53 PM, Christian König wrote:

Only use the lower address space on GMC9 for the system domain.
Otherwise we would need to sign extend GMC addresses.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index e44b5191735d..d982956c8329 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -938,11 +938,10 @@ static int gmc_v9_0_sw_init(void *handle)
if (r)
return r;
  
-	/* Set the internal MC address mask

-* This is the max address of the GPU's
-* internal address space.
+   /* Use only the lower range for the internal MC address mask. This is
+* the max address of the GPU's internal address space.
 */
-   adev->gmc.mc_mask = 0xULL; /* 48 bit MC */
+   adev->gmc.mc_mask = 0x7fffULL;
  
  	/* set DMA mask + need_dma32 flags.

 * PCIE - can handle 44-bits.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] drm: add syncobj timeline support v2

2018-08-23 Thread zhoucm1



On 2018年08月23日 17:15, Christian König wrote:

Am 23.08.2018 um 10:25 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer 
payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
    * Host query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * Host wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.


I think I have a idea what "Host" means in this context, but it would 
probably be better to describe it.


How about "CPU"?


    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)


I really liked Daniels idea to handle the classic syncobj like a 
timeline synobj with just 1 entry. That can probably simplify the 
implementation quite a bit.
Yeah, after timeline, seems we can remove old syncobj->fence, right? 
will try to unify them in additional patch.


Thanks,
David Zhou


Additional to that an amdgpu patch which shows how the interface is to 
be used is probably something Daniel will want to see as well.


Christian.



TODO:
1. CPU query and wait on timeline semaphore.
2. test application (Daniel Vetter)

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c | 383 
+++---

  include/drm/drm_syncobj.h |  28 +++
  include/uapi/drm/drm.h    |   1 +
  3 files changed, 389 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 6227df2cc0a4..f738d78edf65 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,44 @@
  #include "drm_internal.h"
  #include 
  +struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence 
*fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence 
*fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = NULL,
+};
+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};
+
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -137,6 +175,150 @@ void drm_syncobj_remove_callback(struct 
drm_syncobj *syncobj,

  spin_unlock(>lock);
  }
  +static void drm_syncobj_timeline_signal_wait_pts(struct 
drm_syncobj *syncobj)

+{
+    struct rb_node *node = NULL;
+    struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+    spin_lock(>lock);
+    for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+    node != NULL; ) {
+    wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+    node = rb_next(node);
+    if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
+    dma_fence_signal(_pt->base.base);
+    rb_erase(_pt->node,
+ >syncobj_timeline.wait_pt_tree);
+ 

Re: [PATCH 5/5] drm: add syncobj timeline support v2

2018-08-23 Thread zhoucm1



On 2018年08月23日 17:08, Daniel Vetter wrote:

On Thu, Aug 23, 2018 at 04:25:42PM +0800, Chunming Zhou wrote:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
* Host query - A host operation that allows querying the payload of the
  timeline semaphore.
* Host wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

Depending upon how it's going to be used, this is the wrong thing to do.


TODO:
1. CPU query and wait on timeline semaphore.
2. test application (Daniel Vetter)

I also had some more suggestions, around aligning the two concepts of
future fences
submission fence is replaced by wait_event, so I don't address your 
future fence suggestion. And welcome to explain future fence status.

and at least trying to merge the timeline and the other
fence (which really is just a special case of a timeline with only 1
slot).

Could you detail that? Do you mean merge syncobj->fence to timeline point?

Thanks,
David Zhou

-Daniel


Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c | 383 +++---
  include/drm/drm_syncobj.h |  28 +++
  include/uapi/drm/drm.h|   1 +
  3 files changed, 389 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 6227df2cc0a4..f738d78edf65 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,44 @@
  #include "drm_internal.h"
  #include 
  
+struct drm_syncobj_stub_fence {

+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+return !dma_fence_is_signaled(fence);
+}
+
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+   .release = NULL,
+};
+
+struct drm_syncobj_wait_pt {
+   struct drm_syncobj_stub_fence base;
+   u64value;
+   struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+   struct drm_syncobj_stub_fence base;
+   struct dma_fence *signal_fence;
+   struct dma_fence *pre_pt_base;
+   struct dma_fence_cb signal_cb;
+   struct dma_fence_cb pre_pt_cb;
+   struct drm_syncobj *syncobj;
+   u64value;
+   struct list_head list;
+};
+
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -137,6 +175,150 @@ void drm_syncobj_remove_callback(struct drm_syncobj 
*syncobj,
spin_unlock(>lock);
  }
  
+static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj)

+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; ) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   node = rb_next(node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
+   dma_fence_signal(_pt->base.base);
+

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread zhoucm1



On 2018年08月22日 17:31, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 05:28:17PM +0800, zhoucm1 wrote:


On 2018年08月22日 17:24, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
* Host query - A host operation that allows querying the payload of the
  timeline semaphore.
* Host wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

TODO:
CPU query and wait on timeline semaphore.

Another TODO: igt testcase for all the cornercases. We already have
other syncobj tests in there.

Yes, I'm also trying to find where test should be wrote, Could you give a
directory?

There's already tests/syncobj_basic.c and tests/syncobj_wait.c. Either
extend those, or probably better to start a new tests/syncobj_timeline.c
since I expect this will have a lot of corner-cases we need to check.
I failed to find them in both kernel and libdrm, Could you point which 
test you said?


Thanks,
David Zhou

-Daniel


Thanks,
David Zhou

That would also help with understanding how this is supposed to be used,
since I'm a bit too dense to immediately get your algorithm by just
staring at the code.



Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
   drivers/gpu/drm/drm_syncobj.c  | 385 
-
   drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
   drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
   include/drm/drm_syncobj.h  |  45 +++-
   include/uapi/drm/drm.h |   3 +-
   6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
   {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, , 0);
if (r)
return r;
@@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct 
amdgpu_cs_parser *p)
   {
int i;
-   for (i = 0; i < p->num_post_dep_syncobjs; ++i)
-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+   drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+   }
   }
   static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
   }
   EXPORT_SYMBOL(drm_syncobj_replace_fence);
+static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj 
*syncobj)
+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; node = rb_next(node)) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+   if (wait_pt->submission_fence)
+   dma_fence_signal(_pt->submission_fence->base);
+   } else {
+   /* the loop is from left to right, the later entry value is
+* bigger, so don't need to check any more */
+

Re: [PATCH 1/2] drm: rename null fence to stub fence in syncobj

2018-08-22 Thread zhoucm1



On 2018年08月22日 17:34, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 04:38:56PM +0800, Chunming Zhou wrote:

stub fence will be used by timeline syncobj as well.

Change-Id: Ia4252f03c07a8105491d2791dc7c8c6976682285
Signed-off-by: Chunming Zhou 
Cc: Jason Ekstrand 

Please don't expose stuff only used by the drm_syncobj implementation to
drivers. Gives us a very unclean driver interface. Imo this should all be
left within drm_syncobj.h.

.c? will fix that.


See also my comments for patch 2, you leak all the implemenation details
to drivers. We need to fix that and have a clear interface.

Yes, I will address them when I do v2.

Thanks,
David Zhou

-Daniel


---
  drivers/gpu/drm/drm_syncobj.c | 28 ++--
  include/drm/drm_syncobj.h | 24 
  2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index d4f4ce484529..70af32d0def1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,39 +187,15 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  
-struct drm_syncobj_null_fence {

-   struct dma_fence base;
-   spinlock_t lock;
-};
-
-static const char *drm_syncobj_null_fence_get_name(struct dma_fence *fence)
-{
-return "syncobjnull";
-}
-
-static bool drm_syncobj_null_fence_enable_signaling(struct dma_fence *fence)
-{
-dma_fence_enable_sw_signaling(fence);
-return !dma_fence_is_signaled(fence);
-}
-
-static const struct dma_fence_ops drm_syncobj_null_fence_ops = {
-   .get_driver_name = drm_syncobj_null_fence_get_name,
-   .get_timeline_name = drm_syncobj_null_fence_get_name,
-   .enable_signaling = drm_syncobj_null_fence_enable_signaling,
-   .wait = dma_fence_default_wait,
-   .release = NULL,
-};
-
  static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
  {
-   struct drm_syncobj_null_fence *fence;
+   struct drm_syncobj_stub_fence *fence;
fence = kzalloc(sizeof(*fence), GFP_KERNEL);
if (fence == NULL)
return -ENOMEM;
  
  	spin_lock_init(>lock);

-   dma_fence_init(>base, _syncobj_null_fence_ops,
+   dma_fence_init(>base, _syncobj_stub_fence_ops,
   >lock, 0, 0);
dma_fence_signal(>base);
  
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h

index 3980602472c0..b04c492ddbb5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -30,6 +30,30 @@
  
  struct drm_syncobj_cb;
  
+struct drm_syncobj_stub_fence {

+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+return "syncobjstub";
+}
+
+bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+dma_fence_enable_sw_signaling(fence);
+return !dma_fence_is_signaled(fence);
+}
+
+const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+   .wait = dma_fence_default_wait,
+   .release = NULL,
+};
+
  /**
   * struct drm_syncobj - sync object.
   *
--
2.14.1

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread zhoucm1



On 2018年08月22日 17:24, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
   * Host query - A host operation that allows querying the payload of the
 timeline semaphore.
   * Host wait - A host operation that allows a blocking wait for a
 timeline semaphore to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline semaphore to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

TODO:
CPU query and wait on timeline semaphore.

Another TODO: igt testcase for all the cornercases. We already have
other syncobj tests in there.
Yes, I'm also trying to find where test should be wrote, Could you give 
a directory?


Thanks,
David Zhou


That would also help with understanding how this is supposed to be used,
since I'm a bit too dense to immediately get your algorithm by just
staring at the code.



Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
  drivers/gpu/drm/drm_syncobj.c  | 385 -
  drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
  drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
  include/drm/drm_syncobj.h  |  45 +++-
  include/uapi/drm/drm.h |   3 +-
  6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
  {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, , 0);
if (r)
return r;
  
@@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)

  {
int i;
  
-	for (i = 0; i < p->num_post_dep_syncobjs; ++i)

-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+   drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+   }
  }
  
  static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  
+static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj *syncobj)

+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; node = rb_next(node)) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+   if (wait_pt->submission_fence)
+   dma_fence_signal(_pt->submission_fence->base);
+   } else {
+   /* the loop is from left to right, the later entry value is
+* bigger, so don't need to check any more */
+   break;
+   }

This seems to reinvet syncobj->cb_list. Since this is the exact same
"future fence that doesn't even exist yet" problem I think those two code
path should be unified. In general I think it'd be much better if we treat
the old syncobj as a timeline with a limit of 1 slot only.

Or there needs to be a really good reason why all new code.

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.


+   }
+ 

Re: [PATCH 1/2] drm: rename null fence to stub fence in syncobj

2018-08-22 Thread zhoucm1



On 2018年08月22日 17:05, Christian König wrote:

Am 22.08.2018 um 11:02 schrieb zhoucm1:



On 2018年08月22日 16:52, Christian König wrote:

Am 22.08.2018 um 10:38 schrieb Chunming Zhou:

stub fence will be used by timeline syncobj as well.

Change-Id: Ia4252f03c07a8105491d2791dc7c8c6976682285
Signed-off-by: Chunming Zhou 
Cc: Jason Ekstrand 
---
  drivers/gpu/drm/drm_syncobj.c | 28 ++--
  include/drm/drm_syncobj.h | 24 
  2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index d4f4ce484529..70af32d0def1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,39 +187,15 @@ void drm_syncobj_replace_fence(struct 
drm_syncobj *syncobj,

  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  -struct drm_syncobj_null_fence {
-    struct dma_fence base;
-    spinlock_t lock;
-};
-
-static const char *drm_syncobj_null_fence_get_name(struct 
dma_fence *fence)

-{
-    return "syncobjnull";
-}
-
-static bool drm_syncobj_null_fence_enable_signaling(struct 
dma_fence *fence)

-{
-    dma_fence_enable_sw_signaling(fence);
-    return !dma_fence_is_signaled(fence);
-}
-
-static const struct dma_fence_ops drm_syncobj_null_fence_ops = {
-    .get_driver_name = drm_syncobj_null_fence_get_name,
-    .get_timeline_name = drm_syncobj_null_fence_get_name,
-    .enable_signaling = drm_syncobj_null_fence_enable_signaling,
-    .wait = dma_fence_default_wait,
-    .release = NULL,
-};
-
  static int drm_syncobj_assign_null_handle(struct drm_syncobj 
*syncobj)

  {
-    struct drm_syncobj_null_fence *fence;
+    struct drm_syncobj_stub_fence *fence;
  fence = kzalloc(sizeof(*fence), GFP_KERNEL);
  if (fence == NULL)
  return -ENOMEM;
    spin_lock_init(>lock);
-    dma_fence_init(>base, _syncobj_null_fence_ops,
+    dma_fence_init(>base, _syncobj_stub_fence_ops,
 >lock, 0, 0);
  dma_fence_signal(>base);
  diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 3980602472c0..b04c492ddbb5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -30,6 +30,30 @@
    struct drm_syncobj_cb;
  +struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+    return "syncobjstub";
+}
+
+bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+    dma_fence_enable_sw_signaling(fence);


Copy from the old implementation, but that is certainly 
totally nonsense.


dma_fence_enable_sw_signaling() is the function who is calling this 
callback.

indeed, will fix.



+    return !dma_fence_is_signaled(fence);
+}
+
+const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .wait = dma_fence_default_wait,


The .wait callback should be dropped.

why?

fence->ops->wait(fence, intr, timeout) is called by dma_fence_wait(). 
If dropped, how does dma_fence_wait() work?


You are working on an older code base, fence->ops->wait is optional by 
now.
Sorry, I still don't get it. My code is synced today from 
amd-staging-drm-next, and it's 4.18-rc1.

I still see the dma_fence_wait_timeout is :
signed long
dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long 
timeout)

{
    signed long ret;

    if (WARN_ON(timeout < 0))
    return -EINVAL;

    trace_dma_fence_wait_start(fence);
    ret = fence->ops->wait(fence, intr, timeout);
    trace_dma_fence_wait_end(fence);
    return ret;
}

.wait callback seems still a must have?

Thanks,
David Zhou




Christian.



Thanks,
David


Apart from that looks good to me.

Christian.


+    .release = NULL,
+};
+
  /**
   * struct drm_syncobj - sync object.
   *




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm: rename null fence to stub fence in syncobj

2018-08-22 Thread zhoucm1



On 2018年08月22日 16:52, Christian König wrote:

Am 22.08.2018 um 10:38 schrieb Chunming Zhou:

stub fence will be used by timeline syncobj as well.

Change-Id: Ia4252f03c07a8105491d2791dc7c8c6976682285
Signed-off-by: Chunming Zhou 
Cc: Jason Ekstrand 
---
  drivers/gpu/drm/drm_syncobj.c | 28 ++--
  include/drm/drm_syncobj.h | 24 
  2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index d4f4ce484529..70af32d0def1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,39 +187,15 @@ void drm_syncobj_replace_fence(struct 
drm_syncobj *syncobj,

  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  -struct drm_syncobj_null_fence {
-    struct dma_fence base;
-    spinlock_t lock;
-};
-
-static const char *drm_syncobj_null_fence_get_name(struct dma_fence 
*fence)

-{
-    return "syncobjnull";
-}
-
-static bool drm_syncobj_null_fence_enable_signaling(struct dma_fence 
*fence)

-{
-    dma_fence_enable_sw_signaling(fence);
-    return !dma_fence_is_signaled(fence);
-}
-
-static const struct dma_fence_ops drm_syncobj_null_fence_ops = {
-    .get_driver_name = drm_syncobj_null_fence_get_name,
-    .get_timeline_name = drm_syncobj_null_fence_get_name,
-    .enable_signaling = drm_syncobj_null_fence_enable_signaling,
-    .wait = dma_fence_default_wait,
-    .release = NULL,
-};
-
  static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
  {
-    struct drm_syncobj_null_fence *fence;
+    struct drm_syncobj_stub_fence *fence;
  fence = kzalloc(sizeof(*fence), GFP_KERNEL);
  if (fence == NULL)
  return -ENOMEM;
    spin_lock_init(>lock);
-    dma_fence_init(>base, _syncobj_null_fence_ops,
+    dma_fence_init(>base, _syncobj_stub_fence_ops,
 >lock, 0, 0);
  dma_fence_signal(>base);
  diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 3980602472c0..b04c492ddbb5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -30,6 +30,30 @@
    struct drm_syncobj_cb;
  +struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+    return "syncobjstub";
+}
+
+bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+    dma_fence_enable_sw_signaling(fence);


Copy from the old implementation, but that is certainly totally 
nonsense.


dma_fence_enable_sw_signaling() is the function who is calling this 
callback.

indeed, will fix.



+    return !dma_fence_is_signaled(fence);
+}
+
+const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .wait = dma_fence_default_wait,


The .wait callback should be dropped.

why?

fence->ops->wait(fence, intr, timeout) is called by dma_fence_wait(). If 
dropped, how does dma_fence_wait() work?


Thanks,
David


Apart from that looks good to me.

Christian.


+    .release = NULL,
+};
+
  /**
   * struct drm_syncobj - sync object.
   *




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu: rework ctx entity creation

2018-08-16 Thread zhoucm1



On 2018年08月16日 16:11, Christian König wrote:

Am 16.08.2018 um 04:07 schrieb zhoucm1:



On 2018年08月15日 18:59, Christian König wrote:

Use a fixed number of entities for each hardware IP.

The number of compute entities is reduced to four, SDMA keeps it two
entities and all other engines just expose one entity.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 291 


  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  30 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  36 ++--
  3 files changed, 190 insertions(+), 167 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c

index 0a6cd1202ee5..987b7f256463 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -27,8 +27,29 @@
  #include "amdgpu.h"
  #include "amdgpu_sched.h"
  -#define to_amdgpu_ctx_ring(e)    \
-    container_of((e), struct amdgpu_ctx_ring, entity)
+#define to_amdgpu_ctx_entity(e)    \
+    container_of((e), struct amdgpu_ctx_entity, entity)
+
+const unsigned int amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] = {
+    [AMDGPU_HW_IP_GFX]    =    1,
+    [AMDGPU_HW_IP_COMPUTE]    =    4,

Could you explain why reduct to four? otherwise it looks good to me.


Currently we change the priority of the compute queues on the fly, but 
the idea is that we will have fixed high priority and low priority 
compute queues in the future.
Yeah, I see that, feel free to add my RB: Reviewed-by: Chunming Zhou 



Regards,
David Zhou


We could as well say we have only 2 or 3 if the closed stack is fine 
with that.


Regards,
Christian.



Thanks,
David Zhou

+    [AMDGPU_HW_IP_DMA]    =    2,
+    [AMDGPU_HW_IP_UVD]    =    1,
+    [AMDGPU_HW_IP_VCE]    =    1,
+    [AMDGPU_HW_IP_UVD_ENC]    =    1,
+    [AMDGPU_HW_IP_VCN_DEC]    =    1,
+    [AMDGPU_HW_IP_VCN_ENC]    =    1,
+};
+
+static int amdgput_ctx_total_num_entities(void)
+{
+    unsigned i, num_entities = 0;
+
+    for (i = 0; i < AMDGPU_HW_IP_NUM; ++i)
+    num_entities += amdgpu_ctx_num_entities[i];
+
+    return num_entities;
+}
    static int amdgpu_ctx_priority_permit(struct drm_file *filp,
    enum drm_sched_priority priority)
@@ -51,9 +72,8 @@ static int amdgpu_ctx_init(struct amdgpu_device 
*adev,

 struct drm_file *filp,
 struct amdgpu_ctx *ctx)
  {
-    struct drm_sched_rq *sdma_rqs[AMDGPU_MAX_RINGS];
-    struct drm_sched_rq *comp_rqs[AMDGPU_MAX_RINGS];
-    unsigned i, j, num_sdma_rqs, num_comp_rqs;
+    unsigned num_entities = amdgput_ctx_total_num_entities();
+    unsigned i, j;
  int r;
    if (priority < 0 || priority >= DRM_SCHED_PRIORITY_MAX)
@@ -65,19 +85,33 @@ static int amdgpu_ctx_init(struct amdgpu_device 
*adev,

    memset(ctx, 0, sizeof(*ctx));
  ctx->adev = adev;
-    kref_init(>refcount);
-    spin_lock_init(>ring_lock);
-    ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
+
+    ctx->fences = kcalloc(amdgpu_sched_jobs * num_entities,
    sizeof(struct dma_fence*), GFP_KERNEL);
  if (!ctx->fences)
  return -ENOMEM;
  -    mutex_init(>lock);
+    ctx->entities[0] = kcalloc(num_entities,
+   sizeof(struct amdgpu_ctx_entity),
+   GFP_KERNEL);
+    if (!ctx->entities[0]) {
+    r = -ENOMEM;
+    goto error_free_fences;
+    }
  -    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
-    ctx->rings[i].sequence = 1;
-    ctx->rings[i].fences = >fences[amdgpu_sched_jobs * i];
+    for (i = 0; i < num_entities; ++i) {
+    struct amdgpu_ctx_entity *entity = >entities[0][i];
+
+    entity->sequence = 1;
+    entity->fences = >fences[amdgpu_sched_jobs * i];
  }
+    for (i = 1; i < AMDGPU_HW_IP_NUM; ++i)
+    ctx->entities[i] = ctx->entities[i - 1] +
+    amdgpu_ctx_num_entities[i - 1];
+
+    kref_init(>refcount);
+    spin_lock_init(>ring_lock);
+    mutex_init(>lock);
    ctx->reset_counter = atomic_read(>gpu_reset_counter);
  ctx->reset_counter_query = ctx->reset_counter;
@@ -85,50 +119,70 @@ static int amdgpu_ctx_init(struct amdgpu_device 
*adev,

  ctx->init_priority = priority;
  ctx->override_priority = DRM_SCHED_PRIORITY_UNSET;
  -    num_sdma_rqs = 0;
-    num_comp_rqs = 0;
-    for (i = 0; i < adev->num_rings; i++) {
-    struct amdgpu_ring *ring = adev->rings[i];
-    struct drm_sched_rq *rq;
-
-    rq = >sched.sched_rq[priority];
-    if (ring->funcs->type == AMDGPU_RING_TYPE_SDMA)
-    sdma_rqs[num_sdma_rqs++] = rq;
-    else if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
-    comp_rqs[num_comp_rqs++] = rq;
-    }
-
-    /* create context entity for each ring */
-    for (i = 0; i < adev->num_rings; i++) {
-    struct amdgpu_ring *ri

Re: [PATCH] drm/amdgpu: fix VM size reporting on Raven

2018-08-15 Thread zhoucm1



On 2018年08月15日 20:05, Christian König wrote:

Raven doesn't have an VCE block and so also no buggy VCE firmware.

Signed-off-by: Christian König 

Acked-by: Chunming Zhou 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 7d3d76a1a9dd..2ecb883bdd41 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -619,7 +619,8 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
vm_size -= AMDGPU_VA_RESERVED_SIZE;
  
  		/* Older VCE FW versions are buggy and can handle only 40bits */

-   if (adev->vce.fw_version < AMDGPU_VCE_FW_53_45)
+   if (adev->vce.fw_version &&
+   adev->vce.fw_version < AMDGPU_VCE_FW_53_45)
vm_size = min(vm_size, 1ULL << 40);
  
  		dev_info.virtual_address_offset = AMDGPU_VA_RESERVED_SIZE;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu: rework ctx entity creation

2018-08-15 Thread zhoucm1



On 2018年08月15日 18:59, Christian König wrote:

Use a fixed number of entities for each hardware IP.

The number of compute entities is reduced to four, SDMA keeps it two
entities and all other engines just expose one entity.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 291 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  30 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  36 ++--
  3 files changed, 190 insertions(+), 167 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 0a6cd1202ee5..987b7f256463 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -27,8 +27,29 @@
  #include "amdgpu.h"
  #include "amdgpu_sched.h"
  
-#define to_amdgpu_ctx_ring(e)	\

-   container_of((e), struct amdgpu_ctx_ring, entity)
+#define to_amdgpu_ctx_entity(e)\
+   container_of((e), struct amdgpu_ctx_entity, entity)
+
+const unsigned int amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] = {
+   [AMDGPU_HW_IP_GFX]  =   1,
+   [AMDGPU_HW_IP_COMPUTE]  =   4,

Could you explain why reduct to four? otherwise it looks good to me.

Thanks,
David Zhou

+   [AMDGPU_HW_IP_DMA]  =   2,
+   [AMDGPU_HW_IP_UVD]  =   1,
+   [AMDGPU_HW_IP_VCE]  =   1,
+   [AMDGPU_HW_IP_UVD_ENC]  =   1,
+   [AMDGPU_HW_IP_VCN_DEC]  =   1,
+   [AMDGPU_HW_IP_VCN_ENC]  =   1,
+};
+
+static int amdgput_ctx_total_num_entities(void)
+{
+   unsigned i, num_entities = 0;
+
+   for (i = 0; i < AMDGPU_HW_IP_NUM; ++i)
+   num_entities += amdgpu_ctx_num_entities[i];
+
+   return num_entities;
+}
  
  static int amdgpu_ctx_priority_permit(struct drm_file *filp,

  enum drm_sched_priority priority)
@@ -51,9 +72,8 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,
   struct drm_file *filp,
   struct amdgpu_ctx *ctx)
  {
-   struct drm_sched_rq *sdma_rqs[AMDGPU_MAX_RINGS];
-   struct drm_sched_rq *comp_rqs[AMDGPU_MAX_RINGS];
-   unsigned i, j, num_sdma_rqs, num_comp_rqs;
+   unsigned num_entities = amdgput_ctx_total_num_entities();
+   unsigned i, j;
int r;
  
  	if (priority < 0 || priority >= DRM_SCHED_PRIORITY_MAX)

@@ -65,19 +85,33 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,
  
  	memset(ctx, 0, sizeof(*ctx));

ctx->adev = adev;
-   kref_init(>refcount);
-   spin_lock_init(>ring_lock);
-   ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
+
+   ctx->fences = kcalloc(amdgpu_sched_jobs * num_entities,
  sizeof(struct dma_fence*), GFP_KERNEL);
if (!ctx->fences)
return -ENOMEM;
  
-	mutex_init(>lock);

+   ctx->entities[0] = kcalloc(num_entities,
+  sizeof(struct amdgpu_ctx_entity),
+  GFP_KERNEL);
+   if (!ctx->entities[0]) {
+   r = -ENOMEM;
+   goto error_free_fences;
+   }
  
-	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {

-   ctx->rings[i].sequence = 1;
-   ctx->rings[i].fences = >fences[amdgpu_sched_jobs * i];
+   for (i = 0; i < num_entities; ++i) {
+   struct amdgpu_ctx_entity *entity = >entities[0][i];
+
+   entity->sequence = 1;
+   entity->fences = >fences[amdgpu_sched_jobs * i];
}
+   for (i = 1; i < AMDGPU_HW_IP_NUM; ++i)
+   ctx->entities[i] = ctx->entities[i - 1] +
+   amdgpu_ctx_num_entities[i - 1];
+
+   kref_init(>refcount);
+   spin_lock_init(>ring_lock);
+   mutex_init(>lock);
  
  	ctx->reset_counter = atomic_read(>gpu_reset_counter);

ctx->reset_counter_query = ctx->reset_counter;
@@ -85,50 +119,70 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,
ctx->init_priority = priority;
ctx->override_priority = DRM_SCHED_PRIORITY_UNSET;
  
-	num_sdma_rqs = 0;

-   num_comp_rqs = 0;
-   for (i = 0; i < adev->num_rings; i++) {
-   struct amdgpu_ring *ring = adev->rings[i];
-   struct drm_sched_rq *rq;
-
-   rq = >sched.sched_rq[priority];
-   if (ring->funcs->type == AMDGPU_RING_TYPE_SDMA)
-   sdma_rqs[num_sdma_rqs++] = rq;
-   else if (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
-   comp_rqs[num_comp_rqs++] = rq;
-   }
-
-   /* create context entity for each ring */
-   for (i = 0; i < adev->num_rings; i++) {
-   struct amdgpu_ring *ring = adev->rings[i];
+   for (i = 0; i < AMDGPU_HW_IP_NUM; ++i) {
+   struct amdgpu_ring *rings[AMDGPU_MAX_RINGS];
+   struct drm_sched_rq *rqs[AMDGPU_MAX_RINGS];
+   unsigned num_rings;
+
+   switch (i) {
+ 

Re: [PATCH 1/2] drm/amdgpu: cleanup HW_IP query

2018-08-15 Thread zhoucm1



On 2018年08月15日 18:59, Christian König wrote:

Move the code into a separate function.

Signed-off-by: Christian König 

Reviewed-by: Chunming Zhou 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 204 +---
  1 file changed, 110 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 7d3d76a1a9dd..40fd591c9980 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -263,6 +263,109 @@ static int amdgpu_firmware_info(struct 
drm_amdgpu_info_firmware *fw_info,
return 0;
  }
  
+static int amdgpu_hw_ip_info(struct amdgpu_device *adev,

+struct drm_amdgpu_info *info,
+struct drm_amdgpu_info_hw_ip *result)
+{
+   uint32_t ib_start_alignment = 0;
+   uint32_t ib_size_alignment = 0;
+   enum amd_ip_block_type type;
+   uint32_t ring_mask = 0;
+   unsigned int i, j;
+
+   if (info->query_hw_ip.ip_instance >= AMDGPU_HW_IP_INSTANCE_MAX_COUNT)
+   return -EINVAL;
+
+   switch (info->query_hw_ip.type) {
+   case AMDGPU_HW_IP_GFX:
+   type = AMD_IP_BLOCK_TYPE_GFX;
+   for (i = 0; i < adev->gfx.num_gfx_rings; i++)
+   ring_mask |= adev->gfx.gfx_ring[i].ready << i;
+   ib_start_alignment = 32;
+   ib_size_alignment = 32;
+   break;
+   case AMDGPU_HW_IP_COMPUTE:
+   type = AMD_IP_BLOCK_TYPE_GFX;
+   for (i = 0; i < adev->gfx.num_compute_rings; i++)
+   ring_mask |= adev->gfx.compute_ring[i].ready << i;
+   ib_start_alignment = 32;
+   ib_size_alignment = 32;
+   break;
+   case AMDGPU_HW_IP_DMA:
+   type = AMD_IP_BLOCK_TYPE_SDMA;
+   for (i = 0; i < adev->sdma.num_instances; i++)
+   ring_mask |= adev->sdma.instance[i].ring.ready << i;
+   ib_start_alignment = 256;
+   ib_size_alignment = 4;
+   break;
+   case AMDGPU_HW_IP_UVD:
+   type = AMD_IP_BLOCK_TYPE_UVD;
+   for (i = 0; i < adev->uvd.num_uvd_inst; i++) {
+   if (adev->uvd.harvest_config & (1 << i))
+   continue;
+   ring_mask |= adev->uvd.inst[i].ring.ready;
+   }
+   ib_start_alignment = 64;
+   ib_size_alignment = 64;
+   break;
+   case AMDGPU_HW_IP_VCE:
+   type = AMD_IP_BLOCK_TYPE_VCE;
+   for (i = 0; i < adev->vce.num_rings; i++)
+   ring_mask |= adev->vce.ring[i].ready << i;
+   ib_start_alignment = 4;
+   ib_size_alignment = 1;
+   break;
+   case AMDGPU_HW_IP_UVD_ENC:
+   type = AMD_IP_BLOCK_TYPE_UVD;
+   for (i = 0; i < adev->uvd.num_uvd_inst; i++) {
+   if (adev->uvd.harvest_config & (1 << i))
+   continue;
+   for (j = 0; j < adev->uvd.num_enc_rings; j++)
+   ring_mask |= adev->uvd.inst[i].ring_enc[j].ready 
<< j;
+   }
+   ib_start_alignment = 64;
+   ib_size_alignment = 64;
+   break;
+   case AMDGPU_HW_IP_VCN_DEC:
+   type = AMD_IP_BLOCK_TYPE_VCN;
+   ring_mask = adev->vcn.ring_dec.ready;
+   ib_start_alignment = 16;
+   ib_size_alignment = 16;
+   break;
+   case AMDGPU_HW_IP_VCN_ENC:
+   type = AMD_IP_BLOCK_TYPE_VCN;
+   for (i = 0; i < adev->vcn.num_enc_rings; i++)
+   ring_mask |= adev->vcn.ring_enc[i].ready << i;
+   ib_start_alignment = 64;
+   ib_size_alignment = 1;
+   break;
+   case AMDGPU_HW_IP_VCN_JPEG:
+   type = AMD_IP_BLOCK_TYPE_VCN;
+   ring_mask = adev->vcn.ring_jpeg.ready;
+   ib_start_alignment = 16;
+   ib_size_alignment = 16;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   for (i = 0; i < adev->num_ip_blocks; i++)
+   if (adev->ip_blocks[i].version->type == type &&
+   adev->ip_blocks[i].status.valid)
+   break;
+
+   if (i == adev->num_ip_blocks)
+   return 0;
+
+   result->hw_ip_version_major = adev->ip_blocks[i].version->major;
+   result->hw_ip_version_minor = adev->ip_blocks[i].version->minor;
+   result->capabilities_flags = 0;
+   result->available_rings = ring_mask;
+   result->ib_start_alignment = ib_start_alignment;
+   result->ib_size_alignment = ib_size_alignment;
+   return 0;
+}
+
  /*
   * Userspace get information ioctl
   */
@@ -288,7 +391,7 @@ static int amdgpu_info_ioctl(struct 

Re: [PATCH v3 1/5] drm/ttm: add helper structures for bulk moves on lru list

2018-08-13 Thread zhoucm1



On 2018年08月13日 18:16, Christian König wrote:

Am 13.08.2018 um 11:58 schrieb Huang Rui:

From: Christian König 

Add bulk move pos to store the pointer of first and last buffer object.
The list in between will be bulk moved on lru list.

Signed-off-by: Christian König 
Signed-off-by: Huang Rui 
Tested-by: Mike Lothian 


If you ask me that looks like it should work now, but I'm prepossessed 
because I helped creating this.


Alex, David or Jerry can somebody else take a look as well?

remember position, list ops...
Acked-by: Chunming Zhou 



Thanks,
Christian.


---
  include/drm/ttm/ttm_bo_driver.h | 28 
  1 file changed, 28 insertions(+)

diff --git a/include/drm/ttm/ttm_bo_driver.h 
b/include/drm/ttm/ttm_bo_driver.h

index 3234cc3..e4fee8e 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -491,6 +491,34 @@ struct ttm_bo_device {
  };
    /**
+ * struct ttm_lru_bulk_move_pos
+ *
+ * @first: first BO in the bulk move range
+ * @last: last BO in the bulk move range
+ *
+ * Positions for a lru bulk move.
+ */
+struct ttm_lru_bulk_move_pos {
+    struct ttm_buffer_object *first;
+    struct ttm_buffer_object *last;
+};
+
+/**
+ * struct ttm_lru_bulk_move
+ *
+ * @tt: first/last lru entry for BOs in the TT domain
+ * @vram: first/last lru entry for BOs in the VRAM domain
+ * @swap: first/last lru entry for BOs on the swap list
+ *
+ * Helper structure for bulk moves on the LRU list.
+ */
+struct ttm_lru_bulk_move {
+    struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY];
+    struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY];
+    struct ttm_lru_bulk_move_pos swap[TTM_MAX_BO_PRIORITY];
+};
+
+/**
   * ttm_flag_masked
   *
   * @old: Pointer to the result and original value.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 2/4] amdgpu: add a function to find bo by cpu mapping (v2)

2018-08-08 Thread zhoucm1



On 2018年08月08日 14:48, Christian König wrote:

Am 08.08.2018 um 06:23 schrieb zhoucm1:



On 2018年08月08日 12:08, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

v2: update mutex range and rebase

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h    | 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  amdgpu_bo_handle *buf_handle);
    /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a7f0662 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
  +int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+    int i;
+    struct amdgpu_bo *bo;
+
+    if (cpu == NULL || size == 0)
+    return -EINVAL;
+
+    pthread_mutex_lock(>bo_table_mutex);
+    for (i = 0; i < dev->bo_handles.max_key; i++) {

Hi Jerry,

As Christian catched before, iterating all BOs of device will 
introduce much CPU overhead, this isn't good direction.
Since cpu virtual address is per-process, you should go to kernel to 
find them from vm tree, which obviously takes less time.


Yeah, but is also much more overhead to maintain.

Since this is only to fix the behavior of a single buggy application 
at least I'm fine to keep the workaround as simple as this.
I like 'workaround' expression, if Jerry adds 'workaround' comments 
here, I'm ok as well.


Regards,
David Zhou


If we find a wider use we can still start to use the kernel 
implementation again.


Regards,
Christian.



Regards,
David Zhou

+    bo = handle_table_lookup(>bo_handles, i);
+    if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+    continue;
+    if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + 
bo->alloc_size))

+    break;
+    }
+
+    if (i < dev->bo_handles.max_key) {
+    atomic_inc(>refcount);
+    *buf_handle = bo;
+    *offset_in_bo = cpu - bo->cpu_ptr;
+    } else {
+    *buf_handle = NULL;
+    *offset_in_bo = 0;
+    }
+    pthread_mutex_unlock(>bo_table_mutex);
+
+    return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 2/4] amdgpu: add a function to find bo by cpu mapping (v2)

2018-08-07 Thread zhoucm1



On 2018年08月08日 12:08, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

v2: update mutex range and rebase

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
  
  /**

+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev  - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a7f0662 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
  }
  
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,

+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i < dev->bo_handles.max_key; i++) {

Hi Jerry,

As Christian catched before, iterating all BOs of device will introduce 
much CPU overhead, this isn't good direction.
Since cpu virtual address is per-process, you should go to kernel to 
find them from vm tree, which obviously takes less time.


Regards,
David Zhou

+   bo = handle_table_lookup(>bo_handles, i);
+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+
+   if (i < dev->bo_handles.max_key) {
+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread zhoucm1



On 2018年08月07日 17:30, Christian König wrote:

Am 07.08.2018 um 09:55 schrieb zhoucm1:



On 2018年08月07日 15:26, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h    | 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int 
amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,

  amdgpu_bo_handle *buf_handle);
    /**
+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev   - \c [in] Device handle. See 
#amdgpu_device_initialize()

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
  }
  }
  +int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+  void *cpu,
+  uint64_t size,
+  amdgpu_bo_handle *buf_handle,
+  uint64_t *offset_in_bo)
+{
+    int i;
+    struct amdgpu_bo *bo;
+
+    if (cpu == NULL || size == 0)
+    return -EINVAL;
+
+    pthread_mutex_lock(>bo_table_mutex);
+    for (i = 0; i <= dev->bo_handles.max_key; i++) {
+    bo = handle_table_lookup(>bo_handles, i);

explicit cast is encouraged, like "

bo = (struct amdgpu_bo *)handle_table_lookup(>bo_handles, i);


Actually it isn't. We use kernel coding style here, so explicit casts 
from "void*" should be avoided:
Casting the return value which is a void pointer is redundant. The 
conversion from void pointer to any other pointer type is guaranteed 
by the C programming language.

understand, personally, I still like explicit cast, which read easily.

David


I already had to remove quite a bunch of explicit casts because of 
this, so please stop adding new ones.


Regards,
Christian.




"

otherwise, the series looks good to me.

Regards,
David Zhou

+    if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+    continue;
+    if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + 
bo->alloc_size))

+    break;
+    }
+    pthread_mutex_unlock(>bo_table_mutex);
+
+    if (i <= dev->bo_handles.max_key) {
+    atomic_inc(>refcount);
+    *buf_handle = bo;
+    *offset_in_bo = cpu - bo->cpu_ptr;
+    } else {
+    *buf_handle = NULL;
+    *offset_in_bo = 0;
+    }
+
+    return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
  void *cpu,
  uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: add a function to find bo by cpu mapping

2018-08-07 Thread zhoucm1



On 2018年08月07日 15:26, Junwei Zhang wrote:

Userspace needs to know if the user memory is from BO or malloc.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu.h| 23 +++
  amdgpu/amdgpu_bo.c | 34 ++
  2 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index be83b45..a8c353c 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -678,6 +678,29 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
amdgpu_bo_handle *buf_handle);
  
  /**

+ * Validate if the user memory comes from BO
+ *
+ * \param dev - [in] Device handle. See #amdgpu_device_initialize()
+ * \param cpu - [in] CPU address of user allocated memory which we
+ * want to map to GPU address space (make GPU accessible)
+ * (This address must be correctly aligned).
+ * \param size - [in] Size of allocation (must be correctly aligned)
+ * \param buf_handle - [out] Buffer handle for the userptr memory
+ * if the user memory is not from BO, the buf_handle will be NULL.
+ * \param offset_in_bo - [out] offset in this BO for this user memory
+ *
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,
+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo);
+
+/**
   * Free previosuly allocated memory
   *
   * \param   dev  - \c [in] Device handle. See 
#amdgpu_device_initialize()
diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index b24e698..a631050 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo,
}
  }
  
+int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev,

+ void *cpu,
+ uint64_t size,
+ amdgpu_bo_handle *buf_handle,
+ uint64_t *offset_in_bo)
+{
+   int i;
+   struct amdgpu_bo *bo;
+
+   if (cpu == NULL || size == 0)
+   return -EINVAL;
+
+   pthread_mutex_lock(>bo_table_mutex);
+   for (i = 0; i <= dev->bo_handles.max_key; i++) {
+   bo = handle_table_lookup(>bo_handles, i);

explicit cast is encouraged, like "

bo = (struct amdgpu_bo *)handle_table_lookup(>bo_handles, i);

"

otherwise, the series looks good to me.

Regards,
David Zhou

+   if (!bo || !bo->cpu_ptr || size > bo->alloc_size)
+   continue;
+   if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + bo->alloc_size))
+   break;
+   }
+   pthread_mutex_unlock(>bo_table_mutex);
+
+   if (i <= dev->bo_handles.max_key) {
+   atomic_inc(>refcount);
+   *buf_handle = bo;
+   *offset_in_bo = cpu - bo->cpu_ptr;
+   } else {
+   *buf_handle = NULL;
+   *offset_in_bo = 0;
+   }
+
+   return 0;
+}
+
  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev,
void *cpu,
uint64_t size,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 1/4] amdgpu: add bo from user memory to handle table

2018-08-07 Thread zhoucm1



On 2018年08月07日 15:26, Junwei Zhang wrote:

When create bo from user memory, add it to handle table
for future query.

Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu_bo.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 422c7c9..b24e698 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -556,7 +556,16 @@ int amdgpu_create_bo_from_user_mem(amdgpu_device_handle 
dev,
bo->alloc_size = size;
bo->handle = args.handle;
  
-	*buf_handle = bo;

+   pthread_mutex_lock(>dev->bo_table_mutex);
+   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+   pthread_mutex_unlock(>dev->bo_table_mutex);
+
+   pthread_mutex_init(>cpu_access_mutex, NULL);

This line is nothing with patch itself, please separate from it.

Regards,
David Zhou

+
+   if (r)
+   amdgpu_bo_free(bo);
+   else
+   *buf_handle = bo;
  
  	return r;

  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/4] drm/scheduler: add new function to get least loaded sched v2

2018-08-02 Thread zhoucm1



On 2018年08月02日 14:01, Nayan Deshmukh wrote:

Hi David,

On Thu, Aug 2, 2018 at 8:22 AM Zhou, David(ChunMing) 
mailto:david1.z...@amd.com>> wrote:


Another big question:

I agree the general idea is good to balance scheduler load for
same ring family.

But, when same entity job run on different scheduler, that means
the later job could be completed ahead of front, Right?

Really good question. To avoid this senario we do not move an entity 
which already has a job in the hardware queue. We only move entities 
whose last_scheduled fence has been signalled which means that the 
last submitted job of this entity has finished executing.

Good handling I missed when reviewing them.

Cheers,
David Zhou


Moving an entity which already has a job in the hardware queue will 
hinder the dependency optimization that we are using and hence will 
not anyway lead to a better performance. I have talked about the issue 
in more detail here [1]. Please let me know if you have any more 
doubts regarding this.


Cheers,
Nayan

[1] 
http://ndesh26.github.io/gsoc/2018/06/14/GSoC-Update-A-Curious-Case-of-Dependency-Handling/


That will break fence design, later fence must be signaled after
front fence in same fence context.

Anything I missed?

Regards,

David Zhou

*From:* dri-devel mailto:dri-devel-boun...@lists.freedesktop.org>> *On Behalf Of
*Nayan Deshmukh
*Sent:* Thursday, August 02, 2018 12:07 AM
*To:* Grodzovsky, Andrey mailto:andrey.grodzov...@amd.com>>
*Cc:* amd-gfx@lists.freedesktop.org
; Maling list - DRI
developers mailto:dri-de...@lists.freedesktop.org>>; Koenig, Christian
mailto:christian.koe...@amd.com>>
*Subject:* Re: [PATCH 3/4] drm/scheduler: add new function to get
least loaded sched v2

Yes, that is correct.

Nayan

On Wed, Aug 1, 2018, 9:05 PM Andrey Grodzovsky
mailto:andrey.grodzov...@amd.com>> wrote:

Clarification question -  if the run queues belong to different
schedulers they effectively point to different rings,

it means we allow to move (reschedule) a drm_sched_entity from
one ring
to another - i assume that the idea int the first place, that

you have a set of HW rings and you can utilize any of them for
your jobs
(like compute rings). Correct ?

Andrey


On 08/01/2018 04:20 AM, Nayan Deshmukh wrote:
> The function selects the run queue from the rq_list with the
> least load. The load is decided by the number of jobs in a
> scheduler.
>
> v2: avoid using atomic read twice consecutively, instead store
>      it locally
>
> Signed-off-by: Nayan Deshmukh mailto:nayan26deshm...@gmail.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 25
+
>   1 file changed, 25 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 375f6f7f6a93..fb4e542660b0 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -255,6 +255,31 @@ static bool
drm_sched_entity_is_ready(struct drm_sched_entity *entity)
>       return true;
>   }
>
> +/**
> + * drm_sched_entity_get_free_sched - Get the rq from
rq_list with least load
> + *
> + * @entity: scheduler entity
> + *
> + * Return the pointer to the rq with least load.
> + */
> +static struct drm_sched_rq *
> +drm_sched_entity_get_free_sched(struct drm_sched_entity
*entity)
> +{
> +     struct drm_sched_rq *rq = NULL;
> +     unsigned int min_jobs = UINT_MAX, num_jobs;
> +     int i;
> +
> +     for (i = 0; i < entity->num_rq_list; ++i) {
> +             num_jobs =
atomic_read(>rq_list[i]->sched->num_jobs);
> +             if (num_jobs < min_jobs) {
> +                     min_jobs = num_jobs;
> +                     rq = entity->rq_list[i];
> +             }
> +     }
> +
> +     return rq;
> +}
> +
>   static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>                                   struct dma_fence_cb *cb)
>   {



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/7] drm/amdgpu: use new scheduler load balancing for VMs

2018-08-02 Thread zhoucm1

Reviewed-by: Chunming Zhou  for series.


Thanks,

David Zhou


On 2018年08月01日 19:31, Christian König wrote:

Instead of the fixed round robin use let the scheduler balance the load
of page table updates.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  7 +++
  drivers/gpu/drm/amd/amdgpu/cik_sdma.c  | 12 +++-
  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 12 +++-
  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 12 +++-
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 12 +++-
  drivers/gpu/drm/amd/amdgpu/si_dma.c| 12 +++-
  8 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 745f760b8df9..971ab128f277 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2335,7 +2335,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
adev->mman.buffer_funcs = NULL;
adev->mman.buffer_funcs_ring = NULL;
adev->vm_manager.vm_pte_funcs = NULL;
-   adev->vm_manager.vm_pte_num_rings = 0;
+   adev->vm_manager.vm_pte_num_rqs = 0;
adev->gmc.gmc_funcs = NULL;
adev->fence_context = dma_fence_context_alloc(AMDGPU_MAX_RINGS);
bitmap_zero(adev->gfx.pipe_reserve_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 015613b4f98b..662e8a34d52c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2568,9 +2568,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
struct amdgpu_bo *root;
const unsigned align = min(AMDGPU_VM_PTB_ALIGN_SIZE,
AMDGPU_VM_PTE_COUNT(adev) * 8);
-   unsigned ring_instance;
-   struct amdgpu_ring *ring;
-   struct drm_sched_rq *rq;
unsigned long size;
uint64_t flags;
int r, i;
@@ -2586,12 +2583,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
INIT_LIST_HEAD(>freed);
  
  	/* create scheduler entity for page table updates */

-
-   ring_instance = atomic_inc_return(>vm_manager.vm_pte_next_ring);
-   ring_instance %= adev->vm_manager.vm_pte_num_rings;
-   ring = adev->vm_manager.vm_pte_rings[ring_instance];
-   rq = >sched.sched_rq[DRM_SCHED_PRIORITY_KERNEL];
-   r = drm_sched_entity_init(>entity, , 1, NULL);
+   r = drm_sched_entity_init(>entity, adev->vm_manager.vm_pte_rqs,
+ adev->vm_manager.vm_pte_num_rqs, NULL);
if (r)
return r;
  
@@ -2898,7 +2891,6 @@ void amdgpu_vm_manager_init(struct amdgpu_device *adev)

for (i = 0; i < AMDGPU_MAX_RINGS; ++i)
adev->vm_manager.seqno[i] = 0;
  
-	atomic_set(>vm_manager.vm_pte_next_ring, 0);

spin_lock_init(>vm_manager.prt_lock);
atomic_set(>vm_manager.num_prt_users, 0);
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 67a15d439ac0..034f8c399c2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -244,10 +244,9 @@ struct amdgpu_vm_manager {
/* vram base address for page table entry  */
u64 vram_base_offset;
/* vm pte handling */
-   const struct amdgpu_vm_pte_funcs*vm_pte_funcs;
-   struct amdgpu_ring  *vm_pte_rings[AMDGPU_MAX_RINGS];
-   unsignedvm_pte_num_rings;
-   atomic_tvm_pte_next_ring;
+   const struct amdgpu_vm_pte_funcs*vm_pte_funcs;
+   struct drm_sched_rq *vm_pte_rqs[AMDGPU_MAX_RINGS];
+   unsignedvm_pte_num_rqs;
  
  	/* partial resident texture handling */

spinlock_t  prt_lock;
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index d0fa2aac2388..154b1499b07e 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -1386,15 +1386,17 @@ static const struct amdgpu_vm_pte_funcs 
cik_sdma_vm_pte_funcs = {
  
  static void cik_sdma_set_vm_pte_funcs(struct amdgpu_device *adev)

  {
+   struct drm_gpu_scheduler *sched;
unsigned i;
  
  	if (adev->vm_manager.vm_pte_funcs == NULL) {

adev->vm_manager.vm_pte_funcs = _sdma_vm_pte_funcs;
-   for (i = 0; i < adev->sdma.num_instances; i++)
-   adev->vm_manager.vm_pte_rings[i] =
-   >sdma.instance[i].ring;
-
-   adev->vm_manager.vm_pte_num_rings = adev->sdma.num_instances;
+   for 

Re: [PATCH] drm/amdgpu: always initialize job->base.sched

2018-07-17 Thread zhoucm1



On 2018年07月17日 15:26, Christian König wrote:

Am 17.07.2018 um 09:16 schrieb Zhou, David(ChunMing):
Acked-by: Chunming Zhou , but I think it isn't a 
nice evaluation although there is comment in code.


Yeah, I didn't thought about the possibility that we need to free the 
job before it is submitted (in other words before the scheduler is 
determined).


Alternatively we could provide the adev manually to amdgpu_job_free() 
and amdgpu_job_free_resources().

not a big deal, you can still go ahead with this patch.

David


Regards,
Christian.




-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On 
Behalf Of Christian K?nig

Sent: Tuesday, July 17, 2018 3:05 PM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: always initialize job->base.sched

Otherwise we can't clean up the job if we run into an error before it 
is pushed to the scheduler.


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index 024efb7ea6d6..42a4764d728e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -54,6 +54,11 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, 
unsigned num_ibs,

  if (!*job)
  return -ENOMEM;
  +    /*
+ * Initialize the scheduler to at least some ring so that we always
+ * have a pointer to adev.
+ */
+    (*job)->base.sched = >rings[0]->sched;
  (*job)->vm = vm;
  (*job)->ibs = (void *)&(*job)[1];
  (*job)->num_ibs = num_ibs;
--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: Allow to create BO lists in CS ioctl v2

2018-07-12 Thread zhoucm1



On 2018年07月12日 15:56, Christian König wrote:

Am 12.07.2018 um 06:21 schrieb zhoucm1:
With more thinking for you performance reason, Can we go further more 
not to create temp bo list at all? directly add them into 
parser->validated list?


You still need something which is added to the parser->validated list, 
so creating the array of BOs in unavoidable.


In fact, if bo array is very long, then overhead of bo list creation 
in CS is considerable, which will double iterate all BOs compared to 
original.


From UMD perspective, they don't create bo list for every CS, they 
could use old created bo_list for next several CS, if there is a new 
bo, they just add it.


And exactly that is the failed concept of bo_lists, it is complete 
nonsense to do this.


Either you create the list of BOs from scratch for each command 
submission like Mesa does it in which is exactly the case we try to 
support efficient here.

@Kai, do you have comments for what Christian said?



Or you use per process BOs which are always valid. Something which we 
have already implemented as well.

Yes, vulkan already use it from 4.15. But pro-ogl still use bo list.


Regards,
Christian.



Thanks,
David Zhou
On 2018年07月12日 12:02, zhoucm1 wrote:



On 2018年07月12日 11:09, Zhou, David(ChunMing) wrote:

Hi Andrey,

Could you add compatibility flag or increase kms driver version? So 
that user space can keep old path when using new one.

Sorry for noise, it's already at end of path.

Regards,
David Zhou


Regards,
David Zhou

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On 
Behalf Of zhoucm1

Sent: Thursday, July 12, 2018 10:31 AM
To: Grodzovsky, Andrey ; 
amd-gfx@lists.freedesktop.org
Cc: Olsak, Marek ; Koenig, Christian 

Subject: Re: [PATCH v2] drm/amdgpu: Allow to create BO lists in CS 
ioctl v2




On 2018年07月12日 04:57, Andrey Grodzovsky wrote:

This change is to support MESA performace optimization.
Modify CS IOCTL to allow its input as command buffer and an array of
buffer handles to create a temporay bo list and then destroy it when
IOCTL completes.
This saves on calling for BO_LIST create and destry IOCTLs in MESA 
and

by this improves performance.

v2: Avoid inserting the temp list into idr struct.

Signed-off-by: Andrey Grodzovsky 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 
   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 86 
++---

   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 51 +++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 +-
   include/uapi/drm/amdgpu_drm.h   |  1 +
   5 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8eaba0f..9b472b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -732,6 +732,17 @@ void amdgpu_bo_list_get_list(struct 
amdgpu_bo_list *list,

    struct list_head *validated);
   void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
   void amdgpu_bo_list_free(struct amdgpu_bo_list *list);
+int amdgpu_bo_create_list_entry_array(struct 
drm_amdgpu_bo_list_in *in,

+  struct drm_amdgpu_bo_list_entry **info_param);
+
+int amdgpu_bo_list_create(struct amdgpu_device *adev,
+ struct drm_file *filp,
+ struct drm_amdgpu_bo_list_entry *info,
+ unsigned num_entries,
+ int *id,
+ struct amdgpu_bo_list **list);
+
+void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id);
      /*
    * GFX stuff
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 92be7f6..14c7c59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -55,11 +55,12 @@ static void amdgpu_bo_list_release_rcu(struct 
kref *ref)

   kfree_rcu(list, rhead);
   }
   -static int amdgpu_bo_list_create(struct amdgpu_device *adev,
+int amdgpu_bo_list_create(struct amdgpu_device *adev,
    struct drm_file *filp,
    struct drm_amdgpu_bo_list_entry *info,
    unsigned num_entries,
- int *id)
+ int *id,
+ struct amdgpu_bo_list **list_out)
   {
   int r;
   struct amdgpu_fpriv *fpriv = filp->driver_priv; @@ -78,20 
+79,25 @@

static int amdgpu_bo_list_create(struct amdgpu_device *adev,
   return r;
   }
   +    if (id) {
   /* idr alloc should be called only after initialization of 
bo list. */

-    mutex_lock(>bo_list_lock);
-    r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
-    mutex_unlock(>bo_list_lock);
-    if (r < 0) {
-    amdgpu_bo_list_free(list);
-    return r;
+    mutex_lock(>bo_list_lock);
+    r = idr_alloc(>bo_list_handles, list, 1, 0, 
GFP_KERNEL);

+    mutex_unlock(>bo_

Re: [PATCH v2] drm/amdgpu: Allow to create BO lists in CS ioctl v2

2018-07-11 Thread zhoucm1
With more thinking for you performance reason, Can we go further more 
not to create temp bo list at all? directly add them into 
parser->validated list?


In fact, if bo array is very long, then overhead of bo list creation in 
CS is considerable, which will double iterate all BOs compared to original.


From UMD perspective, they don't create bo list for every CS, they 
could use old created bo_list for next several CS, if there is a new bo, 
they just add it.


Thanks,
David Zhou
On 2018年07月12日 12:02, zhoucm1 wrote:



On 2018年07月12日 11:09, Zhou, David(ChunMing) wrote:

Hi Andrey,

Could you add compatibility flag or increase kms driver version? So 
that user space can keep old path when using new one.

Sorry for noise, it's already at end of path.

Regards,
David Zhou


Regards,
David Zhou

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On 
Behalf Of zhoucm1

Sent: Thursday, July 12, 2018 10:31 AM
To: Grodzovsky, Andrey ; 
amd-gfx@lists.freedesktop.org
Cc: Olsak, Marek ; Koenig, Christian 

Subject: Re: [PATCH v2] drm/amdgpu: Allow to create BO lists in CS 
ioctl v2




On 2018年07月12日 04:57, Andrey Grodzovsky wrote:

This change is to support MESA performace optimization.
Modify CS IOCTL to allow its input as command buffer and an array of
buffer handles to create a temporay bo list and then destroy it when
IOCTL completes.
This saves on calling for BO_LIST create and destry IOCTLs in MESA and
by this improves performance.

v2: Avoid inserting the temp list into idr struct.

Signed-off-by: Andrey Grodzovsky 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 
   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 86 
++---

   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 51 +++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 +-
   include/uapi/drm/amdgpu_drm.h   |  1 +
   5 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8eaba0f..9b472b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -732,6 +732,17 @@ void amdgpu_bo_list_get_list(struct 
amdgpu_bo_list *list,

    struct list_head *validated);
   void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
   void amdgpu_bo_list_free(struct amdgpu_bo_list *list);
+int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in 
*in,

+  struct drm_amdgpu_bo_list_entry **info_param);
+
+int amdgpu_bo_list_create(struct amdgpu_device *adev,
+ struct drm_file *filp,
+ struct drm_amdgpu_bo_list_entry *info,
+ unsigned num_entries,
+ int *id,
+ struct amdgpu_bo_list **list);
+
+void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id);
      /*
    * GFX stuff
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 92be7f6..14c7c59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -55,11 +55,12 @@ static void amdgpu_bo_list_release_rcu(struct 
kref *ref)

   kfree_rcu(list, rhead);
   }
   -static int amdgpu_bo_list_create(struct amdgpu_device *adev,
+int amdgpu_bo_list_create(struct amdgpu_device *adev,
    struct drm_file *filp,
    struct drm_amdgpu_bo_list_entry *info,
    unsigned num_entries,
- int *id)
+ int *id,
+ struct amdgpu_bo_list **list_out)
   {
   int r;
   struct amdgpu_fpriv *fpriv = filp->driver_priv; @@ -78,20 
+79,25 @@

static int amdgpu_bo_list_create(struct amdgpu_device *adev,
   return r;
   }
   +    if (id) {
   /* idr alloc should be called only after initialization of bo 
list. */

-    mutex_lock(>bo_list_lock);
-    r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
-    mutex_unlock(>bo_list_lock);
-    if (r < 0) {
-    amdgpu_bo_list_free(list);
-    return r;
+    mutex_lock(>bo_list_lock);
+    r = idr_alloc(>bo_list_handles, list, 1, 0, 
GFP_KERNEL);

+    mutex_unlock(>bo_list_lock);
+    if (r < 0) {
+    amdgpu_bo_list_free(list);
+    return r;
+    }
+    *id = r;
   }
-    *id = r;
+
+    if (list_out)
+    *list_out = list;
      return 0;
   }
   -static void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int
id)
+void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id)
   {
   struct amdgpu_bo_list *list;
   @@ -263,53 +269,68 @@ void amdgpu_bo_list_free(struct 
amdgpu_bo_list *list)

   kfree(list);
   }
   -int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,
-    struct drm_file *filp)
+int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in 
*in,

+  s

Re: [PATCH v2] drm/amdgpu: Allow to create BO lists in CS ioctl v2

2018-07-11 Thread zhoucm1



On 2018年07月12日 11:09, Zhou, David(ChunMing) wrote:

Hi Andrey,

Could you add compatibility flag or increase kms driver version? So that user 
space can keep old path when using new one.

Sorry for noise, it's already at end of path.

Regards,
David Zhou


Regards,
David Zhou

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
zhoucm1
Sent: Thursday, July 12, 2018 10:31 AM
To: Grodzovsky, Andrey ; 
amd-gfx@lists.freedesktop.org
Cc: Olsak, Marek ; Koenig, Christian 

Subject: Re: [PATCH v2] drm/amdgpu: Allow to create BO lists in CS ioctl v2



On 2018年07月12日 04:57, Andrey Grodzovsky wrote:

This change is to support MESA performace optimization.
Modify CS IOCTL to allow its input as command buffer and an array of
buffer handles to create a temporay bo list and then destroy it when
IOCTL completes.
This saves on calling for BO_LIST create and destry IOCTLs in MESA and
by this improves performance.

v2: Avoid inserting the temp list into idr struct.

Signed-off-by: Andrey Grodzovsky 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 
   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 86 
++---
   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 51 +++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 +-
   include/uapi/drm/amdgpu_drm.h   |  1 +
   5 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8eaba0f..9b472b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -732,6 +732,17 @@ void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
 struct list_head *validated);
   void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
   void amdgpu_bo_list_free(struct amdgpu_bo_list *list);
+int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
+ struct drm_amdgpu_bo_list_entry 
**info_param);
+
+int amdgpu_bo_list_create(struct amdgpu_device *adev,
+struct drm_file *filp,
+struct drm_amdgpu_bo_list_entry *info,
+unsigned num_entries,
+int *id,
+struct amdgpu_bo_list **list);
+
+void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id);
   
   /*

* GFX stuff
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 92be7f6..14c7c59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -55,11 +55,12 @@ static void amdgpu_bo_list_release_rcu(struct kref *ref)
kfree_rcu(list, rhead);
   }
   
-static int amdgpu_bo_list_create(struct amdgpu_device *adev,

+int amdgpu_bo_list_create(struct amdgpu_device *adev,
 struct drm_file *filp,
 struct drm_amdgpu_bo_list_entry *info,
 unsigned num_entries,
-int *id)
+int *id,
+struct amdgpu_bo_list **list_out)
   {
int r;
struct amdgpu_fpriv *fpriv = filp->driver_priv; @@ -78,20 +79,25 @@
static int amdgpu_bo_list_create(struct amdgpu_device *adev,
return r;
}
   
+	if (id) {

/* idr alloc should be called only after initialization of bo list. */
-   mutex_lock(>bo_list_lock);
-   r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
-   mutex_unlock(>bo_list_lock);
-   if (r < 0) {
-   amdgpu_bo_list_free(list);
-   return r;
+   mutex_lock(>bo_list_lock);
+   r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
+   mutex_unlock(>bo_list_lock);
+   if (r < 0) {
+   amdgpu_bo_list_free(list);
+   return r;
+   }
+   *id = r;
}
-   *id = r;
+
+   if (list_out)
+   *list_out = list;
   
   	return 0;

   }
   
-static void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int

id)
+void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id)
   {
struct amdgpu_bo_list *list;
   
@@ -263,53 +269,68 @@ void amdgpu_bo_list_free(struct amdgpu_bo_list *list)

kfree(list);
   }
   
-int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,

-   struct drm_file *filp)
+int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
+ struct drm_amdgpu_bo_list_entry 
**info_param)
   {
-   const uint32_t info_size = sizeof(struct drm_amdgpu_bo_list_entry);
-
-   struct amdgpu_device *adev = dev->dev_private;
-   struct amdgpu_fp

Re: [PATCH v2] drm/amdgpu: Allow to create BO lists in CS ioctl v2

2018-07-11 Thread zhoucm1



On 2018年07月12日 04:57, Andrey Grodzovsky wrote:

This change is to support MESA performace optimization.
Modify CS IOCTL to allow its input as command buffer and an array of
buffer handles to create a temporay bo list and then destroy it
when IOCTL completes.
This saves on calling for BO_LIST create and destry IOCTLs in MESA
and by this improves performance.

v2: Avoid inserting the temp list into idr struct.

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 86 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 51 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 +-
  include/uapi/drm/amdgpu_drm.h   |  1 +
  5 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8eaba0f..9b472b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -732,6 +732,17 @@ void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
 struct list_head *validated);
  void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
  void amdgpu_bo_list_free(struct amdgpu_bo_list *list);
+int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
+ struct drm_amdgpu_bo_list_entry 
**info_param);
+
+int amdgpu_bo_list_create(struct amdgpu_device *adev,
+struct drm_file *filp,
+struct drm_amdgpu_bo_list_entry *info,
+unsigned num_entries,
+int *id,
+struct amdgpu_bo_list **list);
+
+void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id);
  
  /*

   * GFX stuff
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 92be7f6..14c7c59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -55,11 +55,12 @@ static void amdgpu_bo_list_release_rcu(struct kref *ref)
kfree_rcu(list, rhead);
  }
  
-static int amdgpu_bo_list_create(struct amdgpu_device *adev,

+int amdgpu_bo_list_create(struct amdgpu_device *adev,
 struct drm_file *filp,
 struct drm_amdgpu_bo_list_entry *info,
 unsigned num_entries,
-int *id)
+int *id,
+struct amdgpu_bo_list **list_out)
  {
int r;
struct amdgpu_fpriv *fpriv = filp->driver_priv;
@@ -78,20 +79,25 @@ static int amdgpu_bo_list_create(struct amdgpu_device *adev,
return r;
}
  
+	if (id) {

/* idr alloc should be called only after initialization of bo list. */
-   mutex_lock(>bo_list_lock);
-   r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
-   mutex_unlock(>bo_list_lock);
-   if (r < 0) {
-   amdgpu_bo_list_free(list);
-   return r;
+   mutex_lock(>bo_list_lock);
+   r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
+   mutex_unlock(>bo_list_lock);
+   if (r < 0) {
+   amdgpu_bo_list_free(list);
+   return r;
+   }
+   *id = r;
}
-   *id = r;
+
+   if (list_out)
+   *list_out = list;
  
  	return 0;

  }
  
-static void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id)

+void amdgpu_bo_list_destroy(struct amdgpu_fpriv *fpriv, int id)
  {
struct amdgpu_bo_list *list;
  
@@ -263,53 +269,68 @@ void amdgpu_bo_list_free(struct amdgpu_bo_list *list)

kfree(list);
  }
  
-int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,

-   struct drm_file *filp)
+int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
+ struct drm_amdgpu_bo_list_entry 
**info_param)
  {
-   const uint32_t info_size = sizeof(struct drm_amdgpu_bo_list_entry);
-
-   struct amdgpu_device *adev = dev->dev_private;
-   struct amdgpu_fpriv *fpriv = filp->driver_priv;
-   union drm_amdgpu_bo_list *args = data;
-   uint32_t handle = args->in.list_handle;
-   const void __user *uptr = u64_to_user_ptr(args->in.bo_info_ptr);
-
struct drm_amdgpu_bo_list_entry *info;
-   struct amdgpu_bo_list *list;
-
int r;
+   const void __user *uptr = u64_to_user_ptr(in->bo_info_ptr);
+   const uint32_t info_size = sizeof(struct drm_amdgpu_bo_list_entry);
  
-	info = kvmalloc_array(args->in.bo_number,

+   info = kvmalloc_array(in->bo_number,
 sizeof(struct drm_amdgpu_bo_list_entry), 
GFP_KERNEL);
if (!info)

Re: [PATCH libdrm] amdgpu: add amdgpu_bo_handle_type_kms_noimport

2018-07-11 Thread zhoucm1



On 2018年07月12日 08:47, Marek Olšák wrote:

From: Marek Olšák 
less patch comment to describe why amdgpu_bo_handle_type_kms doesn't 
meet requriement and what patch does.

less Signed-off-by.



---
  amdgpu/amdgpu.h| 7 ++-
  amdgpu/amdgpu_bo.c | 4 
  2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 36f91058..be83b457 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -77,21 +77,26 @@ struct drm_amdgpu_info_hw_ip;
   *
  */
  enum amdgpu_bo_handle_type {
/** GEM flink name (needs DRM authentication, used by DRI2) */
amdgpu_bo_handle_type_gem_flink_name = 0,
  
  	/** KMS handle which is used by all driver ioctls */

amdgpu_bo_handle_type_kms = 1,
  
  	/** DMA-buf fd handle */

-   amdgpu_bo_handle_type_dma_buf_fd = 2
+   amdgpu_bo_handle_type_dma_buf_fd = 2,
+
+   /** KMS handle, but re-importing as a DMABUF handle through
+*  drmPrimeHandleToFD is forbidden. (Glamor does that)
+*/
+   amdgpu_bo_handle_type_kms_noimport = 3,
I'm always curious that these enum members are lowercase, could we 
change them to uppercase by this time?


Thanks,
David Zhou

  };
  
  /** Define known types of GPU VM VA ranges */

  enum amdgpu_gpu_va_range
  {
/** Allocate from "normal"/general range */
amdgpu_gpu_va_range_general = 0
  };
  
  enum amdgpu_sw_info {

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 9e37b149..d29be244 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -234,20 +234,22 @@ int amdgpu_bo_export(amdgpu_bo_handle bo,
case amdgpu_bo_handle_type_gem_flink_name:
r = amdgpu_bo_export_flink(bo);
if (r)
return r;
  
  		*shared_handle = bo->flink_name;

return 0;
  
  	case amdgpu_bo_handle_type_kms:

amdgpu_add_handle_to_table(bo);
+   /* fall through */
+   case amdgpu_bo_handle_type_kms_noimport:
*shared_handle = bo->handle;
return 0;
  
  	case amdgpu_bo_handle_type_dma_buf_fd:

amdgpu_add_handle_to_table(bo);
return drmPrimeHandleToFD(bo->dev->fd, bo->handle,
  DRM_CLOEXEC | DRM_RDWR,
  (int*)shared_handle);
}
return -EINVAL;
@@ -299,20 +301,21 @@ int amdgpu_bo_import(amdgpu_device_handle dev,
bo = util_hash_table_get(dev->bo_flink_names,
 (void*)(uintptr_t)shared_handle);
break;
  
  	case amdgpu_bo_handle_type_dma_buf_fd:

bo = util_hash_table_get(dev->bo_handles,
 (void*)(uintptr_t)shared_handle);
break;
  
  	case amdgpu_bo_handle_type_kms:

+   case amdgpu_bo_handle_type_kms_noimport:
/* Importing a KMS handle in not allowed. */
pthread_mutex_unlock(>bo_table_mutex);
return -EPERM;
  
  	default:

pthread_mutex_unlock(>bo_table_mutex);
return -EINVAL;
}
  
  	if (bo) {

@@ -368,20 +371,21 @@ int amdgpu_bo_import(amdgpu_device_handle dev,
util_hash_table_set(dev->bo_flink_names,
(void*)(uintptr_t)bo->flink_name, bo);
break;
  
  	case amdgpu_bo_handle_type_dma_buf_fd:

bo->handle = shared_handle;
bo->alloc_size = dma_buf_size;
break;
  
  	case amdgpu_bo_handle_type_kms:

+   case amdgpu_bo_handle_type_kms_noimport:
assert(0); /* unreachable */
}
  
  	/* Initialize it. */

atomic_set(>refcount, 1);
bo->dev = dev;
pthread_mutex_init(>cpu_access_mutex, NULL);
  
  	util_hash_table_set(dev->bo_handles, (void*)(uintptr_t)bo->handle, bo);

pthread_mutex_unlock(>bo_table_mutex);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Verify root PD is mapped into kernel address space.

2018-07-04 Thread zhoucm1



On 2018年07月05日 03:49, Andrey Grodzovsky wrote:

Problem: When PD/PT update made by CPU root PD was not yet mapped causing
page fault.

Fix: Move amdgpu_bo_kmap into amdgpu_vm_bo_base_init to cover
all cases and avoid code duplication with amdgpu_vm_alloc_levels.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=107065
Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 47 ++
  1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 845f73a..f546afa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -143,25 +143,36 @@ struct amdgpu_prt_cb {
   * Initialize a bo_va_base structure and add it to the appropriate lists
   *
   */
-static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
+static int amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
   struct amdgpu_vm *vm,
   struct amdgpu_bo *bo)
  {
+   int r = 0;
base->vm = vm;
base->bo = bo;
INIT_LIST_HEAD(>bo_list);
INIT_LIST_HEAD(>vm_status);
  
  	if (!bo)

-   return;
+   return r;
+
list_add_tail(>bo_list, >va);
  
+	if (vm->use_cpu_for_update && bo->tbo.type == ttm_bo_type_kernel) {

+   r = amdgpu_bo_kmap(bo, NULL);
+   if (r) {
+   amdgpu_bo_unref(>shadow);
+   amdgpu_bo_unref();
I feel these two lines should move out of helper function, ref/unref 
should appear in where used with pair.


Regards,
David Zhou

+   return r;
+   }
+   }
+
if (bo->tbo.resv != vm->root.base.bo->tbo.resv)
-   return;
+   return r;
  
  	if (bo->preferred_domains &

amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type))
-   return;
+   return r;
  
  	/*

 * we checked all the prerequisites, but it looks like this per vm bo
@@ -169,6 +180,8 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base 
*base,
 * is validated on next vm use to avoid fault.
 * */
list_move_tail(>vm_status, >evicted);
+
+   return r;
  }
  
  /**

@@ -525,21 +538,15 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
return r;
}
  
-			if (vm->use_cpu_for_update) {

-   r = amdgpu_bo_kmap(pt, NULL);
-   if (r) {
-   amdgpu_bo_unref(>shadow);
-   amdgpu_bo_unref();
-   return r;
-   }
-   }
-
/* Keep a reference to the root directory to avoid
* freeing them up in the wrong order.
*/
pt->parent = amdgpu_bo_ref(parent->base.bo);
  
-			amdgpu_vm_bo_base_init(>base, vm, pt);

+   r = amdgpu_vm_bo_base_init(>base, vm, pt);
+   if (r)
+   return r;
+
list_move(>base.vm_status, >relocated);
}
  
@@ -1992,12 +1999,17 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev,

  struct amdgpu_bo *bo)
  {
struct amdgpu_bo_va *bo_va;
+   int r;
  
  	bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);

if (bo_va == NULL) {
return NULL;
}
-   amdgpu_vm_bo_base_init(_va->base, vm, bo);
+
+   if ((r = amdgpu_vm_bo_base_init(_va->base, vm, bo))) {
+   WARN_ONCE(1,"r = %d\n", r);
+   return NULL;
+   }
  
  	bo_va->ref_count = 1;

INIT_LIST_HEAD(_va->valids);
@@ -2613,7 +2625,10 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
if (r)
goto error_unreserve;
  
-	amdgpu_vm_bo_base_init(>root.base, vm, root);

+   r = amdgpu_vm_bo_base_init(>root.base, vm, root);
+   if (r)
+   goto error_unreserve;
+
amdgpu_bo_unreserve(vm->root.base.bo);
  
  	if (pasid) {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 2/2] drm/admgpu: Present amdgpu_task_info in VM_FAULTS.

2018-07-04 Thread zhoucm1



On 2018年07月04日 23:04, Andrey Grodzovsky wrote:

Extract and present the reposnsible process and thread when
VM_FAULT happens.

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 10 --
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  9 +++--
  3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 7a625f3..1c483ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -187,6 +187,18 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, void *data)
if (p->uf_entry.robj)
p->job->uf_addr = uf_offset;
kfree(chunk_array);
+
+   /* Use this opportunity to fill in task info for the vm */
+   if (!vm->task_info.pid) {
+   vm->task_info.pid = current->pid;
+   get_task_comm(vm->task_info.task_name, current);
+
+   if (current->group_leader->mm == current->mm) {
+   vm->task_info.tgid = current->group_leader->pid;
+   get_task_comm(vm->task_info.process_name, 
current->group_leader);
+   }
+   }
+

you can wrap this segment to  a function like amdgpu_vm_set_task_info.



return 0;
  
  free_all_kdata:

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 08753e7..7ad19f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -46,6 +46,7 @@
  
  #include "ivsrcid/ivsrcid_vislands30.h"
  
+#include "amdgpu_vm.h"
  
  static void gmc_v8_0_set_gmc_funcs(struct amdgpu_device *adev);

  static void gmc_v8_0_set_irq_funcs(struct amdgpu_device *adev);
@@ -1449,8 +1450,13 @@ static int gmc_v8_0_process_interrupt(struct 
amdgpu_device *adev,
gmc_v8_0_set_fault_enable_default(adev, false);
  
  	if (printk_ratelimit()) {

-   dev_err(adev->dev, "GPU fault detected: %d 0x%08x\n",
-   entry->src_id, entry->src_data[0]);
+   struct amdgpu_task_info task_info = { 0 };
+
+   amdgpu_vm_task_info(adev, entry->pasid, _info);

you can rename this function to amdgpu_vm_get_task_info.

general, it looks very good to me and does what I want to do before.

Thanks,
David Zhou

+
+   dev_err(adev->dev, "GPU fault detected: %d 0x%08x for process %s pid 
%d thread %s pid %d\n",
+   entry->src_id, entry->src_data[0], 
task_info.process_name,
+   task_info.tgid, task_info.task_name, task_info.pid);
dev_err(adev->dev, "  VM_CONTEXT1_PROTECTION_FAULT_ADDR   
0x%08X\n",
addr);
dev_err(adev->dev, "  VM_CONTEXT1_PROTECTION_FAULT_STATUS 
0x%08X\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 691a659..384a89c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -259,11 +259,16 @@ static int gmc_v9_0_process_interrupt(struct 
amdgpu_device *adev,
}
  
  	if (printk_ratelimit()) {

+   struct amdgpu_task_info task_info = { 0 };
+
+   amdgpu_vm_task_info(adev, entry->pasid, _info);
+
dev_err(adev->dev,
-   "[%s] VMC page fault (src_id:%u ring:%u vmid:%u 
pasid:%u)\n",
+   "[%s] VMC page fault (src_id:%u ring:%u vmid:%u pasid:%u, 
for process %s pid %d thread %s pid %d\n)\n",
entry->vmid_src ? "mmhub" : "gfxhub",
entry->src_id, entry->ring_id, entry->vmid,
-   entry->pasid);
+   entry->pasid, task_info.process_name, task_info.tgid,
+   task_info.task_name, task_info.pid);
dev_err(adev->dev, "  at page 0x%016llx from %d\n",
addr, entry->client_id);
if (!amdgpu_sriov_vf(adev))


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Add AMDGPU_GPU_PAGES_IN_CPU_PAGE define

2018-06-24 Thread zhoucm1

one question to you:

Did you consider the case that GPU_PAGE_SIZE > CPU_PAGE_SIZE? What 
happens if the case is ture?



Regards,

David Zhou


On 2018年06月23日 03:50, Alex Deucher wrote:

On Fri, Jun 22, 2018 at 12:54 PM, Michel Dänzer  wrote:

From: Michel Dänzer 

To hopefully make the code dealing with GPU vs CPU pages a little
clearer.

Suggested-by: Christian König 
Signed-off-by: Michel Dänzer 

Reviewed-by: Alex Deucher 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 8 
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h | 2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 8 
  3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 17d6b9fb6d77..e3bf0e7bfad2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -234,7 +234,7 @@ int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t 
offset,
 }

 t = offset / AMDGPU_GPU_PAGE_SIZE;
-   p = t / (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE);
+   p = t / AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 for (i = 0; i < pages; i++, p++) {
  #ifdef CONFIG_DRM_AMDGPU_GART_DEBUGFS
 adev->gart.pages[p] = NULL;
@@ -243,7 +243,7 @@ int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t 
offset,
 if (!adev->gart.ptr)
 continue;

-   for (j = 0; j < (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE); j++, t++) {
+   for (j = 0; j < AMDGPU_GPU_PAGES_IN_CPU_PAGE; j++, t++) {
 amdgpu_gmc_set_pte_pde(adev, adev->gart.ptr,
t, page_base, flags);
 page_base += AMDGPU_GPU_PAGE_SIZE;
@@ -282,7 +282,7 @@ int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t 
offset,

 for (i = 0; i < pages; i++) {
 page_base = dma_addr[i];
-   for (j = 0; j < (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE); j++, t++) {
+   for (j = 0; j < AMDGPU_GPU_PAGES_IN_CPU_PAGE; j++, t++) {
 amdgpu_gmc_set_pte_pde(adev, dst, t, page_base, flags);
 page_base += AMDGPU_GPU_PAGE_SIZE;
 }
@@ -319,7 +319,7 @@ int amdgpu_gart_bind(struct amdgpu_device *adev, uint64_t 
offset,

  #ifdef CONFIG_DRM_AMDGPU_GART_DEBUGFS
 t = offset / AMDGPU_GPU_PAGE_SIZE;
-   p = t / (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE);
+   p = t / AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 for (i = 0; i < pages; i++, p++)
 adev->gart.pages[p] = pagelist ? pagelist[i] : NULL;
  #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index 456295c00291..9f9e9dc87da1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -37,6 +37,8 @@ struct amdgpu_bo;
  #define AMDGPU_GPU_PAGE_SHIFT 12
  #define AMDGPU_GPU_PAGE_ALIGN(a) (((a) + AMDGPU_GPU_PAGE_MASK) & 
~AMDGPU_GPU_PAGE_MASK)

+#define AMDGPU_GPU_PAGES_IN_CPU_PAGE (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE)
+
  struct amdgpu_gart {
 u64 table_addr;
 struct amdgpu_bo*robj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 75579200f4a6..0f6d287f54c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1571,7 +1571,7 @@ static int amdgpu_vm_bo_split_mapping(struct 
amdgpu_device *adev,
 if (nodes) {
 addr = nodes->start << PAGE_SHIFT;
 max_entries = (nodes->size - pfn) *
-   (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE);
+   AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 } else {
 addr = 0;
 max_entries = S64_MAX;
@@ -1582,7 +1582,7 @@ static int amdgpu_vm_bo_split_mapping(struct 
amdgpu_device *adev,

 max_entries = min(max_entries, 16ull * 1024ull);
 for (count = 1;
-count < max_entries / (PAGE_SIZE / 
AMDGPU_GPU_PAGE_SIZE);
+count < max_entries / AMDGPU_GPU_PAGES_IN_CPU_PAGE;
  ++count) {
 uint64_t idx = pfn + count;

@@ -1596,7 +1596,7 @@ static int amdgpu_vm_bo_split_mapping(struct 
amdgpu_device *adev,
 dma_addr = pages_addr;
 } else {
 addr = pages_addr[pfn];
-   max_entries = count * (PAGE_SIZE / 
AMDGPU_GPU_PAGE_SIZE);
+   max_entries = count * 
AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 }

 } else if (flags & AMDGPU_PTE_VALID) {
@@ -1611,7 +1611,7 @@ static int amdgpu_vm_bo_split_mapping(struct 

Re: [PATCH] drm/amdgpu: update ib_start/size_alignment same as windows used

2018-06-15 Thread zhoucm1
Marek, Can I get your RB or Acked on this patches? Since these info are 
reported to UMD.



Thanks,

David Zhou


On 2018年06月15日 15:22, zhoucm1 wrote:



On 2018年06月15日 15:16, Zhang, Jerry wrote:

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On 
Behalf Of

Christian K?nig
Sent: Friday, June 15, 2018 15:09
To: Zhou, David(ChunMing) ; amd-
g...@lists.freedesktop.org
Cc: Olsak, Marek ; Ryan, Sean 
Subject: Re: [PATCH] drm/amdgpu: update ib_start/size_alignment same as
windows used

Am 15.06.2018 um 08:45 schrieb Chunming Zhou:

PAGE_SIZE for start_alignment is far much than hw requirement, And
now, update to expereince value from window side.

Change-Id: I08a7e72076386c32faf36ec4812b30e68dde23e5
Signed-off-by: Chunming Zhou 

Acked-by: Christian König 

Acked-by: Junwei Zhang 

BTW, any issue it fixes?
Yes, as talked in internal brahma list " whether ib_start_alignment is 
proper", which fixes some PAL assert checking.


Regards,
David Zhou


Jerry


---
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 28 
++-

-

   1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 392dd24e83f5..d041dddaad0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -329,35 +329,35 @@ static int amdgpu_info_ioctl(struct 
drm_device *dev,

void *data, struct drm_file

   type = AMD_IP_BLOCK_TYPE_GFX;
   for (i = 0; i < adev->gfx.num_gfx_rings; i++)
   ring_mask |= ((adev->gfx.gfx_ring[i].ready ? 1 : 0)

<< i);

-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-    ib_size_alignment = 8;
+    ib_start_alignment = 32;
+    ib_size_alignment = 32;
   break;
   case AMDGPU_HW_IP_COMPUTE:
   type = AMD_IP_BLOCK_TYPE_GFX;
   for (i = 0; i < adev->gfx.num_compute_rings; i++)
   ring_mask |= ((adev->gfx.compute_ring[i].ready ?

1 : 0) << i);

-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-    ib_size_alignment = 8;
+    ib_start_alignment = 32;
+    ib_size_alignment = 32;
   break;
   case AMDGPU_HW_IP_DMA:
   type = AMD_IP_BLOCK_TYPE_SDMA;
   for (i = 0; i < adev->sdma.num_instances; i++)
   ring_mask |= ((adev-
sdma.instance[i].ring.ready ? 1 : 0) << i);
-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-    ib_size_alignment = 1;
+    ib_start_alignment = 256;
+    ib_size_alignment = 4;
   break;
   case AMDGPU_HW_IP_UVD:
   type = AMD_IP_BLOCK_TYPE_UVD;
   for (i = 0; i < adev->uvd.num_uvd_inst; i++)
   ring_mask |= ((adev->uvd.inst[i].ring.ready ? 1 
: 0)

<< i);

-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-    ib_size_alignment = 16;
+    ib_start_alignment = 64;
+    ib_size_alignment = 64;
   break;
   case AMDGPU_HW_IP_VCE:
   type = AMD_IP_BLOCK_TYPE_VCE;
   for (i = 0; i < adev->vce.num_rings; i++)
   ring_mask |= ((adev->vce.ring[i].ready ? 1 : 0) <<

i);

-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
+    ib_start_alignment = 4;
   ib_size_alignment = 1;
   break;
   case AMDGPU_HW_IP_UVD_ENC:
@@ -367,26 +367,26 @@ static int amdgpu_info_ioctl(struct 
drm_device *dev,

void *data, struct drm_file

   ring_mask |=
((adev->uvd.inst[i].ring_enc[j].ready ? 1 :

0) <<

   (j + i * adev->uvd.num_enc_rings));
-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-    ib_size_alignment = 1;
+    ib_start_alignment = 64;
+    ib_size_alignment = 64;
   break;
   case AMDGPU_HW_IP_VCN_DEC:
   type = AMD_IP_BLOCK_TYPE_VCN;
   ring_mask = adev->vcn.ring_dec.ready ? 1 : 0;
-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
+    ib_start_alignment = 16;
   ib_size_alignment = 16;
   break;
   case AMDGPU_HW_IP_VCN_ENC:
   type = AMD_IP_BLOCK_TYPE_VCN;
   for (i = 0; i < adev->vcn.num_enc_rings; i++)
   ring_mask |= ((adev->vcn.ring_enc[i].ready ? 1 : 0)

<< i);

-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
+    ib_start_alignment = 64;
   ib_size_alignment = 1;
   break;
   case AMDGPU_HW_IP_VCN_JPEG:
   type = AMD_IP_BLOCK_TYPE_VCN;
   ring_mask = adev->vcn.ring_jpeg.ready ? 1 : 0;
-    ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
+    ib_start_alignment = 16;
   ib_

Re: [PATCH] drm/amdgpu: update ib_start/size_alignment same as windows used

2018-06-15 Thread zhoucm1



On 2018年06月15日 15:16, Zhang, Jerry wrote:

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of
Christian K?nig
Sent: Friday, June 15, 2018 15:09
To: Zhou, David(ChunMing) ; amd-
g...@lists.freedesktop.org
Cc: Olsak, Marek ; Ryan, Sean 
Subject: Re: [PATCH] drm/amdgpu: update ib_start/size_alignment same as
windows used

Am 15.06.2018 um 08:45 schrieb Chunming Zhou:

PAGE_SIZE for start_alignment is far much than hw requirement, And
now, update to expereince value from window side.

Change-Id: I08a7e72076386c32faf36ec4812b30e68dde23e5
Signed-off-by: Chunming Zhou 

Acked-by: Christian König 

Acked-by: Junwei Zhang 

BTW, any issue it fixes?
Yes, as talked in internal brahma list " whether ib_start_alignment is 
proper", which fixes some PAL assert checking.


Regards,
David Zhou


Jerry


---
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 28 ++-

-

   1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 392dd24e83f5..d041dddaad0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -329,35 +329,35 @@ static int amdgpu_info_ioctl(struct drm_device *dev,

void *data, struct drm_file

type = AMD_IP_BLOCK_TYPE_GFX;
for (i = 0; i < adev->gfx.num_gfx_rings; i++)
ring_mask |= ((adev->gfx.gfx_ring[i].ready ? 1 
: 0)

<< i);

-   ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-   ib_size_alignment = 8;
+   ib_start_alignment = 32;
+   ib_size_alignment = 32;
break;
case AMDGPU_HW_IP_COMPUTE:
type = AMD_IP_BLOCK_TYPE_GFX;
for (i = 0; i < adev->gfx.num_compute_rings; i++)
ring_mask |= ((adev->gfx.compute_ring[i].ready ?

1 : 0) << i);

-   ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-   ib_size_alignment = 8;
+   ib_start_alignment = 32;
+   ib_size_alignment = 32;
break;
case AMDGPU_HW_IP_DMA:
type = AMD_IP_BLOCK_TYPE_SDMA;
for (i = 0; i < adev->sdma.num_instances; i++)
ring_mask |= ((adev-
sdma.instance[i].ring.ready ? 1 : 0) << i);
-   ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-   ib_size_alignment = 1;
+   ib_start_alignment = 256;
+   ib_size_alignment = 4;
break;
case AMDGPU_HW_IP_UVD:
type = AMD_IP_BLOCK_TYPE_UVD;
for (i = 0; i < adev->uvd.num_uvd_inst; i++)
ring_mask |= ((adev->uvd.inst[i].ring.ready ? 1 
: 0)

<< i);

-   ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-   ib_size_alignment = 16;
+   ib_start_alignment = 64;
+   ib_size_alignment = 64;
break;
case AMDGPU_HW_IP_VCE:
type = AMD_IP_BLOCK_TYPE_VCE;
for (i = 0; i < adev->vce.num_rings; i++)
ring_mask |= ((adev->vce.ring[i].ready ? 1 : 0) 
<<

i);

-   ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
+   ib_start_alignment = 4;
ib_size_alignment = 1;
break;
case AMDGPU_HW_IP_UVD_ENC:
@@ -367,26 +367,26 @@ static int amdgpu_info_ioctl(struct drm_device *dev,

void *data, struct drm_file

ring_mask |=
((adev->uvd.inst[i].ring_enc[j].ready ? 
1 :

0) <<

(j + i * adev->uvd.num_enc_rings));
-   ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
-   ib_size_alignment = 1;
+   ib_start_alignment = 64;
+   ib_size_alignment = 64;
break;
case AMDGPU_HW_IP_VCN_DEC:
type = AMD_IP_BLOCK_TYPE_VCN;
ring_mask = adev->vcn.ring_dec.ready ? 1 : 0;
-   ib_start_alignment = AMDGPU_GPU_PAGE_SIZE;
+   ib_start_alignment = 16;
ib_size_alignment = 16;
break;
case AMDGPU_HW_IP_VCN_ENC:
type = AMD_IP_BLOCK_TYPE_VCN;
for (i = 0; i < adev->vcn.num_enc_rings; i++)
ring_mask |= ((adev->vcn.ring_enc[i].ready ? 1 
: 0)

<< i);

-   

Re: [PATCH] drm/amdgpu: remove unused parameter for va update

2018-06-12 Thread zhoucm1



On 2018年06月12日 14:01, Junwei Zhang wrote:

Don't need validation list any more

Signed-off-by: Junwei Zhang 

Reviewed-by: Chunming Zhou 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 5fb156a..eff716d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -510,7 +510,6 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void 
*data,
   * @adev: amdgpu_device pointer
   * @vm: vm to update
   * @bo_va: bo_va to update
- * @list: validation list
   * @operation: map, unmap or clear
   *
   * Update the bo_va directly after setting its address. Errors are not
@@ -519,7 +518,6 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void 
*data,
  static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
struct amdgpu_vm *vm,
struct amdgpu_bo_va *bo_va,
-   struct list_head *list,
uint32_t operation)
  {
int r;
@@ -673,7 +671,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
break;
}
if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !amdgpu_vm_debug)
-   amdgpu_gem_va_update_vm(adev, >vm, bo_va, ,
+   amdgpu_gem_va_update_vm(adev, >vm, bo_va,
args->operation);
  
  error_backoff:


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


  1   2   3   4   >