AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

2023-02-02 Thread Koenig, Christian
Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.


Von: Xiao, Jack 
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org ; Deucher, 
Alexander ; Koenig, Christian 

Cc: Xiao, Jack 
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 
*gpu_addr,
 if (*bo == NULL)
 return;

-   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+   WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+   
!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

 if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 if (cpu_addr)
--
2.37.3



AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Koenig, Christian
We don't need that.

TTM only reschedules when the BOs are still busy.

And if the BOs are still busy when you unload the driver we have much bigger 
problems that this TTM worker :)

Regards,
Christian


Von: Pan, Xinhui 
Gesendet: Mittwoch, 13. April 2022 05:08
An: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Koenig, Christian 
; Pan, Xinhui 
Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

ttm_device_delayed_workqueue would reschedule itself if there is pending
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.

Fix it by waiting all BO to be destroyed.

Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct 
amdgpu_device *adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 {
+   int pending = 1;
+
 dev_info(adev->dev, "amdgpu: finishing device.\n");
 flush_delayed_work(>delayed_init_work);
-   if (adev->mman.initialized) {
+   while (adev->mman.initialized && pending) {
 flush_delayed_work(>mman.bdev.wq);
-   ttm_bo_lock_delayed_workqueue(>mman.bdev);
+   pending = ttm_bo_lock_delayed_workqueue(>mman.bdev);
+   if (pending) {
+   ttm_bo_unlock_delayed_workqueue(>mman.bdev, true);
+   msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+   }
 }
 adev->shutdown = true;

--
2.25.1



AW: [PATCH] drm/amdgpu: fix some repeated includings

2021-09-30 Thread Koenig, Christian
Seconded, there is one include for each hardware version.

At least of hand I don't see a duplicate.

Von: Simon Ser 
Gesendet: Donnerstag, 30. September 2021 12:17
An: Guo Zhengkui 
Cc: Deucher, Alexander ; Koenig, Christian 
; Pan, Xinhui ; David Airlie 
; Daniel Vetter ; Chen, Guchun 
; Zhou, Peng Ju ; Zhang, Bokun 
; Gao, Likun ; 
amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org ; 
linux-ker...@vger.kernel.org ; ker...@vivo.com 

Betreff: Re: [PATCH] drm/amdgpu: fix some repeated includings

One include is v2, the other is v3, or am I missing something?


AW: [PATCH 2/2] drm/amdgpu: Use mod_delayed_work in JPEG/UVD/VCE/VCN ring_end_use hooks

2021-08-11 Thread Koenig, Christian
Hi James,

Evan seems to have understood how this all works together.

See while any begin/end use critical section is active the work should not be 
active.

When you handle only one ring you can just call cancel in begin use and 
schedule in end use. But when you have more than one ring you need a lock or 
counter to prevent concurrent work items to be started.

Michelle's idea to use mod_delayed_work is a bad one because it assumes that 
the delayed work is still running.

Something similar applies to the first patch I think, so when this makes a 
difference it is actually a bug.

Regards,
Christian.

Von: Quan, Evan 
Gesendet: Donnerstag, 12. August 2021 04:42
An: Koenig, Christian ; Michel Dänzer 
; Deucher, Alexander 
Cc: Liu, Leo ; Zhu, James ; 
amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org 
Betreff: RE: [PATCH 2/2] drm/amdgpu: Use mod_delayed_work in JPEG/UVD/VCE/VCN 
ring_end_use hooks


[AMD Official Use Only]



Different from the 1st patch(for amdgpu_gfx_off_ctrl) of the series, 
“cancel_delayed_work_sync(>uvd.idle_work)” will be called on like 
amdgpu_uvd_ring_begin_use().  Under this case, does it make any difference from 
previous implementation ”schedule_delayed_work”?

Suppose the sequence is as below:

  *   Ring begin use
  *   Ring end use -->  mod_delayed_work() : queue a new delayed work, right?
  *   Ring begin use (within 1s) --> cancel_delayed_work_sync() will cancel the 
work submitted above, right?
  *   Ring end use  --> mod_delayed_work(): queue another new scheduled work, 
same as previous “schedule_delayed_work”?



BR

Evan

From: amd-gfx  On Behalf Of Koenig, 
Christian
Sent: Thursday, August 12, 2021 5:34 AM
To: Michel Dänzer ; Deucher, Alexander 

Cc: Liu, Leo ; Zhu, James ; 
amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
Subject: AW: [PATCH 2/2] drm/amdgpu: Use mod_delayed_work in JPEG/UVD/VCE/VCN 
ring_end_use hooks



NAK to at least this patch.



Since activating power management while submitting work is problematic 
cancel_delayed_work() must have been called during begin use or otherwise we 
have a serious coding problem in the first place.



So this change shouldn't make a difference and I suggest to really stick with 
schedule_delayed_work().



Maybe add a comment how this works?



Need to take a closer look at the first patch when I'm back from vacation, but 
it could be that this applies there as well.



Regards,

Christian.





Von: Michel Dänzer mailto:mic...@daenzer.net>>
Gesendet: Mittwoch, 11. August 2021 18:52
An: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; Koenig, 
Christian mailto:christian.koe...@amd.com>>
Cc: Liu, Leo mailto:leo@amd.com>>; Zhu, James 
mailto:james@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>; 
dri-de...@lists.freedesktop.org<mailto:dri-de...@lists.freedesktop.org> 
mailto:dri-de...@lists.freedesktop.org>>
Betreff: [PATCH 2/2] drm/amdgpu: Use mod_delayed_work in JPEG/UVD/VCE/VCN 
ring_end_use hooks



From: Michel Dänzer mailto:mdaen...@redhat.com>>

In contrast to schedule_delayed_work, this pushes back the work if it
was already scheduled before. Specific behaviour change:

Before:

The scheduled work ran ~1 second after the first time ring_end_use was
called, even if the ring was used again during that second.

After:

The scheduled work runs ~1 second after the last time ring_end_use is
called.

Inspired by the corresponding change in amdgpu_gfx_off_ctrl. While I
haven't run into specific issues in this case, the new behaviour makes
more sense to me.

Signed-off-by: Michel Dänzer mailto:mdaen...@redhat.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
index 8996cb4ed57a..2c0040153f6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
@@ -110,7 +110,7 @@ void amdgpu_jpeg_ring_begin_use(struct amdgpu_ring *ring)
 void amdgpu_jpeg_ring_end_use(struct amdgpu_ring *ring)
 {
 atomic_dec(>adev->jpeg.total_submission_cnt);
-   schedule_delayed_work(>adev->jpeg.idle_work, JPEG_IDLE_TIMEOUT);
+   mod_delayed_work(system_wq, >adev->jpeg.idle_work, 
JPEG_IDLE_TIMEOUT);
 }

 int amdgpu_jpeg_dec_ring_test_ring(struct amdgpu_ring *ring)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 0f576f294d8a..b6b1d7eeb8e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1283,7 +1283,7 @@ void amdg

AW: [PATCH 2/2] drm/amdgpu: Use mod_delayed_work in JPEG/UVD/VCE/VCN ring_end_use hooks

2021-08-11 Thread Koenig, Christian
NAK to at least this patch.

Since activating power management while submitting work is problematic 
cancel_delayed_work() must have been called during begin use or otherwise we 
have a serious coding problem in the first place.

So this change shouldn't make a difference and I suggest to really stick with 
schedule_delayed_work().

Maybe add a comment how this works?

Need to take a closer look at the first patch when I'm back from vacation, but 
it could be that this applies there as well.

Regards,
Christian.


Von: Michel Dänzer 
Gesendet: Mittwoch, 11. August 2021 18:52
An: Deucher, Alexander ; Koenig, Christian 

Cc: Liu, Leo ; Zhu, James ; 
amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org 
Betreff: [PATCH 2/2] drm/amdgpu: Use mod_delayed_work in JPEG/UVD/VCE/VCN 
ring_end_use hooks

From: Michel Dänzer 

In contrast to schedule_delayed_work, this pushes back the work if it
was already scheduled before. Specific behaviour change:

Before:

The scheduled work ran ~1 second after the first time ring_end_use was
called, even if the ring was used again during that second.

After:

The scheduled work runs ~1 second after the last time ring_end_use is
called.

Inspired by the corresponding change in amdgpu_gfx_off_ctrl. While I
haven't run into specific issues in this case, the new behaviour makes
more sense to me.

Signed-off-by: Michel Dänzer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
index 8996cb4ed57a..2c0040153f6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c
@@ -110,7 +110,7 @@ void amdgpu_jpeg_ring_begin_use(struct amdgpu_ring *ring)
 void amdgpu_jpeg_ring_end_use(struct amdgpu_ring *ring)
 {
 atomic_dec(>adev->jpeg.total_submission_cnt);
-   schedule_delayed_work(>adev->jpeg.idle_work, JPEG_IDLE_TIMEOUT);
+   mod_delayed_work(system_wq, >adev->jpeg.idle_work, 
JPEG_IDLE_TIMEOUT);
 }

 int amdgpu_jpeg_dec_ring_test_ring(struct amdgpu_ring *ring)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 0f576f294d8a..b6b1d7eeb8e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -1283,7 +1283,7 @@ void amdgpu_uvd_ring_begin_use(struct amdgpu_ring *ring)
 void amdgpu_uvd_ring_end_use(struct amdgpu_ring *ring)
 {
 if (!amdgpu_sriov_vf(ring->adev))
-   schedule_delayed_work(>adev->uvd.idle_work, 
UVD_IDLE_TIMEOUT);
+   mod_delayed_work(system_wq, >adev->uvd.idle_work, 
UVD_IDLE_TIMEOUT);
 }

 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 1ae7f824adc7..2253c18a6688 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -401,7 +401,7 @@ void amdgpu_vce_ring_begin_use(struct amdgpu_ring *ring)
 void amdgpu_vce_ring_end_use(struct amdgpu_ring *ring)
 {
 if (!amdgpu_sriov_vf(ring->adev))
-   schedule_delayed_work(>adev->vce.idle_work, 
VCE_IDLE_TIMEOUT);
+   mod_delayed_work(system_wq, >adev->vce.idle_work, 
VCE_IDLE_TIMEOUT);
 }

 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index 284bb42d6c86..d5937ab5ac80 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1874,7 +1874,7 @@ void vcn_v1_0_set_pg_for_begin_use(struct amdgpu_ring 
*ring, bool set_clocks)

 void vcn_v1_0_ring_end_use(struct amdgpu_ring *ring)
 {
-   schedule_delayed_work(>adev->vcn.idle_work, VCN_IDLE_TIMEOUT);
+   mod_delayed_work(system_wq, >adev->vcn.idle_work, 
VCN_IDLE_TIMEOUT);
 mutex_unlock(>adev->vcn.vcn1_jpeg1_workaround);
 }

--
2.32.0



AW: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

2021-03-18 Thread Koenig, Christian
Reviewed-by: Christian König 

Von: Daniel Gomez 
Gesendet: Donnerstag, 18. März 2021 09:32
Cc: dag...@gmail.com ; Daniel Gomez ; 
Deucher, Alexander ; Koenig, Christian 
; David Airlie ; Daniel Vetter 
; amd-gfx@lists.freedesktop.org 
; dri-de...@lists.freedesktop.org 
; linux-ker...@vger.kernel.org 

Betreff: [PATCH] drm/radeon/ttm: Fix memory leak userptr pages

If userptr pages have been pinned but not bounded,
they remain uncleared.

Signed-off-by: Daniel Gomez 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index e8c66d10478f..bbcc6264d48f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -485,13 +485,14 @@ static void radeon_ttm_backend_unbind(struct 
ttm_bo_device *bdev, struct ttm_tt
 struct radeon_ttm_tt *gtt = (void *)ttm;
 struct radeon_device *rdev = radeon_get_rdev(bdev);

+   if (gtt->userptr)
+   radeon_ttm_tt_unpin_userptr(bdev, ttm);
+
 if (!gtt->bound)
 return;

 radeon_gart_unbind(rdev, gtt->offset, ttm->num_pages);

-   if (gtt->userptr)
-   radeon_ttm_tt_unpin_userptr(bdev, ttm);
 gtt->bound = false;
 }

--
2.30.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


AW: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

2021-03-18 Thread Koenig, Christian
Exactly that's what you don't seem to understand.

The GPU reset doesn't complete the fences we wait for. It only completes the 
hardware fences as part of the reset.

So waiting for a fence while holding the reset lock is illegal and needs to be 
avoided.

Lockdep also complains about this when it is used correctly. The only reason it 
doesn't complain here is because you use an atomic+wait_event instead of a 
locking primitive.

Regards,
Christian.


Von: Li, Dennis 
Gesendet: Donnerstag, 18. März 2021 09:28
An: Koenig, Christian ; amd-gfx@lists.freedesktop.org 
; Deucher, Alexander 
; Kuehling, Felix ; Zhang, 
Hawking 
Betreff: RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

>>> Those two steps need to be exchanged or otherwise it is possible that new 
>>> delayed work items etc are started before the lock is taken.
What about adding check for adev->in_gpu_reset in work item? If exchange the 
two steps, it maybe introduce the deadlock.  For example, the user thread hold 
the read lock and waiting for the fence, if recovery thread try to hold write 
lock and then complete fences, in this case, recovery thread will always be 
blocked.

Best Regards
Dennis Li
-Original Message-
From: Koenig, Christian 
Sent: Thursday, March 18, 2021 3:54 PM
To: Li, Dennis ; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander ; Kuehling, Felix 
; Zhang, Hawking 
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 18.03.21 um 08:23 schrieb Dennis Li:
> We have defined two variables in_gpu_reset and reset_sem in adev object. The 
> atomic type variable in_gpu_reset is used to avoid recovery thread reenter 
> and make lower functions return more earlier when recovery start, but 
> couldn't block recovery thread when it access hardware. The r/w semaphore 
> reset_sem is used to solve these synchronization issues between recovery 
> thread and other threads.
>
> The original solution locked registers' access in lower functions, which will 
> introduce following issues:
>
> 1) many lower functions are used in both recovery thread and others. Firstly 
> we must harvest these functions, it is easy to miss someones. Secondly these 
> functions need select which lock (read lock or write lock) will be used, 
> according to the thread it is running in. If the thread context isn't 
> considered, the added lock will easily introduce deadlock. Besides that, in 
> most time, developer easily forget to add locks for new functions.
>
> 2) performance drop. More lower functions are more frequently called.
>
> 3) easily introduce false positive lockdep complaint, because write lock has 
> big range in recovery thread, but low level functions will hold read lock may 
> be protected by other locks in other threads.
>
> Therefore the new solution will try to add lock protection for ioctls of kfd. 
> Its goal is that there are no threads except for recovery thread or its 
> children (for xgmi) to access hardware when doing GPU reset and resume. So 
> refine recovery thread as the following:
>
> Step 0: atomic_cmpxchg(>in_gpu_reset, 0, 1)
> 1). if failed, it means system had a recovery thread running, current 
> thread exit directly;
> 2). if success, enter recovery thread;
>
> Step 1: cancel all delay works, stop drm schedule, complete all unreceived 
> fences and so on. It try to stop or pause other threads.
>
> Step 2: call down_write(>reset_sem) to hold write lock, which will 
> block recovery thread until other threads release read locks.

Those two steps need to be exchanged or otherwise it is possible that new 
delayed work items etc are started before the lock is taken.

Just to make it clear until this is fixed the whole patch set is a NAK.

Regards,
Christian.

>
> Step 3: normally, there is only recovery threads running to access hardware, 
> it is safe to do gpu reset now.
>
> Step 4: do post gpu reset, such as call all ips' resume functions;
>
> Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads and release 
> write lock. Recovery thread exit normally.
>
> Other threads call the amdgpu_read_lock to synchronize with recovery thread. 
> If it finds that in_gpu_reset is 1, it should release read lock if it has 
> holden one, and then blocks itself to wait for recovery finished event. If 
> thread successfully hold read lock and in_gpu_reset is 0, it continues. It 
> will exit normally or be stopped by recovery thread in step 1.
>
> Dennis Li (4):
>drm/amdgpu: remove reset lock from low level functions
>drm/amdgpu: refine the GPU recovery sequence
>drm/amdgpu: instead of using down/up_read directly
>drm/amdkfd: add reset lock protection for kfd entry functions
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu

AW: [PATCH 2/2] drm/amdgpu: clean-up unused variable

2021-02-13 Thread Koenig, Christian
Reviewed-by: Christian König 

Von: Sharma, Shashank 
Gesendet: Samstag, 13. Februar 2021 17:37
An: amd-gfx@lists.freedesktop.org 
Cc: Sharma, Shashank ; Koenig, Christian 
; Deucher, Alexander 
Betreff: [PATCH 2/2] drm/amdgpu: clean-up unused variable

Variable 'bp' seems to be unused residue from previous
logic, and is not required anymore.

Cc: Koenig Christian 
Cc: Deucher Alexander 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index db62f3c9d6a5..d3e4d6a06bbd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -435,17 +435,9 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct 
dma_buf *dma_buf)
 struct dma_resv *resv = dma_buf->resv;
 struct amdgpu_device *adev = drm_to_adev(dev);
 struct amdgpu_bo *bo;
-   struct amdgpu_bo_param bp;
 struct drm_gem_object *gobj;
 int ret;

-   memset(, 0, sizeof(bp));
-   bp.size = dma_buf->size;
-   bp.byte_align = PAGE_SIZE;
-   bp.domain = AMDGPU_GEM_DOMAIN_CPU;
-   bp.flags = 0;
-   bp.type = ttm_bo_type_sg;
-   bp.resv = resv;
 dma_resv_lock(resv, NULL);
 ret = amdgpu_gem_object_create(adev, dma_buf->size, PAGE_SIZE,
 AMDGPU_GEM_DOMAIN_CPU, AMDGPU_GEM_CREATE_CPU_GTT_USWC,
--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


AW: [PATCH 1/2] drm/amdgpu: Set GTT_USWC flag to enable freesync

2021-02-13 Thread Koenig, Christian
Well that's unfortunately a NAK.

We currently can't communicate by DMA-buf if USWC is possible or not.

For the short term we could add something like a special handling for A+A 
configurations here. E.g. check if the imported BO is also an andgpu BO and set 
the flag if it is also set on the exported BO.

Regards,
Christian.


Von: Sharma, Shashank 
Gesendet: Samstag, 13. Februar 2021 17:37
An: amd-gfx@lists.freedesktop.org 
Cc: Sharma, Shashank ; Koenig, Christian 
; Deucher, Alexander 
Betreff: [PATCH 1/2] drm/amdgpu: Set GTT_USWC flag to enable freesync

This patch sets 'AMDGPU_GEM_CREATE_CPU_GTT_USWC' as input
parameter flag, during object creation of an imported DMA
buffer.

In absence of this flag:
1. Function amdgpu_display_supported_domains() doesn't add
   AMDGPU_GEM_DOMAIN_GTT as supported domain.
2. Due to which, Function amdgpu_display_user_framebuffer_create()
   refuses to create framebuffer for imported DMA buffers.
3. Due to which, AddFB() IOCTL fails.
4. Due to which, amdgpu_present_check_flip() check fails in DDX
5. Due to which DDX driver doesn't allow flips (goes to blitting)
6. Due to which setting Freesync/VRR property fails for PRIME buffers.

So, this patch finally enables Freesync with PRIME buffer offloading.

Cc: Koenig Christian 
Cc: Deucher Alexander 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 47e0b48dc26f..db62f3c9d6a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -448,8 +448,8 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct 
dma_buf *dma_buf)
 bp.resv = resv;
 dma_resv_lock(resv, NULL);
 ret = amdgpu_gem_object_create(adev, dma_buf->size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_CPU,
-   0, ttm_bo_type_sg, resv, );
+   AMDGPU_GEM_DOMAIN_CPU, AMDGPU_GEM_CREATE_CPU_GTT_USWC,
+   ttm_bo_type_sg, resv, );
 if (ret)
 goto error;

--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: first bad commit: [fc8c70526bd30733ea8667adb8b8ffebea30a8ed] drm/radeon: Prefer lower feedback dividers

2020-09-12 Thread Koenig, Christian
Yes, that's a known issue. Patch for the revert is already underway.

Christian.

Am 12.09.2020 10:43 schrieb Borislav Petkov :
Hi,

this patch breaks X on my box - it is fully responsive and I can log in
into it from another machine but both monitors are black and show this:

"The current input timing is not supported by the monitor display. Please

change your input timing to 1920x1200@60Hz or any other monitor

listed timing as per the monitor specifications."

Reverting it fixes the issue.

Thx.

--
Regards/Gruss,
Boris.

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquettedata=02%7C01%7Cchristian.koenig%40amd.com%7C23ce888c10c24d32406508d856f7d8b9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637354969801820970sdata=6l12dRUhk4at8YTDxkxq6DZOcQXuBY77XH4VYF8HjRA%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/radeon: Don't use WC for VRAM if !RADEON_GEM_GTT_WC

2020-09-08 Thread Koenig, Christian


Am 09.09.2020 07:29 schrieb Huacai Chen :
Though RADEON_GEM_GTT_WC is initially used for GTT, but this flag is
bound to drm_arch_can_wc_memory(), and if arch doesn't support WC, then
VRAM should not use WC.

NAK, If System memory supports WC is completely independent from the VRAM BAR.

Christian.


Signed-off-by: Huacai Chen 
---
 drivers/gpu/drm/radeon/radeon_object.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index f3dee01..07b82d9 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -117,10 +117,16 @@ void radeon_ttm_placement_from_domain(struct radeon_bo 
*rbo, u32 domain)
  TTM_PL_FLAG_VRAM;
 }

-   rbo->placements[c].fpfn = 0;
-   rbo->placements[c++].flags = TTM_PL_FLAG_WC |
-TTM_PL_FLAG_UNCACHED |
-TTM_PL_FLAG_VRAM;
+   if (rbo->flags & RADEON_GEM_GTT_WC) {
+   rbo->placements[c].fpfn = 0;
+   rbo->placements[c++].flags = TTM_PL_FLAG_WC |
+TTM_PL_FLAG_UNCACHED |
+TTM_PL_FLAG_VRAM;
+   } else {
+   rbo->placements[c].fpfn = 0;
+   rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED |
+TTM_PL_FLAG_VRAM;
+   }
 }

 if (domain & RADEON_GEM_DOMAIN_GTT) {
--
2.7.0


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TTM/nouveau conflict in drm-misc-next

2020-08-14 Thread Koenig, Christian


Am 14.08.2020 17:53 schrieb Alex Deucher :
On Fri, Aug 14, 2020 at 11:22 AM Christian König
 wrote:
>
> Hey Thomas & Alex,
>
> well the TTM and Nouveau changes look good to me, but this completely
> broke amdgpu.
>
> Alex any idea what is going on here?

What's broken in amdgpu?  There shouldn't be any ttm changes in amdgpu
for drm-next.  Those all go through drm-misc.

It's not a TTM change.

The backmerge of drm-next into drm-misc-next broke amdgpu so that even glxgears 
doesn't work anymore.

But each individual merge head still works fine as far as I can say.

Any idea how to track that down?

Christian.


Alex

>
> Regards,
> Christian.
>
> Am 12.08.20 um 21:10 schrieb Thomas Zimmermann:
> > Hi Christian and Ben,
> >
> > I backmerged drm-next into drm-misc-next and had a conflict between ttm
> > and nouveau. struct ttm_mem_res got renamed to struct ttm_resource. I
> > updated nouveau to the new name, test-built, and pushed the result to
> > drm-misc-next. If either of you has a minute, you may want to double
> > check the merge.
> >
> > Best regards
> > Thomas
> >
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cchristian.koenig%40amd.com%7Ca1aefc1ee22a4e733df908d8406a395c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330172275088649sdata=X2ZJUETwoq884Xtg66sDudjXB%2F3s%2BgRglnh33gpU4Hc%3Dreserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: don't create entity when use cpu to update page table

2020-08-06 Thread Koenig, Christian
NAK, we also use the entity context number for debugging.

Additional to this the entities should not need any additional resources. So 
the functions are only initializing fields.

Regards,
Christian.


Am 06.08.2020 18:06 schrieb "Wang, Kevin(Yang)" :
the entity isn't needed when vm use cpu to update page table.

Signed-off-by: Kevin Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 45 ++
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 71e005cf2952..e15c29d613d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2802,20 +2802,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 spin_lock_init(>invalidated_lock);
 INIT_LIST_HEAD(>freed);

-
-   /* create scheduler entities for page table updates */
-   r = drm_sched_entity_init(>immediate, DRM_SCHED_PRIORITY_NORMAL,
- adev->vm_manager.vm_pte_scheds,
- adev->vm_manager.vm_pte_num_scheds, NULL);
-   if (r)
-   return r;
-
-   r = drm_sched_entity_init(>delayed, DRM_SCHED_PRIORITY_NORMAL,
- adev->vm_manager.vm_pte_scheds,
- adev->vm_manager.vm_pte_num_scheds, NULL);
-   if (r)
-   goto error_free_immediate;
-
 vm->pte_support_ats = false;
 vm->is_compute_context = false;

@@ -2835,10 +2821,25 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
!amdgpu_gmc_vram_full_visible(>gmc)),
   "CPU update of VM recommended only for large BAR system\n");

-   if (vm->use_cpu_for_update)
+   if (vm->use_cpu_for_update) {
 vm->update_funcs = _vm_cpu_funcs;
-   else
+   } else {
+   /* create scheduler entities for page table updates */
+   r = drm_sched_entity_init(>immediate, 
DRM_SCHED_PRIORITY_NORMAL,
+ adev->vm_manager.vm_pte_scheds,
+ adev->vm_manager.vm_pte_num_scheds, 
NULL);
+   if (r)
+   return r;
+
+   r = drm_sched_entity_init(>delayed, 
DRM_SCHED_PRIORITY_NORMAL,
+ adev->vm_manager.vm_pte_scheds,
+ adev->vm_manager.vm_pte_num_scheds, 
NULL);
+   if (r)
+   goto error_free_immediate;
+
 vm->update_funcs = _vm_sdma_funcs;
+   }
+
 vm->last_update = NULL;
 vm->last_unlocked = dma_fence_get_stub();

@@ -2895,10 +2896,12 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,

 error_free_delayed:
 dma_fence_put(vm->last_unlocked);
-   drm_sched_entity_destroy(>delayed);
+   if (!vm->use_cpu_for_update)
+   drm_sched_entity_destroy(>delayed);

 error_free_immediate:
-   drm_sched_entity_destroy(>immediate);
+   if (!vm->use_cpu_for_update)
+   drm_sched_entity_destroy(>immediate);

 return r;
 }
@@ -3120,8 +3123,10 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
 amdgpu_bo_unref();
 WARN_ON(vm->root.base.bo);

-   drm_sched_entity_destroy(>immediate);
-   drm_sched_entity_destroy(>delayed);
+   if (!vm->use_cpu_for_update) {
+   drm_sched_entity_destroy(>immediate);
+   drm_sched_entity_destroy(>delayed);
+   }

 if (!RB_EMPTY_ROOT(>va.rb_root)) {
 dev_err(adev->dev, "still active bo inside vm\n");
--
2.27.0


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] drm/ttm: fix offset in VMAs with a pg_offs in ttm_bo_vm_access

2020-07-29 Thread Koenig, Christian
Sure.

Christian.

Am 29.07.2020 17:30 schrieb "Deucher, Alexander" :

[AMD Public Use]

Christian, Can you cc stable when you apply it to drm-misc?

Alex

From: Kuehling, Felix 
Sent: Wednesday, July 29, 2020 10:15 AM
To: Koenig, Christian ; 
dri-de...@lists.freedesktop.org ; 
amd-gfx@lists.freedesktop.org ; Deucher, 
Alexander 
Cc: Morichetti, Laurent 
Subject: Re: [PATCH 1/1] drm/ttm: fix offset in VMAs with a pg_offs in 
ttm_bo_vm_access

Am 2020-07-29 um 4:08 a.m. schrieb Christian König:
> Am 28.07.20 um 20:27 schrieb Felix Kuehling:
>> VMAs with a pg_offs that's offset from the start of the vma_node need
>> to adjust the offset within the BO accordingly. This matches the
>> offset calculation in ttm_bo_vm_fault_reserved.
>>
>> Signed-off-by: Felix Kuehling 
>> Tested-by: Laurent Morichetti 
>
> Reviewed-by: Christian König 
>
> Going to pick that up for inclusion in drm-misc-next.

Thanks. I'll submit it to amd-staging-drm-next so it makes its way into
our DKMS branch quickly.

Alex, would you push this to drm-fixes?

Regards,
  Felix


>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index 389128b8c4dd..60b41447bec8 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -405,8 +405,10 @@ static int ttm_bo_vm_access_kmap(struct
>> ttm_buffer_object *bo,
>>   int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>void *buf, int len, int write)
>>   {
>> -unsigned long offset = (addr) - vma->vm_start;
>>   struct ttm_buffer_object *bo = vma->vm_private_data;
>> +unsigned long offset = (addr) - vma->vm_start +
>> +((vma->vm_pgoff - drm_vma_node_start(>base.vma_node))
>> + << PAGE_SHIFT);
>>   int ret;
>> if (len < 1 || (offset + len) >> PAGE_SHIFT > bo->num_pages)
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/6] drm/ttm: Add unampping of the entire device address space

2020-06-09 Thread Koenig, Christian


Am 09.06.2020 18:37 schrieb "Grodzovsky, Andrey" :

On 6/5/20 2:40 PM, Christian König wrote:
> Am 05.06.20 um 16:29 schrieb Andrey Grodzovsky:
>>
>> On 5/11/20 2:45 AM, Christian König wrote:
>>> Am 09.05.20 um 20:51 schrieb Andrey Grodzovsky:
 Signed-off-by: Andrey Grodzovsky 
 ---
   drivers/gpu/drm/ttm/ttm_bo.c| 22 +-
   include/drm/ttm/ttm_bo_driver.h |  2 ++
   2 files changed, 23 insertions(+), 1 deletion(-)

 diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
 b/drivers/gpu/drm/ttm/ttm_bo.c
 index c5b516f..eae61cc 100644
 --- a/drivers/gpu/drm/ttm/ttm_bo.c
 +++ b/drivers/gpu/drm/ttm/ttm_bo.c
 @@ -1750,9 +1750,29 @@ void ttm_bo_unmap_virtual(struct
 ttm_buffer_object *bo)
   ttm_bo_unmap_virtual_locked(bo);
   ttm_mem_io_unlock(man);
   }
 +EXPORT_SYMBOL(ttm_bo_unmap_virtual);
   +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
 +{
 +struct ttm_mem_type_manager *man;
 +int i;
   -EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>>>
 +for (i = 0; i < TTM_NUM_MEM_TYPES; i++) {
 +man = >man[i];
 +if (man->has_type && man->use_type)
 +ttm_mem_io_lock(man, false);
 +}
>>>
>>> You should drop that it will just result in a deadlock warning for
>>> Nouveau and has no effect at all.
>>>
>>> Apart from that looks good to me,
>>> Christian.
>>
>>
>> As I am considering to re-include this in V2 of the patchsets, can
>> you clarify please why this will have no effect at all ?
>
> The locks are exclusive for Nouveau to allocate/free the io address
> space.
>
> Since we don't do this here we don't need the locks.
>
> Christian.


So basically calling unmap_mapping_range doesn't require any extra
locking around it and whatever locks are taken within the function
should be enough ?


I think so, yes.

Christian.


Andrey


>
>>
>> Andrey
>>
>>
>>>
 +
 +unmap_mapping_range(bdev->dev_mapping, 0, 0 , 1);
 +/*TODO What about ttm_mem_io_free_vm(bo) ? */
 +
 +for (i = TTM_NUM_MEM_TYPES - 1; i >= 0; i--) {
 +man = >man[i];
 +if (man->has_type && man->use_type)
 +ttm_mem_io_unlock(man);
 +}
 +}
 +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
 int ttm_bo_wait(struct ttm_buffer_object *bo,
   bool interruptible, bool no_wait)
 diff --git a/include/drm/ttm/ttm_bo_driver.h
 b/include/drm/ttm/ttm_bo_driver.h
 index c9e0fd0..3133463 100644
 --- a/include/drm/ttm/ttm_bo_driver.h
 +++ b/include/drm/ttm/ttm_bo_driver.h
 @@ -600,6 +600,8 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
*/
   void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
   +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device
 *bdev);
 +
   /**
* ttm_bo_unmap_virtual
*
>>>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 00/14] drm/radeon: remove comparison to bool

2020-05-06 Thread Koenig, Christian


Am 06.05.2020 18:00 schrieb Alex Deucher :
On Wed, May 6, 2020 at 10:27 AM Zheng Bin  wrote:
>
> Zheng Bin (14):
>   drm/radeon: remove comparison to bool in btc_dpm.c
>   drm/radeon: remove comparison to bool in ci_dpm.c
>   drm/radeon: remove comparison to bool in ni_dpm.c
>   drm/radeon: remove comparison to bool in radeon_atpx_handler.c
>   drm/radeon: remove comparison to bool in radeon_object.c
>   drm/radeon: remove comparison to bool in radeon_ttm.c
>   drm/radeon: remove comparison to bool in r100.c
>   drm/radeon: remove comparison to bool in r300.c
>   drm/radeon: remove comparison to bool in r600.c
>   drm/radeon: remove comparison to bool in rs600.c
>   drm/radeon: remove comparison to bool in rs690.c
>   drm/radeon: remove comparison to bool in rv6xx_dpm.c
>   drm/radeon: remove comparison to bool in rv515.c
>   drm/radeon: remove comparison to bool in si_dpm.c

Does the checker need to be fixed?  All of these are comparing boolean
variables to true/false.  Seems like needless code churn to me.

We should probably make sure that no new code like this leaks in, but I also 
don't see that this is necessary for the old driver stack.

Christian.


Alex

>
>  drivers/gpu/drm/radeon/btc_dpm.c | 2 +-
>  drivers/gpu/drm/radeon/ci_dpm.c  | 4 ++--
>  drivers/gpu/drm/radeon/ni_dpm.c  | 6 +++---
>  drivers/gpu/drm/radeon/r100.c| 2 +-
>  drivers/gpu/drm/radeon/r300.c| 2 +-
>  drivers/gpu/drm/radeon/r600.c| 3 ++-
>  drivers/gpu/drm/radeon/radeon_atpx_handler.c | 4 ++--
>  drivers/gpu/drm/radeon/radeon_object.c   | 2 +-
>  drivers/gpu/drm/radeon/radeon_ttm.c  | 2 +-
>  drivers/gpu/drm/radeon/rs600.c   | 2 +-
>  drivers/gpu/drm/radeon/rs690.c   | 3 ++-
>  drivers/gpu/drm/radeon/rv515.c   | 2 +-
>  drivers/gpu/drm/radeon/rv6xx_dpm.c   | 2 +-
>  drivers/gpu/drm/radeon/si_dpm.c  | 6 +++---
>  14 files changed, 22 insertions(+), 20 deletions(-)
>
> --
> 2.26.0.106.g9fadedd
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cchristian.koenig%40amd.com%7C10c2a90728574bb20ef208d7f1d69e2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637243776401264275sdata=Z6alCS8hPA7rWNKHimpkc6zBldtBagK0dGpX8mTOEZA%3Dreserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)

2020-04-29 Thread Koenig, Christian
Hi Monk,

because some parallel execution could load the GL2C.

See you need to insert cache invalidations before you start reading something 
which another engine has written.

And you need cache flushes to make sure that something your engine has written 
has reached memory before you signal finished execution.

That's perfectly normal cache handling what Marek is doing here.

Regards,
Christian.

Am 29.04.2020 13:24 schrieb "Liu, Monk" :

>> Well from my understanding I think that a G2LC invalidation is still 
>> necessary before an IB executes.

Agree, I think before an IB executes the only thing we need on GL2C is the 
invalidation, not the flush .

>> The problem is that the memory of the IB could also be cached because of 
>> some activity of the GFX or Compute rings.

If we always insert a GL2C invalidate at every EOP of every IB from every 
engine, why we need a GL2C invalidate before IB  execute ?

_

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: Koenig, Christian 
Sent: Wednesday, April 29, 2020 5:38 PM
To: Liu, Monk ; Marek Olšák ; amd-gfx 
mailing list 
Subject: Re: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



Well from my understanding I think that a G2LC invalidation is still necessary 
before an IB executes.

The problem is that the memory of the IB could also be cached because of some 
activity of the GFX or Compute rings.

Regards,
Christian.

Am 29.04.20 um 11:35 schrieb Liu, Monk:

Here is the reason we should always insert a “sync mem” packet at the FENCE 
place of SDMA, not before IB emit.



By always inserting “sync mem” in the FENCE place we can make sure:1

  1.  data is really flushed to system memory before CPU try to read it
  2.  all the G2LC is invalidated by “sync mem”, thus in the next round SDMA 
IB, it won’t get staled data from G2LC cache



by inserting “sync mem” in prior to IB could only achieve :  Avoid get staled 
data in g2lc during IB execution



for GFX/COMPUTE ring since they have release_mem packet so it is inherently 
doing the G2LC flush and invalidate upon a fence signaled



_

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: Liu, Monk
Sent: Wednesday, April 29, 2020 5:06 PM
To: 'Marek Olšák' <mailto:mar...@gmail.com>; amd-gfx mailing 
list <mailto:amd-gfx@lists.freedesktop.org>; 
Koenig, Christian <mailto:christian.koe...@amd.com>
Subject: RE: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



Hi @Koenig, Christian<mailto:christian.koe...@amd.com> & Marek



I still have some concerns regarding Marek’s patch, correct me if I’m wrong



See that Marek put a SDMA_OP_GCR_REQ before emitting IB, to make sure SDMA 
won’t get stale cache data during the IB execution.



But that “SDMA_OP_GCR_REQ” only invalidate/flush the GFXHUB’s G2LC cache right 
?  what if the memory is changed by MM or CPU (out side of GFXHUB) ?



Can this “ SDMA_OP_GCR_REQ” force MMHUB or even CPU to flush their operation 
result from their cache to memory ??



Besides, with my understanding the “EOP” of gfx ring is doing the thing of 
“invalidate/flush” L2 cache upon a fence signaled, so what we should do on 
SDMA5 is to insert this “SDMA_OP_GCR_REQ”

Right before thee “emit_fence” of SDMA  (this is what windows KMD do)



thanks

_

Monk Liu|GPU Virtualization Team |AMD

[sig-cloud-gpu]



From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Marek Ol?ák
Sent: Saturday, April 25, 2020 4:52 PM
To: amd-gfx mailing list 
mailto:amd-gfx@lists.freedesktop.org>>
Subject: drm/amdgpu: invalidate L2 before SDMA IBs (on gfx10)



This should fix SDMA hangs on gfx10.



Marek



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

2020-04-14 Thread Koenig, Christian
That's exactly my concern as well.

This looks a bit like the test creates erroneous data somehow, but there 
doesn't seems to be a RAS check in the MM data path.

And now that we use the BAR path it goes up in flames.

I just don't see how we can create erroneous data in a test case?

Christian.

Am 14.04.2020 16:35 schrieb "Deucher, Alexander" :

[AMD Public Use]

If this causes an issue, any access to vram via the BAR could cause an issue.

Alex

From: amd-gfx  on behalf of Russell, 
Kent 
Sent: Tuesday, April 14, 2020 10:19 AM
To: Koenig, Christian ; amd-gfx@lists.freedesktop.org 

Cc: Kuehling, Felix ; Kim, Jonathan 

Subject: RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in 
amdgpu_device_vram_access v2"

[AMD Official Use Only - Internal Distribution Only]

On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, 
and then the kernel ends up hanging. I don't know enough about the test itself 
to know why this is occurring, but Jon Kim and Felix were discussing it on a 
separate thread when the issue was first reported, so they can hopefully 
provide some additional information.

 Kent

> -Original Message-
> From: Christian König 
> Sent: Tuesday, April 14, 2020 9:52 AM
> To: Russell, Kent ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in
> amdgpu_device_vram_access v2"
>
> Am 13.04.20 um 20:20 schrieb Kent Russell:
> > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e.
> > The original patch causes a RAS event and subsequent kernel hard-hang
> > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
> > Arcturus
> >
> > dmesg output at hang time:
> > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
> > amdgpu :67:00.0: GPU reset begin!
> > Evicting PASID 0x8000 queues
> > Started evicting pasid 0x8000
> > qcm fence wait loop timeout expired
> > The cp might be in an unrecoverable state due to an unsuccessful
> > queues preemption Failed to evict process queues Failed to suspend
> > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid
> > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost
> > due to RAS ERREVENT_ATHUB_INTERRUPT
> > amdgpu: [powerplay] Failed to send message 0x26, response 0x0
> > amdgpu: [powerplay] Failed to set soft min gfxclk !
> > amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
> > amdgpu: [powerplay] Failed to send message 0x7, response 0x0
> > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu
> features!
> > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
> > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
> > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP
> > block  failed -5
>
> Do you have more information on what's going wrong here since this is a really
> important patch for KFD debugging.
>
> >
> > Signed-off-by: Kent Russell 
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 --
> >   1 file changed, 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf5d6e585634..a3f997f84020 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct
> amdgpu_device *adev, loff_t pos,
> >  uint32_t hi = ~0;
> >  uint64_t last;
> >
> > -
> > -#ifdef CONFIG_64BIT
> > -   last = min(pos + size, adev->gmc.visible_vram_size);
> > -   if (last > pos) {
> > -   void __iomem *addr = adev->mman.aper_base_kaddr + pos;
> > -   size_t count = last - pos;
> > -
> > -   if (write) {
> > -   memcpy_toio(addr, buf, count);
> > -   mb();
> > -   amdgpu_asic_flush_hdp(adev, NULL);
> > -   } else {
> > -   amdgpu_asic_invalidate_hdp(adev, NULL);
> > -   mb();
> > -   memcpy_fromio(buf, addr, count);
> > -   }
> > -
> > -   if (count == size)
> > -   return;
> > -
> > -   pos += count;
> > -   buf += count / 4;
> > -   size -= count;
> > -   }
> > -#endif
> > -
> >  spin_lock_irqsave(>mmio_idx_lock, flags);
> >  for (last = pos + size; pos < last; pos += 4) 

Re: [PATCH] drm/ttm: Break out the loops if need_resched in bo delayed delete worker

2020-04-10 Thread Koenig, Christian


Am 10.04.2020 12:58 schrieb "Pan, Xinhui" :
The delayed delete list is per device which might be very huge. And in
a heavy workload test, the list might always not be empty. That will
trigger any RCU stall warnings or softlockups in non-preemptible kernels
Lets do break out the loops in that case.

Signed-off-by: xinhui pan 

Reviewed-by: Christian König 

---
 drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 9e07c3f75156..c5b516fa4eae 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -518,7 +518,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
*bdev, bool remove_all)
 INIT_LIST_HEAD();

 spin_lock(>lru_lock);
-   while (!list_empty(>ddestroy)) {
+   while (!list_empty(>ddestroy) && !need_resched()) {
 struct ttm_buffer_object *bo;

 bo = list_first_entry(>ddestroy, struct 
ttm_buffer_object,
--
2.17.1


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool

2020-03-26 Thread Koenig, Christian


Am 26.03.2020 07:45 schrieb "Pan, Xinhui" :


> 2020年3月26日 14:36,Koenig, Christian  写道:
>
>
>
> Am 26.03.2020 07:15 schrieb "Pan, Xinhui" :
>
>
> > 2020年3月26日 13:38,Koenig, Christian  写道:
> >
> > Yeah that's on my TODO list for quite a while as well.
> >
> > But we even need three IB pools. One very small for the IB tests, one for 
> > direct VM updates and one for the rest.
> >
> > So please make the pool a parameter to ib_get() and not the hack you have 
> > below.
>
> yep, I will make IB pool  a parameter.
>
> IB tests for gfx need many IBs, PAGE_SIZE for ib pool is still not enough.
> but the default size for ib pool is 2MB now, just one hugepage, today we have 
> memory in TB.
> so no need make a different size for IB tests pool.
>
> 2MB is probably a bit much and we don't have huge page optimisation for 
> kernel allocations at the moment anyway. Keep in mind that we have only 
> limited space in the GART.
gart table is just 512MB.
do you mean every entry in gart table just points to one 4KB page? and need 5 
gart table entries for one 2M hugepage?

Yes, we need 512 * 4KB entries for a 2MB page in GART. The table for the system 
VM is flat because of hardware restrictions.

IIRC we tried 256MB for the GART initially and in general we try to keep that 
as small as possible because it eats up visible VRAM space.

Christian.


>
> Maybe make this 4*PAGE_SIZE for the new IB pool for now and test if that 
> works or not.
>
> Christian.
>
>
>
>
> >
> > Thanks,
> > Christian.
> >
> > Am 26.03.2020 03:02 schrieb "Pan, Xinhui" :
> > Another ib poll for direct submit.
> > Any jobs schedule IBs without dependence on gpu scheduler should use
> > this pool firstly.
> >
> > Signed-off-by: xinhui pan 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++--
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
> >  5 files changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index 7dd74253e7b6..c01423ffb8ed 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -849,6 +849,7 @@ struct amdgpu_device {
> >  struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
> >  boolib_pool_ready;
> >  struct amdgpu_sa_managerring_tmp_bo;
> > +   struct amdgpu_sa_managerring_tmp_bo_direct;
> >
> >  /* interrupts */
> >  struct amdgpu_irq   irq;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index 8304d0c87899..28be4efb3d5b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
> >  parser->entity = entity;
> >
> >  ring = to_amdgpu_ring(entity->rq->sched);
> > -   r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
> > +   r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, 
> > ring->funcs->parse_cs ?
> > chunk_ib->ib_bytes : 0, ib);
> >  if (r) {
> >  DRM_ERROR("Failed to get ib !\n");
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > index bece01f1cf09..f2e08c372d57 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> > @@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
> > amdgpu_vm *vm,
> >  int r;
> >
> >  if (size) {
> > -   r = amdgpu_sa_bo_new(>ring_tmp_bo,
> > +   r = amdgpu_sa_bo_new(vm ? >ring_tmp_bo : 
> > >ring_tmp_bo_direct,
> >>sa_bo, size, 256);
> >  if (r) {
> >  dev_err(adev->dev, "failed to get a new IB 
> > (%d)\n", r);
> > @@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
> > amdgpu_vm *vm,
> >
> >  ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo

Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool

2020-03-26 Thread Koenig, Christian


Am 26.03.2020 07:15 schrieb "Pan, Xinhui" :


> 2020年3月26日 13:38,Koenig, Christian  写道:
>
> Yeah that's on my TODO list for quite a while as well.
>
> But we even need three IB pools. One very small for the IB tests, one for 
> direct VM updates and one for the rest.
>
> So please make the pool a parameter to ib_get() and not the hack you have 
> below.

yep, I will make IB pool  a parameter.

IB tests for gfx need many IBs, PAGE_SIZE for ib pool is still not enough.
but the default size for ib pool is 2MB now, just one hugepage, today we have 
memory in TB.
so no need make a different size for IB tests pool.

2MB is probably a bit much and we don't have huge page optimisation for kernel 
allocations at the moment anyway. Keep in mind that we have only limited space 
in the GART.

Maybe make this 4*PAGE_SIZE for the new IB pool for now and test if that works 
or not.

Christian.




>
> Thanks,
> Christian.
>
> Am 26.03.2020 03:02 schrieb "Pan, Xinhui" :
> Another ib poll for direct submit.
> Any jobs schedule IBs without dependence on gpu scheduler should use
> this pool firstly.
>
> Signed-off-by: xinhui pan 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
>  5 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 7dd74253e7b6..c01423ffb8ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -849,6 +849,7 @@ struct amdgpu_device {
>  struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
>  boolib_pool_ready;
>  struct amdgpu_sa_managerring_tmp_bo;
> +   struct amdgpu_sa_managerring_tmp_bo_direct;
>
>  /* interrupts */
>  struct amdgpu_irq   irq;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 8304d0c87899..28be4efb3d5b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
>  parser->entity = entity;
>
>  ring = to_amdgpu_ring(entity->rq->sched);
> -   r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
> +   r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, 
> ring->funcs->parse_cs ?
> chunk_ib->ib_bytes : 0, ib);
>  if (r) {
>  DRM_ERROR("Failed to get ib !\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index bece01f1cf09..f2e08c372d57 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
> amdgpu_vm *vm,
>  int r;
>
>  if (size) {
> -   r = amdgpu_sa_bo_new(>ring_tmp_bo,
> +   r = amdgpu_sa_bo_new(vm ? >ring_tmp_bo : 
> >ring_tmp_bo_direct,
>>sa_bo, size, 256);
>  if (r) {
>  dev_err(adev->dev, "failed to get a new IB (%d)\n", 
> r);
> @@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
> amdgpu_vm *vm,
>
>  ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo);
>
> -   if (!vm)
> +   if (!((unsigned long)vm & ~0x1))
>  ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo);
>  }
>
> @@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev)
>  return r;
>  }
>
> +   r = amdgpu_sa_bo_manager_init(adev, >ring_tmp_bo_direct,
> + AMDGPU_IB_POOL_SIZE*64*1024,
> + AMDGPU_GPU_PAGE_SIZE,
> + AMDGPU_GEM_DOMAIN_GTT);
> +   if (r) {
> +   return r;
> +   }
>  adev->ib_pool_ready = true;
>
>  return 0;
> @@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev)
>  {
>  if (adev->ib_pool_ready) {
>  amdgpu_sa_bo_manager_fini(adev, >ring_tmp_bo);
> +   amdgpu_sa_bo_manager_fini(adev, >ring_tmp_bo_direct);
>  

Re: [RFC PATCH 1/2] drm/amdgpu: add direct ib pool

2020-03-25 Thread Koenig, Christian
Yeah that's on my TODO list for quite a while as well.

But we even need three IB pools. One very small for the IB tests, one for 
direct VM updates and one for the rest.

So please make the pool a parameter to ib_get() and not the hack you have below.

Thanks,
Christian.

Am 26.03.2020 03:02 schrieb "Pan, Xinhui" :
Another ib poll for direct submit.
Any jobs schedule IBs without dependence on gpu scheduler should use
this pool firstly.

Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 7dd74253e7b6..c01423ffb8ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -849,6 +849,7 @@ struct amdgpu_device {
 struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
 boolib_pool_ready;
 struct amdgpu_sa_managerring_tmp_bo;
+   struct amdgpu_sa_managerring_tmp_bo_direct;

 /* interrupts */
 struct amdgpu_irq   irq;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8304d0c87899..28be4efb3d5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
 parser->entity = entity;

 ring = to_amdgpu_ring(entity->rq->sched);
-   r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
+   r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, 
ring->funcs->parse_cs ?
chunk_ib->ib_bytes : 0, ib);
 if (r) {
 DRM_ERROR("Failed to get ib !\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index bece01f1cf09..f2e08c372d57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 int r;

 if (size) {
-   r = amdgpu_sa_bo_new(>ring_tmp_bo,
+   r = amdgpu_sa_bo_new(vm ? >ring_tmp_bo : 
>ring_tmp_bo_direct,
   >sa_bo, size, 256);
 if (r) {
 dev_err(adev->dev, "failed to get a new IB (%d)\n", r);
@@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,

 ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo);

-   if (!vm)
+   if (!((unsigned long)vm & ~0x1))
 ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo);
 }

@@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev)
 return r;
 }

+   r = amdgpu_sa_bo_manager_init(adev, >ring_tmp_bo_direct,
+ AMDGPU_IB_POOL_SIZE*64*1024,
+ AMDGPU_GPU_PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_GTT);
+   if (r) {
+   return r;
+   }
 adev->ib_pool_ready = true;

 return 0;
@@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev)
 {
 if (adev->ib_pool_ready) {
 amdgpu_sa_bo_manager_fini(adev, >ring_tmp_bo);
+   amdgpu_sa_bo_manager_fini(adev, >ring_tmp_bo_direct);
 adev->ib_pool_ready = false;
 }
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 4981e443a884..6a63826c6760 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,

 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
  struct amdgpu_job **job)
+{
+   return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0);
+}
+
+int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
+struct amdgpu_job **job, int direct)
 {
 int r;

@@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, 
unsigned size,
 if (r)
 return r;

-   r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
+   r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]);
 if (r)
 kfree(*job);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 2e2110dddb76..be9dd72b9912 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ 

Re: [PATCH] drm/amdgpu: Check entity rq

2020-03-25 Thread Koenig, Christian
Hi guys,

thanks for pointing this out Nirmoy.

Yeah, could be that I forgot to commit the patch. Currently I don't know at 
which end of the chaos I should start to clean up.

Christian.

Am 25.03.2020 12:09 schrieb "Das, Nirmoy" :
Hi Xinhui,


Can you please check if you can reproduce the crash with
https://lists.freedesktop.org/archives/amd-gfx/2020-February/046414.html

Christian fix it earlier, I think he forgot to push it.


Regards,

Nirmoy

On 3/25/20 12:07 PM, xinhui pan wrote:
> gpu recover will call sdma suspend/resume. In this period, ring will be
> disabled. So the vm_pte_scheds(sdma.instance[X].ring.sched)->ready will
> be false.
>
> If we submit any jobs in this ring-disabled period. We fail to pick up
> a rq for vm entity and entity->rq will set to NULL.
> amdgpu_vm_sdma_commit did not check the entity->rq, so fix it. Otherwise
> hit panic.
>
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Felix Kuehling 
> Signed-off-by: xinhui pan 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index cf96c335b258..d30d103e48a2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -95,6 +95,8 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>int r;
>
>entity = p->direct ? >vm->direct : >vm->delayed;
> + if (!entity->rq)
> + return -ENOENT;
>ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
>
>WARN_ON(ib->length_dw == 0);


Am 25.03.2020 12:09 schrieb "Das, Nirmoy" :
Hi Xinhui,


Can you please check if you can reproduce the crash with
https://lists.freedesktop.org/archives/amd-gfx/2020-February/046414.html

Christian fix it earlier, I think he forgot to push it.


Regards,

Nirmoy

On 3/25/20 12:07 PM, xinhui pan wrote:
> gpu recover will call sdma suspend/resume. In this period, ring will be
> disabled. So the vm_pte_scheds(sdma.instance[X].ring.sched)->ready will
> be false.
>
> If we submit any jobs in this ring-disabled period. We fail to pick up
> a rq for vm entity and entity->rq will set to NULL.
> amdgpu_vm_sdma_commit did not check the entity->rq, so fix it. Otherwise
> hit panic.
>
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Felix Kuehling 
> Signed-off-by: xinhui pan 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index cf96c335b258..d30d103e48a2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -95,6 +95,8 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>int r;
>
>entity = p->direct ? >vm->direct : >vm->delayed;
> + if (!entity->rq)
> + return -ENOENT;
>ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
>
>WARN_ON(ib->length_dw == 0);


Am 25.03.2020 12:09 schrieb "Das, Nirmoy" :
Hi Xinhui,


Can you please check if you can reproduce the crash with
https://lists.freedesktop.org/archives/amd-gfx/2020-February/046414.html

Christian fix it earlier, I think he forgot to push it.


Regards,

Nirmoy

On 3/25/20 12:07 PM, xinhui pan wrote:
> gpu recover will call sdma suspend/resume. In this period, ring will be
> disabled. So the vm_pte_scheds(sdma.instance[X].ring.sched)->ready will
> be false.
>
> If we submit any jobs in this ring-disabled period. We fail to pick up
> a rq for vm entity and entity->rq will set to NULL.
> amdgpu_vm_sdma_commit did not check the entity->rq, so fix it. Otherwise
> hit panic.
>
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Felix Kuehling 
> Signed-off-by: xinhui pan 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index cf96c335b258..d30d103e48a2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -95,6 +95,8 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>int r;
>
>entity = p->direct ? >vm->direct : >vm->delayed;
> + if (!entity->rq)
> + return -ENOENT;
>ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
>
>WARN_ON(ib->length_dw == 0);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix hpd bo size calculation error

2020-03-25 Thread Koenig, Christian
Good catch! mem.size is actually the backing store size (usually in pages).

Patch is Acked-by: Christian König 

Am 25.03.2020 11:19 schrieb "Wang, Kevin(Yang)" :
the HPD bo size calculation error.
the "mem.size" can't present actual BO size all time.

Signed-off-by: Kevin Wang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 7f9ac1a14e6f..91c82383b016 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1113,7 +1113,7 @@ static int gfx_v10_0_mec_init(struct amdgpu_device *adev)
 return r;
 }

-   memset(hpd, 0, adev->gfx.mec.hpd_eop_obj->tbo.mem.size);
+   memset(hpd, 0, mec_hpd_size);

 amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
 amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index fb567cf5671b..01b22dad52fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1946,7 +1946,7 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev)
 return r;
 }

-   memset(hpd, 0, adev->gfx.mec.hpd_eop_obj->tbo.mem.size);
+   memset(hpd, 0, mec_hpd_size);

 amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
 amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation

2020-03-21 Thread Koenig, Christian
Correct, yes.

For example if you have a 16GB VRAM Vega10 in a system with just 4GB RAM you 
can only allocate < 4GB VRAM (actually more like ~3GB) in a single BO.

Otherwise we wouldn't be able to evacuate VRAM to system memory and disk during 
suspend/resume or during memory pressure.

Regards,
Christian.

Am 21.03.2020 09:32 schrieb "Yin, Tianci (Rico)" :

[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
You mean amdgpu_bo_validate_size() return false is the expectation when GTT < 
request < VRAM, even if VRAM size can meet the requirement, right?

Thanks!
Rico

From: Christian König 
Sent: Saturday, March 21, 2020 2:27
To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org 

Cc: Xu, Feifei ; Li, Pauline ; Long, 
Gang ; Zhang, Hawking 
Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer 
creation

Am 20.03.20 um 10:46 schrieb Tianci Yin:
> From: "Tianci.Yin" 
>
> [why]
> When GTT domain size is smaller than VRAM, if APP apply a very large
> buffer whose size is larger than GTT but smaller than VRAM, the size
> validation will fail.
>
> [how]
> Validate VRAM domain size at first place, then GTT domain.

NAK, this is intended behavior. VRAM allocations larger than GTT
allocations are illegal and can crash the memory management.

Regards,
Christian.

>
> Change-Id: Ic1d31b9b0a4939e6bba0241ff79ae9aa2225ee05
> Signed-off-by: Tianci.Yin 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 +-
>   1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 84745f9e7408..bab134b6369f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -464,21 +464,21 @@ static bool amdgpu_bo_validate_size(struct 
> amdgpu_device *adev,
>   {
>struct ttm_mem_type_manager *man = NULL;
>
> - /*
> -  * If GTT is part of requested domains the check must succeed to
> -  * allow fall back to GTT
> -  */
> - if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> - man = >mman.bdev.man[TTM_PL_TT];
> + if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> + man = >mman.bdev.man[TTM_PL_VRAM];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;
> - else
> + else if (!(domain & AMDGPU_GEM_DOMAIN_GTT))
>goto fail;
>}
>
> - if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> - man = >mman.bdev.man[TTM_PL_VRAM];
> + /*
> +  * If GTT is part of requested domains the check must succeed to
> +  * allow fall back to GTT
> +  */
> + if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> + man = >mman.bdev.man[TTM_PL_TT];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH 1/1] drm/amdgpu: wait for sched to become ready on job submit

2020-02-24 Thread Koenig, Christian
Hi Nirmoy,

Am 24.02.2020 17:48 schrieb Nirmoy Das :
On reset, amdgpu can set a drm sched's ready status to false temporarily. drm 
job
init will fail if all of the drm scheds are not ready for a HW IP. This patch 
tries to make
kernel's internal drm job submit handle, amdgpu_job_submit() a bit more fault 
tolerant.

I don't think that this approach makes sense. Since it is a front end property 
we should rather stop setting the scheduler ready status to false during reset.

Instead we should only set it to false when the ring/IB test fails and we can't 
bring the ring back to life again.

Christian.


Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 35 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  5 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  2 +-
 7 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index d42be880a236..0745df80112f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -139,7 +139,38 @@ void amdgpu_job_free(struct amdgpu_job *job)
 kfree(job);
 }

-int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
+static int amdgpu_job_try_init(struct amdgpu_device *adev,
+  struct drm_sched_job *base,
+  struct drm_sched_entity *entity,
+  void *owner)
+{
+   int r, i;
+
+   r = drm_sched_job_init(base, entity, owner);
+   if (r == -ENOENT) {
+   /* retry till we come out of reset phase */
+   while (!mutex_trylock(>lock_reset))
+   msleep(10);
+   /* retry for a second for the sched to get ready*/
+   for (i = 0; i < 100; i++) {
+   msleep(10);
+   r = drm_sched_job_init(base, entity, owner);
+   if (r == -ENOENT)
+   continue;
+   }
+
+   mutex_unlock(>lock_reset);
+   /* If after all these we failed to initialize a job
+* it means the IP is unrecoverable */
+   if (r == -ENOENT)
+   return -ENODEV;
+   }
+
+   return r;
+}
+
+int amdgpu_job_submit(struct amdgpu_device *adev,struct amdgpu_job *job,
+ struct drm_sched_entity *entity,
   void *owner, struct dma_fence **f)
 {
 enum drm_sched_priority priority;
@@ -149,7 +180,7 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct 
drm_sched_entity *entity,
 if (!f)
 return -EINVAL;

-   r = drm_sched_job_init(>base, entity, owner);
+   r = amdgpu_job_try_init(adev, >base, entity, owner);
 if (r)
 return r;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 2e2110dddb76..fed87e96cacc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -70,8 +70,9 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, 
unsigned size,

 void amdgpu_job_free_resources(struct amdgpu_job *job);
 void amdgpu_job_free(struct amdgpu_job *job);
-int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
- void *owner, struct dma_fence **f);
+int amdgpu_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job,
+ struct drm_sched_entity *entity, void *owner,
+ struct dma_fence **f);
 int amdgpu_job_submit_direct(struct amdgpu_job *job, struct amdgpu_ring *ring,
  struct dma_fence **fence);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 660867cf2597..adfde07eb75f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2066,7 +2066,7 @@ static int amdgpu_map_buffer(struct ttm_buffer_object *bo,
 if (r)
 goto error_free;

-   r = amdgpu_job_submit(job, >mman.entity,
+   r = amdgpu_job_submit(adev, job, >mman.entity,
   AMDGPU_FENCE_OWNER_UNDEFINED, );
 if (r)
 goto error_free;
@@ -2137,7 +2137,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t 
src_offset,
 if (direct_submit)
 r = amdgpu_job_submit_direct(job, ring, fence);
 else
-   r = amdgpu_job_submit(job, >mman.entity,
+   r = amdgpu_job_submit(adev, job, >mman.entity,
   AMDGPU_FENCE_OWNER_UNDEFINED, fence);
 if (r)
 

Re: [PATCH] drm/radeon: avoid page fault during gpu reset

2020-01-25 Thread Koenig, Christian


Am 25.01.2020 19:47 schrieb Andreas Messer :
When backing up a ring, validate pointer to avoid page fault.

When the drivers attempts to handle a gpu lockup, a page fault might occur
during call of radeon_ring_backup() since (*ring->next_rptr_cpu_addr) could
have invalid content:

  [ 3790.348267] radeon :01:00.0: ring 0 stalled for more than 10150msec
  [ 3790.348276] radeon :01:00.0: GPU lockup (current fence id 
0x000699e4 last fence id 0x000699f9 on ring 0)
  [ 3791.504484] BUG: unable to handle page fault for address: ba5602800ffc
  [ 3791.504485] #PF: supervisor read access in kernel mode
  [ 3791.504486] #PF: error_code(0x) - not-present page
  [ 3791.504487] PGD 851d3b067 P4D 851d3b067 PUD 0
  [ 3791.504488] Oops:  [#1] SMP PTI
  [ 3791.504490] CPU: 5 PID: 268 Comm: kworker/5:1H Tainted: GE 
5.4.8-amesser #3
  [ 3791.504491] Hardware name: Gigabyte Technology Co., Ltd. X170-WS 
ECC/X170-WS ECC-CF, BIOS F2 06/20/2016
  [ 3791.504507] Workqueue: radeon-crtc radeon_flip_work_func [radeon]
  [ 3791.504520] RIP: 0010:radeon_ring_backup+0xb9/0x130 [radeon]

It seems that my HD7750 enters such a state during thermal shutdown. Here
the kernel message with added debug print and fix:

  [ 2930.783094] radeon :01:00.0: ring 3 stalled for more than 10280msec
  [ 2930.783104] radeon :01:00.0: GPU lockup (current fence id 
0x0011194b last fence id 0x0011196a on ring 3)
  [ 2931.936653] radeon :01:00.0: Bad ptr 0x [   -1] for backup
  [ 2931.937704] radeon :01:00.0: GPU softreset: 0x0BFD
  [ 2931.937705] radeon :01:00.0:   GRBM_STATUS   = 0x
  [ 2931.937707] radeon :01:00.0:   GRBM_STATUS_SE0   = 0x

NAK, that was suggested multiple times now and is essentially the wrong 
approach.

The problem is that the value is invalid because the hardware is not functional 
any more. Returning here without backing up the ring just papers over the real 
problem.

This is just the first occurance of this and you would need to fix a couple of 
hundred register accesses (both inside and outside of the driver) to make that 
really work reliable.

The only advice I can give you is to replace the hardware. From experience 
those symptoms mean that your GPU will die rather soon.

Regards,
Christian.



Signed-off-by: Andreas Messer 
---
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 37093cea24c5..bf55a682442a 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -309,6 +309,12 @@ unsigned radeon_ring_backup(struct radeon_device *rdev, 
struct radeon_ring *ring
 return 0;
 }

+   /* ptr could be invalid after thermal shutdown */
+   if (ptr >= (ring->ring_size / 4)) {
+   mutex_unlock(>ring_lock);
+   return 0;
+   }
+
 size = ring->wptr + (ring->ring_size / 4);
 size -= ptr;
 size &= ring->ptr_mask;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-15 Thread Koenig, Christian
I know, that's the usual chicken and egg problem with updating libdrm.

But we should update the file with the kernel version and not pick all changes 
line by line.

Christian.

Am 15.11.2019 15:49 schrieb "Deucher, Alexander" :
We can't land the kernel side until we have real userspace (e.g., Mesa) that 
uses the TMZ interfaces.  The unit tests are not enough.

Alex



From: Christian König 
Sent: Friday, November 15, 2019 7:56 AM
To: Liu, Aaron ; amd-gfx@lists.freedesktop.org 

Cc: Olsak, Marek ; Huang, Ray ; Tuikov, 
Luben ; Deucher, Alexander ; 
Liu, Leo ; Koenig, Christian 
Subject: Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

Am 15.11.19 um 04:34 schrieb Aaron Liu:
> From: Huang Rui 
>
> To align the kernel uapi change from Alex:
>
> "Add a flag to the GEM_CREATE ioctl to create encrypted buffers. Buffers with
> this flag set will be created with the TMZ bit set in the PTEs or engines
> accessing them. This is required in order to properly access the data from the
> engines."
>
> We will use GEM_CREATE_ENCRYPTED flag for secure buffer allocation.
>
> Signed-off-by: Huang Rui 
> Reviewed-by: Alex Deucher 

Please read up on how amdpu_drm.h is updated. The change must first land
upstream and then the file is synced up somehow semi-automatic.

Christian.

> ---
>   include/drm/amdgpu_drm.h | 5 +
>   1 file changed, 5 insertions(+)
>
> diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
> index 5c28aa7..1a95e37 100644
> --- a/include/drm/amdgpu_drm.h
> +++ b/include/drm/amdgpu_drm.h
> @@ -141,6 +141,11 @@ extern "C" {
>* releasing the memory
>*/
>   #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE  (1 << 9)
> +/* Flag that BO will be encrypted and that the TMZ bit should be
> + * set in the PTEs when mapping this buffer via GPUVM or
> + * accessing it with various hw blocks
> + */
> +#define AMDGPU_GEM_CREATE_ENCRYPTED  (1 << 10)
>
>   /* Hybrid specific */
>   /* Flag that the memory allocation should be from top of domain */

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Koenig, Christian
Hi Emily,

exactly that can't happen. See here:

>     /* Don't destroy jobs while the timeout worker is running */
>     if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>     !cancel_delayed_work(>work_tdr))
>     return NULL;

We never free jobs while the timeout working is running to prevent 
exactly that issue.

Regards,
Christian.

Am 08.11.19 um 11:32 schrieb Deng, Emily:
> Hi Christian,
>   The drm_sched_job_timedout-> amdgpu_job_timedout call 
> amdgpu_device_gpu_recover. I mean the main scheduler free the jobs while in 
> amdgpu_device_gpu_recover, and before calling drm_sched_stop.
>
> Best wishes
> Emily Deng
>
>
>
>> -Original Message-
>> From: Koenig, Christian 
>> Sent: Friday, November 8, 2019 6:26 PM
>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>
>> Hi Emily,
>>
>> well who is calling amdgpu_device_gpu_recover() in this case?
>>
>> When it's not the scheduler we shouldn't have a guilty job in the first 
>> place.
>>
>> Regards,
>> Christian.
>>
>> Am 08.11.19 um 11:22 schrieb Deng, Emily:
>>> Hi Chrisitan,
>>>No, I am with the new branch and also has the patch. Even it are 
>>> freed by
>> main scheduler, how we could avoid main scheduler to free jobs while enter
>> to function amdgpu_device_gpu_recover?
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -Original Message-
>>>> From: Koenig, Christian 
>>>> Sent: Friday, November 8, 2019 6:15 PM
>>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>
>>>> Hi Emily,
>>>>
>>>> in this case you are on an old code branch.
>>>>
>>>> Jobs are freed now by the main scheduler thread and only if no
>>>> timeout handler is running.
>>>>
>>>> See this patch here:
>>>>> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
>>>>> Author: Christian König 
>>>>> Date:   Thu Apr 18 11:00:21 2019 -0400
>>>>>
>>>>>       drm/scheduler: rework job destruction
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 08.11.19 um 11:11 schrieb Deng, Emily:
>>>>> Hi Christian,
>>>>> Please refer to follow log, when it enter to
>>>>> amdgpu_device_gpu_recover
>>>> function, the bad job 5086879e is freeing in function
>>>> amdgpu_job_free_cb  at the same time, because of the hardware fence
>> signal.
>>>> But amdgpu_device_gpu_recover goes faster, at this case, the s_fence
>>>> is already freed, but job is not freed in time. Then this issue occurs.
>>>>> [  449.792189] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
>> sdma0
>>>>> timeout, signaled seq=2481, emitted seq=2483 [  449.793202]
>>>>> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
>>>> process  pid 0 thread  pid 0, s_job:5086879e [  449.794163]
>>>> amdgpu
>>>> :00:08.0: GPU reset begin!
>>>>> [  449.794175] Emily:amdgpu_job_free_cb,Process information: process
>>>>> pid 0 thread  pid 0, s_job:5086879e [  449.794221]
>>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>>> pid 0, s_job:66eb74ab [  449.794222]
>>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>>> pid 0, s_job:d4438ad9 [  449.794255]
>>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>>> pid 0, s_job:b6d69c65 [  449.794257]
>>>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>>>> pid 0,
>>>> s_job:ea85e922 [  449.794287]
>>>> Emily:amdgpu_job_free_cb,Process
>>>> information: process  pid 0 thread  pid 0, s_job:ed3a5ac6 [
>>>> 449.794366] BUG: unable to handle kernel NULL pointer dereference at
>>>> 00c0 [  449.800818] PGD 0 P4D 0 [  449.801040] Oops: 
>>>> [#1] SMP PTI
>>>>> [  449.801338] CPU: 3 PID: 55 Comm: kworker/3:1 Tainted: G   OE
>>>> 4.18.0-15-generic #16~18.04.1-Ubuntu
>>>>> [  449.802157] Hardware name: QEMU Standard PC (i440FX + PIIX,
>>>>> 19

Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Koenig, Christian
Hi Emily,

well who is calling amdgpu_device_gpu_recover() in this case?

When it's not the scheduler we shouldn't have a guilty job in the first 
place.

Regards,
Christian.

Am 08.11.19 um 11:22 schrieb Deng, Emily:
> Hi Chrisitan,
>   No, I am with the new branch and also has the patch. Even it are freed 
> by main scheduler, how we could avoid main scheduler to free jobs while enter 
> to function amdgpu_device_gpu_recover?
>
> Best wishes
> Emily Deng
>
>
>
>> -Original Message-
>> From: Koenig, Christian 
>> Sent: Friday, November 8, 2019 6:15 PM
>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>
>> Hi Emily,
>>
>> in this case you are on an old code branch.
>>
>> Jobs are freed now by the main scheduler thread and only if no timeout
>> handler is running.
>>
>> See this patch here:
>>> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
>>> Author: Christian König 
>>> Date:   Thu Apr 18 11:00:21 2019 -0400
>>>
>>>      drm/scheduler: rework job destruction
>> Regards,
>> Christian.
>>
>> Am 08.11.19 um 11:11 schrieb Deng, Emily:
>>> Hi Christian,
>>>Please refer to follow log, when it enter to 
>>> amdgpu_device_gpu_recover
>> function, the bad job 5086879e is freeing in function
>> amdgpu_job_free_cb  at the same time, because of the hardware fence signal.
>> But amdgpu_device_gpu_recover goes faster, at this case, the s_fence is
>> already freed, but job is not freed in time. Then this issue occurs.
>>> [  449.792189] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
>>> timeout, signaled seq=2481, emitted seq=2483 [  449.793202]
>>> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
>> process  pid 0 thread  pid 0, s_job:5086879e [  449.794163] amdgpu
>> :00:08.0: GPU reset begin!
>>> [  449.794175] Emily:amdgpu_job_free_cb,Process information: process
>>> pid 0 thread  pid 0, s_job:5086879e [  449.794221]
>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>> pid 0, s_job:66eb74ab [  449.794222]
>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>> pid 0, s_job:d4438ad9 [  449.794255]
>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread
>>> pid 0, s_job:b6d69c65 [  449.794257]
>>> Emily:amdgpu_job_free_cb,Process information: process  pid 0 thread  pid 0,
>> s_job:ea85e922 [  449.794287] Emily:amdgpu_job_free_cb,Process
>> information: process  pid 0 thread  pid 0, s_job:ed3a5ac6
>> [  449.794366] BUG: unable to handle kernel NULL pointer dereference at
>> 00c0 [  449.800818] PGD 0 P4D 0 [  449.801040] Oops: 
>> [#1] SMP PTI
>>> [  449.801338] CPU: 3 PID: 55 Comm: kworker/3:1 Tainted: G   OE
>> 4.18.0-15-generic #16~18.04.1-Ubuntu
>>> [  449.802157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>> BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [  449.802944] Workqueue: events
>>> drm_sched_job_timedout [amd_sched] [  449.803488] RIP:
>> 0010:amdgpu_device_gpu_recover+0x1da/0xb60 [amdgpu]
>>> [  449.804020] Code: dd ff ff 49 39 c5 48 89 55 a8 0f 85 56 ff ff ff 45 85 
>>> e4 0f
>> 85 a1 00 00 00 48 8b 45 b0 48 85 c0 0f 84 60 01 00 00 48 8b 40 10 <48> 8b 98
>> c0 00 00 00 48 85 db 0f 84 4c 01 00 00 48 8b 43 48 a8 01
>>> [  449.805593] RSP: 0018:b4c7c08f7d68 EFLAGS: 00010286 [
>>> 449.806032] RAX:  RBX:  RCX:
>>>  [  449.806625] RDX: b4c7c08f5ac0 RSI:
>>> 000fffe0 RDI: 0246 [  449.807224] RBP:
>>> b4c7c08f7de0 R08: 0068b9d54000 R09:  [
>>> 449.807818] R10:  R11: 0148 R12:
>>>  [  449.808411] R13: b4c7c08f7da0 R14:
>>> 8d82b8525d40 R15: 8d82b8525d40 [  449.809004] FS:
>>> () GS:8d82bfd8()
>>> knlGS: [  449.809674] CS:  0010 DS:  ES:  CR0:
>>> 80050033 [  449.810153] CR2: 00c0 CR3:
>>> 3cc0a001 CR4: 003606e0 [  449.810747] DR0:
>>  DR1:  DR2: 
>> [  449.811344] DR3:  DR6: fffe0ff0 DR7:
>> 0400 [  449.811937] Call Trace:
>>> [  449.812206]  amdgpu_job_timedout+0x114/0x140 [amdgpu] [
>>> 4

Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Koenig, Christian
Hi Emily,

in this case you are on an old code branch.

Jobs are freed now by the main scheduler thread and only if no timeout 
handler is running.

See this patch here:
> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
> Author: Christian König 
> Date:   Thu Apr 18 11:00:21 2019 -0400
>
>     drm/scheduler: rework job destruction

Regards,
Christian.

Am 08.11.19 um 11:11 schrieb Deng, Emily:
> Hi Christian,
>   Please refer to follow log, when it enter to amdgpu_device_gpu_recover 
> function, the bad job 5086879e is freeing in function  
> amdgpu_job_free_cb  at the same time, because of the hardware fence signal. 
> But amdgpu_device_gpu_recover goes faster, at this case, the s_fence is 
> already freed, but job is not freed in time. Then this issue occurs.
>
> [  449.792189] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, 
> signaled seq=2481, emitted seq=2483
> [  449.793202] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process 
> information: process  pid 0 thread  pid 0, s_job:5086879e
> [  449.794163] amdgpu :00:08.0: GPU reset begin!
> [  449.794175] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
> thread  pid 0, s_job:5086879e
> [  449.794221] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
> thread  pid 0, s_job:66eb74ab
> [  449.794222] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
> thread  pid 0, s_job:d4438ad9
> [  449.794255] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
> thread  pid 0, s_job:b6d69c65
> [  449.794257] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
> thread  pid 0, s_job:ea85e922
> [  449.794287] Emily:amdgpu_job_free_cb,Process information: process  pid 0 
> thread  pid 0, s_job:ed3a5ac6
> [  449.794366] BUG: unable to handle kernel NULL pointer dereference at 
> 00c0
> [  449.800818] PGD 0 P4D 0
> [  449.801040] Oops:  [#1] SMP PTI
> [  449.801338] CPU: 3 PID: 55 Comm: kworker/3:1 Tainted: G   OE 
> 4.18.0-15-generic #16~18.04.1-Ubuntu
> [  449.802157] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [  449.802944] Workqueue: events drm_sched_job_timedout [amd_sched]
> [  449.803488] RIP: 0010:amdgpu_device_gpu_recover+0x1da/0xb60 [amdgpu]
> [  449.804020] Code: dd ff ff 49 39 c5 48 89 55 a8 0f 85 56 ff ff ff 45 85 e4 
> 0f 85 a1 00 00 00 48 8b 45 b0 48 85 c0 0f 84 60 01 00 00 48 8b 40 10 <48> 8b 
> 98 c0 00 00 00 48 85 db 0f 84 4c 01 00 00 48 8b 43 48 a8 01
> [  449.805593] RSP: 0018:b4c7c08f7d68 EFLAGS: 00010286
> [  449.806032] RAX:  RBX:  RCX: 
> 
> [  449.806625] RDX: b4c7c08f5ac0 RSI: 000fffe0 RDI: 
> 0246
> [  449.807224] RBP: b4c7c08f7de0 R08: 0068b9d54000 R09: 
> 
> [  449.807818] R10:  R11: 0148 R12: 
> 
> [  449.808411] R13: b4c7c08f7da0 R14: 8d82b8525d40 R15: 
> 8d82b8525d40
> [  449.809004] FS:  () GS:8d82bfd8() 
> knlGS:
> [  449.809674] CS:  0010 DS:  ES:  CR0: 80050033
> [  449.810153] CR2: 00c0 CR3: 3cc0a001 CR4: 
> 003606e0
> [  449.810747] DR0:  DR1:  DR2: 
> 
> [  449.811344] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  449.811937] Call Trace:
> [  449.812206]  amdgpu_job_timedout+0x114/0x140 [amdgpu]
> [  449.812635]  drm_sched_job_timedout+0x44/0x90 [amd_sched]
> [  449.813139]  ? amdgpu_cgs_destroy_device+0x10/0x10 [amdgpu]
> [  449.813609]  ? drm_sched_job_timedout+0x44/0x90 [amd_sched]
> [  449.814077]  process_one_work+0x1fd/0x3f0
> [  449.814417]  worker_thread+0x34/0x410
> [  449.814728]  kthread+0x121/0x140
> [  449.815004]  ? process_one_work+0x3f0/0x3f0
> [  449.815374]  ? kthread_create_worker_on_cpu+0x70/0x70
> [  449.815799]  ret_from_fork+0x35/0x40
>
>> -Original Message-
>> From: Koenig, Christian 
>> Sent: Friday, November 8, 2019 5:43 PM
>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>
>> Am 08.11.19 um 10:39 schrieb Deng, Emily:
>>> Sorry, please take your time.
>> Have you seen my other response a bit below?
>>
>> I can't follow how it would be possible for job->s_fence to be NULL without
>> the job also being freed.
>>
>> So it looks like this patch is just papering over some bigger issues.
>>
>> Regards,
>> Christian.
>>
>>> Best wishes

Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Koenig, Christian
Am 08.11.19 um 10:39 schrieb Deng, Emily:
> Sorry, please take your time.

Have you seen my other response a bit below?

I can't follow how it would be possible for job->s_fence to be NULL 
without the job also being freed.

So it looks like this patch is just papering over some bigger issues.

Regards,
Christian.

>
> Best wishes
> Emily Deng
>
>
>
>> -Original Message-----
>> From: Koenig, Christian 
>> Sent: Friday, November 8, 2019 5:08 PM
>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>
>> Am 08.11.19 um 09:52 schrieb Deng, Emily:
>>> Ping.
>> You need to give me at least enough time to wake up :)
>>
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -Original Message-
>>>> From: amd-gfx  On Behalf Of
>>>> Deng, Emily
>>>> Sent: Friday, November 8, 2019 10:56 AM
>>>> To: Koenig, Christian ; amd-
>>>> g...@lists.freedesktop.org
>>>> Subject: RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>
>>>>> -Original Message-
>>>>> From: Christian König 
>>>>> Sent: Thursday, November 7, 2019 7:28 PM
>>>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>>
>>>>> Am 07.11.19 um 11:25 schrieb Emily Deng:
>>>>>> When the job is already signaled, the s_fence is freed. Then it
>>>>>> will has null pointer in amdgpu_device_gpu_recover.
>>>>> NAK, the s_fence is only set to NULL when the job is destroyed. See
>>>>> drm_sched_job_cleanup().
>>>> I know it is set to NULL in drm_sched_job_cleanup. But in one case,
>>>> when it enter into the amdgpu_device_gpu_recover, it already in
>>>> drm_sched_job_cleanup, and at this time, it will go to free job. But
>>>> the amdgpu_device_gpu_recover sometimes is faster. At that time, job
>>>> is not freed, but s_fence is already NULL.
>> No, that case can't happen. See here:
>>
>>>      drm_sched_job_cleanup(s_job);
>>>
>>>      amdgpu_ring_priority_put(ring, s_job->s_priority);
>>>      dma_fence_put(job->fence);
>>>      amdgpu_sync_free(>sync);
>>>      amdgpu_sync_free(>sched_sync);
>>>      kfree(job);
>> The job itself is freed up directly after freeing the reference to the 
>> s_fence.
>>
>> So you are just papering over a much bigger problem here. This patch is a
>> clear NAK.
>>
>> Regards,
>> Christian.
>>
>>>>> When you see a job without an s_fence then that means the problem is
>>>>> somewhere else.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: Emily Deng 
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 11 ++-
>>>>>> 2 files changed, 7 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index e6ce949..5a8f08e 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -4075,7 +4075,7 @@ int amdgpu_device_gpu_recover(struct
>>>>> amdgpu_device *adev,
>>>>>>   *
>>>>>>   * job->base holds a reference to parent fence
>>>>>>   */
>>>>>> -if (job && job->base.s_fence->parent &&
>>>>>> +if (job && job->base.s_fence && job->base.s_fence->parent &&
>>>>>>  dma_fence_is_signaled(job->base.s_fence->parent))
>>>>>>  job_signaled = true;
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 31809ca..56cc10e 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>

Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr

2019-11-08 Thread Koenig, Christian
Am 08.11.19 um 09:52 schrieb Deng, Emily:
> Ping.

You need to give me at least enough time to wake up :)

>
>
> Best wishes
> Emily Deng
>
>
>
>> -Original Message-
>> From: amd-gfx  On Behalf Of Deng,
>> Emily
>> Sent: Friday, November 8, 2019 10:56 AM
>> To: Koenig, Christian ; amd-
>> g...@lists.freedesktop.org
>> Subject: RE: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Thursday, November 7, 2019 7:28 PM
>>> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>
>>> Am 07.11.19 um 11:25 schrieb Emily Deng:
>>>> When the job is already signaled, the s_fence is freed. Then it will
>>>> has null pointer in amdgpu_device_gpu_recover.
>>> NAK, the s_fence is only set to NULL when the job is destroyed. See
>>> drm_sched_job_cleanup().
>> I know it is set to NULL in drm_sched_job_cleanup. But in one case, when it
>> enter into the amdgpu_device_gpu_recover, it already in
>> drm_sched_job_cleanup, and at this time, it will go to free job. But the
>> amdgpu_device_gpu_recover sometimes is faster. At that time, job is not
>> freed, but s_fence is already NULL.

No, that case can't happen. See here:

>     drm_sched_job_cleanup(s_job);
>
>     amdgpu_ring_priority_put(ring, s_job->s_priority);
>     dma_fence_put(job->fence);
>     amdgpu_sync_free(>sync);
>     amdgpu_sync_free(>sched_sync);
>     kfree(job);

The job itself is freed up directly after freeing the reference to the 
s_fence.

So you are just papering over a much bigger problem here. This patch is 
a clear NAK.

Regards,
Christian.

>>> When you see a job without an s_fence then that means the problem is
>>> somewhere else.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Signed-off-by: Emily Deng 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
>>>>drivers/gpu/drm/scheduler/sched_main.c | 11 ++-
>>>>2 files changed, 7 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index e6ce949..5a8f08e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -4075,7 +4075,7 @@ int amdgpu_device_gpu_recover(struct
>>> amdgpu_device *adev,
>>>> *
>>>> * job->base holds a reference to parent fence
>>>> */
>>>> -  if (job && job->base.s_fence->parent &&
>>>> +  if (job && job->base.s_fence && job->base.s_fence->parent &&
>>>>dma_fence_is_signaled(job->base.s_fence->parent))
>>>>job_signaled = true;
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 31809ca..56cc10e 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -334,8 +334,8 @@ void drm_sched_increase_karma(struct
>>> drm_sched_job
>>>> *bad)
>>>>
>>>>spin_lock(>lock);
>>>>list_for_each_entry_safe(entity, tmp, 
>>>> >entities,
>>> list) {
>>>> -  if (bad->s_fence->scheduled.context ==
>>>> -  entity->fence_context) {
>>>> +  if (bad->s_fence && (bad->s_fence-
>>>> scheduled.context ==
>>>> +  entity->fence_context)) {
>>>>if (atomic_read(>karma) >
>>>>bad->sched->hang_limit)
>>>>if (entity->guilty)
>>>> @@ -376,7 +376,7 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>> *sched, struct drm_sched_job *bad)
>>>> * This iteration is thread safe as sched thread is stopped.
>>>> */
>>>>list_for_each_entry_safe_reverse(s_job, tmp, 
>>>> ring_mirror_lis

Re: [PATCH 1/4] Revert "drm/amdgpu: dont schedule jobs while in reset"

2019-11-07 Thread Koenig, Christian
Am 06.11.19 um 18:51 schrieb Andrey Grodzovsky:
> This reverts commit 3cdf9bd0089723c468d5f6240e54d1afa52e9a04.
>
> We will do a proper fix in next patch.
>
> Signed-off-by: Andrey Grodzovsky 

The order of this one and patch #2 needs to be swapped, or otherwise we 
have the bug in between those two commits again.

Apart from that the series is Reviewed-by: Christian König 
.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 +
>   1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 2cdaf3b..6614d8a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,11 +604,8 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
> *mgr)
>   continue;
>   }
>   
> - for (i = 0; i < num_entities; i++) {
> - mutex_lock(>adev->lock_reset);
> + for (i = 0; i < num_entities; i++)
>   drm_sched_entity_fini(>entities[0][i].entity);
> - mutex_unlock(>adev->lock_reset);
> - }
>   }
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Add comments to gmc structure

2019-11-07 Thread Koenig, Christian
Am 06.11.19 um 21:05 schrieb Zeng, Oak:
> Thanks Alex.
>
>> AGP is also used for page tables in system memory.
> I am not aware of this usage. I thought page table are all in frame buffer 
> today. Was this a use case in old asics?

No, that is pretty new and only works for Renoir. But we have disabled 
it currently because of bad interactions with IOMMU.

Christian.

>
> Oak
>
> -Original Message-
> From: Alex Deucher 
> Sent: Wednesday, November 6, 2019 12:37 PM
> To: Zeng, Oak 
> Cc: amd-gfx@lists.freedesktop.org; Kuehling, Felix ; 
> Koenig, Christian 
> Subject: Re: [PATCH] drm/amdgpu: Add comments to gmc structure
>
> On Wed, Nov 6, 2019 at 12:21 PM Zeng, Oak  wrote:
>> Explain fields like aper_base, agp_start etc. The definition of those
>> fields are confusing as they are from different view (CPU or GPU). Add
>> comments for easier understand.
>>
>> Change-Id: I02c2a27cd0dbc205498eb86aafa722f2e0c25fe6
>> Signed-off-by: Oak Zeng 
> A few comments below, otherwise looks good to me.
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 25
>> +
>>   1 file changed, 25 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
>> index 555d8e5..8003201 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
>> @@ -127,18 +127,43 @@ struct amdgpu_xgmi {  };
>>
>>   struct amdgpu_gmc {
>> +   /* FB's physical address in MMIO space (for CPU to
>> +* map FB). This is different compared to the apg/
> apg -> agp
>
>> +* gart/vram_start/end field as the later is from
>> +* GPU's view and aper_base is from CPU's view.
>> +*/
>>  resource_size_t aper_size;
>>  resource_size_t aper_base;
>>  /* for some chips with <= 32MB we need to lie
>>   * about vram size near mc fb location */
>>  u64 mc_vram_size;
>>  u64 visible_vram_size;
>> +   /* APG aperture start and end in MC address space
> APG -> AGP
>
>> +* Driver find a hole in the MC address space
>> +* to place AGP by setting MC_VM_AGP_BOT/TOP registers
>> +* Under VMID0, logical address == MC address
>> +* AGP aperture is used to simulate FB in ZFB case
>> +*/
> You may want to add a comment that the AGP aperture just maps to physical bus 
> or IOVA addresses on the platform.  It's also used for page tables in system 
> memory.
>
>>  u64 agp_size;
>>  u64 agp_start;
>>  u64 agp_end;
>> +   /* GART aperture start and end in MC address space
>> +* Driver find a hole in the MC address space
>> +* to place GART by setting VM_CONTEXT0_PAGE_TABLE_START/END_ADDR
>> +* registers
>> +* Under VMID0, logical address inside GART aperture will
>> +* be translated through gpuvm gart page table to access
>> +* paged system memory
>> +*/
>>  u64 gart_size;
>>  u64 gart_start;
>>  u64 gart_end;
>> +   /* Frame buffer aperture of this GPU device. Different from
>> +* fb_start (see below), this only covers the local GPU device.
>> +* Driver get fb_start from MC_VM_FB_LOCATION_BASE (set by vbios)
>> +* and calculate vram_start of this local device by adding an
>> +* offset inside the XGMI hive.
>> +*/
>>  u64 vram_start;
>>  u64 vram_end;
>>  /* FB region , it's same as local vram region in single GPU,
>> in XGMI
>> --
>> 2.7.4
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdgpu: fix double reference dropping

2019-11-06 Thread Koenig, Christian
Am 06.11.19 um 12:35 schrieb Pan Bian:
> The reference to object fence is dropped at the end of the loop.
> However, it is dropped again outside the loop. The reference can be
> dropped immediately after calling dma_fence_wait() in the loop and
> thus the dropping operation outside the loop can be removed.
>
> Signed-off-by: Pan Bian 

Reviewed-by: Christian König 

> ---
> v2: fix the bug in a more concise way
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 6 ++
>   1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> index 649e68c4479b..d1495e1c9289 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> @@ -33,7 +33,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device 
> *adev, unsigned size,
>   {
>   unsigned long start_jiffies;
>   unsigned long end_jiffies;
> - struct dma_fence *fence = NULL;
> + struct dma_fence *fence;
>   int i, r;
>   
>   start_jiffies = jiffies;
> @@ -44,16 +44,14 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device 
> *adev, unsigned size,
>   if (r)
>   goto exit_do_move;
>   r = dma_fence_wait(fence, false);
> + dma_fence_put(fence);
>   if (r)
>   goto exit_do_move;
> - dma_fence_put(fence);
>   }
>   end_jiffies = jiffies;
>   r = jiffies_to_msecs(end_jiffies - start_jiffies);
>   
>   exit_do_move:
> - if (fence)
> - dma_fence_put(fence);
>   return r;
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: add dummy read by engines for some GCVM status registers in gfx10

2019-11-06 Thread Koenig, Christian
Am 06.11.19 um 11:52 schrieb Zhu, Changfeng:
> From: changzhu 
>
> The GRBM register interface is now capable of bursting 1 cycle per
> register wr->wr, wr->rd much faster than previous muticycle per
> transaction done interface.  This has caused a problem where
> status registers requiring HW to update have a 1 cycle delay, due
> to the register update having to go through GRBM.
>
> For cp ucode, it has realized dummy read in cp firmware.It covers
> the use of WAIT_REG_MEM operation 1 case only.So it needs to call
> gfx_v10_0_wait_reg_mem in gfx10. Besides it also needs to add warning to
> update firmware in case firmware is too old to have function to realize
> dummy read in cp firmware.
>
> For sdma ucode, it hasn't realized dummy read in sdma firmware. sdma is
> moved to gfxhub in gfx10. So it needs to add dummy read in driver
> between amdgpu_ring_emit_wreg and amdgpu_ring_emit_reg_wait for sdma_v5_0.
>
> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
> Signed-off-by: changzhu 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 48 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  8 ++---
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 13 ++-
>   4 files changed, 64 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 459aa9059542..a74ecd449775 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -267,6 +267,7 @@ struct amdgpu_gfx {
>   uint32_tmec2_feature_version;
>   boolmec_fw_write_wait;
>   boolme_fw_write_wait;
> + boolcp_fw_write_wait;
>   struct amdgpu_ring  gfx_ring[AMDGPU_MAX_GFX_RINGS];
>   unsignednum_gfx_rings;
>   struct amdgpu_ring  compute_ring[AMDGPU_MAX_COMPUTE_RINGS];
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 17a5cbfd0024..c7a6f98bf6b8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -561,6 +561,32 @@ static void gfx_v10_0_free_microcode(struct 
> amdgpu_device *adev)
>   kfree(adev->gfx.rlc.register_list_format);
>   }
>   
> +static void gfx_v10_0_check_fw_write_wait(struct amdgpu_device *adev)
> +{
> + adev->gfx.cp_fw_write_wait = false;
> +
> + switch (adev->asic_type) {
> + case CHIP_NAVI10:
> + case CHIP_NAVI12:
> + case CHIP_NAVI14:
> + if ((adev->gfx.me_fw_version >= 0x0046) &&
> + (adev->gfx.me_feature_version >= 27) &&
> + (adev->gfx.pfp_fw_version >= 0x0068) &&
> + (adev->gfx.pfp_feature_version >= 27) &&
> + (adev->gfx.mec_fw_version >= 0x005b) &&
> + (adev->gfx.mec_feature_version >= 27))
> + adev->gfx.cp_fw_write_wait = true;
> + break;
> + default:
> + break;
> + }
> +
> + if (adev->gfx.cp_fw_write_wait == false)
> + DRM_WARN_ONCE("Warning: check cp_fw_version and update it to 
> realize \
> +   GRBM requires 1-cycle delay in cp firmware\n");
> +}
> +
> +
>   static void gfx_v10_0_init_rlc_ext_microcode(struct amdgpu_device *adev)
>   {
>   const struct rlc_firmware_header_v2_1 *rlc_hdr;
> @@ -829,6 +855,7 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device 
> *adev)
>   }
>   }
>   
> + gfx_v10_0_check_fw_write_wait(adev);
>   out:
>   if (err) {
>   dev_err(adev->dev,
> @@ -4768,6 +4795,24 @@ static void gfx_v10_0_ring_emit_reg_wait(struct 
> amdgpu_ring *ring, uint32_t reg,
>   gfx_v10_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20);
>   }
>   
> +static void gfx_v10_0_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
> +uint32_t reg0, uint32_t reg1,
> +uint32_t ref, uint32_t mask)
> +{
> + int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
> + struct amdgpu_device *adev = ring->adev;
> + bool fw_version_ok = false;
> +
> + fw_version_ok = adev->gfx.cp_fw_write_wait;
> +
> + if (fw_version_ok)
> + gfx_v10_0_wait_reg_mem(ring, usepfp, 0, 1, reg0, reg1,
> +ref, mask, 0x20);
> + else
> + amdgpu_ring_emit_reg_write_reg_wait_helper(ring, reg0, reg1,
> +ref, mask);
> +}
> +
>   static void
>   gfx_v10_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev,
> uint32_t me, uint32_t pipe,
> @@ -5158,6 +5203,7 @@ static const struct amdgpu_ring_funcs 
> 

Re: [PATCH] drm/amdgpu: fix double reference dropping

2019-11-06 Thread Koenig, Christian
Am 06.11.19 um 10:53 schrieb Pan Bian:
> After dropping the reference of object fence in the loop, it should be
> set to NULL to protecting dropping its reference again outside the loop.

NAK, the actual bug is that we shouldn't drop the fence outside the loop 
in the first place.

Just move the dma_fence_put(fence); two lines up and drop initializing 
fence to NULL.

Regards,
Christian.

>
> Signed-off-by: Pan Bian 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> index 649e68c4479b..3174093f35f3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c
> @@ -47,6 +47,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device 
> *adev, unsigned size,
>   if (r)
>   goto exit_do_move;
>   dma_fence_put(fence);
> + fence = NULL;
>   }
>   end_jiffies = jiffies;
>   r = jiffies_to_msecs(end_jiffies - start_jiffies);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix potential double drop fence reference

2019-11-06 Thread Koenig, Christian
Am 06.11.19 um 10:14 schrieb Pan Bian:
> The object fence is not set to NULL after its reference is dropped. As a
> result, its reference may be dropped again if error occurs after that,
> which may lead to a use after free bug. To avoid the issue, fence is
> explicitly set to NULL after dropping its reference.
>
> Signed-off-by: Pan Bian 

Acked-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_test.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_test.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_test.c
> index b66d29d5ffa2..b158230af8db 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_test.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_test.c
> @@ -138,6 +138,7 @@ static void amdgpu_do_test_moves(struct amdgpu_device 
> *adev)
>   }
>   
>   dma_fence_put(fence);
> + fence = NULL;
>   
>   r = amdgpu_bo_kmap(vram_obj, _map);
>   if (r) {
> @@ -183,6 +184,7 @@ static void amdgpu_do_test_moves(struct amdgpu_device 
> *adev)
>   }
>   
>   dma_fence_put(fence);
> + fence = NULL;
>   
>   r = amdgpu_bo_kmap(gtt_obj[i], _map);
>   if (r) {

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: add dummy read by engines for some GCVM status registers in gfx10

2019-11-06 Thread Koenig, Christian
Am 06.11.19 um 09:21 schrieb Zhu, Changfeng:
> From: changzhu 
>
> The GRBM register interface is now capable of bursting 1 cycle per
> register wr->wr, wr->rd much faster than previous muticycle per
> transaction done interface.  This has caused a problem where
> status registers requiring HW to update have a 1 cycle delay, due
> to the register update having to go through GRBM.
>
> For cp ucode, it has realized dummy read in cp firmware.It covers
> the use of WAIT_REG_MEM operation 1 case only.So it needs to call
> gfx_v10_0_wait_reg_mem in gfx10. Besides it also needs to add warning to
> update firmware in case firmware is too old to have function to realize
> dummy read in cp firmware.
>
> For sdma ucode, it hasn't realized dummy read in sdma firmware. sdma is
> moved to gfxhub in gfx10. So it needs to add dummy read in driver
> between amdgpu_ring_emit_wreg and amdgpu_ring_emit_reg_wait for sdma_v5_0.
>
> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 48 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  8 ++---
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 13 ++-
>   4 files changed, 64 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 459aa9059542..a74ecd449775 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -267,6 +267,7 @@ struct amdgpu_gfx {
>   uint32_tmec2_feature_version;
>   boolmec_fw_write_wait;
>   boolme_fw_write_wait;
> + boolcp_fw_write_wait;
>   struct amdgpu_ring  gfx_ring[AMDGPU_MAX_GFX_RINGS];
>   unsignednum_gfx_rings;
>   struct amdgpu_ring  compute_ring[AMDGPU_MAX_COMPUTE_RINGS];
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 17a5cbfd0024..acdb0e4df9b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -561,6 +561,32 @@ static void gfx_v10_0_free_microcode(struct 
> amdgpu_device *adev)
>   kfree(adev->gfx.rlc.register_list_format);
>   }
>   
> +static void gfx_v10_0_check_fw_write_wait(struct amdgpu_device *adev)
> +{
> + adev->gfx.cp_fw_write_wait = false;
> +
> + switch (adev->asic_type) {
> + case CHIP_NAVI10:
> + case CHIP_NAVI12:
> + case CHIP_NAVI14:
> + if ((adev->gfx.me_fw_version >= 0x0046) &&
> + (adev->gfx.me_feature_version >= 27) &&
> + (adev->gfx.pfp_fw_version >= 0x0068) &&
> + (adev->gfx.pfp_feature_version >= 27) &&
> + (adev->gfx.mec_fw_version >= 0x005b) &&
> + (adev->gfx.mec_feature_version >= 27))
> + adev->gfx.cp_fw_write_wait = true;
> + break;
> + default:
> + break;
> + }
> +
> + if (adev->gfx.cp_fw_write_wait == false)
> + DRM_WARN_ONCE("Warning: check cp_fw_version and update it to 
> realize \
> +   GRBM requires 1-cycle delay in cp 
> firmware\n");
> +}
> +
> +
>   static void gfx_v10_0_init_rlc_ext_microcode(struct amdgpu_device *adev)
>   {
>   const struct rlc_firmware_header_v2_1 *rlc_hdr;
> @@ -4768,6 +4794,25 @@ static void gfx_v10_0_ring_emit_reg_wait(struct 
> amdgpu_ring *ring, uint32_t reg,
>   gfx_v10_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20);
>   }
>   
> +static void gfx_v10_0_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
> +   uint32_t reg0, uint32_t reg1,
> +   uint32_t ref, uint32_t mask)
> +{
> + int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
> + struct amdgpu_device *adev = ring->adev;
> + bool fw_version_ok = false;
> +
> + gfx_v10_0_check_fw_write_wait(adev);

Doing that here is invalid, the function can be called concurrently from 
multiple threads.

You either need to do this only once from some of the _init() callbacks 
or drop the adev->gfx.cp_fw_write_wait variable and instead return the 
result here.

> + fw_version_ok = adev->gfx.cp_fw_write_wait;
> +
> + if (fw_version_ok)
> + gfx_v10_0_wait_reg_mem(ring, usepfp, 0, 1, reg0, reg1,
> +   ref, mask, 0x20);
> + else
> + amdgpu_ring_emit_reg_write_reg_wait_helper(ring, reg0, reg1,
> +ref, mask);
> +}
> +
>   static void
>   gfx_v10_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev,
> uint32_t me, uint32_t pipe,
> @@ -5158,6 +5203,7 @@ static const 

Re: [PATCH] drm/amdgpu: remove PT BOs when unmapping

2019-11-05 Thread Koenig, Christian
Hi Eric,

Ah! Yeah that is a well known issue.

Basic problem is that for releasing the BOs we need to reserve them to 
check if they are idle or not.

I've got a branch with a TTM change to avoid that, but essentially that 
is a huge problem which needs a rather big change in memory management 
to fix.

Regards,
Christian.

Am 05.11.19 um 17:27 schrieb Huang, JinHuiEric:
> Hi Christian,
>
> I found the reason why page tables are not freed when unmapping. All the
> pts are reserved, then they are not freed until vm fini. So the
> consequences are old pts and new pts for the same VAs will exist till vm
> fini. In KFD big buffer strees test, multiple times of mapping and
> unmapping a big range of system memory causes huge vram pts usage
> accumulation.
>
> I tried to avoid generating duplicated pts during unmapping in
> amdgpu_vm_update_ptes() by skipping amdgpu_vm_free_pts() and not
> reserving the lowest pts, but they didn't work with VM fault. The only
> way working is skipping whole function amdgpu_vm_update_ptes(), but it
> seems wrong, because we have to update GPU VM MMU.
>
> So there is no bug in amdgpu_vm_update_ptes(), but the accumulation of
> pts vram usage is an overhead. Do you think what we can do to get better
> solution?
>
> Regards,
>
> Eric
>
> On 2019-10-31 10:33 a.m., Huang, JinHuiEric wrote:
>> The hardware is vega10 and test is KFDMemoryTest.BigBufferStressTest.
>> More detail is on Jira SWDEV-201443.
>>
>> Regards,
>>
>> Eric
>>
>> On 2019-10-31 10:08 a.m., StDenis, Tom wrote:
>>> I could try it on my carrizo/polaris setup.  Is there a test procedure I
>>> could folllow to trigger the changed code paths?
>>>
>>>
>>> Tom
>>>
>>> On 2019-10-31 6:41 a.m., Koenig, Christian wrote:
>>>> Just tested this and amdgpu_vm_update_ptes() indeed works as expected.
>>>>
>>>> When you free at least a 2MB the lowest level of page tables is freed
>>>> up again.
>>>>
>>>> BTW: What hardware have you tested this on? On gfx8 and older it is
>>>> expected that page tables are never freed.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 30.10.19 um 19:11 schrieb Christian König:
>>>>> Then I don't see how this patch actually changes anything.
>>>>>
>>>>> Could only be a bug in amdgpu_vm_update_ptes(). Going to investigate
>>>>> this, but I won't have time to look into the ticket in detail.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 30.10.19 um 19:00 schrieb Huang, JinHuiEric:
>>>>>> Actually I do prevent to remove in-use pts by this:
>>>>>>
>>>>>> +   r = amdgpu_vm_remove_ptes(adev, vm,
>>>>>> +   (mapping->start + 0x1ff) & (~0x1ffll),
>>>>>> +   (mapping->last + 1) & (~0x1ffll));
>>>>>>
>>>>>> Which is only removing aligned page table for 2M. And I have tested
>>>>>> it at least on KFD tests without anything broken.
>>>>>>
>>>>>> By the way, I am not familiar with memory staff. This patch is the
>>>>>> best I can do for now. Could you take a look at the Jira ticket
>>>>>> SWDEV-201443 ? and find the better solution. Thanks!
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>> On 2019-10-30 1:57 p.m., Christian König wrote:
>>>>>>> One thing I've forgotten:
>>>>>>>
>>>>>>> What you could maybe do to improve the situation is to join
>>>>>>> adjacent ranges in amdgpu_vm_clear_freed(), but I'm not sure how
>>>>>>> the chances are that the ranges are freed all together.
>>>>>>>
>>>>>>> The only other alternative I can see would be to check the mappings
>>>>>>> of a range in amdgpu_update_ptes() and see if you could walk the
>>>>>>> tree up if the valid flag is not set and there are no mappings left
>>>>>>> for a page table.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 30.10.19 um 18:42 schrieb Koenig, Christian:
>>>>>>>>> The vaild flag doesn't take effect in this function.
>>>&g

Re: [PATCH 1/2] drm/amdgpu: add dummy read by engines for some GCVM status registers in gfx10

2019-11-05 Thread Koenig, Christian
Am 05.11.19 um 12:42 schrieb Zhu, Changfeng:
> From: changzhu 
>
> The GRBM register interface is now capable of bursting 1 cycle per
> register wr->wr, wr->rd much faster than previous muticycle per
> transaction done interface.  This has caused a problem where
> status registers requiring HW to update have a 1 cycle delay, due
> to the register update having to go through GRBM.
>
> For cp ucode, it has realized dummy read in cp firmware.It covers
> the use of WAIT_REG_MEM operation 1 case only.So it needs to call
> gfx_v10_0_wait_reg_mem in gfx10. Besides it also needs to add warning to
> update firmware in case firmware is too old to have function to realize
> dummy read in cp firmware.
>
> For sdma ucode, it hasn't realized dummy read in sdma firmware. sdma is
> moved to gfxhub in gfx10. So it needs to add dummy read in driver
> between amdgpu_ring_emit_wreg and amdgpu_ring_emit_reg_wait for sdma_v5_0.
>
> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 47 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  8 ++---
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 13 ++-
>   4 files changed, 63 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 459aa9059542..a74ecd449775 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -267,6 +267,7 @@ struct amdgpu_gfx {
>   uint32_tmec2_feature_version;
>   boolmec_fw_write_wait;
>   boolme_fw_write_wait;
> + boolcp_fw_write_wait;
>   struct amdgpu_ring  gfx_ring[AMDGPU_MAX_GFX_RINGS];
>   unsignednum_gfx_rings;
>   struct amdgpu_ring  compute_ring[AMDGPU_MAX_COMPUTE_RINGS];
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 17a5cbfd0024..e82b6d796b69 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -561,6 +561,32 @@ static void gfx_v10_0_free_microcode(struct 
> amdgpu_device *adev)
>   kfree(adev->gfx.rlc.register_list_format);
>   }
>   
> +static void gfx_v10_0_check_fw_write_wait(struct amdgpu_device *adev)
> +{
> + adev->gfx.cp_fw_write_wait = false;
> +
> + switch (adev->asic_type) {
> + case CHIP_NAVI10:
> + case CHIP_NAVI12:
> + case CHIP_NAVI14:
> + if ((adev->gfx.me_fw_version >= 0x0046) &&
> + (adev->gfx.me_feature_version >= 27) &&
> + (adev->gfx.pfp_fw_version >= 0x0068) &&
> + (adev->gfx.pfp_feature_version >= 27) &&
> + (adev->gfx.mec_fw_version >= 0x005b) &&
> + (adev->gfx.mec_feature_version >= 27))
> + adev->gfx.cp_fw_write_wait = true;
> + break;
> + default:
> + break;
> + }
> +
> + if (adev->gfx.cp_fw_write_wait == false)
> + DRM_WARN_ONCE("Warning: check cp_fw_version and update it to 
> realize \
> +   GRBM requires 1-cycle delay in cp 
> firmware\n");
> +}
> +
> +
>   static void gfx_v10_0_init_rlc_ext_microcode(struct amdgpu_device *adev)
>   {
>   const struct rlc_firmware_header_v2_1 *rlc_hdr;
> @@ -4768,6 +4794,25 @@ static void gfx_v10_0_ring_emit_reg_wait(struct 
> amdgpu_ring *ring, uint32_t reg,
>   gfx_v10_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20);
>   }
>   
> +static void gfx_v10_0_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
> +   uint32_t reg0, uint32_t reg1,
> +   uint32_t ref, uint32_t mask)
> +{
> + int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
> + struct amdgpu_device *adev = ring->adev;
> + bool fw_version_ok = false;
> +
> + gfx_v10_0_check_fw_write_wait(adev);
> + fw_version_ok = adev->gfx.cp_fw_write_wait;
> +
> + if (fw_version_ok)
> + gfx_v10_0_wait_reg_mem(ring, usepfp, 0, 1, reg0, reg1,
> +   ref, mask, 0x20);
> + else
> + amdgpu_ring_emit_reg_write_reg_wait_helper(ring, reg0, reg1,
> +ref, mask);
> +}
> +
>   static void
>   gfx_v10_0_set_gfx_eop_interrupt_state(struct amdgpu_device *adev,
> uint32_t me, uint32_t pipe,
> @@ -5158,6 +5203,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v10_0_ring_funcs_gfx = {
>   .emit_tmz = gfx_v10_0_ring_emit_tmz,
>   .emit_wreg = gfx_v10_0_ring_emit_wreg,
>   .emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
> + .emit_reg_write_reg_wait = 

Re: [PATCH 2/2] drm/amdgpu: add warning for GRBM 1-cycle delay issue in gfx9

2019-11-05 Thread Koenig, Christian
Am 05.11.19 um 12:42 schrieb Zhu, Changfeng:
> From: changzhu 
>
> It needs to add warning to update firmware in gfx9
> in case that firmware is too old to have function to
> realize dummy read in cp firmware.
>
> Change-Id: I6aef94f0823138f244f1eedb62fde833dd697023
> Signed-off-by: changzhu 

Reviewed-by: Christian König  for this one.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 7 +++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 9d5f900e3e1c..f2deb225c8a9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -982,6 +982,13 @@ static void gfx_v9_0_check_fw_write_wait(struct 
> amdgpu_device *adev)
>   adev->gfx.me_fw_write_wait = false;
>   adev->gfx.mec_fw_write_wait = false;
>   
> + if ((adev->gfx.mec_fw_version < 0x01a5) ||
> + (adev->gfx.mec_feature_version < 46) ||
> + (adev->gfx.pfp_fw_version < 0x00b7) ||
> + (adev->gfx.pfp_feature_version < 46))
> + DRM_WARN_ONCE("Warning: check cp_fw_version and update it to 
> realize \
> + GRBM requires 1-cycle delay in 
> cp firmware\n");
> +
>   switch (adev->asic_type) {
>   case CHIP_VEGA10:
>   if ((adev->gfx.me_fw_version >= 0x009c) &&

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM status registers

2019-11-05 Thread Koenig, Christian
Am 05.11.19 um 11:21 schrieb Zhu, Changfeng:
> Hi Chris,
>
> Maybe it's better to use amdgpu_ring_emit_reg_wait(ring, reg0, 0, 0); to 
> replace
> amdgpu_ring_emit_reg_wait(ring, reg1, 0, 0); ?

Good point. I've mixed up request and acknowledge register.

Important is that you need 0 as mask and value, or otherwise we could 
potentially wait forever.

Regards,
Christian.

>
> http://ontrack-internal.amd.com/browse/SWDEV-192660
> Jira ticket recommends to read VM_INVALIDATE_ENG*_REQ.
>
> BR,
> Changfeng.
>
> -----Original Message-
> From: Koenig, Christian 
> Sent: Tuesday, November 5, 2019 5:13 PM
> To: Zhu, Changfeng ; amd-gfx@lists.freedesktop.org; 
> Tuikov, Luben ; Huang, Ray ; Huang, 
> Shimmer 
> Subject: Re: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM 
> status registers
>
> Am 05.11.19 um 07:32 schrieb Zhu, Changfeng:
>> From: changzhu 
>>
>> The GRBM register interface is now capable of bursting 1 cycle per
>> register wr->wr, wr->rd much faster than previous muticycle per
>> transaction done interface.  This has caused a problem where status
>> registers requiring HW to update have a 1 cycle delay, due to the
>> register update having to go through GRBM.
>>
>> For cp ucode, it has realized dummy read in cp firmware.It covers the
>> use of WAIT_REG_MEM operation 1 case only.So it needs to call
>> gfx_v10_0_wait_reg_mem in gfx10. Besides it also needs to add warning
>> to update firmware in case firmware is too old to have function to
>> realize dummy read in cp firmware.
>>
>> For sdma ucode, it hasn't realized dummy read in sdma firmware. sdma
>> is moved to gfxhub in gfx10. So it needs to add dummy read in driver
>> between amdgpu_ring_emit_wreg and amdgpu_ring_emit_reg_wait for sdma_v5_0.
> First of all thanks for getting your environment setup properly, we are 
> finally making progress with that issue.
>
> A bunch of nice to have comments below and two major bugs/typos which really 
> needs to be fixed.
>
>> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
>> Signed-off-by: changzhu 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  1 +
>>drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 50 +
>>drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  7 
>>drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  8 ++--
>>drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 13 ++-
>>5 files changed, 73 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> index 459aa9059542..a74ecd449775 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>> @@ -267,6 +267,7 @@ struct amdgpu_gfx {
>>  uint32_tmec2_feature_version;
>>  boolmec_fw_write_wait;
>>  boolme_fw_write_wait;
>> +boolcp_fw_write_wait;
>>  struct amdgpu_ring  gfx_ring[AMDGPU_MAX_GFX_RINGS];
>>  unsignednum_gfx_rings;
>>  struct amdgpu_ring  compute_ring[AMDGPU_MAX_COMPUTE_RINGS];
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 17a5cbfd0024..814764723c26 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -561,6 +561,32 @@ static void gfx_v10_0_free_microcode(struct 
>> amdgpu_device *adev)
>>  kfree(adev->gfx.rlc.register_list_format);
>>}
>>
>> +static void gfx_v10_0_check_fw_write_wait(struct amdgpu_device *adev)
>> +{
>> +adev->gfx.cp_fw_write_wait = false;
>> +
>> +switch (adev->asic_type) {
>> +case CHIP_NAVI10:
>> +case CHIP_NAVI12:
>> +case CHIP_NAVI14:
>> +if ((adev->gfx.me_fw_version >= 0x0046) &&
>> +(adev->gfx.me_feature_version >= 27) &&
>> +(adev->gfx.pfp_fw_version >= 0x0068) &&
>> +(adev->gfx.pfp_feature_version >= 27) &&
>> +(adev->gfx.mec_fw_version >= 0x005b) &&
>> +(adev->gfx.mec_feature_version >= 27))
>> +adev->gfx.cp_fw_write_wait = true;
>> +break;
>> +default:
>> +break;
>> +}
>> +
>> +if (adev->gfx.cp_fw_write_wait == false)
&g

Re: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM status registers

2019-11-05 Thread Koenig, Christian
Am 05.11.19 um 07:32 schrieb Zhu, Changfeng:
> From: changzhu 
>
> The GRBM register interface is now capable of bursting 1 cycle per
> register wr->wr, wr->rd much faster than previous muticycle per
> transaction done interface.  This has caused a problem where
> status registers requiring HW to update have a 1 cycle delay, due
> to the register update having to go through GRBM.
>
> For cp ucode, it has realized dummy read in cp firmware.It covers
> the use of WAIT_REG_MEM operation 1 case only.So it needs to call
> gfx_v10_0_wait_reg_mem in gfx10. Besides it also needs to add warning to
> update firmware in case firmware is too old to have function to realize
> dummy read in cp firmware.
>
> For sdma ucode, it hasn't realized dummy read in sdma firmware. sdma is
> moved to gfxhub in gfx10. So it needs to add dummy read in driver
> between amdgpu_ring_emit_wreg and amdgpu_ring_emit_reg_wait for sdma_v5_0.

First of all thanks for getting your environment setup properly, we are 
finally making progress with that issue.

A bunch of nice to have comments below and two major bugs/typos which 
really needs to be fixed.

>
> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 50 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  7 
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  |  8 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 13 ++-
>   5 files changed, 73 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 459aa9059542..a74ecd449775 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -267,6 +267,7 @@ struct amdgpu_gfx {
>   uint32_tmec2_feature_version;
>   boolmec_fw_write_wait;
>   boolme_fw_write_wait;
> + boolcp_fw_write_wait;
>   struct amdgpu_ring  gfx_ring[AMDGPU_MAX_GFX_RINGS];
>   unsignednum_gfx_rings;
>   struct amdgpu_ring  compute_ring[AMDGPU_MAX_COMPUTE_RINGS];
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 17a5cbfd0024..814764723c26 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -561,6 +561,32 @@ static void gfx_v10_0_free_microcode(struct 
> amdgpu_device *adev)
>   kfree(adev->gfx.rlc.register_list_format);
>   }
>   
> +static void gfx_v10_0_check_fw_write_wait(struct amdgpu_device *adev)
> +{
> + adev->gfx.cp_fw_write_wait = false;
> +
> + switch (adev->asic_type) {
> + case CHIP_NAVI10:
> + case CHIP_NAVI12:
> + case CHIP_NAVI14:
> + if ((adev->gfx.me_fw_version >= 0x0046) &&
> + (adev->gfx.me_feature_version >= 27) &&
> + (adev->gfx.pfp_fw_version >= 0x0068) &&
> + (adev->gfx.pfp_feature_version >= 27) &&
> + (adev->gfx.mec_fw_version >= 0x005b) &&
> + (adev->gfx.mec_feature_version >= 27))
> + adev->gfx.cp_fw_write_wait = true;
> + break;
> + default:
> + break;
> + }
> +
> + if (adev->gfx.cp_fw_write_wait == false)
> + DRM_WARN_ONCE("Warning: check cp_fw_version and update it to 
> realize \
> +   GRBM requires 1-cycle delay in cp 
> firmware\n");
> +}
> +
> +
>   static void gfx_v10_0_init_rlc_ext_microcode(struct amdgpu_device *adev)
>   {
>   const struct rlc_firmware_header_v2_1 *rlc_hdr;
> @@ -4768,6 +4794,28 @@ static void gfx_v10_0_ring_emit_reg_wait(struct 
> amdgpu_ring *ring, uint32_t reg,
>   gfx_v10_0_wait_reg_mem(ring, 0, 0, 0, reg, 0, val, mask, 0x20);
>   }
>   
> +static void gfx_v10_0_ring_emit_reg_write_reg_wait(struct amdgpu_ring *ring,
> +   uint32_t reg0, uint32_t reg1,
> +   uint32_t ref, uint32_t mask)
> +{
> + int usepfp = (ring->funcs->type == AMDGPU_RING_TYPE_GFX);
> + struct amdgpu_device *adev = ring->adev;
> + bool fw_version_ok = false;
> +
> + gfx_v10_0_check_fw_write_wait(adev);
> +
> + if (ring->funcs->type == AMDGPU_RING_TYPE_GFX ||
> + ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)

That check is probably superfluous. A few lines below you are using the 
function in the gfx_v10_0_ring_funcs_gfx and 
gfx_v10_0_ring_funcs_compute, so the ring->funcs->type is always constant.

> + fw_version_ok = adev->gfx.cp_fw_write_wait;
> +
> + if (fw_version_ok)
> + gfx_v10_0_wait_reg_mem(ring, usepfp, 0, 1, reg0, reg1,
> +   ref, mask, 0x20);
> + else
> + 

Re: [PATCH] drm/amdgpu: remove PT BOs when unmapping

2019-10-31 Thread Koenig, Christian
Just tested this and amdgpu_vm_update_ptes() indeed works as expected.

When you free at least a 2MB the lowest level of page tables is freed up again.

BTW: What hardware have you tested this on? On gfx8 and older it is expected 
that page tables are never freed.

Regards,
Christian.

Am 30.10.19 um 19:11 schrieb Christian König:
Then I don't see how this patch actually changes anything.

Could only be a bug in amdgpu_vm_update_ptes(). Going to investigate this, but 
I won't have time to look into the ticket in detail.

Regards,
Christian.

Am 30.10.19 um 19:00 schrieb Huang, JinHuiEric:

Actually I do prevent to remove in-use pts by this:

+   r = amdgpu_vm_remove_ptes(adev, vm,
+   (mapping->start + 0x1ff) & (~0x1ffll),
+   (mapping->last + 1) & (~0x1ffll));

Which is only removing aligned page table for 2M. And I have tested it at least 
on KFD tests without anything broken.

By the way, I am not familiar with memory staff. This patch is the best I can 
do for now. Could you take a look at the Jira ticket SWDEV-201443 ? and find 
the better solution. Thanks!

Regards,

Eric

On 2019-10-30 1:57 p.m., Christian König wrote:
One thing I've forgotten:

What you could maybe do to improve the situation is to join adjacent ranges in 
amdgpu_vm_clear_freed(), but I'm not sure how the chances are that the ranges 
are freed all together.

The only other alternative I can see would be to check the mappings of a range 
in amdgpu_update_ptes() and see if you could walk the tree up if the valid flag 
is not set and there are no mappings left for a page table.

Regards,
Christian.

Am 30.10.19 um 18:42 schrieb Koenig, Christian:
The vaild flag doesn't take effect in this function.
That's irrelevant.

See what amdgpu_vm_update_ptes() does is to first determine the fragment size:
amdgpu_vm_fragment(params, frag_start, end, flags, , _end);

Then we walk down the tree:
amdgpu_vm_pt_start(adev, params->vm, start, );
while (cursor.pfn < end) {

And make sure that the page tables covering the address range are actually 
allocated:
r = amdgpu_vm_alloc_pts(params->adev, params->vm, );

Then we update the tables with the flags and addresses and free up subsequent 
tables in the case of huge pages or freed up areas:
/* Free all child entries */
while (cursor.pfn < frag_start) {
amdgpu_vm_free_pts(adev, params->vm, );
amdgpu_vm_pt_next(adev, );
}

This is the maximum you can free, cause all other page tables are not 
completely covered by the range and so potentially still in use.

And I have the strong suspicion that this is what your patch is actually doing 
wrong. In other words you are also freeing page tables which are only partially 
covered by the range and so potentially still in use.

Since we don't have any tracking how many entries in a page table are currently 
valid and how many are invalid we actually can't implement what you are trying 
to do here. So the patch is definitely somehow broken.

Regards,
Christian.

Am 30.10.19 um 17:55 schrieb Huang, JinHuiEric:

The vaild flag doesn't take effect in this function. amdgpu_vm_alloc_pts() is 
always executed that only depended on "cursor.pfn < end". The valid flag has 
only been checked on here for asic below GMC v9:

if (adev->asic_type < CHIP_VEGA10 &&
(flags & AMDGPU_PTE_VALID))...

Regards,

Eric

On 2019-10-30 12:30 p.m., Koenig, Christian wrote:


Am 30.10.2019 17:19 schrieb "Huang, JinHuiEric" 
<mailto:jinhuieric.hu...@amd.com>:

I tested it that it saves a lot of vram on KFD big buffer stress test. I think 
there are two reasons:

1. Calling amdgpu_vm_update_ptes() during unmapping will allocate unnecessary 
pts, because there is no flag to determine if the VA is mapping or unmapping in 
function
amdgpu_vm_update_ptes(). It saves the most of memory.

That's not correct. The valid flag is used for this.


2. Intentionally removing those unmapping pts is logical expectation, although 
it is not removing so much pts.

Well I actually don't see a change to what update_ptes is doing and have the 
strong suspicion that the patch is simply broken.

You either free page tables which are potentially still in use or update_pte 
doesn't free page tables when the valid but is not set.

Regards,
Christian.




Regards,

Eric

On 2019-10-30 11:57 a.m., Koenig, Christian wrote:


Am 30.10.2019 16:47 schrieb "Kuehling, Felix" 
<mailto:felix.kuehl...@amd.com>:
On 2019-10-30 9:52 a.m., Christian König wrote:
> Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>> The issue is PT BOs are not freed when unmapping VA,
>> which causes vram usage accumulated is huge in some
>> memory stress test, such as kfd big buffer stress 

Re: [PATCH] drm/amdgpu: remove PT BOs when unmapping

2019-10-30 Thread Koenig, Christian
The vaild flag doesn't take effect in this function.
That's irrelevant.

See what amdgpu_vm_update_ptes() does is to first determine the fragment size:
amdgpu_vm_fragment(params, frag_start, end, flags, , _end);

Then we walk down the tree:
amdgpu_vm_pt_start(adev, params->vm, start, );
while (cursor.pfn < end) {

And make sure that the page tables covering the address range are actually 
allocated:
r = amdgpu_vm_alloc_pts(params->adev, params->vm, );

Then we update the tables with the flags and addresses and free up subsequent 
tables in the case of huge pages or freed up areas:
/* Free all child entries */
while (cursor.pfn < frag_start) {
amdgpu_vm_free_pts(adev, params->vm, );
amdgpu_vm_pt_next(adev, );
}

This is the maximum you can free, cause all other page tables are not 
completely covered by the range and so potentially still in use.

And I have the strong suspicion that this is what your patch is actually doing 
wrong. In other words you are also freeing page tables which are only partially 
covered by the range and so potentially still in use.

Since we don't have any tracking how many entries in a page table are currently 
valid and how many are invalid we actually can't implement what you are trying 
to do here. So the patch is definitely somehow broken.

Regards,
Christian.

Am 30.10.19 um 17:55 schrieb Huang, JinHuiEric:

The vaild flag doesn't take effect in this function. amdgpu_vm_alloc_pts() is 
always executed that only depended on "cursor.pfn < end". The valid flag has 
only been checked on here for asic below GMC v9:

if (adev->asic_type < CHIP_VEGA10 &&
(flags & AMDGPU_PTE_VALID))...

Regards,

Eric

On 2019-10-30 12:30 p.m., Koenig, Christian wrote:


Am 30.10.2019 17:19 schrieb "Huang, JinHuiEric" 
<mailto:jinhuieric.hu...@amd.com>:

I tested it that it saves a lot of vram on KFD big buffer stress test. I think 
there are two reasons:

1. Calling amdgpu_vm_update_ptes() during unmapping will allocate unnecessary 
pts, because there is no flag to determine if the VA is mapping or unmapping in 
function
amdgpu_vm_update_ptes(). It saves the most of memory.

That's not correct. The valid flag is used for this.


2. Intentionally removing those unmapping pts is logical expectation, although 
it is not removing so much pts.

Well I actually don't see a change to what update_ptes is doing and have the 
strong suspicion that the patch is simply broken.

You either free page tables which are potentially still in use or update_pte 
doesn't free page tables when the valid but is not set.

Regards,
Christian.




Regards,

Eric

On 2019-10-30 11:57 a.m., Koenig, Christian wrote:


Am 30.10.2019 16:47 schrieb "Kuehling, Felix" 
<mailto:felix.kuehl...@amd.com>:
On 2019-10-30 9:52 a.m., Christian König wrote:
> Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>> The issue is PT BOs are not freed when unmapping VA,
>> which causes vram usage accumulated is huge in some
>> memory stress test, such as kfd big buffer stress test.
>> Function amdgpu_vm_bo_update_mapping() is called by both
>> amdgpu_vm_bo_update() and amdgpu_vm_clear_freed(). The
>> solution is replacing amdgpu_vm_bo_update_mapping() in
>> amdgpu_vm_clear_freed() with removing PT BOs function
>> to save vram usage.
>
> NAK, that is intentional behavior.
>
> Otherwise we can run into out of memory situations when page tables
> need to be allocated again under stress.

That's a bit arbitrary and inconsistent. We are freeing page tables in
other situations, when a mapping uses huge pages in
amdgpu_vm_update_ptes. Why not when a mapping is destroyed completely?

I'm actually a bit surprised that the huge-page handling in
amdgpu_vm_update_ptes isn't kicking in to free up lower-level page
tables when a BO is unmapped.

Well it does free the lower level, and that is already causing problems (that's 
why I added the reserved space).

What we don't do is freeing the higher levels.

E.g. when you free a 2MB BO we free the lowest level, if we free a 1GB BO we 
free the two lowest levels etc...

The problem with freeing the higher levels is that you don't know who is also 
using this. E.g. we would need to check all entries when we unmap one.

It's simply not worth it for a maximum saving of 2MB per VM.

Writing this I'm actually wondering how you ended up in this issue? There 
shouldn't be much savings from this.

Regards,
Christian.


Regards,
   Felix


>
> Regards,
> Christian.
>
>>
>> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924
>> Signed-off-by: Eric Huang 
>> <mailto:jinhuieric.hu...@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56
&

Re: [PATCH] drm/amdgpu: remove PT BOs when unmapping

2019-10-30 Thread Koenig, Christian


Am 30.10.2019 17:19 schrieb "Huang, JinHuiEric" :

I tested it that it saves a lot of vram on KFD big buffer stress test. I think 
there are two reasons:

1. Calling amdgpu_vm_update_ptes() during unmapping will allocate unnecessary 
pts, because there is no flag to determine if the VA is mapping or unmapping in 
function
amdgpu_vm_update_ptes(). It saves the most of memory.

That's not correct. The valid flag is used for this.


2. Intentionally removing those unmapping pts is logical expectation, although 
it is not removing so much pts.

Well I actually don't see a change to what update_ptes is doing and have the 
strong suspicion that the patch is simply broken.

You either free page tables which are potentially still in use or update_pte 
doesn't free page tables when the valid but is not set.

Regards,
Christian.




Regards,

Eric

On 2019-10-30 11:57 a.m., Koenig, Christian wrote:


Am 30.10.2019 16:47 schrieb "Kuehling, Felix" 
<mailto:felix.kuehl...@amd.com>:
On 2019-10-30 9:52 a.m., Christian König wrote:
> Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>> The issue is PT BOs are not freed when unmapping VA,
>> which causes vram usage accumulated is huge in some
>> memory stress test, such as kfd big buffer stress test.
>> Function amdgpu_vm_bo_update_mapping() is called by both
>> amdgpu_vm_bo_update() and amdgpu_vm_clear_freed(). The
>> solution is replacing amdgpu_vm_bo_update_mapping() in
>> amdgpu_vm_clear_freed() with removing PT BOs function
>> to save vram usage.
>
> NAK, that is intentional behavior.
>
> Otherwise we can run into out of memory situations when page tables
> need to be allocated again under stress.

That's a bit arbitrary and inconsistent. We are freeing page tables in
other situations, when a mapping uses huge pages in
amdgpu_vm_update_ptes. Why not when a mapping is destroyed completely?

I'm actually a bit surprised that the huge-page handling in
amdgpu_vm_update_ptes isn't kicking in to free up lower-level page
tables when a BO is unmapped.

Well it does free the lower level, and that is already causing problems (that's 
why I added the reserved space).

What we don't do is freeing the higher levels.

E.g. when you free a 2MB BO we free the lowest level, if we free a 1GB BO we 
free the two lowest levels etc...

The problem with freeing the higher levels is that you don't know who is also 
using this. E.g. we would need to check all entries when we unmap one.

It's simply not worth it for a maximum saving of 2MB per VM.

Writing this I'm actually wondering how you ended up in this issue? There 
shouldn't be much savings from this.

Regards,
Christian.


Regards,
   Felix


>
> Regards,
> Christian.
>
>>
>> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924
>> Signed-off-by: Eric Huang 
>> <mailto:jinhuieric.hu...@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56
>> +-
>>   1 file changed, 48 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 0f4c3b2..8a480c7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1930,6 +1930,51 @@ static void amdgpu_vm_prt_fini(struct
>> amdgpu_device *adev, struct amdgpu_vm *vm)
>>   }
>> /**
>> + * amdgpu_vm_remove_ptes - free PT BOs
>> + *
>> + * @adev: amdgpu device structure
>> + * @vm: amdgpu vm structure
>> + * @start: start of mapped range
>> + * @end: end of mapped entry
>> + *
>> + * Free the page table level.
>> + */
>> +static int amdgpu_vm_remove_ptes(struct amdgpu_device *adev,
>> +struct amdgpu_vm *vm, uint64_t start, uint64_t end)
>> +{
>> +struct amdgpu_vm_pt_cursor cursor;
>> +unsigned shift, num_entries;
>> +
>> +amdgpu_vm_pt_start(adev, vm, start, );
>> +while (cursor.level < AMDGPU_VM_PTB) {
>> +if (!amdgpu_vm_pt_descendant(adev, ))
>> +return -ENOENT;
>> +}
>> +
>> +while (cursor.pfn < end) {
>> +amdgpu_vm_free_table(cursor.entry);
>> +num_entries = amdgpu_vm_num_entries(adev, cursor.level - 1);
>> +
>> +if (cursor.entry != >entries[num_entries - 1]) {
>> +/* Next ptb entry */
>> +shift = amdgpu_vm_level_shift(adev, cursor.level - 1);
>> +cursor.pfn += 1ULL << shift;
>> +cursor.pfn &= ~((1ULL << shift) - 1);
>> +cursor.entry++;
>> +} else {
>> +/* Next ptb entry in next pd0 entry */
>> +amdgpu_vm_pt_ancestor()

Re: [PATCH] drm/amdgpu: remove PT BOs when unmapping

2019-10-30 Thread Koenig, Christian


Am 30.10.2019 16:47 schrieb "Kuehling, Felix" :
On 2019-10-30 9:52 a.m., Christian König wrote:
> Am 29.10.19 um 21:06 schrieb Huang, JinHuiEric:
>> The issue is PT BOs are not freed when unmapping VA,
>> which causes vram usage accumulated is huge in some
>> memory stress test, such as kfd big buffer stress test.
>> Function amdgpu_vm_bo_update_mapping() is called by both
>> amdgpu_vm_bo_update() and amdgpu_vm_clear_freed(). The
>> solution is replacing amdgpu_vm_bo_update_mapping() in
>> amdgpu_vm_clear_freed() with removing PT BOs function
>> to save vram usage.
>
> NAK, that is intentional behavior.
>
> Otherwise we can run into out of memory situations when page tables
> need to be allocated again under stress.

That's a bit arbitrary and inconsistent. We are freeing page tables in
other situations, when a mapping uses huge pages in
amdgpu_vm_update_ptes. Why not when a mapping is destroyed completely?

I'm actually a bit surprised that the huge-page handling in
amdgpu_vm_update_ptes isn't kicking in to free up lower-level page
tables when a BO is unmapped.

Well it does free the lower level, and that is already causing problems (that's 
why I added the reserved space).

What we don't do is freeing the higher levels.

E.g. when you free a 2MB BO we free the lowest level, if we free a 1GB BO we 
free the two lowest levels etc...

The problem with freeing the higher levels is that you don't know who is also 
using this. E.g. we would need to check all entries when we unmap one.

It's simply not worth it for a maximum saving of 2MB per VM.

Writing this I'm actually wondering how you ended up in this issue? There 
shouldn't be much savings from this.

Regards,
Christian.


Regards,
   Felix


>
> Regards,
> Christian.
>
>>
>> Change-Id: Ic24e35bff8ca85265b418a642373f189d972a924
>> Signed-off-by: Eric Huang 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 56
>> +-
>>   1 file changed, 48 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 0f4c3b2..8a480c7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1930,6 +1930,51 @@ static void amdgpu_vm_prt_fini(struct
>> amdgpu_device *adev, struct amdgpu_vm *vm)
>>   }
>> /**
>> + * amdgpu_vm_remove_ptes - free PT BOs
>> + *
>> + * @adev: amdgpu device structure
>> + * @vm: amdgpu vm structure
>> + * @start: start of mapped range
>> + * @end: end of mapped entry
>> + *
>> + * Free the page table level.
>> + */
>> +static int amdgpu_vm_remove_ptes(struct amdgpu_device *adev,
>> +struct amdgpu_vm *vm, uint64_t start, uint64_t end)
>> +{
>> +struct amdgpu_vm_pt_cursor cursor;
>> +unsigned shift, num_entries;
>> +
>> +amdgpu_vm_pt_start(adev, vm, start, );
>> +while (cursor.level < AMDGPU_VM_PTB) {
>> +if (!amdgpu_vm_pt_descendant(adev, ))
>> +return -ENOENT;
>> +}
>> +
>> +while (cursor.pfn < end) {
>> +amdgpu_vm_free_table(cursor.entry);
>> +num_entries = amdgpu_vm_num_entries(adev, cursor.level - 1);
>> +
>> +if (cursor.entry != >entries[num_entries - 1]) {
>> +/* Next ptb entry */
>> +shift = amdgpu_vm_level_shift(adev, cursor.level - 1);
>> +cursor.pfn += 1ULL << shift;
>> +cursor.pfn &= ~((1ULL << shift) - 1);
>> +cursor.entry++;
>> +} else {
>> +/* Next ptb entry in next pd0 entry */
>> +amdgpu_vm_pt_ancestor();
>> +shift = amdgpu_vm_level_shift(adev, cursor.level - 1);
>> +cursor.pfn += 1ULL << shift;
>> +cursor.pfn &= ~((1ULL << shift) - 1);
>> +amdgpu_vm_pt_descendant(adev, );
>> +}
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +/**
>>* amdgpu_vm_clear_freed - clear freed BOs in the PT
>>*
>>* @adev: amdgpu_device pointer
>> @@ -1949,7 +1994,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device
>> *adev,
>> struct dma_fence **fence)
>>   {
>>   struct amdgpu_bo_va_mapping *mapping;
>> -uint64_t init_pte_value = 0;
>>   struct dma_fence *f = NULL;
>>   int r;
>>   @@ -1958,13 +2002,10 @@ int amdgpu_vm_clear_freed(struct
>> amdgpu_device *adev,
>>   struct amdgpu_bo_va_mapping, list);
>>   list_del(>list);
>>   -if (vm->pte_support_ats &&
>> -mapping->start < AMDGPU_GMC_HOLE_START)
>> -init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
>> +r = amdgpu_vm_remove_ptes(adev, vm,
>> +(mapping->start + 0x1ff) & (~0x1ffll),
>> +(mapping->last + 1) & (~0x1ffll));
>>   -r = amdgpu_vm_bo_update_mapping(adev, vm, false, NULL,
>> -mapping->start, mapping->last,
>> -init_pte_value, 0, NULL, );
>>   amdgpu_vm_free_mapping(adev, vm, mapping, f);
>>   if (r) {
>>  

Re: [PATCH] drm/amdgpu: guard ib scheduling while in reset

2019-10-30 Thread Koenig, Christian
Yeah, and exactly that's the problem :) You need a global lock covering 
all schedulers.

Otherwise you end up in hell's kitchen again with taking all those locks 
in the right order.

Christian.

Am 30.10.19 um 15:56 schrieb Grodzovsky, Andrey:
> Can you elaborate on what is the tricky part with the lock ? I assumed
> we just use per scheduler lock.
>
> Andrey
>
> On 10/30/19 10:50 AM, Christian König wrote:
>> A lock inside the scheduler is rather tricky to implement.
>>
>> What you need to do is to get rid of the park()/unpark() hack in
>> drm_sched_entity_fini().
>>
>> We could do this with a struct completion or convert the scheduler
>> from a thread to a work item.
>>
>> Regards,
>> Christian.
>>
>> Am 30.10.19 um 15:44 schrieb Grodzovsky, Andrey:
>>> That good  as proof of RCA but I still think we should grab a dedicated
>>> lock inside scheduler since the race is internal to scheduler code so
>>> this better to handle it inside the scheduler code to make the fix apply
>>> for all drivers using it.
>>>
>>> Andrey
>>>
>>> On 10/30/19 4:44 AM, S, Shirish wrote:
>>> We still have it and isn't doing kthread_park()/unpark() from
>>> drm_sched_entity_fini while GPU reset in progress defeats all the
>>> purpose of drm_sched_stop->kthread_park ? If
>>> drm_sched_entity_fini-> kthread_unpark happens AFTER
>>> drm_sched_stop->kthread_park nothing prevents from another (third)
>>> thread keep submitting job to HW which will be picked up by the
>>> unparked scheduler thread try to submit to HW but fail because the
>>> HW ring is deactivated.
>>>
>>> If so maybe we should serialize calls to
>>> kthread_park/unpark(sched->thread) ?
>>>
>> Yeah, that was my thinking as well. Probably best to just grab the
>> reset lock before calling drm_sched_entity_fini().
> Shirish - please try locking >lock_reset around calls to
> drm_sched_entity_fini as Christian suggests and see if this actually
> helps the issue.
>
 Yes that also works.

 Regards,

>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: dont schedule jobs while in reset

2019-10-30 Thread Koenig, Christian
Am 30.10.19 um 10:13 schrieb S, Shirish:
> [Why]
>
> doing kthread_park()/unpark() from drm_sched_entity_fini
> while GPU reset is in progress defeats all the purpose of
> drm_sched_stop->kthread_park.
> If drm_sched_entity_fini->kthread_unpark() happens AFTER
> drm_sched_stop->kthread_park nothing prevents from another
> (third) thread to keep submitting job to HW which will be
> picked up by the unparked scheduler thread and try to submit
> to HW but fail because the HW ring is deactivated.
>
> [How]
> grab the reset lock before calling drm_sched_entity_fini()
>
> Signed-off-by: Shirish S 
> Suggested-by: Christian König 

Patch itself is Reviewed-by: Christian König 

Does that also fix the problems you have been seeing?

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 6614d8a..2cdaf3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -604,8 +604,11 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
> *mgr)
>   continue;
>   }
>   
> - for (i = 0; i < num_entities; i++)
> + for (i = 0; i < num_entities; i++) {
> + mutex_lock(>adev->lock_reset);
>   drm_sched_entity_fini(>entities[0][i].entity);
> + mutex_unlock(>adev->lock_reset);
> + }
>   }
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 13/15] drm/amdgpu: Use mmu_range_insert instead of hmm_mirror

2019-10-29 Thread Koenig, Christian
Am 28.10.19 um 21:10 schrieb Jason Gunthorpe:
> From: Jason Gunthorpe 
>
> Remove the interval tree in the driver and rely on the tree maintained by
> the mmu_notifier for delivering mmu_notifier invalidation callbacks.
>
> For some reason amdgpu has a very complicated arrangement where it tries
> to prevent duplicate entries in the interval_tree, this is not necessary,
> each amdgpu_bo can be its own stand alone entry. interval_tree already
> allows duplicates and overlaps in the tree.
>
> Also, there is no need to remove entries upon a release callback, the
> mmu_range API safely allows objects to remain registered beyond the
> lifetime of the mm. The driver only has to stop touching the pages during
> release.
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: David (ChunMing) Zhou 
> Cc: amd-gfx@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   2 +
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c| 341 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|  13 +-
>   6 files changed, 84 insertions(+), 282 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index bd37df5dd6d048..60591a5d420021 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1006,6 +1006,8 @@ struct amdgpu_device {
>   struct mutex  lock_reset;
>   struct amdgpu_doorbell_index doorbell_index;
>   
> + struct mutexnotifier_lock;
> +
>   int asic_reset_res;
>   struct work_struct  xgmi_reset_work;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 6d021ecc8d598f..47700302a08b7f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -481,8 +481,7 @@ static void remove_kgd_mem_from_kfd_bo_list(struct 
> kgd_mem *mem,
>*
>* Returns 0 for success, negative errno for errors.
>*/
> -static int init_user_pages(struct kgd_mem *mem, struct mm_struct *mm,
> -uint64_t user_addr)
> +static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
>   {
>   struct amdkfd_process_info *process_info = mem->process_info;
>   struct amdgpu_bo *bo = mem->bo;
> @@ -1195,7 +1194,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, user_addr);
>   
>   if (user_addr) {
> - ret = init_user_pages(*mem, current->mm, user_addr);
> + ret = init_user_pages(*mem, user_addr);
>   if (ret)
>   goto allocate_init_user_pages_failed;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5a1939dbd4e3e6..38f97998aaddb2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2633,6 +2633,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   mutex_init(>virt.vf_errors.lock);
>   hash_init(adev->mn_hash);
>   mutex_init(>lock_reset);
> + mutex_init(>notifier_lock);
>   mutex_init(>virt.dpm_mutex);
>   mutex_init(>psp.mutex);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index 31d4deb5d29484..4ffd7b90f4d907 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -50,66 +50,6 @@
>   #include "amdgpu.h"
>   #include "amdgpu_amdkfd.h"
>   
> -/**
> - * struct amdgpu_mn_node
> - *
> - * @it: interval node defining start-last of the affected address range
> - * @bos: list of all BOs in the affected address range
> - *
> - * Manages all BOs which are affected of a certain range of address space.
> - */
> -struct amdgpu_mn_node {
> - struct interval_tree_node   it;
> - struct list_headbos;
> -};
> -
> -/**
> - * amdgpu_mn_destroy - destroy the HMM mirror
> - *
> - * @work: previously sheduled work item
> - *
> - * Lazy destroys the notifier from a work item
> - */
> -static void amdgpu_mn_destroy(struct work_struct *work)
> -{
> - struct amdgpu_mn *amn = container_of(work, struct amdgpu_mn, work);
> - struct amdgpu_device *adev = amn->adev;
> - struct amdgpu_mn_node *node, *next_node;
> - struct amdgpu_bo *bo, *next_bo;
> -
> - mutex_lock(>mn_lock);
> - down_write(>lock);
> - hash_del(>node);
> - rbtree_postorder_for_each_entry_safe(node, next_node,
> -  >objects.rb_root, it.rb) {
> - list_for_each_entry_safe(bo, next_bo, >bos, mn_list) {
> - bo->mn = NULL;
> -   

Re: [PATCH v2 12/15] drm/amdgpu: Call find_vma under mmap_sem

2019-10-29 Thread Koenig, Christian
Am 28.10.19 um 21:10 schrieb Jason Gunthorpe:
> From: Jason Gunthorpe 
>
> find_vma() must be called under the mmap_sem, reorganize this code to
> do the vma check after entering the lock.
>
> Further, fix the unlocked use of struct task_struct's mm, instead use
> the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
> must be converted to a mm_get before acquiring mmap_sem or calling
> find_vma().
>
> Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
> Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in 
> worker threads")
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: David (ChunMing) Zhou 
> Cc: amd-gfx@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe 

Acked-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 37 ++---
>   1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index dff41d0a85fe96..c0e41f1f0c2365 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -35,6 +35,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -788,7 +789,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   struct hmm_mirror *mirror = bo->mn ? >mn->mirror : NULL;
>   struct ttm_tt *ttm = bo->tbo.ttm;
>   struct amdgpu_ttm_tt *gtt = (void *)ttm;
> - struct mm_struct *mm = gtt->usertask->mm;
> + struct mm_struct *mm;
>   unsigned long start = gtt->userptr;
>   struct vm_area_struct *vma;
>   struct hmm_range *range;
> @@ -796,25 +797,14 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   uint64_t *pfns;
>   int r = 0;
>   
> - if (!mm) /* Happens during process shutdown */
> - return -ESRCH;
> -
>   if (unlikely(!mirror)) {
>   DRM_DEBUG_DRIVER("Failed to get hmm_mirror\n");
> - r = -EFAULT;
> - goto out;
> + return -EFAULT;
>   }
>   
> - vma = find_vma(mm, start);
> - if (unlikely(!vma || start < vma->vm_start)) {
> - r = -EFAULT;
> - goto out;
> - }
> - if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> - vma->vm_file)) {
> - r = -EPERM;
> - goto out;
> - }
> + mm = mirror->hmm->mmu_notifier.mm;
> + if (!mmget_not_zero(mm)) /* Happens during process shutdown */
> + return -ESRCH;
>   
>   range = kzalloc(sizeof(*range), GFP_KERNEL);
>   if (unlikely(!range)) {
> @@ -847,6 +837,17 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT);
>   
>   down_read(>mmap_sem);
> + vma = find_vma(mm, start);
> + if (unlikely(!vma || start < vma->vm_start)) {
> + r = -EFAULT;
> + goto out_unlock;
> + }
> + if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> + vma->vm_file)) {
> + r = -EPERM;
> + goto out_unlock;
> + }
> +
>   r = hmm_range_fault(range, 0);
>   up_read(>mmap_sem);
>   
> @@ -865,15 +866,19 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   }
>   
>   gtt->range = range;
> + mmput(mm);
>   
>   return 0;
>   
> +out_unlock:
> + up_read(>mmap_sem);
>   out_free_pfns:
>   hmm_range_unregister(range);
>   kvfree(pfns);
>   out_free_ranges:
>   kfree(range);
>   out:
> + mmput(mm);
>   return r;
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 07/15] drm/radeon: use mmu_range_notifier_insert

2019-10-29 Thread Koenig, Christian
Am 28.10.19 um 21:10 schrieb Jason Gunthorpe:
> From: Jason Gunthorpe 
>
> The new API is an exact match for the needs of radeon.
>
> For some reason radeon tries to remove overlapping ranges from the
> interval tree, but interval trees (and mmu_range_notifier_insert)
> support overlapping ranges directly. Simply delete all this code.
>
> Since this driver is missing a invalidate_range_end callback, but
> still calls get_user_pages(), it cannot be correct against all races.
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: David (ChunMing) Zhou 
> Cc: amd-gfx@lists.freedesktop.org
> Cc: Petr Cvek 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/radeon/radeon.h|   9 +-
>   drivers/gpu/drm/radeon/radeon_mn.c | 219 ++---
>   2 files changed, 52 insertions(+), 176 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index d59b004f669583..27959f3ace1152 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -68,6 +68,10 @@
>   #include 
>   #include 
>   
> +#ifdef CONFIG_MMU_NOTIFIER
> +#include 
> +#endif
> +
>   #include 
>   #include 
>   #include 
> @@ -509,8 +513,9 @@ struct radeon_bo {
>   struct ttm_bo_kmap_obj  dma_buf_vmap;
>   pid_t   pid;
>   
> - struct radeon_mn*mn;
> - struct list_headmn_list;
> +#ifdef CONFIG_MMU_NOTIFIER
> + struct mmu_range_notifier   notifier;
> +#endif
>   };
>   #define gem_to_radeon_bo(gobj) container_of((gobj), struct radeon_bo, 
> tbo.base)
>   
> diff --git a/drivers/gpu/drm/radeon/radeon_mn.c 
> b/drivers/gpu/drm/radeon/radeon_mn.c
> index dbab9a3a969b9e..d3d41e20a64922 100644
> --- a/drivers/gpu/drm/radeon/radeon_mn.c
> +++ b/drivers/gpu/drm/radeon/radeon_mn.c
> @@ -36,131 +36,51 @@
>   
>   #include "radeon.h"
>   
> -struct radeon_mn {
> - struct mmu_notifier mn;
> -
> - /* objects protected by lock */
> - struct mutexlock;
> - struct rb_root_cached   objects;
> -};
> -
> -struct radeon_mn_node {
> - struct interval_tree_node   it;
> - struct list_headbos;
> -};
> -
>   /**
> - * radeon_mn_invalidate_range_start - callback to notify about mm change
> + * radeon_mn_invalidate - callback to notify about mm change
>*
>* @mn: our notifier
> - * @mn: the mm this callback is about
> - * @start: start of updated range
> - * @end: end of updated range
> + * @range: the VMA under invalidation
>*
>* We block for all BOs between start and end to be idle and
>* unmap them by move them into system domain again.
>*/
> -static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
> - const struct mmu_notifier_range *range)
> +static bool radeon_mn_invalidate(struct mmu_range_notifier *mn,
> +  const struct mmu_notifier_range *range,
> +  unsigned long cur_seq)
>   {
> - struct radeon_mn *rmn = container_of(mn, struct radeon_mn, mn);
> + struct radeon_bo *bo = container_of(mn, struct radeon_bo, notifier);
>   struct ttm_operation_ctx ctx = { false, false };
> - struct interval_tree_node *it;
> - unsigned long end;
> - int ret = 0;
> -
> - /* notification is exclusive, but interval is inclusive */
> - end = range->end - 1;
> -
> - /* TODO we should be able to split locking for interval tree and
> -  * the tear down.
> -  */
> - if (mmu_notifier_range_blockable(range))
> - mutex_lock(>lock);
> - else if (!mutex_trylock(>lock))
> - return -EAGAIN;
> -
> - it = interval_tree_iter_first(>objects, range->start, end);
> - while (it) {
> - struct radeon_mn_node *node;
> - struct radeon_bo *bo;
> - long r;
> -
> - if (!mmu_notifier_range_blockable(range)) {
> - ret = -EAGAIN;
> - goto out_unlock;
> - }
> -
> - node = container_of(it, struct radeon_mn_node, it);
> - it = interval_tree_iter_next(it, range->start, end);
> + long r;
>   
> - list_for_each_entry(bo, >bos, mn_list) {
> + if (!bo->tbo.ttm || bo->tbo.ttm->state != tt_bound)
> + return true;
>   
> - if (!bo->tbo.ttm || bo->tbo.ttm->state != tt_bound)
> - continue;
> + if (!mmu_notifier_range_blockable(range))
> + return false;
>   
> - r = radeon_bo_reserve(bo, true);
> - if (r) {
> - DRM_ERROR("(%ld) failed to reserve user bo\n", 
> r);
> - continue;
> - }
> -
> - r = dma_resv_wait_timeout_rcu(bo->tbo.base.resv,
> - true, false, MAX_SCHEDULE_TIMEOUT);
> -  

RE: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay

2019-10-28 Thread Koenig, Christian
I think we should implement the write/wait combined command in gfx10.

Did we ever released any firmware which couldn't do this?

Christian.

Am 28.10.2019 13:07 schrieb "Zhu, Changfeng" :
Hi Christian,

Should we also realize the function of gfx_v9_0_wait_reg_mem in gfx10 like gfx9 
since gfx10 also realize write/wait command in a single packet after CL#1761300?

Or we can add dummy read in gmc10 by using emit_wait like Luben's way?

BR,
Changfeng.

-Original Message-----
From: Koenig, Christian 
Sent: Monday, October 28, 2019 6:47 PM
To: Zhu, Changfeng ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Pelloux-prayer, Pierre-eric 
; Huang, Ray ; Tuikov, 
Luben 
Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay

Hi Changfeng,

> So how can we deal with the firmware between mec version(402) and mec 
> version(421)?
Well of hand I see only two options: Either print a warning or completely 
reject loading the driver.

Completely rejecting loading the driver is probably not a good idea and the 
issue is actually extremely unlikely to cause any problems.

So printing a warning that the user should update their firmware is probably 
the best approach.

Regards,
Christian.

Am 28.10.19 um 04:01 schrieb Zhu, Changfeng:
> Hi Christian,
>
> Re- that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered 
> registers (like the semaphore ones).
>
> Do you mean that I should use reg_wait registers(wait_reg_mem) like Luben to 
> replace read triggered registers for adding dummy read?
>
> Re-Additional to that it will never work on GFX9, since the CP firmware there 
> uses the integrated write/wait command and you can't add an additional dummy 
> read there.
>
> Yes, I see the integrated write/wait command and they are realized in 
> gfx_v9_0_wait_reg_mem:
> Emily's patch:
> drm/amdgpu: Remove the sriov checking and add firmware checking
> decides when to go into gfx_v9_0_wait_reg_mem and when go into 
> amdgpu_ring_emit_reg_write_reg_wait_helper.
>
> However there are two problems now.
> 1.Before the fw_version_ok fw version, the code goes into 
> amdgpu_ring_emit_reg_write_reg_wait_helper. In this case, should not we add 
> dummy read in amdgpu_ring_emit_reg_write_reg_wait_helper?
> 2.After the fw_version_ok fw version, the code goes into 
> gfx_v9_0_wait_reg_mem. However, it realizes write/wait command in firmware. 
> Then how can we add this dummy read? According to Yang,Zilong, the CP 
> firmware has realized dummy in firmware in CL:
> Vega20 CL#1762470 @3/27/2019
> Navi10 CL#1761300 @3/25/2019
> Accodring to CL#1762470,
> The firmware which realized dummy read is(Raven for example):
> Mec version:
> #define F32_MEC_UCODE_VERSION "#421"
> #define F32_MEC_FEATURE_VERSION 46
> Pfp version:
> #define F32_PFP_UCODE_VERSION "#183"
> #define F32_PFP_FEATURE_VERSION 46
> In Emily's patch:
> The CP firmware whichuses the integrated write/wait command begins from 
> version:
> +   case CHIP_RAVEN:
> +   if ((adev->gfx.me_fw_version >= 0x009c) &&
> +   (adev->gfx.me_feature_version >= 42) &&
> +   (adev->gfx.pfp_fw_version >=  0x00b1(177)) &&
> +   (adev->gfx.pfp_feature_version >= 42))
> +   adev->gfx.me_fw_write_wait = true;
> +
> +   if ((adev->gfx.mec_fw_version >=  0x0192(402)) &&
> +   (adev->gfx.mec_feature_version >= 42))
> +   adev->gfx.mec_fw_write_wait = true;
> +   break;
>
> So how can we deal with the firmware between mec version(402) and mec 
> version(421)?
> It will realize write/wait command in CP firmware but it doesn't have dummy 
> read.
>
> BR,
> Changfeng.
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Friday, October 25, 2019 11:54 PM
> To: Zhu, Changfeng ;
> amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Pelloux-prayer,
> Pierre-eric ; Huang, Ray
> ; Tuikov, Luben 
> Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle
> delay
>
> Hi Changfeng,
>
> that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered 
> registers (like the semaphore ones).
>
> Additional to that it will never work on GFX9, since the CP firmware there 
> uses the integrated write/wait command and you can't add an additional dummy 
> read there.
>
> Regards,
> Christian.
>
> Am 25.10.19 um 16:22 schrieb Zhu, Changfeng:
>> I try to write a patch based on the patch of Tuikov,Luben.
>>
>> Inspired by Luben,here is t

Re: [PATCH] drm/amdgpu: simplify padding calculations (v2)

2019-10-28 Thread Koenig, Christian
Am 26.10.19 um 00:41 schrieb Tuikov, Luben:
> Simplify padding calculations.
>
> v2: Comment update and spacing.
>
> Signed-off-by: Luben Tuikov 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 17 -
>   5 files changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> index c45304f1047c..4af9acc2dc4f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> @@ -228,7 +228,7 @@ static void cik_sdma_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   u32 extra_bits = vmid & 0xf;
>   
>   /* IB packet must end on a 8 DW boundary */
> - cik_sdma_ring_insert_nop(ring, (12 - (lower_32_bits(ring->wptr) & 7)) % 
> 8);
> + cik_sdma_ring_insert_nop(ring, (4 - lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_INDIRECT_BUFFER, 0, 
> extra_bits));
>   amdgpu_ring_write(ring, ib->gpu_addr & 0xffe0); /* base must be 32 
> byte aligned */
> @@ -811,7 +811,7 @@ static void cik_sdma_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib)
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && sdma->burst_nop && (i == 0))
>   ib->ptr[ib->length_dw++] =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> index a10175838013..b6af67f6f214 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> @@ -255,7 +255,7 @@ static void sdma_v2_4_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>   
>   /* IB packet must end on a 8 DW boundary */
> - sdma_v2_4_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
> % 8);
> + sdma_v2_4_ring_insert_nop(ring, (2 - lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
> SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
> @@ -750,7 +750,7 @@ static void sdma_v2_4_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && sdma->burst_nop && (i == 0))
>   ib->ptr[ib->length_dw++] =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index 5f4e2c616241..cd3ebed46d05 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -429,7 +429,7 @@ static void sdma_v3_0_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>   
>   /* IB packet must end on a 8 DW boundary */
> - sdma_v3_0_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
> % 8);
> + sdma_v3_0_ring_insert_nop(ring, (2 - lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
> SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
> @@ -1021,7 +1021,7 @@ static void sdma_v3_0_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && sdma->burst_nop && (i == 0))
>   ib->ptr[ib->length_dw++] =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 45bd538ba97e..8ce15056ee4f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -698,7 +698,7 @@ static void sdma_v4_0_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>   
>   /* IB packet must end on a 8 DW boundary */
> - sdma_v4_0_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
> % 8);
> + sdma_v4_0_ring_insert_nop(ring, (2 - lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
> SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
> @@ -1580,7 +1580,7 @@ static void sdma_v4_0_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && 

Re: [PATCH] drm/amdgpu: GFX10: GRBM requires 1-cycle delay (v2)

2019-10-28 Thread Koenig, Christian
Am 26.10.19 um 00:59 schrieb Tuikov, Luben:
> The GRBM interface is now capable of bursting
> 1-cycle op per register, a WRITE followed by
> another WRITE, or a WRITE followed by a READ--much
> faster than previous muti-cycle per
> completed-transaction interface. This causes a
> problem, whereby status registers requiring a
> read/write by hardware, have a 1-cycle delay, due
> to the register update having to go through GRBM
> interface.
>
> This patch adds this delay.
>
> A one cycle read op is added after updating the
> invalidate request and before reading the
> invalidate-ACK status.
>
> See also commit
> 534991731cb5fa94b5519957646cf849ca10d17d.
>
> v2: Remove GFX9 and apply only to SDMA ring.
>
> Signed-off-by: Luben Tuikov 
> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 +++
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 2 +-
>   2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 6e1b25bd1fe7..dedd7e1ab2fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -346,6 +346,13 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct 
> amdgpu_ring *ring,
>   
>   amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);
>   
> + /* Insert a dummy read to delay one cycle after the write REQ,
> +  * and before the ACK inquiry.
> +  */
> + if (ring->funcs->type == AMDGPU_RING_TYPE_SDMA)
> + amdgpu_ring_emit_reg_wait(ring,
> +   hub->vm_inv_eng0_req + eng, 0, 0);
> +
>   /* wait for the invalidate to complete */
>   amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
> 1 << vmid, 1 << vmid);

Looks like we indeed doesn't use the integrated write/wait CP command on 
Navi.

Anyway that is a completely separate issue, this patch is Reviewed-by: 
Christian König 

Regards,
Christian.

> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index b8fdb192f6d6..0c41b4fdc58b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -1588,7 +1588,7 @@ static const struct amdgpu_ring_funcs 
> sdma_v5_0_ring_funcs = {
>   6 + /* sdma_v5_0_ring_emit_pipeline_sync */
>   /* sdma_v5_0_ring_emit_vm_flush */
>   SOC15_FLUSH_GPU_TLB_NUM_WREG * 3 +
> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 6 +
> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 6 * 2 +
>   10 + 10 + 10, /* sdma_v5_0_ring_emit_fence x3 for user fence, 
> vm fence */
>   .emit_ib_size = 7 + 6, /* sdma_v5_0_ring_emit_ib */
>   .emit_ib = sdma_v5_0_ring_emit_ib,

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay

2019-10-28 Thread Koenig, Christian
Hi Changfeng,

> So how can we deal with the firmware between mec version(402) and mec 
> version(421)?
Well of hand I see only two options: Either print a warning or 
completely reject loading the driver.

Completely rejecting loading the driver is probably not a good idea and 
the issue is actually extremely unlikely to cause any problems.

So printing a warning that the user should update their firmware is 
probably the best approach.

Regards,
Christian.

Am 28.10.19 um 04:01 schrieb Zhu, Changfeng:
> Hi Christian,
>
> Re- that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered 
> registers (like the semaphore ones).
>
> Do you mean that I should use reg_wait registers(wait_reg_mem) like Luben to 
> replace read triggered registers for adding dummy read?
>
> Re-Additional to that it will never work on GFX9, since the CP firmware there 
> uses the integrated write/wait command and you can't add an additional dummy 
> read there.
>
> Yes, I see the integrated write/wait command and they are realized in 
> gfx_v9_0_wait_reg_mem:
> Emily's patch:
> drm/amdgpu: Remove the sriov checking and add firmware checking
> decides when to go into gfx_v9_0_wait_reg_mem and when go into 
> amdgpu_ring_emit_reg_write_reg_wait_helper.
>
> However there are two problems now.
> 1.Before the fw_version_ok fw version, the code goes into 
> amdgpu_ring_emit_reg_write_reg_wait_helper. In this case, should not we add 
> dummy read in amdgpu_ring_emit_reg_write_reg_wait_helper?
> 2.After the fw_version_ok fw version, the code goes into 
> gfx_v9_0_wait_reg_mem. However, it realizes write/wait command in firmware. 
> Then how can we add this dummy read? According to Yang,Zilong, the CP 
> firmware has realized dummy in firmware in CL:
> Vega20 CL#1762470 @3/27/2019
> Navi10 CL#1761300 @3/25/2019
> Accodring to CL#1762470,
> The firmware which realized dummy read is(Raven for example):
> Mec version:
> #define F32_MEC_UCODE_VERSION "#421"
> #define F32_MEC_FEATURE_VERSION 46
> Pfp version:
> #define F32_PFP_UCODE_VERSION "#183"
> #define F32_PFP_FEATURE_VERSION 46
> In Emily's patch:
> The CP firmware whichuses the integrated write/wait command begins from 
> version:
> +   case CHIP_RAVEN:
> +   if ((adev->gfx.me_fw_version >= 0x009c) &&
> +   (adev->gfx.me_feature_version >= 42) &&
> +   (adev->gfx.pfp_fw_version >=  0x00b1(177)) &&
> +   (adev->gfx.pfp_feature_version >= 42))
> +   adev->gfx.me_fw_write_wait = true;
> +
> +   if ((adev->gfx.mec_fw_version >=  0x0192(402)) &&
> +   (adev->gfx.mec_feature_version >= 42))
> +   adev->gfx.mec_fw_write_wait = true;
> +       break;
>
> So how can we deal with the firmware between mec version(402) and mec 
> version(421)?
> It will realize write/wait command in CP firmware but it doesn't have dummy 
> read.
>
> BR,
> Changfeng.
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Friday, October 25, 2019 11:54 PM
> To: Zhu, Changfeng ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Pelloux-prayer, 
> Pierre-eric ; Huang, Ray 
> ; Tuikov, Luben 
> Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay
>
> Hi Changfeng,
>
> that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered 
> registers (like the semaphore ones).
>
> Additional to that it will never work on GFX9, since the CP firmware there 
> uses the integrated write/wait command and you can't add an additional dummy 
> read there.
>
> Regards,
> Christian.
>
> Am 25.10.19 um 16:22 schrieb Zhu, Changfeng:
>> I try to write a patch based on the patch of Tuikov,Luben.
>>
>> Inspired by Luben,here is the patch:
>>
>>   From 1980d8f1ed44fb9a84a5ea1f6e2edd2bc25c629a Mon Sep 17 00:00:00
>> 2001
>> From: changzhu 
>> Date: Thu, 10 Oct 2019 11:02:33 +0800
>> Subject: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM status
>>registers
>>
>> The GRBM register interface is now capable of bursting 1 cycle per
>> register wr->wr, wr->rd much faster than previous muticycle per
>> transaction done interface.  This has caused a problem where status
>> registers requiring HW to update have a 1 cycle delay, due to the
>> register update having to go through GRBM.
>>
>> SW may operate on an incorrect value if they write a register and
>> immediately check th

Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay

2019-10-26 Thread Koenig, Christian
Am 26.10.19 um 00:45 schrieb Tuikov, Luben:
> On 2019-10-25 12:19 p.m., Koenig, Christian wrote:
>> Am 25.10.19 um 18:05 schrieb Alex Deucher:
>>> On Fri, Oct 25, 2019 at 2:49 AM Koenig, Christian
>>>  wrote:
>>>> Am 24.10.19 um 23:16 schrieb Tuikov, Luben:
>>>>> The GRBM interface is now capable of bursting
>>>>> 1-cycle op per register, a WRITE followed by
>>>>> another WRITE, or a WRITE followed by a READ--much
>>>>> faster than previous muti-cycle per
>>>>> completed-transaction interface. This causes a
>>>>> problem, whereby status registers requiring a
>>>>> read/write by hardware, have a 1-cycle delay, due
>>>>> to the register update having to go through GRBM
>>>>> interface.
>>>>>
>>>>> This patch adds this delay.
>>>>>
>>>>> A one cycle read op is added after updating the
>>>>> invalidate request and before reading the
>>>>> invalidate-ACK status.
>>>> Please completely drop all changes for GFX9 since this patch will most
>>>> likely break SRIOV.
>>>>
>>>> Additional to that please apply the workaround only to SDMA since the CP
>>>> driven engines should handle that in firmware.
> Thank you Christian for reviewing this patch.
>
> This patch stirred quite a bit of noise. So, then, I'll go by
> your last comment above--I suppose this is the desired way to go forward then?

You most likely broke the SRIOV use case on GFX9 with that, no wonder 
that this raised eyebrows.

As far as I can see this manual workaround is only applicable to the 
SDMA on Navi.

But we should double check that the CP firmware interface with the 
combined write/wait command is correctly used on Navi/GFX10 as well. 
IIRC that came in rather late for GFX9, could be that the Navi bringup 
branch never had that.

Regards,
Christian.

>
> Regards,
> Luben
>
>
>>> I think the CP only handles this in firmware if we use the new TLB
>>> invalidation packet.  I don't think it applies it to general register
>>> writes like we do.
>> No, on the CP we should use the combined write/wait command even if we
>> don't use the new specialized VM invalidate command. Everything else
>> won't work with SRIOV.
>>
>> Even if we want to we can't insert an extra read in this combined
>> write/wait command. And if we split up the commands we would break SRIOV
>> once more.
>>
>> So applying this workaround to the CP code doesn't make any sense at all.
>>
>> The only TODO which I can see is that we maybe don't use the combined
>> write/wait command on Navi yet.
>>
>> Christian.
>>
>>> Alex
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> See also commit
>>>>> 534991731cb5fa94b5519957646cf849ca10d17d.
>>>>>
>>>>> Signed-off-by: Luben Tuikov 
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 4 ++--
>>>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 4 ++--
>>>>> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 9 +
>>>>> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 8 
>>>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 2 +-
>>>>> 5 files changed, 22 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>>>> index ac43b1af69e3..0042868dbd53 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>>>> @@ -5129,7 +5129,7 @@ static const struct amdgpu_ring_funcs 
>>>>> gfx_v10_0_ring_funcs_gfx = {
>>>>> 5 + /* COND_EXEC */
>>>>> 7 + /* PIPELINE_SYNC */
>>>>> SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
>>>>> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
>>>>> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>>>>> 2 + /* VM_FLUSH */
>>>>> 8 + /* FENCE for VM_FLUSH */
>>>>> 20 + /* GDS switch */
>>>>> @@ -5182,7 +5182,7 @@ static const struct amdgpu_ring_funcs 
>>>>> gfx_v10_0_ring_funcs_compute = {
>>>>> 5 + /* hdp invalidate */
>>>>> 7 + /* gfx_v10_0_ring

Re: [PATCH] drm/amdgpu: simplify padding calculations

2019-10-26 Thread Koenig, Christian
Am 25.10.19 um 23:51 schrieb Tuikov, Luben:
> On 2019-10-25 2:54 a.m., Koenig, Christian wrote:
>> Am 25.10.19 um 01:44 schrieb Tuikov, Luben:
>>> Simplify padding calculations.
>>>
>>> Signed-off-by: Luben Tuikov 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |  4 ++--
>>>drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |  4 ++--
>>>drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |  4 ++--
>>>drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  4 ++--
>>>drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 14 +-
>>>5 files changed, 17 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c 
>>> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
>>> index c45304f1047c..1ea9e18d6f08 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
>>> @@ -228,7 +228,7 @@ static void cik_sdma_ring_emit_ib(struct amdgpu_ring 
>>> *ring,
>>> u32 extra_bits = vmid & 0xf;
>>>
>>> /* IB packet must end on a 8 DW boundary */
>>> -   cik_sdma_ring_insert_nop(ring, (12 - (lower_32_bits(ring->wptr) & 7)) % 
>>> 8);
>>> +   cik_sdma_ring_insert_nop(ring, (4-lower_32_bits(ring->wptr)) & 7);
>>>
>>> amdgpu_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_INDIRECT_BUFFER, 0, 
>>> extra_bits));
>>> amdgpu_ring_write(ring, ib->gpu_addr & 0xffe0); /* base must be 32 
>>> byte aligned */
>>> @@ -811,7 +811,7 @@ static void cik_sdma_ring_pad_ib(struct amdgpu_ring 
>>> *ring, struct amdgpu_ib *ib)
>>> u32 pad_count;
>>> int i;
>>>
>>> -   pad_count = (8 - (ib->length_dw & 0x7)) % 8;
>>> +   pad_count = (-ib->length_dw) & 7;
>>> for (i = 0; i < pad_count; i++)
>>> if (sdma && sdma->burst_nop && (i == 0))
>>> ib->ptr[ib->length_dw++] =
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c 
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
>>> index a10175838013..d340f179401a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
>>> @@ -255,7 +255,7 @@ static void sdma_v2_4_ring_emit_ib(struct amdgpu_ring 
>>> *ring,
>>> unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>>>
>>> /* IB packet must end on a 8 DW boundary */
>>> -   sdma_v2_4_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
>>> % 8);
>>> +   sdma_v2_4_ring_insert_nop(ring, (2-lower_32_bits(ring->wptr)) & 7);
>>>
>>> amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
>>>   SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
>>> @@ -750,7 +750,7 @@ static void sdma_v2_4_ring_pad_ib(struct amdgpu_ring 
>>> *ring, struct amdgpu_ib *ib
>>> u32 pad_count;
>>> int i;
>>>
>>> -   pad_count = (8 - (ib->length_dw & 0x7)) % 8;
>>> +   pad_count = (-ib->length_dw) & 7;
>>> for (i = 0; i < pad_count; i++)
>>> if (sdma && sdma->burst_nop && (i == 0))
>>> ib->ptr[ib->length_dw++] =
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>>> index 5f4e2c616241..5c3c310188b6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>>> @@ -429,7 +429,7 @@ static void sdma_v3_0_ring_emit_ib(struct amdgpu_ring 
>>> *ring,
>>> unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>>>
>>> /* IB packet must end on a 8 DW boundary */
>>> -   sdma_v3_0_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
>>> % 8);
>>> +   sdma_v3_0_ring_insert_nop(ring, (2-lower_32_bits(ring->wptr)) & 7);
>>>
>>> amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
>>>   SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
>>> @@ -1021,7 +1021,7 @@ static void sdma_v3_0_ring_pad_ib(struct amdgpu_ring 
>>> *ring, struct amdgpu_ib *ib
>>> u32 pad_count;
>>> int i;
>>>
>>> -   pad_count = (8 - (ib->length_dw & 0x7)) % 8;
>>> +   pad_count = (-ib->length_dw) & 7;
>>> for (i = 0; i < pad_count; i++)
>>> if (sdma &&

Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay

2019-10-25 Thread Koenig, Christian
Am 25.10.19 um 18:05 schrieb Alex Deucher:
> On Fri, Oct 25, 2019 at 2:49 AM Koenig, Christian
>  wrote:
>> Am 24.10.19 um 23:16 schrieb Tuikov, Luben:
>>> The GRBM interface is now capable of bursting
>>> 1-cycle op per register, a WRITE followed by
>>> another WRITE, or a WRITE followed by a READ--much
>>> faster than previous muti-cycle per
>>> completed-transaction interface. This causes a
>>> problem, whereby status registers requiring a
>>> read/write by hardware, have a 1-cycle delay, due
>>> to the register update having to go through GRBM
>>> interface.
>>>
>>> This patch adds this delay.
>>>
>>> A one cycle read op is added after updating the
>>> invalidate request and before reading the
>>> invalidate-ACK status.
>> Please completely drop all changes for GFX9 since this patch will most
>> likely break SRIOV.
>>
>> Additional to that please apply the workaround only to SDMA since the CP
>> driven engines should handle that in firmware.
> I think the CP only handles this in firmware if we use the new TLB
> invalidation packet.  I don't think it applies it to general register
> writes like we do.

No, on the CP we should use the combined write/wait command even if we 
don't use the new specialized VM invalidate command. Everything else 
won't work with SRIOV.

Even if we want to we can't insert an extra read in this combined 
write/wait command. And if we split up the commands we would break SRIOV 
once more.

So applying this workaround to the CP code doesn't make any sense at all.

The only TODO which I can see is that we maybe don't use the combined 
write/wait command on Navi yet.

Christian.

>
> Alex
>
>> Regards,
>> Christian.
>>
>>> See also commit
>>> 534991731cb5fa94b5519957646cf849ca10d17d.
>>>
>>> Signed-off-by: Luben Tuikov 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 4 ++--
>>>drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 4 ++--
>>>drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 9 +
>>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 8 
>>>drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 2 +-
>>>5 files changed, 22 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> index ac43b1af69e3..0042868dbd53 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> @@ -5129,7 +5129,7 @@ static const struct amdgpu_ring_funcs 
>>> gfx_v10_0_ring_funcs_gfx = {
>>>5 + /* COND_EXEC */
>>>7 + /* PIPELINE_SYNC */
>>>SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
>>> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
>>> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>>>2 + /* VM_FLUSH */
>>>8 + /* FENCE for VM_FLUSH */
>>>20 + /* GDS switch */
>>> @@ -5182,7 +5182,7 @@ static const struct amdgpu_ring_funcs 
>>> gfx_v10_0_ring_funcs_compute = {
>>>5 + /* hdp invalidate */
>>>7 + /* gfx_v10_0_ring_emit_pipeline_sync */
>>>SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
>>> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
>>> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>>>2 + /* gfx_v10_0_ring_emit_vm_flush */
>>>8 + 8 + 8, /* gfx_v10_0_ring_emit_fence x3 for user fence, 
>>> vm fence */
>>>.emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_compute */
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> index 9fe95e7693d5..9a7a717208de 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> @@ -6218,7 +6218,7 @@ static const struct amdgpu_ring_funcs 
>>> gfx_v9_0_ring_funcs_gfx = {
>>>5 +  /* COND_EXEC */
>>>7 +  /* PIPELINE_SYNC */
>>>SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
>>> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
>>> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>>>2 + /* VM_FLUSH */
>>>8 +  /* FENCE for VM_FLUSH */
>>>20 + /* GDS switch */
>>> @@ -6271,7 +6271,7 @@ static const struct amdgpu_ring_funcs 
>>> gfx_v9_0_ring_funcs_compute = {
>>>

Re: [PATCH 1/2] drm/sched: Set error to s_fence if HW job submission failed.

2019-10-25 Thread Koenig, Christian
Am 25.10.19 um 17:56 schrieb Grodzovsky, Andrey:
> On 10/25/19 11:55 AM, Koenig, Christian wrote:
>> Am 25.10.19 um 16:57 schrieb Grodzovsky, Andrey:
>>> On 10/25/19 4:44 AM, Christian König wrote:
>>>> Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky:
>>>>> Problem:
>>>>> When run_job fails and HW fence returned is NULL we still signal
>>>>> the s_fence to avoid hangs but the user has no way of knowing if
>>>>> the actual HW job was ran and finished.
>>>>>
>>>>> Fix:
>>>>> Allow .run_job implementations to return ERR_PTR in the fence pointer
>>>>> returned and then set this error for s_fence->finished fence so whoever
>>>>> wait on this fence can inspect the signaled fence for an error.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky 
>>>>> ---
>>>>>  drivers/gpu/drm/scheduler/sched_main.c | 19 ---
>>>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 9a0ee74..f39b97e 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -479,6 +479,7 @@ void drm_sched_resubmit_jobs(struct
>>>>> drm_gpu_scheduler *sched)
>>>>>  struct drm_sched_job *s_job, *tmp;
>>>>>  uint64_t guilty_context;
>>>>>  bool found_guilty = false;
>>>>> +    struct dma_fence *fence;
>>>>>    list_for_each_entry_safe(s_job, tmp,
>>>>> >ring_mirror_list, node) {
>>>>>  struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>> @@ -492,7 +493,16 @@ void drm_sched_resubmit_jobs(struct
>>>>> drm_gpu_scheduler *sched)
>>>>>  dma_fence_set_error(_fence->finished, -ECANCELED);
>>>>>    dma_fence_put(s_job->s_fence->parent);
>>>>> -    s_job->s_fence->parent = sched->ops->run_job(s_job);
>>>>> +    fence = sched->ops->run_job(s_job);
>>>>> +
>>>>> +    if (IS_ERR_OR_NULL(fence)) {
>>>>> +    s_job->s_fence->parent = NULL;
>>>>> +    dma_fence_set_error(_fence->finished, PTR_ERR(fence));
>>>>> +    } else {
>>>>> +    s_job->s_fence->parent = fence;
>>>>> +    }
>>>>> +
>>>>> +
>>>> Maybe time for a drm_sched_run_job() function which does that
>>>> handling? And why don't we need to install the callback here?
>>> What code do you want to put in drm_sched_run_job ?
>>>
>>> We reinstall the callback later in drm_sched_start,
>>> drm_sched_resubmit_jobs is conditional on whether the guilty fence did
>>> signal by this time or not and so the split of the logic into
>>> drm_sched_start and drm_sched_resubmit_jobs.
>> Ah, yes of course. In this case the patch is Reviewed-by: Christian
>> König .
>>
>> Regards,
>> Christian.
>
> Thanks, there is also 2/2 patch for amdgpu, please take a look.

Seen that, feel free to add my rb as well.

Christian.

>
> Andrey
>
>
>>> Andrey
>>>
>>>
>>>> Apart from that looks good to me,
>>>> Christian.
>>>>
>>>>>  }
>>>>>  }
>>>>>  EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>>>>> @@ -720,7 +730,7 @@ static int drm_sched_main(void *param)
>>>>>  fence = sched->ops->run_job(sched_job);
>>>>>  drm_sched_fence_scheduled(s_fence);
>>>>>  -    if (fence) {
>>>>> +    if (!IS_ERR_OR_NULL(fence)) {
>>>>>  s_fence->parent = dma_fence_get(fence);
>>>>>  r = dma_fence_add_callback(fence, _job->cb,
>>>>> drm_sched_process_job);
>>>>> @@ -730,8 +740,11 @@ static int drm_sched_main(void *param)
>>>>>  DRM_ERROR("fence add callback failed (%d)\n",
>>>>>    r);
>>>>>  dma_fence_put(fence);
>>>>> -    } else
>>>>> +    } else {
>>>>> +
>>>>> +    dma_fence_set_error(_fence->finished, PTR_ERR(fence));
>>>>>  drm_sched_process_job(NULL, _job->cb);
>>>>> +    }
>>>>>    wake_up(>job_scheduled);
>>>>>  }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: guard ib scheduling while in reset

2019-10-25 Thread Koenig, Christian
Am 25.10.19 um 17:35 schrieb Grodzovsky, Andrey:


On 10/25/19 5:26 AM, Koenig, Christian wrote:
Am 25.10.19 um 11:22 schrieb S, Shirish:


On 10/25/2019 2:23 PM, Koenig, Christian wrote:

amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

This is what's the root of the problem.

The scheduler should never be resumed before we are done with bringing back the 
hardware into an usable state.

I dont see the scheduler being resumed when the ib is scheduled, its done way 
after the hardware is ready in reset code path.

Below are the logs:

amdgpu :03:00.0: GPU reset begin!
amdgpu_device_gpu_recover calling drm_sched_stop <==
...
amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...
amdgpu_device_ip_resume_phase2 resumed sdma_v4_0
amdgpu_device_ip_resume_phase2 resumed powerplay
amdgpu_device_ip_resume_phase2 resumed dm
...
[drm] recover vram bo from shadow done
amdgpu_device_gpu_recover calling  drm_sched_start <==
...

As mentioned in the call trace, drm_sched_main() is responsible for this 
job_run which seems to be called during cleanup.

Then the scheduler isn't stopped for some reason and we need to investigate why.

We used to have another kthread_park()/unpark() in drm_sched_entity_fini(), 
maybe an application is crashing while we are trying to reset the GPU?


We still have it and isn't doing kthread_park()/unpark() from 
drm_sched_entity_fini while GPU reset in progress defeats all the purpose of 
drm_sched_stop->kthread_park ? If drm_sched_entity_fini-> kthread_unpark 
happens AFTER drm_sched_stop->kthread_park nothing prevents from another 
(third) thread keep submitting job to HW which will be picked up by the 
unparked scheduler thread try to submit to HW but fail because the HW ring is 
deactivated.

If so maybe we should serialize calls to kthread_park/unpark(sched->thread) ?

Yeah, that was my thinking as well. Probably best to just grab the reset lock 
before calling drm_sched_entity_fini().

Alternative I think we could change the kthread_park/unpark to a 
wait_event_ in drm_sched_entity_fini().

Regards,
Christian.


Andrey


Would be rather unlikely, especially that would be rather hard to reproduce but 
currently my best bet what's going wrong here.

Regards,
Christian.


Regards,

Shirish S

Regards,
Christian.

Am 25.10.19 um 10:50 schrieb S, Shirish:

Here is the call trace:

Call Trace:
 dump_stack+0x4d/0x63
 amdgpu_ib_schedule+0x86/0x4b7
 ? __mod_timer+0x21e/0x244
 amdgpu_job_run+0x108/0x178
 drm_sched_main+0x253/0x2fa
 ? remove_wait_queue+0x51/0x51
 ? drm_sched_cleanup_jobs.part.12+0xda/0xda
 kthread+0x14f/0x157
 ? kthread_park+0x86/0x86
 ret_from_fork+0x22/0x40
amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)

printed via below change:

@@ -151,6 +152,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}

if (!ring->sched.ready) {
+  dump_stack();
dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", 
ring->name);
return -EINVAL;

On 10/24/2019 10:00 PM, Christian König wrote:
Am 24.10.19 um 17:06 schrieb Grodzovsky, Andrey:


On 10/24/19 7:01 AM, Christian König wrote:
Am 24.10.19 um 12:58 schrieb S, Shirish:
[Why]
Upon GPU reset, kernel cleans up already submitted jobs
via drm_sched_cleanup_jobs.
This schedules ib's via drm_sched_main()->run_job, leading to
race condition of rings being ready or not, since during reset
rings may be suspended.

NAK, exactly that's what should not happen.

The scheduler should be suspend while a GPU reset is in progress.

So you are running into a completely different race here.

Below is the series of events when the issue occurs.

(Note that as you & Andrey mentioned the scheduler has been suspended but the 
job is scheduled nonetheless.)

amdgpu :03:00.0: GPU reset begin!

...

amdgpu_device_gpu_recover stopping ring sdma0 via drm_sched_stop

...

amdgpu :03:00.0: GPU reset succeeded, trying to resume

amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

...

FWIW, since the job is always NULL when "drm_sched_stop(>sched, job ? 
>base : NULL);" when called during reset, all drm_sched_stop() does

is  cancel delayed work and park the sched->thread. There is no job list to be 
iterated to de-activate or remove or update fences.

Re: [PATCH 1/2] drm/sched: Set error to s_fence if HW job submission failed.

2019-10-25 Thread Koenig, Christian
Am 25.10.19 um 16:57 schrieb Grodzovsky, Andrey:
> On 10/25/19 4:44 AM, Christian König wrote:
>> Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky:
>>> Problem:
>>> When run_job fails and HW fence returned is NULL we still signal
>>> the s_fence to avoid hangs but the user has no way of knowing if
>>> the actual HW job was ran and finished.
>>>
>>> Fix:
>>> Allow .run_job implementations to return ERR_PTR in the fence pointer
>>> returned and then set this error for s_fence->finished fence so whoever
>>> wait on this fence can inspect the signaled fence for an error.
>>>
>>> Signed-off-by: Andrey Grodzovsky 
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 19 ---
>>>    1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 9a0ee74..f39b97e 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -479,6 +479,7 @@ void drm_sched_resubmit_jobs(struct
>>> drm_gpu_scheduler *sched)
>>>    struct drm_sched_job *s_job, *tmp;
>>>    uint64_t guilty_context;
>>>    bool found_guilty = false;
>>> +    struct dma_fence *fence;
>>>      list_for_each_entry_safe(s_job, tmp,
>>> >ring_mirror_list, node) {
>>>    struct drm_sched_fence *s_fence = s_job->s_fence;
>>> @@ -492,7 +493,16 @@ void drm_sched_resubmit_jobs(struct
>>> drm_gpu_scheduler *sched)
>>>    dma_fence_set_error(_fence->finished, -ECANCELED);
>>>      dma_fence_put(s_job->s_fence->parent);
>>> -    s_job->s_fence->parent = sched->ops->run_job(s_job);
>>> +    fence = sched->ops->run_job(s_job);
>>> +
>>> +    if (IS_ERR_OR_NULL(fence)) {
>>> +    s_job->s_fence->parent = NULL;
>>> +    dma_fence_set_error(_fence->finished, PTR_ERR(fence));
>>> +    } else {
>>> +    s_job->s_fence->parent = fence;
>>> +    }
>>> +
>>> +
>> Maybe time for a drm_sched_run_job() function which does that
>> handling? And why don't we need to install the callback here?
>
> What code do you want to put in drm_sched_run_job ?
>
> We reinstall the callback later in drm_sched_start,
> drm_sched_resubmit_jobs is conditional on whether the guilty fence did
> signal by this time or not and so the split of the logic into
> drm_sched_start and drm_sched_resubmit_jobs.

Ah, yes of course. In this case the patch is Reviewed-by: Christian 
König .

Regards,
Christian.

>
> Andrey
>
>
>> Apart from that looks good to me,
>> Christian.
>>
>>>    }
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>>> @@ -720,7 +730,7 @@ static int drm_sched_main(void *param)
>>>    fence = sched->ops->run_job(sched_job);
>>>    drm_sched_fence_scheduled(s_fence);
>>>    -    if (fence) {
>>> +    if (!IS_ERR_OR_NULL(fence)) {
>>>    s_fence->parent = dma_fence_get(fence);
>>>    r = dma_fence_add_callback(fence, _job->cb,
>>>   drm_sched_process_job);
>>> @@ -730,8 +740,11 @@ static int drm_sched_main(void *param)
>>>    DRM_ERROR("fence add callback failed (%d)\n",
>>>      r);
>>>    dma_fence_put(fence);
>>> -    } else
>>> +    } else {
>>> +
>>> +    dma_fence_set_error(_fence->finished, PTR_ERR(fence));
>>>    drm_sched_process_job(NULL, _job->cb);
>>> +    }
>>>      wake_up(>job_scheduled);
>>>    }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay

2019-10-25 Thread Koenig, Christian
Hi Changfeng,

that won't work, you can't add this to 
amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered 
registers (like the semaphore ones).

Additional to that it will never work on GFX9, since the CP firmware 
there uses the integrated write/wait command and you can't add an 
additional dummy read there.

Regards,
Christian.

Am 25.10.19 um 16:22 schrieb Zhu, Changfeng:
> I try to write a patch based on the patch of Tuikov,Luben.
>
> Inspired by Luben,here is the patch:
>
>  From 1980d8f1ed44fb9a84a5ea1f6e2edd2bc25c629a Mon Sep 17 00:00:00 2001
> From: changzhu 
> Date: Thu, 10 Oct 2019 11:02:33 +0800
> Subject: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM status
>   registers
>
> The GRBM register interface is now capable of bursting 1 cycle per
> register wr->wr, wr->rd much faster than previous muticycle per
> transaction done interface.  This has caused a problem where
> status registers requiring HW to update have a 1 cycle delay, due
> to the register update having to go through GRBM.
>
> SW may operate on an incorrect value if they write a register and
> immediately check the corresponding status register.
>
> Registers requiring HW to clear or set fields may be delayed by 1 cycle.
> For example,
>
> 1. write VM_INVALIDATE_ENG0_REQ mask = 5a
> 2. read VM_INVALIDATE_ENG0_ACKb till the ack is same as the request mask = 5a
>   a. HW will reset VM_INVALIDATE_ENG0_ACK = 0 until invalidation is 
> complete
> 3. write VM_INVALIDATE_ENG0_REQ mask = 5a
> 4. read VM_INVALIDATE_ENG0_ACK till the ack is same as the request mask = 5a
>   a. First read of VM_INVALIDATE_ENG0_ACK = 5a instead of 0
>   b. Second read of VM_INVALIDATE_ENG0_ACK = 0 because the remote GRBM h/w
>  register takes one extra cycle to be cleared
>   c. In this case,SW wil see a false ACK if they exit on first read
>
> Affected registers (only GC variant)  | Recommended Dummy Read
> --+
> VM_INVALIDATE_ENG*_ACK  |  VM_INVALIDATE_ENG*_REQ
> VM_L2_STATUS|  VM_L2_STATUS
> VM_L2_PROTECTION_FAULT_STATUS   |  VM_L2_PROTECTION_FAULT_STATUS
> VM_L2_PROTECTION_FAULT_ADDR_HI/LO32   |  VM_L2_PROTECTION_FAULT_ADDR_HI/LO32
> VM_L2_IH_LOG_BUSY   |  VM_L2_IH_LOG_BUSY
> MC_VM_L2_PERFCOUNTER_HI/LO  |  MC_VM_L2_PERFCOUNTER_HI/LO
> ATC_L2_PERFCOUNTER_HI/LO|  ATC_L2_PERFCOUNTER_HI/LO
> ATC_L2_PERFCOUNTER2_HI/LO   |  ATC_L2_PERFCOUNTER2_HI/LO
>
> It also needs dummy read by engines for these gc registers.
>
> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  5 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   |  2 ++
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  2 ++
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c   |  4 
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 18 ++
>   5 files changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 4b3f58dbf36f..c2fbf6087ecf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -392,6 +392,11 @@ void amdgpu_ring_emit_reg_write_reg_wait_helper(struct 
> amdgpu_ring *ring,
>   uint32_t ref, uint32_t mask)
>   {
>   amdgpu_ring_emit_wreg(ring, reg0, ref);
> +
> + /* wait for a cycle to reset vm_inv_eng0_ack */
> + if (ring->funcs->vmhub == AMDGPU_GFXHUB_0)
> + amdgpu_ring_emit_rreg(ring, reg0);
> +
>   amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index ef1975a5323a..104c47734316 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -5155,6 +5155,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v10_0_ring_funcs_gfx = {
>   .patch_cond_exec = gfx_v10_0_ring_emit_patch_cond_exec,
>   .preempt_ib = gfx_v10_0_ring_preempt_ib,
>   .emit_tmz = gfx_v10_0_ring_emit_tmz,
> + .emit_rreg = gfx_v10_0_ring_emit_rreg,
>   .emit_wreg = gfx_v10_0_ring_emit_wreg,
>   .emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
>   };
> @@ -5188,6 +5189,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v10_0_ring_funcs_compute = {
>   .test_ib = gfx_v10_0_ring_test_ib,
>   .insert_nop = amdgpu_ring_insert_nop,
>   .pad_ib = amdgpu_ring_generic_pad_ib,
> + .emit_rreg = gfx_v10_0_ring_emit_rreg,
>   .emit_wreg = gfx_v10_0_ring_emit_wreg,
>   .emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
>   };
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 2f03bf533d41..d00b53de0fdc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

Re: [PATCH] drm/amdgpu: guard ib scheduling while in reset

2019-10-25 Thread Koenig, Christian
Am 25.10.19 um 12:08 schrieb S, Shirish:


On 10/25/2019 2:56 PM, Koenig, Christian wrote:
Am 25.10.19 um 11:22 schrieb S, Shirish:


On 10/25/2019 2:23 PM, Koenig, Christian wrote:

amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

This is what's the root of the problem.

The scheduler should never be resumed before we are done with bringing back the 
hardware into an usable state.

I dont see the scheduler being resumed when the ib is scheduled, its done way 
after the hardware is ready in reset code path.

Below are the logs:

amdgpu :03:00.0: GPU reset begin!
amdgpu_device_gpu_recover calling drm_sched_stop <==
...
amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...
amdgpu_device_ip_resume_phase2 resumed sdma_v4_0
amdgpu_device_ip_resume_phase2 resumed powerplay
amdgpu_device_ip_resume_phase2 resumed dm
...
[drm] recover vram bo from shadow done
amdgpu_device_gpu_recover calling  drm_sched_start <==
...

As mentioned in the call trace, drm_sched_main() is responsible for this 
job_run which seems to be called during cleanup.

Then the scheduler isn't stopped for some reason and we need to investigate why.


The drm_sched_stop() as i mentioned only parks the thread

That should be sufficient for the thread to be halted and not submit more jobs 
to the hardware.


and cancels work and nothing else, not sure why you think it hasn't stopped or 
done what it is supposed to do.

Since it works 3/5 times.

We used to have another kthread_park()/unpark() in drm_sched_entity_fini(), 
maybe an application is crashing while we are trying to reset the GPU?

Would be rather unlikely, especially that would be rather hard to reproduce but 
currently my best bet what's going wrong here.

Its sometimes triggered from drm_sched_entity_fini(), as i can see in prints 
but not always.

I believe application crashing while GPU resets is anticipated, depending upon 
workloads and state of gfx renderer when reset has happened.

Since reset is something that is not a usual/routine/regular event, such 
anomalies are to be expected when it happens,

so we need to have failsafe methods like this patch and may be some more based 
on system behavior upon reset.

Sorry but your patch is complete nonsense from the design point of view.

It is absolutely mandatory that the scheduler is stopped and the thread 
correctly parked for the GPU reset to work properly.

See you not only need to prevent running new jobs, but also job preparation 
(e.g. grabbing VMIDs) or otherwise you will immediately run into the next GPU 
reset after the first one finished.

Regards,
Christian.


Regards,

Shirish S

Regards,
Christian.


Regards,

Shirish S

Regards,
Christian.

Am 25.10.19 um 10:50 schrieb S, Shirish:

Here is the call trace:

Call Trace:
 dump_stack+0x4d/0x63
 amdgpu_ib_schedule+0x86/0x4b7
 ? __mod_timer+0x21e/0x244
 amdgpu_job_run+0x108/0x178
 drm_sched_main+0x253/0x2fa
 ? remove_wait_queue+0x51/0x51
 ? drm_sched_cleanup_jobs.part.12+0xda/0xda
 kthread+0x14f/0x157
 ? kthread_park+0x86/0x86
 ret_from_fork+0x22/0x40
amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)

printed via below change:

@@ -151,6 +152,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}

if (!ring->sched.ready) {
+  dump_stack();
dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", 
ring->name);
return -EINVAL;

On 10/24/2019 10:00 PM, Christian König wrote:
Am 24.10.19 um 17:06 schrieb Grodzovsky, Andrey:


On 10/24/19 7:01 AM, Christian König wrote:
Am 24.10.19 um 12:58 schrieb S, Shirish:
[Why]
Upon GPU reset, kernel cleans up already submitted jobs
via drm_sched_cleanup_jobs.
This schedules ib's via drm_sched_main()->run_job, leading to
race condition of rings being ready or not, since during reset
rings may be suspended.

NAK, exactly that's what should not happen.

The scheduler should be suspend while a GPU reset is in progress.

So you are running into a completely different race here.

Below is the series of events when the issue occurs.

(Note that as you & Andrey mentioned the scheduler has been suspended but the 
job is scheduled nonetheless.)

amdgpu :03:00.0: GPU reset begin!

...

amdgpu_device_gpu_recover stopping ring sdma0 via drm_sched_stop

...

amdgpu :03:00.0: GPU reset succeeded, trying to resume

amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0


Re: [PATCH] drm/amdgpu: guard ib scheduling while in reset

2019-10-25 Thread Koenig, Christian
Am 25.10.19 um 11:22 schrieb S, Shirish:


On 10/25/2019 2:23 PM, Koenig, Christian wrote:

amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

This is what's the root of the problem.

The scheduler should never be resumed before we are done with bringing back the 
hardware into an usable state.

I dont see the scheduler being resumed when the ib is scheduled, its done way 
after the hardware is ready in reset code path.

Below are the logs:

amdgpu :03:00.0: GPU reset begin!
amdgpu_device_gpu_recover calling drm_sched_stop <==
...
amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...
amdgpu_device_ip_resume_phase2 resumed sdma_v4_0
amdgpu_device_ip_resume_phase2 resumed powerplay
amdgpu_device_ip_resume_phase2 resumed dm
...
[drm] recover vram bo from shadow done
amdgpu_device_gpu_recover calling  drm_sched_start <==
...

As mentioned in the call trace, drm_sched_main() is responsible for this 
job_run which seems to be called during cleanup.

Then the scheduler isn't stopped for some reason and we need to investigate why.

We used to have another kthread_park()/unpark() in drm_sched_entity_fini(), 
maybe an application is crashing while we are trying to reset the GPU?

Would be rather unlikely, especially that would be rather hard to reproduce but 
currently my best bet what's going wrong here.

Regards,
Christian.


Regards,

Shirish S

Regards,
Christian.

Am 25.10.19 um 10:50 schrieb S, Shirish:

Here is the call trace:

Call Trace:
 dump_stack+0x4d/0x63
 amdgpu_ib_schedule+0x86/0x4b7
 ? __mod_timer+0x21e/0x244
 amdgpu_job_run+0x108/0x178
 drm_sched_main+0x253/0x2fa
 ? remove_wait_queue+0x51/0x51
 ? drm_sched_cleanup_jobs.part.12+0xda/0xda
 kthread+0x14f/0x157
 ? kthread_park+0x86/0x86
 ret_from_fork+0x22/0x40
amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)

printed via below change:

@@ -151,6 +152,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}

if (!ring->sched.ready) {
+  dump_stack();
dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", 
ring->name);
return -EINVAL;

On 10/24/2019 10:00 PM, Christian König wrote:
Am 24.10.19 um 17:06 schrieb Grodzovsky, Andrey:


On 10/24/19 7:01 AM, Christian König wrote:
Am 24.10.19 um 12:58 schrieb S, Shirish:
[Why]
Upon GPU reset, kernel cleans up already submitted jobs
via drm_sched_cleanup_jobs.
This schedules ib's via drm_sched_main()->run_job, leading to
race condition of rings being ready or not, since during reset
rings may be suspended.

NAK, exactly that's what should not happen.

The scheduler should be suspend while a GPU reset is in progress.

So you are running into a completely different race here.

Below is the series of events when the issue occurs.

(Note that as you & Andrey mentioned the scheduler has been suspended but the 
job is scheduled nonetheless.)

amdgpu :03:00.0: GPU reset begin!

...

amdgpu_device_gpu_recover stopping ring sdma0 via drm_sched_stop

...

amdgpu :03:00.0: GPU reset succeeded, trying to resume

amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

...

FWIW, since the job is always NULL when "drm_sched_stop(>sched, job ? 
>base : NULL);" when called during reset, all drm_sched_stop() does

is  cancel delayed work and park the sched->thread. There is no job list to be 
iterated to de-activate or remove or update fences.

Based on all this analysis, adding a mutex is more failsafe and less intrusive 
in the current code flow and lastly seems to be logical as well, hence I 
devised this approach


Please sync up with Andrey how this was able to happen.

Regards,
Christian.


Shirish - Christian makes a good point - note that in amdgpu_device_gpu_recover 
drm_sched_stop which stop all the scheduler threads is called way before we 
suspend the HW in amdgpu_device_pre_asic_reset->amdgpu_device_ip_suspend where 
SDMA suspension is happening and where the HW ring marked as not ready - please 
provide call stack for when you hit [drm:amdgpu_job_run] *ERROR* Error 
scheduling IBs (-22) to identify the code path which tried to submit the SDMA IB

Well the most likely cause of this is that the hardware failed to resume after 
the reset.

Infact hardware resume has not yet started, when the job is scheduled, which

Re: [PATCH] drm/amdgpu: guard ib scheduling while in reset

2019-10-25 Thread Koenig, Christian
amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

This is what's the root of the problem.

The scheduler should never be resumed before we are done with bringing back the 
hardware into an usable state.

Regards,
Christian.

Am 25.10.19 um 10:50 schrieb S, Shirish:

Here is the call trace:

Call Trace:
 dump_stack+0x4d/0x63
 amdgpu_ib_schedule+0x86/0x4b7
 ? __mod_timer+0x21e/0x244
 amdgpu_job_run+0x108/0x178
 drm_sched_main+0x253/0x2fa
 ? remove_wait_queue+0x51/0x51
 ? drm_sched_cleanup_jobs.part.12+0xda/0xda
 kthread+0x14f/0x157
 ? kthread_park+0x86/0x86
 ret_from_fork+0x22/0x40
amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)

printed via below change:

@@ -151,6 +152,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}

if (!ring->sched.ready) {
+  dump_stack();
dev_err(adev->dev, "couldn't schedule ib on ring <%s>\n", 
ring->name);
return -EINVAL;

On 10/24/2019 10:00 PM, Christian König wrote:
Am 24.10.19 um 17:06 schrieb Grodzovsky, Andrey:


On 10/24/19 7:01 AM, Christian König wrote:
Am 24.10.19 um 12:58 schrieb S, Shirish:
[Why]
Upon GPU reset, kernel cleans up already submitted jobs
via drm_sched_cleanup_jobs.
This schedules ib's via drm_sched_main()->run_job, leading to
race condition of rings being ready or not, since during reset
rings may be suspended.

NAK, exactly that's what should not happen.

The scheduler should be suspend while a GPU reset is in progress.

So you are running into a completely different race here.

Below is the series of events when the issue occurs.

(Note that as you & Andrey mentioned the scheduler has been suspended but the 
job is scheduled nonetheless.)

amdgpu :03:00.0: GPU reset begin!

...

amdgpu_device_gpu_recover stopping ring sdma0 via drm_sched_stop

...

amdgpu :03:00.0: GPU reset succeeded, trying to resume

amdgpu_do_asic_reset starting to resume blocks

...

amdgpu :03:00.0: couldn't schedule ib on ring 
[drm:amdgpu_job_run] *ERROR* Error scheduling IBs (-22)
...

amdgpu_device_ip_resume_phase2 resumed gfx_v9_0

amdgpu_device_ip_resume_phase2 resumed sdma_v4_0

amdgpu_device_ip_resume_phase2 resumed powerplay

...

FWIW, since the job is always NULL when "drm_sched_stop(>sched, job ? 
>base : NULL);" when called during reset, all drm_sched_stop() does

is  cancel delayed work and park the sched->thread. There is no job list to be 
iterated to de-activate or remove or update fences.

Based on all this analysis, adding a mutex is more failsafe and less intrusive 
in the current code flow and lastly seems to be logical as well, hence I 
devised this approach


Please sync up with Andrey how this was able to happen.

Regards,
Christian.


Shirish - Christian makes a good point - note that in amdgpu_device_gpu_recover 
drm_sched_stop which stop all the scheduler threads is called way before we 
suspend the HW in amdgpu_device_pre_asic_reset->amdgpu_device_ip_suspend where 
SDMA suspension is happening and where the HW ring marked as not ready - please 
provide call stack for when you hit [drm:amdgpu_job_run] *ERROR* Error 
scheduling IBs (-22) to identify the code path which tried to submit the SDMA IB

Well the most likely cause of this is that the hardware failed to resume after 
the reset.

Infact hardware resume has not yet started, when the job is scheduled, which is 
the race am trying to address with this patch.

Regards,

Shirish S

Christian.


Andrey



[How]
make GPU reset's amdgpu_device_ip_resume_phase2() &
amdgpu_ib_schedule() in amdgpu_job_run() mutually exclusive.

Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 2 ++
  3 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index f4d9041..7b07a47b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -973,6 +973,7 @@ struct amdgpu_device {
  boolin_gpu_reset;
  enum pp_mp1_state   mp1_state;
  struct mutex  lock_reset;
+struct mutex  lock_ib_sched;
  struct amdgpu_doorbell_index doorbell_index;
int asic_reset_res;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 676cad1..63cad74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2759,6 +2759,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  mutex_init(>virt.vf_errors.lock);
 

Re: [PATCH] drm/amdgpu: simplify padding calculations

2019-10-25 Thread Koenig, Christian
Am 25.10.19 um 01:44 schrieb Tuikov, Luben:
> Simplify padding calculations.
>
> Signed-off-by: Luben Tuikov 
> ---
>   drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 14 +-
>   5 files changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> index c45304f1047c..1ea9e18d6f08 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
> @@ -228,7 +228,7 @@ static void cik_sdma_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   u32 extra_bits = vmid & 0xf;
>   
>   /* IB packet must end on a 8 DW boundary */
> - cik_sdma_ring_insert_nop(ring, (12 - (lower_32_bits(ring->wptr) & 7)) % 
> 8);
> + cik_sdma_ring_insert_nop(ring, (4-lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_INDIRECT_BUFFER, 0, 
> extra_bits));
>   amdgpu_ring_write(ring, ib->gpu_addr & 0xffe0); /* base must be 32 
> byte aligned */
> @@ -811,7 +811,7 @@ static void cik_sdma_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib)
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && sdma->burst_nop && (i == 0))
>   ib->ptr[ib->length_dw++] =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> index a10175838013..d340f179401a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
> @@ -255,7 +255,7 @@ static void sdma_v2_4_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>   
>   /* IB packet must end on a 8 DW boundary */
> - sdma_v2_4_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
> % 8);
> + sdma_v2_4_ring_insert_nop(ring, (2-lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
> SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
> @@ -750,7 +750,7 @@ static void sdma_v2_4_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && sdma->burst_nop && (i == 0))
>   ib->ptr[ib->length_dw++] =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> index 5f4e2c616241..5c3c310188b6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> @@ -429,7 +429,7 @@ static void sdma_v3_0_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>   
>   /* IB packet must end on a 8 DW boundary */
> - sdma_v3_0_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
> % 8);
> + sdma_v3_0_ring_insert_nop(ring, (2-lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
> SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
> @@ -1021,7 +1021,7 @@ static void sdma_v3_0_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && sdma->burst_nop && (i == 0))
>   ib->ptr[ib->length_dw++] =
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 45bd538ba97e..7c71c88e38a4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -698,7 +698,7 @@ static void sdma_v4_0_ring_emit_ib(struct amdgpu_ring 
> *ring,
>   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>   
>   /* IB packet must end on a 8 DW boundary */
> - sdma_v4_0_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
> % 8);
> + sdma_v4_0_ring_insert_nop(ring, (2-lower_32_bits(ring->wptr)) & 7);
>   
>   amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
> SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
> @@ -1580,7 +1580,7 @@ static void sdma_v4_0_ring_pad_ib(struct amdgpu_ring 
> *ring, struct amdgpu_ib *ib
>   u32 pad_count;
>   int i;
>   
> - pad_count = (8 - (ib->length_dw & 0x7)) % 8;
> + pad_count = (-ib->length_dw) & 7;
>   for (i = 0; i < pad_count; i++)
>   if (sdma && sdma->burst_nop && (i == 0))
>   ib->ptr[ib->length_dw++] =

Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay

2019-10-25 Thread Koenig, Christian
Am 24.10.19 um 23:16 schrieb Tuikov, Luben:
> The GRBM interface is now capable of bursting
> 1-cycle op per register, a WRITE followed by
> another WRITE, or a WRITE followed by a READ--much
> faster than previous muti-cycle per
> completed-transaction interface. This causes a
> problem, whereby status registers requiring a
> read/write by hardware, have a 1-cycle delay, due
> to the register update having to go through GRBM
> interface.
>
> This patch adds this delay.
>
> A one cycle read op is added after updating the
> invalidate request and before reading the
> invalidate-ACK status.

Please completely drop all changes for GFX9 since this patch will most 
likely break SRIOV.

Additional to that please apply the workaround only to SDMA since the CP 
driven engines should handle that in firmware.

Regards,
Christian.

>
> See also commit
> 534991731cb5fa94b5519957646cf849ca10d17d.
>
> Signed-off-by: Luben Tuikov 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 9 +
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 8 
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 2 +-
>   5 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index ac43b1af69e3..0042868dbd53 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -5129,7 +5129,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v10_0_ring_funcs_gfx = {
>   5 + /* COND_EXEC */
>   7 + /* PIPELINE_SYNC */
>   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>   2 + /* VM_FLUSH */
>   8 + /* FENCE for VM_FLUSH */
>   20 + /* GDS switch */
> @@ -5182,7 +5182,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v10_0_ring_funcs_compute = {
>   5 + /* hdp invalidate */
>   7 + /* gfx_v10_0_ring_emit_pipeline_sync */
>   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>   2 + /* gfx_v10_0_ring_emit_vm_flush */
>   8 + 8 + 8, /* gfx_v10_0_ring_emit_fence x3 for user fence, vm 
> fence */
>   .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_compute */
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 9fe95e7693d5..9a7a717208de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -6218,7 +6218,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v9_0_ring_funcs_gfx = {
>   5 +  /* COND_EXEC */
>   7 +  /* PIPELINE_SYNC */
>   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>   2 + /* VM_FLUSH */
>   8 +  /* FENCE for VM_FLUSH */
>   20 + /* GDS switch */
> @@ -6271,7 +6271,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v9_0_ring_funcs_compute = {
>   5 + /* hdp invalidate */
>   7 + /* gfx_v9_0_ring_emit_pipeline_sync */
>   SOC15_FLUSH_GPU_TLB_NUM_WREG * 5 +
> - SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 +
> + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 7 * 2 +
>   2 + /* gfx_v9_0_ring_emit_vm_flush */
>   8 + 8 + 8, /* gfx_v9_0_ring_emit_fence x3 for user fence, vm 
> fence */
>   .emit_ib_size = 7, /* gfx_v9_0_ring_emit_ib_compute */
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index 6e1b25bd1fe7..100d526e9a42 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -346,6 +346,15 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct 
> amdgpu_ring *ring,
>   
>   amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);
>   
> + /* Insert a dummy read to delay one cycle before the ACK
> +  * inquiry.
> +  */
> + if (ring->funcs->type == AMDGPU_RING_TYPE_SDMA ||
> + ring->funcs->type == AMDGPU_RING_TYPE_GFX  ||
> + ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
> + amdgpu_ring_emit_reg_wait(ring,
> +   hub->vm_inv_eng0_req + eng, 0, 0);
> +
>   /* wait for the invalidate to complete */
>   amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
> 1 << vmid, 1 << vmid);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 9f2a893871ec..8f3097e45299 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ 

Re: [PATCH] drm/amdgpu: remove unused variable in amdgpu_gfx_kiq_free_ring

2019-10-23 Thread Koenig, Christian
Maybe say parameter instead of variable in the subject.

Am 23.10.19 um 16:35 schrieb Nirmoy Das:
> Signed-off-by: Nirmoy Das 

Apart from that Acked-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 +--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 +--
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 2 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 2 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 2 +-
>   5 files changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 069515f57c2a..c9d1fada6188 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -319,8 +319,7 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>   return r;
>   }
>   
> -void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring,
> -   struct amdgpu_irq_src *irq)
> +void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring)
>   {
>   amdgpu_device_wb_free(ring->adev, ring->adev->virt.reg_val_offs);
>   amdgpu_ring_fini(ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 35eff9e6ce16..459aa9059542 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -330,8 +330,7 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>struct amdgpu_ring *ring,
>struct amdgpu_irq_src *irq);
>   
> -void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring,
> -   struct amdgpu_irq_src *irq);
> +void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring);
>   
>   void amdgpu_gfx_kiq_fini(struct amdgpu_device *adev);
>   int amdgpu_gfx_kiq_init(struct amdgpu_device *adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 8fca6ab5fa8f..ac43b1af69e3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -1443,7 +1443,7 @@ static int gfx_v10_0_sw_fini(void *handle)
>   amdgpu_ring_fini(>gfx.compute_ring[i]);
>   
>   amdgpu_gfx_mqd_sw_fini(adev);
> - amdgpu_gfx_kiq_free_ring(>gfx.kiq.ring, >gfx.kiq.irq);
> + amdgpu_gfx_kiq_free_ring(>gfx.kiq.ring);
>   amdgpu_gfx_kiq_fini(adev);
>   
>   gfx_v10_0_pfp_fini(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index a7fe0ea24d1f..e4c645da4e28 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -2103,7 +2103,7 @@ static int gfx_v8_0_sw_fini(void *handle)
>   amdgpu_ring_fini(>gfx.compute_ring[i]);
>   
>   amdgpu_gfx_mqd_sw_fini(adev);
> - amdgpu_gfx_kiq_free_ring(>gfx.kiq.ring, >gfx.kiq.irq);
> + amdgpu_gfx_kiq_free_ring(>gfx.kiq.ring);
>   amdgpu_gfx_kiq_fini(adev);
>   
>   gfx_v8_0_mec_fini(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index dd345fcedb97..9fe95e7693d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -2153,7 +2153,7 @@ static int gfx_v9_0_sw_fini(void *handle)
>   amdgpu_ring_fini(>gfx.compute_ring[i]);
>   
>   amdgpu_gfx_mqd_sw_fini(adev);
> - amdgpu_gfx_kiq_free_ring(>gfx.kiq.ring, >gfx.kiq.irq);
> + amdgpu_gfx_kiq_free_ring(>gfx.kiq.ring);
>   amdgpu_gfx_kiq_fini(adev);
>   
>   gfx_v9_0_mec_fini(adev);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: refine reboot debugfs operation in ras case (v3)

2019-10-22 Thread Koenig, Christian
Am 22.10.19 um 04:28 schrieb Chen, Guchun:
> Ras reboot debugfs node allows user one easy control to avoid
> gpu recovery hang problem and directly reboot system per card
> basis, after ras uncorrectable error happens. However, it is
> one common entry, which should get rid of ras_ctrl node and
> remove ip dependence when inputting by user. So add one new
> auto_reboot node in ras debugfs dir to achieve this.
>
> v2: in commit mssage, add justification why ras reboot debugfs
> node is needed.
> v3: use debugfs_create_bool to create debugfs file for boolean value
>
> Signed-off-by: Guchun Chen 

Nice cleanup, patch is Reviewed-by: Christian König 
.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 ---
>   1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 6220394521e4..2d9e13d2a71a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -153,8 +153,6 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
> *f,
>   op = 1;
>   else if (sscanf(str, "inject %32s %8s", block_name, err) == 2)
>   op = 2;
> - else if (sscanf(str, "reboot %32s", block_name) == 1)
> - op = 3;
>   else if (str[0] && str[1] && str[2] && str[3])
>   /* ascii string, but commands are not matched. */
>   return -EINVAL;
> @@ -218,12 +216,11 @@ static struct ras_manager *amdgpu_ras_find_obj(struct 
> amdgpu_device *adev,
>* value to the address.
>*
>* Second member: struct ras_debug_if::op.
> - * It has four kinds of operations.
> + * It has three kinds of operations.
>*
>* - 0: disable RAS on the block. Take ::head as its data.
>* - 1: enable RAS on the block. Take ::head as its data.
>* - 2: inject errors on the block. Take ::inject as its data.
> - * - 3: reboot on unrecoverable error
>*
>* How to use the interface?
>* programs:
> @@ -305,9 +302,6 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file 
> *f, const char __user *
>   /* data.inject.address is offset instead of absolute gpu 
> address */
>   ret = amdgpu_ras_error_inject(adev, );
>   break;
> - case 3:
> - amdgpu_ras_get_context(adev)->reboot = true;
> - break;
>   default:
>   ret = -EINVAL;
>   break;
> @@ -1037,6 +1031,17 @@ static void amdgpu_ras_debugfs_create_ctrl_node(struct 
> amdgpu_device *adev)
>   adev, _ras_debugfs_ctrl_ops);
>   debugfs_create_file("ras_eeprom_reset", S_IWUGO | S_IRUGO, con->dir,
>   adev, _ras_debugfs_eeprom_ops);
> +
> + /*
> +  * After one uncorrectable error happens, usually GPU recovery will
> +  * be scheduled. But due to the known problem in GPU recovery failing
> +  * to bring GPU back, below interface provides one direct way to
> +  * user to reboot system automatically in such case within
> +  * ERREVENT_ATHUB_INTERRUPT generated. Normal GPU recovery routine
> +  * will never be called.
> +  */
> + debugfs_create_bool("auto_reboot", S_IWUGO | S_IRUGO, con->dir,
> + >reboot);
>   }
>   
>   void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Fix memory leak in amdgpu_fence_emit

2019-10-22 Thread Koenig, Christian
Am 21.10.19 um 20:09 schrieb Navid Emamdoost:
> In the impelementation of amdgpu_fence_emit() if dma_fence_wait() fails
> and returns an errno, before returning the allocated memory for fence
> should be released.
>
> Fixes: 3d2aca8c8620 ("drm/amdgpu: fix old fence check in amdgpu_fence_emit")
> Signed-off-by: Navid Emamdoost 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 23085b352cf2..2f59c9270a7e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -166,8 +166,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct 
> dma_fence **f,
>   if (old) {
>   r = dma_fence_wait(old, false);
>   dma_fence_put(old);
> - if (r)
> + if (r) {
> + dma_fence_put(fence);
>   return r;
> + }
>   }
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

2019-10-21 Thread Koenig, Christian
Am 21.10.19 um 15:57 schrieb Jason Gunthorpe:
> On Sun, Oct 20, 2019 at 02:21:42PM +0000, Koenig, Christian wrote:
>> Am 18.10.19 um 22:36 schrieb Jason Gunthorpe:
>>> On Thu, Oct 17, 2019 at 04:47:20PM +, Koenig, Christian wrote:
>>> [SNIP]
>>>
>>>> So again how are they serialized?
>>> The 'driver lock' thing does it, read the hmm documentation, the hmm
>>> approach is basically the only approach that was correct of all the
>>> drivers..
>> Well that's what I've did, but what HMM does still doesn't looks correct
>> to me.
> It has a bug, but the basic flow seems to work.
>
> https://patchwork.kernel.org/patch/11191

Maybe wrong link? That link looks like an unrelated discussion on kernel 
image relocation.

>>> So long as the 'driver lock' is held the range cannot become
>>> invalidated as the 'driver lock' prevents progress of invalidation.
>> Correct, but the problem is it doesn't wait for ongoing operations to
>> complete.
>>
>> See I'm talking about the following case:
>>
>> Thread A    Thread B
>> invalidate_range_start()
>>   mmu_range_read_begin()
>>   get_user_pages()/hmm_range_fault()
>>   grab_driver_lock()
>> Updating the ptes
>> invalidate_range_end()
>>
>> As far as I can see in invalidate_range_start() the driver lock is taken
>> to make sure that we can't start any invalidation while the driver is
>> using the pages for a command submission.
> Again, this uses the seqlock like scheme *and* the driver lock.
>
> In this case after grab_driver_lock() mmu_range_read_retry() will
> return false if Thread A has progressed to 'updating the ptes.
>
> For instance here is how the concurrency resolves for retry:
>
> CPU1CPU2
>seq = mmu_range_read_begin()
> invalidate_range_start()
>invalidate_seq++

How that was order was what confusing me. But I've read up on the code 
in mmu_range_read_begin() and found the lines I was looking for:

+    if (is_invalidating)
+        wait_event(mmn_mm->wq,
+               READ_ONCE(mmn_mm->invalidate_seq) != seq);

[SNIP]

> For the above I've simplified the mechanics of the invalidate_seq, you
> need to look through the patch to see how it actually works.

Yea, that you also allow multiple write sides is pretty neat.

>> Well we don't update the seqlock after the update to the protected data
>> structure (the page table) happened, but rather before that.
> ??? This is what mn_itree_inv_end() does, it is called by
> invalidate_range_end
>
>> That doesn't looks like the normal patter for a seqlock to me and as far
>> as I can see that is quite a bug in the HMM design/logic.
> Well, hmm has a bug because it doesn't use a seqlock pattern, see the
> above URL.
>
> One of the motivations for this work is to squash that bug by adding a
> seqlock like pattern. But the basic hmm flow and collision-retry
> approach seems sound.
>
> Do you see a problem with this patch?

No, not any more.

Essentially you are doing the same thing I've tried to do with the 
original amdgpu implementation. The difference is that you don't try to 
use a per range sequence (which is a good idea, we never got that fully 
working) and you allow multiple writers at the same time.

Feel free to stitch an Acked-by: Christian König 
 on patch #2, but you still doing a bunch of 
things in there which are way beyond my understanding (e.g. where are 
all the SMP barriers?).

Cheers,
Christian.

>
> Jason

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

2019-10-20 Thread Koenig, Christian
Am 18.10.19 um 22:36 schrieb Jason Gunthorpe:
> On Thu, Oct 17, 2019 at 04:47:20PM +0000, Koenig, Christian wrote:
>
>>> get_user_pages/hmm_range_fault() and invalidate_range_start() both are
>>> called while holding mm->map_sem, so they are always serialized.
>> Not even remotely.
>>
>> For calling get_user_pages()/hmm_range_fault() you only need to hold the
>> mmap_sem in read mode.
> Right
>   
>> And IIRC invalidate_range_start() is sometimes called without holding
>> the mmap_sem at all.
> Yep
>   
>> So again how are they serialized?
> The 'driver lock' thing does it, read the hmm documentation, the hmm
> approach is basically the only approach that was correct of all the
> drivers..

Well that's what I've did, but what HMM does still doesn't looks correct 
to me.

> So long as the 'driver lock' is held the range cannot become
> invalidated as the 'driver lock' prevents progress of invalidation.

Correct, but the problem is it doesn't wait for ongoing operations to 
complete.

See I'm talking about the following case:

Thread A    Thread B
invalidate_range_start()
                     mmu_range_read_begin()
                     get_user_pages()/hmm_range_fault()
                     grab_driver_lock()
Updating the ptes
invalidate_range_end()

As far as I can see in invalidate_range_start() the driver lock is taken 
to make sure that we can't start any invalidation while the driver is 
using the pages for a command submission.

But the pages we got from get_user_pages()/hmm_range_fault() might not 
be up to date because the driver lock is also dropped again in 
invalidate_range_start() and not in invalidate_range_end().

> Holding the driver lock and using the seq based mmu_range_read_retry()
> tells if the previous unlocked get_user_pages() is still valid or
> needs to be discard.
>
> So it doesn't matter if get_user_pages() races or not, the result is not
> to be used until the driver lock is held and mmu_range_read_retry()
> called, which provides the coherence.
>
> It is the usual seqlock pattern.

Well we don't update the seqlock after the update to the protected data 
structure (the page table) happened, but rather before that.

That doesn't looks like the normal patter for a seqlock to me and as far 
as I can see that is quite a bug in the HMM design/logic.

Regards,
Christian.

>
> Jason

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-18 Thread Koenig, Christian
Am 17.10.19 um 22:12 schrieb Sylvain Munaut:
> So a bit more testing.
>
> I was using a bit of "unusual" config I guess, having 2 GPUs and some
> other pcie cards (10G, ..).
> So I simplified and went to the most standard thing I could think of,
> _just_ the RX 560 card plugged into the main PCIe 16x slot directly
> connected to the CPU.
>
> And exact same results, no change in behavior.
>
> So on one hand I'm happy that the other cards and having the AMD GPU
> in the second slot isn't the issue (because I really need that config
> that way), but on the other, I'm no closer to finding the issue :/

At least you tested quite a bunch of things which I would have suggested 
as well.

I would also test if disabling power features helps as well, try to add 
amdgpu.pg_mask=0 and amdgpu.cg_mask=0 to the kernel command line for 
example.

Regards,
Christian.

>
> Cheers,
>
>   Sylvain Munaut

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-17 Thread Koenig, Christian
Am 17.10.19 um 16:29 schrieb Sylvain Munaut:
> Hi,
>
>
>>> I have RX 560 2G card. It's plugged into a 16x physical / 4x
>>> electrical slot of a X570 chipset motherboard with a Ryzen 3700X CPU.
>>> The hardware works fine and is stable under Windows (tested with
>>> games, benchmarks, stress-tests, ...)
>> Does booting with pci=noats on the kernel command line in grub fix the issue?
> It doesn't :/
>
> Message is slightly different but same idea :
>
> [   83.704035] amdgpu :06:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x address=0x0 flags=0x0020]
> [   88.732685] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]]
> *ERROR* Waiting for fences timed out or interrupted!
> [   92.074379] ixgbe :04:00.1: Adapter removed
> [   93.480989] igb :07:00.0 enp7s0: PCIe link lost
>
> So it screws up the PCIe very badly :/
> Specifically seems to be everything connected to the X570 chipset.

 From the hardware point of view the only thing which comes to mind is 
that you somehow triggered the ESD protection.

I assume you can rule out an unstable physical connection (because it 
works on windows), so the only thing left is that there is something 
very very badly going wrong with power management.

Have you "tuned" the power tables on the board somehow? Or maybe 
multiple GPUs connected to the same power supply?

Regards,
Christian.

>
> Cheers,
>
>  Sylvain
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

2019-10-17 Thread Koenig, Christian
Sending once more as text.

Am 17.10.19 um 18:26 schrieb Yang, Philip:
> On 2019-10-17 4:54 a.m., Christian König wrote:
>> Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
>>> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
 Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
> From: Jason Gunthorpe 
>
> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp,
> hfi1,
> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
> they only use invalidate_range_start/end and immediately check the
> invalidating range against some driver data structure to tell if the
> driver is interested. Half of them use an interval_tree, the others are
> simple linear search lists.
>
> Of the ones I checked they largely seem to have various kinds of races,
> bugs and poor implementation. This is a result of the complexity in how
> the notifier interacts with get_user_pages(). It is extremely
> difficult to
> use it correctly.
>
> Consolidate all of this code together into the core mmu_notifier and
> provide a locking scheme similar to hmm_mirror that allows the user to
> safely use get_user_pages() and reliably know if the page list still
> matches the mm.
 That sounds really good, but could you outline for a moment how that is
 archived?
>>> It uses the same basic scheme as hmm and rdma odp, outlined in the
>>> revisions to hmm.rst later on.
>>>
>>> Basically,
>>>
>>>    seq = mmu_range_read_begin();
>>>
>>>    // This is a speculative region
>>>    .. get_user_pages()/hmm_range_fault() ..
>> How do we enforce that this get_user_pages()/hmm_range_fault() doesn't
>> see outdated page table information?
>>
>> In other words how the the following race prevented:
>>
>> CPU A CPU B
>> invalidate_range_start()
>>         mmu_range_read_begin()
>>         get_user_pages()/hmm_range_fault()
>> Updating the ptes
>> invalidate_range_end()
>>
>>
>> I mean get_user_pages() tries to circumvent this issue by grabbing a
>> reference to the pages in question, but that isn't sufficient for the
>> SVM use case.
>>
>> That's the reason why we had this horrible solution with a r/w lock and
>> a linked list of BOs in an interval tree.
>>
>> Regards,
>> Christian.
> get_user_pages/hmm_range_fault() and invalidate_range_start() both are
> called while holding mm->map_sem, so they are always serialized.

Not even remotely.

For calling get_user_pages()/hmm_range_fault() you only need to hold the 
mmap_sem in read mode.

And IIRC invalidate_range_start() is sometimes called without holding 
the mmap_sem at all.

So again how are they serialized?

Regards,
Christian.

>
> Philip
>>>    // Result cannot be derferenced
>>>
>>>    take_lock(driver->update);
>>>    if (mmu_range_read_retry(, range.notifier_seq) {
>>>   // collision! The results are not correct
>>>   goto again
>>>    }
>>>
>>>    // no collision, and now under lock. Now we can de-reference the
>>> pages/etc
>>>    // program HW
>>>    // Now the invalidate callback is responsible to synchronize against
>>> changes
>>>    unlock(driver->update)
>>>
>>> Basically, anything that was using hmm_mirror correctly transisions
>>> over fairly trivially, just with the modification to store a sequence
>>> number to close that race described in the hmm commit.
>>>
>>> For something like AMD gpu I expect it to transition to use dma_fence
>>> from the notifier for coherency right before it unlocks driver->update.
>>>
>>> Jason
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

2019-10-17 Thread Koenig, Christian


Am 17.10.2019 18:26 schrieb "Yang, Philip" :


On 2019-10-17 4:54 a.m., Christian König wrote:
> Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
>> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
>>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
 From: Jason Gunthorpe 

 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp,
 hfi1,
 scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
 they only use invalidate_range_start/end and immediately check the
 invalidating range against some driver data structure to tell if the
 driver is interested. Half of them use an interval_tree, the others are
 simple linear search lists.

 Of the ones I checked they largely seem to have various kinds of races,
 bugs and poor implementation. This is a result of the complexity in how
 the notifier interacts with get_user_pages(). It is extremely
 difficult to
 use it correctly.

 Consolidate all of this code together into the core mmu_notifier and
 provide a locking scheme similar to hmm_mirror that allows the user to
 safely use get_user_pages() and reliably know if the page list still
 matches the mm.
>>> That sounds really good, but could you outline for a moment how that is
>>> archived?
>> It uses the same basic scheme as hmm and rdma odp, outlined in the
>> revisions to hmm.rst later on.
>>
>> Basically,
>>
>>   seq = mmu_range_read_begin();
>>
>>   // This is a speculative region
>>   .. get_user_pages()/hmm_range_fault() ..
>
> How do we enforce that this get_user_pages()/hmm_range_fault() doesn't
> see outdated page table information?
>
> In other words how the the following race prevented:
>
> CPU A CPU B
> invalidate_range_start()
>mmu_range_read_begin()
>get_user_pages()/hmm_range_fault()
> Updating the ptes
> invalidate_range_end()
>
>
> I mean get_user_pages() tries to circumvent this issue by grabbing a
> reference to the pages in question, but that isn't sufficient for the
> SVM use case.
>
> That's the reason why we had this horrible solution with a r/w lock and
> a linked list of BOs in an interval tree.
>
> Regards,
> Christian.
get_user_pages/hmm_range_fault() and invalidate_range_start() both are
called while holding mm->map_sem, so they are always serialized.

Not even remotely.

For calling get_user_pages()/hmm_range_fault() you only need to hold the 
mmap_sem in read mode.

And IIRC invalidate_range_start() is sometimes called without holding the 
mmap_sem at all.

So again how are they serialized?

Regards,
Christian.


Philip
>
>>   // Result cannot be derferenced
>>
>>   take_lock(driver->update);
>>   if (mmu_range_read_retry(, range.notifier_seq) {
>>  // collision! The results are not correct
>>  goto again
>>   }
>>
>>   // no collision, and now under lock. Now we can de-reference the
>> pages/etc
>>   // program HW
>>   // Now the invalidate callback is responsible to synchronize against
>> changes
>>   unlock(driver->update)
>>
>> Basically, anything that was using hmm_mirror correctly transisions
>> over fairly trivially, just with the modification to store a sequence
>> number to close that race described in the hmm commit.
>>
>> For something like AMD gpu I expect it to transition to use dma_fence
>> from the notifier for coherency right before it unlocks driver->update.
>>
>> Jason
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix amdgpu trace event print string format error

2019-10-16 Thread Koenig, Christian
Hi Kevin,

do you have other better way to do it?
Not of hand, but maybe check the trace documentation if there is any better 
approach.

If you can't find anything the patch is Reviewed-by: Christian König 
<mailto:christian.koe...@amd.com>.

Regards,
Christian.

Am 16.10.19 um 15:30 schrieb Wang, Kevin(Yang):
Hi Chris,

You said that this kind of scene also existed in other source code, these has 
same method.
in amdgpu_trace.h file, this usage case is exits in amdgpu driver.
likes TRACE_EVENT(amdgpu_cs_ioctl) -> timeline :
TP_printk("sched_job=%llu, timeline=%s, context=%u, seqno=%u, 
ring_name=%s, num_ibs=%u",
  __entry->sched_job_id, __get_str(timeline), 
__entry->context,
  __entry->seqno, __get_str(ring), __entry->num_ibs)
and do you have other better way to do it?
thanks.

Best Regards,
Kevin

________
From: Koenig, Christian 
<mailto:christian.koe...@amd.com>
Sent: Wednesday, October 16, 2019 8:15 PM
To: Wang, Kevin(Yang) <mailto:kevin1.w...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: fix amdgpu trace event print string format 
error

Hi Kevin,

well that copies the string into the ring buffer every time the trace event is 
called which is not necessary a good idea for a constant string.

Can't we avoid that somehow?

Thanks,
Christian.

Am 16.10.19 um 14:01 schrieb Wang, Kevin(Yang):
add @Koenig, Christian<mailto:christian.koe...@amd.com>,
could you help me review it?

Best Regards,
Kevin


From: Wang, Kevin(Yang) <mailto:kevin1.w...@amd.com>
Sent: Wednesday, October 16, 2019 11:06 AM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Cc: Wang, Kevin(Yang) <mailto:kevin1.w...@amd.com>
Subject: [PATCH] drm/amdgpu: fix amdgpu trace event print string format error

the trace event print string format error.
(use integer type to handle string)

before:
amdgpu_test_kev-1556  [002]   138.508781: amdgpu_cs_ioctl:
sched_job=8, timeline=gfx_0.0.0, context=177, seqno=1,
ring_name=94d01c207bf0, num_ibs=2

after:
amdgpu_test_kev-1506  [004]   370.703783: amdgpu_cs_ioctl:
sched_job=12, timeline=gfx_0.0.0, context=234, seqno=2,
ring_name=gfx_0.0.0, num_ibs=1

change trace event list:
1.amdgpu_cs_ioctl
2.amdgpu_sched_run_job
3.amdgpu_ib_pipe_sync

Signed-off-by: Kevin Wang <mailto:kevin1.w...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 8227ebd0f511..f940526c5889 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -170,7 +170,7 @@ TRACE_EVENT(amdgpu_cs_ioctl,
  __field(unsigned int, context)
  __field(unsigned int, seqno)
  __field(struct dma_fence *, fence)
-__field(char *, ring_name)
+__string(ring, 
to_amdgpu_ring(job->base.sched)->name)
  __field(u32, num_ibs)
  ),

@@ -179,12 +179,12 @@ TRACE_EVENT(amdgpu_cs_ioctl,
__assign_str(timeline, 
AMDGPU_JOB_GET_TIMELINE_NAME(job))
__entry->context = 
job->base.s_fence->finished.context;
__entry->seqno = job->base.s_fence->finished.seqno;
-  __entry->ring_name = 
to_amdgpu_ring(job->base.sched)->name;
+  __assign_str(ring, 
to_amdgpu_ring(job->base.sched)->name)
__entry->num_ibs = job->num_ibs;
),
 TP_printk("sched_job=%llu, timeline=%s, context=%u, seqno=%u, 
ring_name=%s, num_ibs=%u",
   __entry->sched_job_id, __get_str(timeline), 
__entry->context,
- __entry->seqno, __entry->ring_name, __entry->num_ibs)
+ __entry->seqno, __get_str(ring), __entry->num_ibs)
 );

 TRACE_EVENT(amdgpu_sched_run_job,
@@ -195,7 +195,7 @@ TRACE_EVENT(amdgpu_sched_run_job,
  __string(timeline, 
AMDGPU_JOB_GET_TIMELINE_NAME(job))
  __field(unsigned int, context)
  __field(unsigned int, seqno)
-__field(char *, ring_name)
+__string(ring, 
to_amdgpu_ring(job->base.sched)->name)
  __field(u32, num_ibs)
  ),

@@ -204,12 +204,12 @@ TRACE_EVENT(amdgpu_sched_run_jo

Re: [PATCH] drm/amdgpu: fix amdgpu trace event print string format error

2019-10-16 Thread Koenig, Christian
Hi Kevin,

well that copies the string into the ring buffer every time the trace event is 
called which is not necessary a good idea for a constant string.

Can't we avoid that somehow?

Thanks,
Christian.

Am 16.10.19 um 14:01 schrieb Wang, Kevin(Yang):
add @Koenig, Christian<mailto:christian.koe...@amd.com>,
could you help me review it?

Best Regards,
Kevin


From: Wang, Kevin(Yang) <mailto:kevin1.w...@amd.com>
Sent: Wednesday, October 16, 2019 11:06 AM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>
Cc: Wang, Kevin(Yang) <mailto:kevin1.w...@amd.com>
Subject: [PATCH] drm/amdgpu: fix amdgpu trace event print string format error

the trace event print string format error.
(use integer type to handle string)

before:
amdgpu_test_kev-1556  [002]   138.508781: amdgpu_cs_ioctl:
sched_job=8, timeline=gfx_0.0.0, context=177, seqno=1,
ring_name=94d01c207bf0, num_ibs=2

after:
amdgpu_test_kev-1506  [004]   370.703783: amdgpu_cs_ioctl:
sched_job=12, timeline=gfx_0.0.0, context=234, seqno=2,
ring_name=gfx_0.0.0, num_ibs=1

change trace event list:
1.amdgpu_cs_ioctl
2.amdgpu_sched_run_job
3.amdgpu_ib_pipe_sync

Signed-off-by: Kevin Wang <mailto:kevin1.w...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 8227ebd0f511..f940526c5889 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -170,7 +170,7 @@ TRACE_EVENT(amdgpu_cs_ioctl,
  __field(unsigned int, context)
  __field(unsigned int, seqno)
  __field(struct dma_fence *, fence)
-__field(char *, ring_name)
+__string(ring, 
to_amdgpu_ring(job->base.sched)->name)
  __field(u32, num_ibs)
  ),

@@ -179,12 +179,12 @@ TRACE_EVENT(amdgpu_cs_ioctl,
__assign_str(timeline, 
AMDGPU_JOB_GET_TIMELINE_NAME(job))
__entry->context = 
job->base.s_fence->finished.context;
__entry->seqno = job->base.s_fence->finished.seqno;
-  __entry->ring_name = 
to_amdgpu_ring(job->base.sched)->name;
+  __assign_str(ring, 
to_amdgpu_ring(job->base.sched)->name)
__entry->num_ibs = job->num_ibs;
),
 TP_printk("sched_job=%llu, timeline=%s, context=%u, seqno=%u, 
ring_name=%s, num_ibs=%u",
   __entry->sched_job_id, __get_str(timeline), 
__entry->context,
- __entry->seqno, __entry->ring_name, __entry->num_ibs)
+ __entry->seqno, __get_str(ring), __entry->num_ibs)
 );

 TRACE_EVENT(amdgpu_sched_run_job,
@@ -195,7 +195,7 @@ TRACE_EVENT(amdgpu_sched_run_job,
  __string(timeline, 
AMDGPU_JOB_GET_TIMELINE_NAME(job))
  __field(unsigned int, context)
  __field(unsigned int, seqno)
-__field(char *, ring_name)
+__string(ring, 
to_amdgpu_ring(job->base.sched)->name)
  __field(u32, num_ibs)
  ),

@@ -204,12 +204,12 @@ TRACE_EVENT(amdgpu_sched_run_job,
__assign_str(timeline, 
AMDGPU_JOB_GET_TIMELINE_NAME(job))
__entry->context = 
job->base.s_fence->finished.context;
__entry->seqno = job->base.s_fence->finished.seqno;
-  __entry->ring_name = 
to_amdgpu_ring(job->base.sched)->name;
+  __assign_str(ring, 
to_amdgpu_ring(job->base.sched)->name)
__entry->num_ibs = job->num_ibs;
),
 TP_printk("sched_job=%llu, timeline=%s, context=%u, seqno=%u, 
ring_name=%s, num_ibs=%u",
   __entry->sched_job_id, __get_str(timeline), 
__entry->context,
- __entry->seqno, __entry->ring_name, __entry->num_ibs)
+ __entry->seqno, __get_str(ring), __entry->num_ibs)
 );


@@ -473,7 +473,7 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
 TP_PROTO(struct amdgpu_job *sched_job, struct dma_fence *fence),
 TP_ARGS(sched_job, fence),
 TP_STRUCT__entry(
-__field(const char *,name)
+__string(ring, sched_job->base.sched->name);
 

Re: [PATCH 1/2] drm/amdgpu/uvd6: fix allocation size in enc ring test

2019-10-14 Thread Koenig, Christian
Am 14.10.19 um 15:01 schrieb Alex Deucher:
> On Mon, Oct 14, 2019 at 5:06 AM Christian König
>  wrote:
>> Am 11.10.19 um 22:50 schrieb Alex Deucher:
>>> We need to allocate a large enough buffer for the
>>> session info, otherwise the IB test can overwrite
>>> other memory.
>>>
>>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204241
>>> Signed-off-by: Alex Deucher 
>> Acked-by: Christian König  for the series.
> + Leo, James
>
> Seems like we still overwrite the buffer.  Do you know how big the
> session buffer needs to be?  Is it different for UVD and VCN?

At least originally we allocated a separate 4KB BO in VRAM for this. The 
message was quite large IIRC.

Christian.

>
> Alex
>
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 8 
>>>1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
>>> index 670784a78512..909bc2ce791f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
>>> @@ -215,12 +215,12 @@ static int uvd_v6_0_enc_get_create_msg(struct 
>>> amdgpu_ring *ring, uint32_t handle
>>>uint64_t dummy;
>>>int i, r;
>>>
>>> - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, );
>>> + r = amdgpu_job_alloc_with_ib(ring->adev, (ib_size_dw * 4) + 1024, 
>>> );
>>>if (r)
>>>return r;
>>>
>>>ib = >ibs[0];
>>> - dummy = ib->gpu_addr + 1024;
>>> + dummy = ib->gpu_addr + (ib_size_dw * 4);
>>>
>>>ib->length_dw = 0;
>>>ib->ptr[ib->length_dw++] = 0x0018;
>>> @@ -277,12 +277,12 @@ static int uvd_v6_0_enc_get_destroy_msg(struct 
>>> amdgpu_ring *ring,
>>>uint64_t dummy;
>>>int i, r;
>>>
>>> - r = amdgpu_job_alloc_with_ib(ring->adev, ib_size_dw * 4, );
>>> + r = amdgpu_job_alloc_with_ib(ring->adev, (ib_size_dw * 4) + 1024, 
>>> );
>>>if (r)
>>>return r;
>>>
>>>ib = >ibs[0];
>>> - dummy = ib->gpu_addr + 1024;
>>> + dummy = ib->gpu_addr + (ib_size_dw * 4);
>>>
>>>ib->length_dw = 0;
>>>ib->ptr[ib->length_dw++] = 0x0018;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 7/8] drm/amdgpu: reserve vram for memory training

2019-10-14 Thread Koenig, Christian
Am 12.10.19 um 01:23 schrieb Tuikov, Luben:
> On 2019-10-10 11:50 p.m., Tianci Yin wrote:
>> From: "Tianci.Yin" 
>>
>> memory training using specific fixed vram segment, reserve these
>> segments before anyone may allocate it.
>>
>> Change-Id: I1436755813a565608a2857a683f535377620a637
>> Reviewed-by: Alex Deucher 
>> Signed-off-by: Tianci.Yin 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 96 +
>>   1 file changed, 96 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 9da6350a4ba2..42d0fcb98382 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -1667,6 +1667,93 @@ static int amdgpu_ttm_fw_reserve_vram_init(struct 
>> amdgpu_device *adev)
>>>fw_vram_usage.va);
>>   }
>>   
>> +/*
>> + * Memoy training reservation functions
>> + */
>> +
>> +/**
>> + * amdgpu_ttm_training_reserve_vram_fini - free memory training reserved 
>> vram
>> + *
>> + * @adev: amdgpu_device pointer
>> + *
>> + * free memory training reserved vram if it has been reserved.
>> + */
>> +static int amdgpu_ttm_training_reserve_vram_fini(struct amdgpu_device *adev)
>> +{
>> +struct psp_memory_training_context *ctx = >psp.mem_train_ctx;
>> +
>> +ctx->init = PSP_MEM_TRAIN_NOT_SUPPORT;
>> +if (ctx->c2p_bo) {
>> +amdgpu_bo_free_kernel(>c2p_bo, NULL, NULL);
>> +ctx->c2p_bo = NULL;
>> +}
> Generally it is a good idea to paragraph your code.
> So an empty line between if-statements is a good idea.
> However, there is no need in:
>
> ret = f(x);
> if (ret) {
>   
> }
>
> if (blah) {
>   
> }
>
> The above are two (2) well-formed paragraphs.

Additional to that amdgpu_bo_free_kernel() just like kfree() is NULL 
safe for the reason that you shouldn't need "if"s like that one.

E.g. just write:

amdgpu_bo_free_kernel(>c2p_bo...);

and you are done.

Regards,
Christian.

>
>> +if (ctx->p2c_bo) {
>> +amdgpu_bo_free_kernel(>p2c_bo, NULL, NULL);
>> +ctx->p2c_bo = NULL;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +/**
>> + * amdgpu_ttm_training_reserve_vram_init - create bo vram reservation from 
>> memory training
>> + *
>> + * @adev: amdgpu_device pointer
>> + *
>> + * create bo vram reservation from memory training.
>> + */
>> +static int amdgpu_ttm_training_reserve_vram_init(struct amdgpu_device *adev)
>> +{
>> +int ret;
>> +struct psp_memory_training_context *ctx = >psp.mem_train_ctx;
>> +
>> +memset(ctx, 0, sizeof(*ctx));
>> +if (!adev->fw_vram_usage.mem_train_support) {
>> +DRM_DEBUG("memory training does not support!\n");
>> +return 0;
>> +}
>> +
>> +ctx->c2p_train_data_offset = adev->fw_vram_usage.mem_train_fb_loc;
>> +ctx->p2c_train_data_offset = (adev->gmc.mc_vram_size - 
>> GDDR6_MEM_TRAINING_OFFSET);
>> +ctx->train_data_size = GDDR6_MEM_TRAINING_DATA_SIZE_IN_BYTES;
>> +
>> +
>> DRM_DEBUG("train_data_size:%llx,p2c_train_data_offset:%llx,c2p_train_data_offset:%llx.\n",
>> +  ctx->train_data_size,
>> +  ctx->p2c_train_data_offset,
>> +  ctx->c2p_train_data_offset);
>> +
>> +ret = amdgpu_bo_create_kernel_at(adev,
>> + ctx->p2c_train_data_offset,
>> + ctx->train_data_size,
>> + AMDGPU_GEM_DOMAIN_VRAM,
>> + >p2c_bo,
>> + NULL);
>> +if (ret) {
>> +DRM_ERROR("alloc p2c_bo failed(%d)!\n", ret);
>> +ret = -ENOMEM;
>> +goto err_out;
>> +}
> NAK!
> Why are you re-writing the error code from "amdgpu_bo_create_kenrel_at()"?
> Pass the error as is.
>
>> +
>> +ret = amdgpu_bo_create_kernel_at(adev,
>> + ctx->c2p_train_data_offset,
>> + ctx->train_data_size,
>> + AMDGPU_GEM_DOMAIN_VRAM,
>> + >c2p_bo,
>> + NULL);
>> +if (ret) {
>> +DRM_ERROR("alloc c2p_bo failed(%d)!\n", ret);
>> +ret = -ENOMEM;
>> +goto err_out;
>> +}
> NAK!
> Why are you re-writing the error code from "amdgpu_bo_create_kenrel_at()"?
> Pass the error as is.
>
>> +
>> +ctx->init = PSP_MEM_TRAIN_RESERVE_SUCCESS;
>> +return 0;
>> +
>> +err_out:
> Yes... well "err_out" could be any identifier, including
> a variable, as our variables follow snake-notation, all lowercase.
>
> Back at the turn of this century, Linux followed capitalized
> goto labels to distinguish them from anything else around
> in the kernel code:
>
>   goto Bad_err;
>   ...
>
>   return 0;
> Bad_err:
>   return bad_gift;
> }
>
> To distinguish that a capitalized identifier is a goto 

Re: [amdgpu] ASSERT()'s in write_i2c*retimer_setting() functions

2019-10-11 Thread Koenig, Christian
Forwarding to the appropriate display folks.

Can you guys take a look?

Christian.

Am 11.10.19 um 01:34 schrieb Gabriel C:
> Hello,
>
> I've built recently a new box with a Ryzen3 2200G APU.
>
> Each time I plug in an HDMI cable ( to a TV or Monitor ),
> or boot with HDMI connected a lot ASSERT()'s from
> write_i2c*retimer_setting() functions are triggered.
>
> I see the same on a Laptop with a Ryzen7 3750H with
> hybrid GPU configuration.
>
> Besides the noise in dmesg and the delay on boot,
> everything is working fine. I cannot find anything wrong
> or broken.
>
> Since the write errors seem to not be fatal, I think a friendly message
> would help more instead of flooding the dmesg with dumps while
> everything is working properly.
>
> Why is ASSERT() used there?
>
> I have a dmesg from the Ryzen3 box with drm.debug and a snipped
> from the Laptop ( not near me right now ) uploaded there:
>
> https://crazy.dev.frugalware.org/amdgpu/
>
> Please let me know if you need more information,
> If needed I can upload a dmesg from the Laptop with drm.debug too.
>
>
> Best Regards,
>
> Gabriel C

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [pull] amdgpu/kfd, radeon, ttm drm-next-5.5

2019-10-10 Thread Koenig, Christian
Am 10.10.19 um 16:34 schrieb Alex Deucher:
> AOn Thu, Oct 10, 2019 at 5:54 AM Daniel Vetter  wrote:
>> On Thu, Oct 10, 2019 at 6:17 AM Alex Deucher  wrote:
>>> [SNIP]
>>> Christian König (22):
>>>drm/amdgpu: use moving fence instead of exclusive for VM updates
>>>drm/amdgpu: reserve at least 4MB of VRAM for page tables v2
>>>drm/amdgpu: remove amdgpu_cs_try_evict
>> Patch no handy for a direct reply, so asking here (but this is totally
>> unrelated to the pull):
>>
>> Do you have other stuff than scanout and pagetables that need to be in
>> vram? I was kinda assume this is needed for big vram-only objects to
>> fit, making space by throwing stuff out that could also be put into
>> system memory. But sounds like it was only for making pagetables fit.
> Yes, basically making page tables fit.  If you push a bunch of stuff
> to system ram, your page table requirements go up too.  See the
> discussion here:
> https://www.spinics.net/lists/amd-gfx/msg38640.html

Yeah, typical chicken and egg problem.

When you evict things to system memory because you don't have enough 
VRAM you need more VRAM for page tables so you need to evict even more 
things to system memory

Additional to that we have a few other cases where we really need VRAM 
for correct operation (firmware, old MM engines etc...), but nothing 
major like page tables.

Regards,
Christian.

>
> Alex
>
>> -Daniel
>>
>>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH] drm/amdgpu: fix amdgpu_vm_handle_fault return value

2019-10-10 Thread Koenig, Christian
Am 10.10.19 um 12:42 schrieb Nirmoy Das:
> amdgpu_vm_handle_fault should return true on success

NAK, that is intentional.

There is a follow up patch which didn't made it into our server branch 
which implements faults handling.

We could actually change the return value to void until that one lands,
Christian.

>
> Signed-off-by: Nirmoy Das 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index d9bece987e60..6f468c6ffef2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3215,5 +3215,5 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
> unsigned int pasid,
>   error_unref:
>   amdgpu_bo_unref();
>   
> - return false;
> + return (r == 0) ? true : false;
>   }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Potential NULL pointer deference in drm/amdgpu

2019-10-10 Thread Koenig, Christian
Hi Yizhuo,

Am 10.10.19 um 07:09 schrieb Yizhuo Zhai:
> Hi All:
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:
> The function to_amdgpu_fence() could return NULL, but callers
> in this file does not check the return value but directly dereference it,
> which seems potentially unsafe.
> Such callers include amdgpu_fence_get_timeline_name(),
> amdgpu_fence_enable_signaling() and amdgpu_fence_free().

That is expected behavior and no need to worry.

The functions in amdgpu_fence.c are the callbacks to implement 
amdgpu_fence_ops. The function to_amdgpu_fence() checks if the ops of 
the fence are amdgpu_fence_ops, so it is guaranteed that the functions 
are called with an amdgpu_fence structure.

Regards,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: user pages array memory leak fix

2019-10-04 Thread Koenig, Christian
Hi Philip,

Am 04.10.19 um 15:40 schrieb Yang, Philip:
> Thanks Joe for the test, I will add your Tested-by.
>
> Hi Christian,
>
> May you help review? The change removes the get user pages from
> gem_userptr_ioctl, this was done if flags AMDGPU_GEM_USERPTR_VALIDATE is
> set, and delay the get user pages to amdgpu_cs_parser_bos, and check if
> user pages are invalidated when amdgpu_cs_submit. I don't find issue for
> overnight test, but not sure if there is potential side effect.

Yeah, seen that.

The AMDGPU_GEM_USERPTR_VALIDATE was explicitly added to cause a 
validation during BO creation because of some very stupid applications.

I didn't wanted to reject that without offering an alternative, but we 
seriously can't do this or it would break Vulkan/OpenGL.

I need more time to take a closer look,
Christian.

>
> Thanks,
> Philip
>
> On 2019-10-03 3:44 p.m., Yang, Philip wrote:
>> user_pages array should always be freed after validation regardless if
>> user pages are changed after bo is created because with HMM change parse
>> bo always allocate user pages array to get user pages for userptr bo.
>>
>> Don't need to get user pages while creating uerptr bo because user pages
>> will only be used while validating after parsing userptr bo.
>>
>> Bugzilla: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1844962
>>
>> v2: remove unused local variable and amend commit
>>
>> Signed-off-by: Philip Yang 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  4 +---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 23 +--
>>2 files changed, 2 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 49b767b7238f..961186e7113e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -474,7 +474,6 @@ static int amdgpu_cs_list_validate(struct 
>> amdgpu_cs_parser *p,
>>
>>  list_for_each_entry(lobj, validated, tv.head) {
>>  struct amdgpu_bo *bo = ttm_to_amdgpu_bo(lobj->tv.bo);
>> -bool binding_userptr = false;
>>  struct mm_struct *usermm;
>>
>>  usermm = amdgpu_ttm_tt_get_usermm(bo->tbo.ttm);
>> @@ -491,14 +490,13 @@ static int amdgpu_cs_list_validate(struct 
>> amdgpu_cs_parser *p,
>>
>>  amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm,
>>   lobj->user_pages);
>> -binding_userptr = true;
>>  }
>>
>>  r = amdgpu_cs_validate(p, bo);
>>  if (r)
>>  return r;
>>
>> -if (binding_userptr) {
>> +if (lobj->user_pages) {
>>  kvfree(lobj->user_pages);
>>  lobj->user_pages = NULL;
>>  }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index a828e3d0bfbd..3ccd61d69964 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -283,7 +283,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
>> *data,
>>int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data,
>>   struct drm_file *filp)
>>{
>> -struct ttm_operation_ctx ctx = { true, false };
>>  struct amdgpu_device *adev = dev->dev_private;
>>  struct drm_amdgpu_gem_userptr *args = data;
>>  struct drm_gem_object *gobj;
>> @@ -326,32 +325,12 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, 
>> void *data,
>>  goto release_object;
>>  }
>>
>> -if (args->flags & AMDGPU_GEM_USERPTR_VALIDATE) {
>> -r = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
>> -if (r)
>> -goto release_object;
>> -
>> -r = amdgpu_bo_reserve(bo, true);
>> -if (r)
>> -goto user_pages_done;
>> -
>> -amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
>> -r = ttm_bo_validate(>tbo, >placement, );
>> -amdgpu_bo_unreserve(bo);
>> -if (r)
>> -goto user_pages_done;
>> -}
>> -
>>  r = drm_gem_handle_create(filp, gobj, );
>>  if (r)
>> -goto user_pages_done;
>> +goto release_object;
>>
>>  args->handle = handle;
>>
>> -user_pages_done:
>> -if (args->flags & AMDGPU_GEM_USERPTR_VALIDATE)
>> -amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
>> -
>>release_object:
>>  drm_gem_object_put_unlocked(gobj);
>>
>>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix memory leak

2019-10-04 Thread Koenig, Christian
Am 04.10.19 um 15:51 schrieb Nirmoy Das:
> cleanup error handling code and make sure temporary info array
> with the handles are freed by amdgpu_bo_list_put() on
> idr_replace()'s failure.
>
> Signed-off-by: Nirmoy Das 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 14 +++---
>   1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> index 7bcf86c61999..61e38e43ad1d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> @@ -270,7 +270,7 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void 
> *data,
>   
>   r = amdgpu_bo_create_list_entry_array(>in, );
>   if (r)
> - goto error_free;
> + return r;
>   
>   switch (args->in.operation) {
>   case AMDGPU_BO_LIST_OP_CREATE:
> @@ -283,8 +283,7 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void 
> *data,
>   r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
>   mutex_unlock(>bo_list_lock);
>   if (r < 0) {
> - amdgpu_bo_list_put(list);
> - return r;
> + goto error_put_list;
>   }
>   
>   handle = r;
> @@ -306,9 +305,8 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void 
> *data,
>   mutex_unlock(>bo_list_lock);
>   
>   if (IS_ERR(old)) {
> - amdgpu_bo_list_put(list);
>   r = PTR_ERR(old);
> - goto error_free;
> + goto error_put_list;
>   }
>   
>   amdgpu_bo_list_put(old);
> @@ -325,8 +323,10 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void 
> *data,
>   
>   return 0;
>   
> +error_put_list:
> + amdgpu_bo_list_put(list);
> +
>   error_free:
> - if (info)
> - kvfree(info);
> + kvfree(info);
>   return r;
>   }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH][next] drm/amdgpu: remove redundant variable r and redundant return statement

2019-10-04 Thread Koenig, Christian
Am 03.10.19 um 23:40 schrieb Colin King:
> From: Colin Ian King 
>
> There is a return statement that is not reachable and a variable that
> is not used.  Remove them.
>
> Addresses-Coverity: ("Structurally dead code")
> Fixes: de7b45babd9b ("drm/amdgpu: cleanup creating BOs at fixed location 
> (v2)")
> Signed-off-by: Colin Ian King 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 --
>   1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 481e4c381083..814159f15633 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1636,7 +1636,6 @@ static void amdgpu_ttm_fw_reserve_vram_fini(struct 
> amdgpu_device *adev)
>   static int amdgpu_ttm_fw_reserve_vram_init(struct amdgpu_device *adev)
>   {
>   uint64_t vram_size = adev->gmc.visible_vram_size;
> - int r;
>   
>   adev->fw_vram_usage.va = NULL;
>   adev->fw_vram_usage.reserved_bo = NULL;
> @@ -1651,7 +1650,6 @@ static int amdgpu_ttm_fw_reserve_vram_init(struct 
> amdgpu_device *adev)
> AMDGPU_GEM_DOMAIN_VRAM,
> >fw_vram_usage.reserved_bo,
> >fw_vram_usage.va);
> - return r;
>   }
>   
>   /**



Re: [PATCH][next] drm/amdgpu: fix uninitialized variable pasid_mapping_needed

2019-10-04 Thread Koenig, Christian
Am 03.10.19 um 23:52 schrieb Colin King:
> From: Colin Ian King 
>
> The boolean variable pasid_mapping_needed is not initialized and
> there are code paths that do not assign it any value before it is
> is read later.  Fix this by initializing pasid_mapping_needed to
> false.
>
> Addresses-Coverity: ("Uninitialized scalar variable")
> Fixes: 6817bf283b2b ("drm/amdgpu: grab the id mgr lock while accessing 
> passid_mapping")
> Signed-off-by: Colin Ian King 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index a2c797e34a29..be10e4b9a94d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1055,7 +1055,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
> amdgpu_job *job,
>   id->oa_size != job->oa_size);
>   bool vm_flush_needed = job->vm_needs_flush;
>   struct dma_fence *fence = NULL;
> - bool pasid_mapping_needed;
> + bool pasid_mapping_needed = false;
>   unsigned patch_offset = 0;
>   int r;
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 03/11] drm/amdgpu: convert amdgpu_vm_it to half closed intervals

2019-10-04 Thread Koenig, Christian
Am 03.10.19 um 22:18 schrieb Davidlohr Bueso:
> The amdgpu_vm interval tree really wants [a, b) intervals,

NAK, we explicitly do need an [a, b[ interval here.

Regards,
Christian.

> not fully closed ones. As such convert it to use the new
> interval_tree_gen.h, and also rename the 'last' endpoint
> in the node to 'end', which is both a more suitable name
> for the half closed interval and also reduces the chances
> of missing a conversion when doing insertion or lookup.
>
> Cc: Jerome Glisse 
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: Daniel Vetter 
> Cc: amd-gfx@lists.freedesktop.org
> Signed-off-by: Davidlohr Bueso 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 18 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c|  3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 46 
> +++---
>   6 files changed, 36 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 49b767b7238f..290bfe820890 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -756,7 +756,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
> *p)
>   }
>   
>   if ((va_start + chunk_ib->ib_bytes) >
> - (m->last + 1) * AMDGPU_GPU_PAGE_SIZE) {
> + m->end * AMDGPU_GPU_PAGE_SIZE) {
>   DRM_ERROR("IB va_start+ib_bytes is invalid\n");
>   return -EINVAL;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 7e99f6c58c48..60b73bc4d11a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -51,7 +51,7 @@ struct amdgpu_bo_va_mapping {
>   struct list_headlist;
>   struct rb_node  rb;
>   uint64_tstart;
> - uint64_tlast;
> + uint64_tend;
>   uint64_t__subtree_last;
>   uint64_toffset;
>   uint64_tflags;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index 8227ebd0f511..c5b0e88d019c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -247,7 +247,7 @@ TRACE_EVENT(amdgpu_vm_bo_map,
>   TP_STRUCT__entry(
>__field(struct amdgpu_bo *, bo)
>__field(long, start)
> -  __field(long, last)
> +  __field(long, end)
>__field(u64, offset)
>__field(u64, flags)
>),
> @@ -255,12 +255,12 @@ TRACE_EVENT(amdgpu_vm_bo_map,
>   TP_fast_assign(
>  __entry->bo = bo_va ? bo_va->base.bo : NULL;
>  __entry->start = mapping->start;
> -__entry->last = mapping->last;
> +__entry->end = mapping->end;
>  __entry->offset = mapping->offset;
>  __entry->flags = mapping->flags;
>  ),
> - TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx",
> -   __entry->bo, __entry->start, __entry->last,
> + TP_printk("bo=%p, start=%lx, end=%lx, offset=%010llx, flags=%llx",
> +   __entry->bo, __entry->start, __entry->end,
> __entry->offset, __entry->flags)
>   );
>   
> @@ -271,7 +271,7 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>   TP_STRUCT__entry(
>__field(struct amdgpu_bo *, bo)
>__field(long, start)
> -  __field(long, last)
> +  __field(long, end)
>__field(u64, offset)
>__field(u64, flags)
>),
> @@ -279,12 +279,12 @@ TRACE_EVENT(amdgpu_vm_bo_unmap,
>   TP_fast_assign(
>  __entry->bo = bo_va ? bo_va->base.bo : NULL;
>  __entry->start = mapping->start;
> -__entry->last = mapping->last;
> +__entry->end = mapping->end;
>  __entry->offset = mapping->offset;
>  __entry->flags = mapping->flags;
>  ),
> - TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx",
> -   __entry->bo, __entry->start, 

Re: [PATCH] drm/amdgpu: do not execute 0-sized IBs

2019-10-03 Thread Koenig, Christian


Am 03.10.2019 10:25 schrieb "Pelloux-prayer, Pierre-eric" 
:

On 03/10/2019 10:09, Christian König wrote:
> Am 03.10.19 um 10:03 schrieb Pelloux-prayer, Pierre-eric:
>> This can be safely skipped entirely.
>> This seems to help with https://bugs.freedesktop.org/show_bug.cgi?id=111481.
>
> NAK, please instead fix gmc_v10_0_flush_gpu_tlb to include at least some NOP 
> in the submitted IBs.

Is there any interest in executing an empty (or only filled with NOPs) IB?

Yeah, we used to have some dummy zero sized IBs for the MM engines which 
otherwise couldn't execute a fence command.

It shouldn't matter for modern firmware/hardware, but you could actually 
silently break somewhere else with this, so better not do this.

Sorry should have mentioned that directly,
Christian.


Anyway I can modify the patch to do this.

Thanks,
Pierre-Eric

>
> Christian.
>
>>
>> Signed-off-by: Pierre-Eric Pelloux-Prayer 
>> 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 5 +
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> index 60655834d649..aa163e679f1f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> @@ -227,6 +227,11 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, 
>> unsigned num_ibs,
>>   !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE 
>> ib must be inserted anyway */
>>   continue;
>>   +if (ib->length_dw == 0) {
>> +/* On Navi gmc_v10_0_flush_gpu_tlb emits 0 sized IB */
>> +continue;
>> +}
>> +
>>   amdgpu_ring_emit_ib(ring, job, ib, status);
>>   status &= ~AMDGPU_HAVE_CTX_SWITCH;
>>   }
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v4] drm/amdgpu: fix multiple memory leaks in acp_hw_init

2019-10-02 Thread Koenig, Christian
Am 02.10.19 um 05:46 schrieb Navid Emamdoost:
> In acp_hw_init there are some allocations that needs to be released in
> case of failure:
>
> 1- adev->acp.acp_genpd should be released if any allocation attemp for
> adev->acp.acp_cell, adev->acp.acp_res or i2s_pdata fails.
> 2- all of those allocations should be released if
> mfd_add_hotplug_devices or pm_genpd_add_device fail.
> 3- Release is needed in case of time out values expire.
>
> Signed-off-by: Navid Emamdoost 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 34 -
>   1 file changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> index eba42c752bca..82155ac3288a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> @@ -189,7 +189,7 @@ static int acp_hw_init(void *handle)
>   u32 val = 0;
>   u32 count = 0;
>   struct device *dev;
> - struct i2s_platform_data *i2s_pdata;
> + struct i2s_platform_data *i2s_pdata = NULL;
>   
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   
> @@ -231,20 +231,21 @@ static int acp_hw_init(void *handle)
>   adev->acp.acp_cell = kcalloc(ACP_DEVS, sizeof(struct mfd_cell),
>   GFP_KERNEL);
>   
> - if (adev->acp.acp_cell == NULL)
> - return -ENOMEM;
> + if (adev->acp.acp_cell == NULL) {
> + r = -ENOMEM;
> + goto failure;
> + }
>   
>   adev->acp.acp_res = kcalloc(5, sizeof(struct resource), GFP_KERNEL);
>   if (adev->acp.acp_res == NULL) {
> - kfree(adev->acp.acp_cell);
> - return -ENOMEM;
> + r = -ENOMEM;
> + goto failure;
>   }
>   
>   i2s_pdata = kcalloc(3, sizeof(struct i2s_platform_data), GFP_KERNEL);
>   if (i2s_pdata == NULL) {
> - kfree(adev->acp.acp_res);
> - kfree(adev->acp.acp_cell);
> - return -ENOMEM;
> + r = -ENOMEM;
> + goto failure;
>   }
>   
>   switch (adev->asic_type) {
> @@ -341,14 +342,14 @@ static int acp_hw_init(void *handle)
>   r = mfd_add_hotplug_devices(adev->acp.parent, adev->acp.acp_cell,
>   ACP_DEVS);
>   if (r)
> - return r;
> + goto failure;
>   
>   for (i = 0; i < ACP_DEVS ; i++) {
>   dev = get_mfd_cell_dev(adev->acp.acp_cell[i].name, i);
>   r = pm_genpd_add_device(>acp.acp_genpd->gpd, dev);
>   if (r) {
>   dev_err(dev, "Failed to add dev to genpd\n");
> - return r;
> + goto failure;
>   }
>   }
>   
> @@ -367,7 +368,8 @@ static int acp_hw_init(void *handle)
>   break;
>   if (--count == 0) {
>   dev_err(>pdev->dev, "Failed to reset ACP\n");
> - return -ETIMEDOUT;
> + r = -ETIMEDOUT;
> + goto failure;
>   }
>   udelay(100);
>   }
> @@ -384,7 +386,8 @@ static int acp_hw_init(void *handle)
>   break;
>   if (--count == 0) {
>   dev_err(>pdev->dev, "Failed to reset ACP\n");
> - return -ETIMEDOUT;
> + r = -ETIMEDOUT;
> + goto failure;
>   }
>   udelay(100);
>   }
> @@ -393,6 +396,13 @@ static int acp_hw_init(void *handle)
>   val &= ~ACP_SOFT_RESET__SoftResetAud_MASK;
>   cgs_write_register(adev->acp.cgs_device, mmACP_SOFT_RESET, val);
>   return 0;
> +
> +failure:
> + kfree(i2s_pdata);
> + kfree(adev->acp.acp_res);
> + kfree(adev->acp.acp_cell);
> + kfree(adev->acp.acp_genpd);
> + return r;
>   }
>   
>   /**



Re: [PATCH v3] drm/amdgpu: fix multiple memory leaks in acp_hw_init

2019-10-01 Thread Koenig, Christian
Am 30.09.19 um 23:26 schrieb Navid Emamdoost:
> In acp_hw_init there are some allocations that needs to be released in
> case of failure:
>
> 1- adev->acp.acp_genpd should be released if any allocation attemp for
> adev->acp.acp_cell, adev->acp.acp_res or i2s_pdata fails.
> 2- all of those allocations should be released if
> mfd_add_hotplug_devices or pm_genpd_add_device fail.
> 3- Release is needed in case of time out values expire.
>
> Signed-off-by: Navid Emamdoost 
> ---
> Changes in v2:
>   -- moved the releases under goto
>
> Changes in v3:
>   -- fixed multiple goto issue
>   -- added goto for 3 other failure cases: one when
> mfd_add_hotplug_devices fails, and two when time out values expires.
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 41 -
>   1 file changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> index eba42c752bca..7809745ec0f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> @@ -184,12 +184,12 @@ static struct device *get_mfd_cell_dev(const char 
> *device_name, int r)
>*/
>   static int acp_hw_init(void *handle)
>   {
> - int r, i;
> + int r, i, ret;

Please don't add another "ret" variable, instead always use "r" here.

Apart from that looks good to me,
Christian.

>   uint64_t acp_base;
>   u32 val = 0;
>   u32 count = 0;
>   struct device *dev;
> - struct i2s_platform_data *i2s_pdata;
> + struct i2s_platform_data *i2s_pdata = NULL;
>   
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   
> @@ -231,20 +231,21 @@ static int acp_hw_init(void *handle)
>   adev->acp.acp_cell = kcalloc(ACP_DEVS, sizeof(struct mfd_cell),
>   GFP_KERNEL);
>   
> - if (adev->acp.acp_cell == NULL)
> - return -ENOMEM;
> + if (adev->acp.acp_cell == NULL) {
> + ret = -ENOMEM;
> + goto failure;
> + }
>   
>   adev->acp.acp_res = kcalloc(5, sizeof(struct resource), GFP_KERNEL);
>   if (adev->acp.acp_res == NULL) {
> - kfree(adev->acp.acp_cell);
> - return -ENOMEM;
> + ret = -ENOMEM;
> + goto failure;
>   }
>   
>   i2s_pdata = kcalloc(3, sizeof(struct i2s_platform_data), GFP_KERNEL);
>   if (i2s_pdata == NULL) {
> - kfree(adev->acp.acp_res);
> - kfree(adev->acp.acp_cell);
> - return -ENOMEM;
> + ret = -ENOMEM;
> + goto failure;
>   }
>   
>   switch (adev->asic_type) {
> @@ -340,15 +341,18 @@ static int acp_hw_init(void *handle)
>   
>   r = mfd_add_hotplug_devices(adev->acp.parent, adev->acp.acp_cell,
>   ACP_DEVS);
> - if (r)
> - return r;
> + if (r) {
> + ret = r;
> + goto failure;
> + }
>   
>   for (i = 0; i < ACP_DEVS ; i++) {
>   dev = get_mfd_cell_dev(adev->acp.acp_cell[i].name, i);
>   r = pm_genpd_add_device(>acp.acp_genpd->gpd, dev);
>   if (r) {
>   dev_err(dev, "Failed to add dev to genpd\n");
> - return r;
> + ret = r;
> + goto failure;
>   }
>   }
>   
> @@ -367,7 +371,8 @@ static int acp_hw_init(void *handle)
>   break;
>   if (--count == 0) {
>   dev_err(>pdev->dev, "Failed to reset ACP\n");
> - return -ETIMEDOUT;
> + ret = -ETIMEDOUT;
> + goto failure;
>   }
>   udelay(100);
>   }
> @@ -384,7 +389,8 @@ static int acp_hw_init(void *handle)
>   break;
>   if (--count == 0) {
>   dev_err(>pdev->dev, "Failed to reset ACP\n");
> - return -ETIMEDOUT;
> + ret = -ETIMEDOUT;
> + goto failure;
>   }
>   udelay(100);
>   }
> @@ -393,6 +399,13 @@ static int acp_hw_init(void *handle)
>   val &= ~ACP_SOFT_RESET__SoftResetAud_MASK;
>   cgs_write_register(adev->acp.cgs_device, mmACP_SOFT_RESET, val);
>   return 0;
> +
> +failure:
> + kfree(i2s_pdata);
> + kfree(adev->acp.acp_res);
> + kfree(adev->acp.acp_cell);
> + kfree(adev->acp.acp_genpd);
> + return ret;
>   }
>   
>   /**

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi

2019-09-26 Thread Koenig, Christian
That was the same bug. What we need to do is to prevent VMID invalidation while 
the SDMA is using it.

The first part was to disallow concurrent VMID flushes, the second part was 
doing VMID0 flushes through the SDMA block itself.

Both workarounds where added to avoid corruption, that GFXOFF is hanging 
without this is certainly a completely different issue.

I suspect that we have similar issue as with Vega/Raven where we need to grab a 
semaphore to avoid the block from being gated while an invalidation is in 
progress.

Regards,
Christian.


Am 26.09.2019 20:07 schrieb "Deucher, Alexander" :

Maybe I'm mixing up issues.  The navi10/14 issue that was fixed on navi12 was 
fixed in amdgpu_ids.c in

commit a2bd77bbde791202267c25478bbcbe71bb4ecdd5
Author: Christian König 
Date:   Thu Feb 7 12:10:29 2019 +0100

drm/amdgpu: disable concurrent flushes for Navi10 v2

Navi10 have a bug in the SDMA which can theoretically cause memory
corruption with concurrent VMID flushes

v2: explicitely check Navi10

Signed-off-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 

This is a different issue and may apply to all navi parts.  so maybe the patch 
is fine as is.

Alex

____
From: Koenig, Christian 
Sent: Thursday, September 26, 2019 2:02 PM
To: Yuan, Xiaojie 
Cc: Alex Deucher ; Deucher, Alexander 
; amd-gfx@lists.freedesktop.org 

Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' 
workaround for navi

Well then you didn't figured out the root cause correctly.

This is to avoid data corruption with the SDMA on Navi 10/14 and should 
definitely not applied to Navi 12.

The hardware team went through quite some work to avoid this.

Christian.

Am 26.09.2019 19:38 schrieb "Yuan, Xiaojie" :
 Hi Alex / Christian,

When gfxoff is enabled for Navi12, I observed sdma0 hang while launching 
desktop. When this workaround is applied, the issue fades away.
That's why I included this workaround for Navi12 as well.

BR,
Xiaojie
________
From: Koenig, Christian 
Sent: Thursday, September 26, 2019 10:20 PM
To: Alex Deucher 
Cc: Deucher, Alexander ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org 

Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' 
workaround for navi



Am 26.09.2019 15:51 schrieb Alex Deucher :
On Thu, Sep 26, 2019 at 9:47 AM Koenig, Christian
 wrote:
>
> Am 26.09.19 um 15:40 schrieb Alex Deucher:
> > On Thu, Sep 26, 2019 at 8:29 AM Christian König
> >  wrote:
> >> Stop, wait a second guys!
> >>
> >> This will disable the workaround for Navi10 and that is certainly not 
> >> correct:
> >>
> >> !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= CHIP_NAVI12)
> >>
> > Actually, I think it's correct. When I merged the baco patch, I
> > accidentally dropped the navi checks.  E.g.,
> > @@ -245,8 +245,9 @@ static void gmc_v10_0_flush_gpu_tlb(struct
> > amdgpu_device *adev,
> >  mutex_lock(>mman.gtt_window_lock);
> >
> >  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_MMHUB, 0);
> > -   if (!adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready ||
> > -   adev->asic_type != CHIP_NAVI10) {
> > +   if (!adev->mman.buffer_funcs_enabled ||
> > +   !adev->ib_pool_ready ||
> > +   adev->in_gpu_reset) {
> >  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB, 0);
> >  mutex_unlock(>mman.gtt_window_lock);
> >  return;
> > I think it should have been
> > adev->asic_type != CHIP_NAVI10 && adev->asic_type != CHIP_NAVI14 &&
> > adev->asic_type != CHIP_NAVI12
>
> My last status is that Navi12 is not supposed to need that workaround,
> that's why we checked Navi10 and later Navi14 separately.
>
> It's possible that I miss-read the !(adev->asic_type >= CHIP_NAVI10 &&
> adev->asic_type <= CHIP_NAVI12) check here, but either way that looks to
> complicated to me.
>
> We should rather mention every affected asic type separately here.

Good point.  navi12 should be dropped from the check.  How about the following?

I would rather test explicitly for Navi 10 and 14, cause we can't be sure if 
there won't be another variant in the future.

Christian.


diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 241a4e57cf4a..280bbd7ca8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -292,7 +292,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct
amdgpu_device *adev, uint32_t vmid,

if (!adev->mman.buffer_funcs_enabled ||
!adev->ib_pool_ready ||
-   adev-&g

Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi

2019-09-26 Thread Koenig, Christian
Well then you didn't figured out the root cause correctly.

This is to avoid data corruption with the SDMA on Navi 10/14 and should 
definitely not applied to Navi 12.

The hardware team went through quite some work to avoid this.

Christian.

Am 26.09.2019 19:38 schrieb "Yuan, Xiaojie" :
 Hi Alex / Christian,

When gfxoff is enabled for Navi12, I observed sdma0 hang while launching 
desktop. When this workaround is applied, the issue fades away.
That's why I included this workaround for Navi12 as well.

BR,
Xiaojie
____
From: Koenig, Christian 
Sent: Thursday, September 26, 2019 10:20 PM
To: Alex Deucher 
Cc: Deucher, Alexander ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org 

Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' 
workaround for navi



Am 26.09.2019 15:51 schrieb Alex Deucher :
On Thu, Sep 26, 2019 at 9:47 AM Koenig, Christian
 wrote:
>
> Am 26.09.19 um 15:40 schrieb Alex Deucher:
> > On Thu, Sep 26, 2019 at 8:29 AM Christian König
> >  wrote:
> >> Stop, wait a second guys!
> >>
> >> This will disable the workaround for Navi10 and that is certainly not 
> >> correct:
> >>
> >> !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= CHIP_NAVI12)
> >>
> > Actually, I think it's correct. When I merged the baco patch, I
> > accidentally dropped the navi checks.  E.g.,
> > @@ -245,8 +245,9 @@ static void gmc_v10_0_flush_gpu_tlb(struct
> > amdgpu_device *adev,
> >  mutex_lock(>mman.gtt_window_lock);
> >
> >  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_MMHUB, 0);
> > -   if (!adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready ||
> > -   adev->asic_type != CHIP_NAVI10) {
> > +   if (!adev->mman.buffer_funcs_enabled ||
> > +   !adev->ib_pool_ready ||
> > +   adev->in_gpu_reset) {
> >  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB, 0);
> >  mutex_unlock(>mman.gtt_window_lock);
> >  return;
> > I think it should have been
> > adev->asic_type != CHIP_NAVI10 && adev->asic_type != CHIP_NAVI14 &&
> > adev->asic_type != CHIP_NAVI12
>
> My last status is that Navi12 is not supposed to need that workaround,
> that's why we checked Navi10 and later Navi14 separately.
>
> It's possible that I miss-read the !(adev->asic_type >= CHIP_NAVI10 &&
> adev->asic_type <= CHIP_NAVI12) check here, but either way that looks to
> complicated to me.
>
> We should rather mention every affected asic type separately here.

Good point.  navi12 should be dropped from the check.  How about the following?

I would rather test explicitly for Navi 10 and 14, cause we can't be sure if 
there won't be another variant in the future.

Christian.


diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 241a4e57cf4a..280bbd7ca8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -292,7 +292,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct
amdgpu_device *adev, uint32_t vmid,

if (!adev->mman.buffer_funcs_enabled ||
!adev->ib_pool_ready ||
-   adev->in_gpu_reset) {
+   adev->in_gpu_reset ||
+   (adev->asic_type == CHIP_NAVI12)) {
gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
mutex_unlock(>mman.gtt_window_lock);
return;

Alex

>
> Regards,
> Christian.
>
> >
> > Alex
> >
> >> Christian.
> >>
> >>
> >> Am 26.09.19 um 14:26 schrieb Deucher, Alexander:
> >>
> >> Please add a patch description.  With that fixed:
> >> Reviewed-by: Alex Deucher 
> >> 
> >> From: amd-gfx  on behalf of Yuan, 
> >> Xiaojie 
> >> Sent: Thursday, September 26, 2019 4:09 AM
> >> To: amd-gfx@lists.freedesktop.org 
> >> Cc: alexdeuc...@gmail.com 
> >> Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' 
> >> workaround for navi
> >>
> >> Hi Alex,
> >>
> >> This patch is to add the asic_type check which is missing after drm-next 
> >> branch rebase.
> >>
> >> BR,
> >> Xiaojie
> >> 
> >> From: Yuan, Xiaojie 
> >> Sent: Thursday, September 26, 2019 4:08 PM
> >> To: amd-gfx@lists.freedesktop.org 
> >> Cc: alexdeuc...@gmail.com ; Yuan, Xiaojie 
> >> 
> >> Subject: [PATCH] drm/amdgpu/gmc10: apply the 'invalidat

Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi

2019-09-26 Thread Koenig, Christian


Am 26.09.2019 15:51 schrieb Alex Deucher :
On Thu, Sep 26, 2019 at 9:47 AM Koenig, Christian
 wrote:
>
> Am 26.09.19 um 15:40 schrieb Alex Deucher:
> > On Thu, Sep 26, 2019 at 8:29 AM Christian König
> >  wrote:
> >> Stop, wait a second guys!
> >>
> >> This will disable the workaround for Navi10 and that is certainly not 
> >> correct:
> >>
> >> !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= CHIP_NAVI12)
> >>
> > Actually, I think it's correct. When I merged the baco patch, I
> > accidentally dropped the navi checks.  E.g.,
> > @@ -245,8 +245,9 @@ static void gmc_v10_0_flush_gpu_tlb(struct
> > amdgpu_device *adev,
> >  mutex_lock(>mman.gtt_window_lock);
> >
> >  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_MMHUB, 0);
> > -   if (!adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready ||
> > -   adev->asic_type != CHIP_NAVI10) {
> > +   if (!adev->mman.buffer_funcs_enabled ||
> > +   !adev->ib_pool_ready ||
> > +   adev->in_gpu_reset) {
> >  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB, 0);
> >  mutex_unlock(>mman.gtt_window_lock);
> >  return;
> > I think it should have been
> > adev->asic_type != CHIP_NAVI10 && adev->asic_type != CHIP_NAVI14 &&
> > adev->asic_type != CHIP_NAVI12
>
> My last status is that Navi12 is not supposed to need that workaround,
> that's why we checked Navi10 and later Navi14 separately.
>
> It's possible that I miss-read the !(adev->asic_type >= CHIP_NAVI10 &&
> adev->asic_type <= CHIP_NAVI12) check here, but either way that looks to
> complicated to me.
>
> We should rather mention every affected asic type separately here.

Good point.  navi12 should be dropped from the check.  How about the following?

I would rather test explicitly for Navi 10 and 14, cause we can't be sure if 
there won't be another variant in the future.

Christian.


diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 241a4e57cf4a..280bbd7ca8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -292,7 +292,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct
amdgpu_device *adev, uint32_t vmid,

if (!adev->mman.buffer_funcs_enabled ||
!adev->ib_pool_ready ||
-   adev->in_gpu_reset) {
+   adev->in_gpu_reset ||
+   (adev->asic_type == CHIP_NAVI12)) {
gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
mutex_unlock(>mman.gtt_window_lock);
return;

Alex

>
> Regards,
> Christian.
>
> >
> > Alex
> >
> >> Christian.
> >>
> >>
> >> Am 26.09.19 um 14:26 schrieb Deucher, Alexander:
> >>
> >> Please add a patch description.  With that fixed:
> >> Reviewed-by: Alex Deucher 
> >> 
> >> From: amd-gfx  on behalf of Yuan, 
> >> Xiaojie 
> >> Sent: Thursday, September 26, 2019 4:09 AM
> >> To: amd-gfx@lists.freedesktop.org 
> >> Cc: alexdeuc...@gmail.com 
> >> Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' 
> >> workaround for navi
> >>
> >> Hi Alex,
> >>
> >> This patch is to add the asic_type check which is missing after drm-next 
> >> branch rebase.
> >>
> >> BR,
> >> Xiaojie
> >> 
> >> From: Yuan, Xiaojie 
> >> Sent: Thursday, September 26, 2019 4:08 PM
> >> To: amd-gfx@lists.freedesktop.org 
> >> Cc: alexdeuc...@gmail.com ; Yuan, Xiaojie 
> >> 
> >> Subject: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' 
> >> workaround for navi
> >>
> >> Fixes: 767acabdac81 ("drm/amd/powerplay: add baco smu reset function for 
> >> smu11")
> >> Signed-off-by: Xiaojie Yuan 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> index cb3f61873baa..dc2e68e019eb 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> @@ -292,6 +292,7 @@ static void gmc_v10_0_flush_gpu_tlb(struct 
> >> amdgpu_device *adev, uint32_t vmid,
> >>
> >>   if (!adev->mman.buffer_funcs_enabled ||
> >>   !adev->ib_pool_ready ||
> >> +   !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= 
> >> CHIP_NAVI12) ||
> >>   adev->in_gpu_reset) {
> >>   gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
> >>   mutex_unlock(>mman.gtt_window_lock);
> >> --
> >> 2.20.1
> >>
> >>
> >> ___
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >>
> >>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 10/11] drm/amdgpu: job is secure iff CS is secure (v4)

2019-09-26 Thread Koenig, Christian
Am 25.09.19 um 16:54 schrieb Huang, Ray:
>> -Original Message-
>> From: Koenig, Christian 
>> Sent: Wednesday, September 25, 2019 10:47 PM
>> To: Huang, Ray ; amd-gfx@lists.freedesktop.org; dri-
>> de...@lists.freedesktop.org; Deucher, Alexander
>> 
>> Cc: Tuikov, Luben ; Liu, Aaron
>> 
>> Subject: Re: [PATCH v3 10/11] drm/amdgpu: job is secure iff CS is secure (v4)
>>
>> Am 25.09.19 um 16:38 schrieb Huang, Ray:
>>> Mark a job as secure, if and only if the command submission flag has
>>> the secure flag set.
>>>
>>> v2: fix the null job pointer while in vmid 0 submission.
>>> v3: Context --> Command submission.
>>> v4: filling cs parser with cs->in.flags
>>>
>>> Signed-off-by: Huang Rui 
>>> Co-developed-by: Luben Tuikov 
>>> Signed-off-by: Luben Tuikov 
>>> Reviewed-by: Alex Deucher 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/amdgpu.h |  3 +++
>>>drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 11 ++-
>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  |  4 ++--
>>>drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  2 ++
>>>4 files changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 697e8e5..fd60695 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -485,6 +485,9 @@ struct amdgpu_cs_parser {
>>> uint64_tbytes_moved;
>>> uint64_tbytes_moved_vis;
>>>
>>> +   /* secure cs */
>>> +   boolis_secure;
>>> +
>>> /* user fence */
>>> struct amdgpu_bo_list_entry uf_entry;
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index 51f3db0..9038dc1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -133,6 +133,14 @@ static int amdgpu_cs_parser_init(struct
>> amdgpu_cs_parser *p, union drm_amdgpu_cs
>>> goto free_chunk;
>>> }
>>>
>>> +   /**
>>> +* The command submission (cs) is a union, so an assignment to
>>> +* 'out' is destructive to the cs (at least the first 8
>>> +* bytes). For this reason, inquire about the flags before the
>>> +* assignment to 'out'.
>>> +*/
>>> +   p->is_secure = cs->in.flags & AMDGPU_CS_FLAGS_SECURE;
>>> +
>>> /* get chunks */
>>> chunk_array_user = u64_to_user_ptr(cs->in.chunks);
>>> if (copy_from_user(chunk_array, chunk_array_user, @@ -1252,8
>>> +1260,9 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>>> p->ctx->preamble_presented = true;
>>> }
>>>
>>> -   cs->out.handle = seq;
>>> +   job->secure = p->is_secure;
>>> job->uf_sequence = seq;
>>> +   cs->out.handle = seq;
>> At least it is no longer accessing cs->in, but that still looks like the 
>> wrong place
>> to initialize the job.
>>
>> Why can't we fill that in directly after amdgpu_job_alloc() ?
> There is not input member that is secure related in amdgpu_job_alloc() except 
> add an one:
>   
> amdgpu_job_alloc(adev, num_ibs, job, vm, secure)
>
> It looks too much, isn't it?

You should not add a new parameter, but rather set the member in 
amdgpu_cs_parser_init() after amdgpu_job_alloc().

Or maybe even better add that into amdgpu_cs_ib_fill(), cause here is 
where we fill in most of the job description.

Regards,
Christian.

>
> Thanks,
> Ray
>
>> Regards,
>> Christian.
>>
>>> amdgpu_job_free_resources(job);
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> index e1dc229..cb9b650 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> @@ -210,7 +210,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring,
>> unsigned num_ibs,
>>> if (job && ring->funcs->emit_cntxcntl) {
>>> status |= job->preamble_status;
>>> status |= job->preemption_status;
>>> -   amdgpu_ring_emit_cntxcntl(ring, status, false);
>>> +   amdgpu_ring_emit_cntxcntl(ring, status, jo

Re: [PATCH v3 10/11] drm/amdgpu: job is secure iff CS is secure (v4)

2019-09-25 Thread Koenig, Christian
Am 25.09.19 um 16:38 schrieb Huang, Ray:
> Mark a job as secure, if and only if the command
> submission flag has the secure flag set.
>
> v2: fix the null job pointer while in vmid 0
> submission.
> v3: Context --> Command submission.
> v4: filling cs parser with cs->in.flags
>
> Signed-off-by: Huang Rui 
> Co-developed-by: Luben Tuikov 
> Signed-off-by: Luben Tuikov 
> Reviewed-by: Alex Deucher 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  3 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 11 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  2 ++
>   4 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 697e8e5..fd60695 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -485,6 +485,9 @@ struct amdgpu_cs_parser {
>   uint64_tbytes_moved;
>   uint64_tbytes_moved_vis;
>   
> + /* secure cs */
> + boolis_secure;
> +
>   /* user fence */
>   struct amdgpu_bo_list_entry uf_entry;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 51f3db0..9038dc1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -133,6 +133,14 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
> *p, union drm_amdgpu_cs
>   goto free_chunk;
>   }
>   
> + /**
> +  * The command submission (cs) is a union, so an assignment to
> +  * 'out' is destructive to the cs (at least the first 8
> +  * bytes). For this reason, inquire about the flags before the
> +  * assignment to 'out'.
> +  */
> + p->is_secure = cs->in.flags & AMDGPU_CS_FLAGS_SECURE;
> +
>   /* get chunks */
>   chunk_array_user = u64_to_user_ptr(cs->in.chunks);
>   if (copy_from_user(chunk_array, chunk_array_user,
> @@ -1252,8 +1260,9 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   p->ctx->preamble_presented = true;
>   }
>   
> - cs->out.handle = seq;
> + job->secure = p->is_secure;
>   job->uf_sequence = seq;
> + cs->out.handle = seq;

At least it is no longer accessing cs->in, but that still looks like the 
wrong place to initialize the job.

Why can't we fill that in directly after amdgpu_job_alloc() ?

Regards,
Christian.

>   
>   amdgpu_job_free_resources(job);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index e1dc229..cb9b650 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -210,7 +210,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   if (job && ring->funcs->emit_cntxcntl) {
>   status |= job->preamble_status;
>   status |= job->preemption_status;
> - amdgpu_ring_emit_cntxcntl(ring, status, false);
> + amdgpu_ring_emit_cntxcntl(ring, status, job->secure);
>   }
>   
>   for (i = 0; i < num_ibs; ++i) {
> @@ -229,7 +229,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   }
>   
>   if (ring->funcs->emit_tmz)
> - amdgpu_ring_emit_tmz(ring, false, false);
> + amdgpu_ring_emit_tmz(ring, false, job ? job->secure : false);
>   
>   #ifdef CONFIG_X86_64
>   if (!(adev->flags & AMD_IS_APU))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> index dc7ee93..aa0e375 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> @@ -63,6 +63,8 @@ struct amdgpu_job {
>   uint64_tuf_addr;
>   uint64_tuf_sequence;
>   
> + /* the job is due to a secure command submission */
> + boolsecure;
>   };
>   
>   int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  1   2   3   4   5   >