Re: [PATCH v2 4/5] drm/amdgpu: use bulk moves for efficient VM LRU handling (v2)

2018-08-10 Thread Huang Rui
On Fri, Aug 10, 2018 at 10:18:24PM +0800, Koenig, Christian wrote:
> Am 10.08.2018 um 13:55 schrieb Huang Rui:
> > I continue to work for bulk moving that based on the proposal by Christian.
> >
> > Background:
> > amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move 
> > all of
> > them on the end of LRU list one by one. Thus, that cause so many BOs moved 
> > to
> > the end of the LRU, and impact performance seriously.
> >
> > Then Christian provided a workaround to not move PD/PT BOs on LRU with below
> > patch:
> > "drm/amdgpu: band aid validating VM PTs"
> > Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae
> >
> > However, the final solution should bulk move all PD/PT and PerVM BOs on the 
> > LRU
> > instead of one by one.
> >
> > Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need 
> > to be
> > validated we move all BOs together to the end of the LRU without dropping 
> > the
> > lock for the LRU.
> >
> > While doing so we note the beginning and end of this block in the LRU list.
> >
> > Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything 
> > to do,
> > we don't move every BO one by one, but instead cut the LRU list into pieces 
> > so
> > that we bulk move everything to the end in just one operation.
> >
> > Test data:
> > +--+-+---+---+
> > |  |The Talos|Clpeak(OCL)|BusSpeedReadback(OCL) 
> >  |
> > |  |Principle(Vulkan)|   |  
> >  |
> > ++
> > |  | |   |0.319 ms(1k) 0.314 ms(2K) 
> > 0.308 ms(4K) |
> > | Original |  147.7 FPS  |  76.86 us |0.307 ms(8K) 0.310 ms(16K)
> >  |
> > ++
> > | Orignial + WA| |   |0.254 ms(1K) 0.241 ms(2K) 
> >  |
> > |(don't move   |  162.1 FPS  |  42.15 us |0.230 ms(4K) 0.223 ms(8K) 
> > 0.204 ms(16K)|
> > |PT BOs on LRU)| |   |  
> >  |
> > ++
> > | Bulk move|  163.1 FPS  |  40.52 us |0.244 ms(1K) 0.252 ms(2K) 
> > 0.213 ms(4K) |
> > |  | |   |0.214 ms(8K) 0.225 ms(16K)
> >  |
> > +--+-+---+---+
> >
> > After test them with above three benchmarks include vulkan and opencl. We 
> > can
> > see the visible improvement than original, and even better than original 
> > with
> > workaround.
> >
> > v2: move all BOs include idle, relocated, and moved list to the end of LRU 
> > and
> > put them together.
> >
> > Signed-off-by: Christian König 
> > Signed-off-by: Huang Rui 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 75 
> > ++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 ++
> >   2 files changed, 63 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > index 9c84770..81fb1ac 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > @@ -268,6 +268,55 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
> >   }
> >   
> >   /**
> > + * amdgpu_vm_move_to_lru_tail - move one list of BOs to end of LRU
> > + *
> > + * @adev: amdgpu device pointer
> > + * @vm: vm providing the BOs
> > + * @list: the list that stored BOs
> > + *
> > + * Move one list of BOs to the end of LRU and update the positions.
> > + */
> > +static void
> > +amdgpu_vm_move_to_lru_tail_by_list(struct amdgpu_device *adev,
> > +  struct amdgpu_vm *vm, struct list_head *list)
> 
> Am I blind or is adev not actually used in the function?
> 

No, what you see is definitely real. ;-)
That's my fault to add an unused adev int the function. Will remove it in
next version.

> > +{
> > +   struct amdgpu_vm_bo_base *bo_base, *tmp;
> > +
> > +   list_for_each_entry_safe(bo_base, tmp, list, vm_status) {
> 
> No need for the _safe variant as far as I can see.
> 

Yes, we won't move vm->vm_status to another list in the loop.

Thanks,
Ray

> Apart from that this looks good to me.
> 
> Christian.
> 
> > +   struct amdgpu_bo *bo = bo_base->bo;
> > +
> > +   if (!bo->parent)
> > +   continue;
> > +
> > +   ttm_bo_move_to_lru_tail(>tbo, >lru_bulk_move);
> > +   if (bo->shadow)
> > +   ttm_bo_move_to_lru_tail(>shadow->tbo,
> > +   >lru_bulk_move);
> > +   }
> > +}
> > +
> > +/**
> > + * amdgpu_vm_move_to_lru_tail - move all BOs to the end of LRU
> > + *
> > + * @adev: 

[PATCH 3/3] drm/amdgpu/powerplay/vega10: enable AVFS control via ppfeaturemask

2018-08-10 Thread Alex Deucher
Allow the user to disable AFVS via ppfeaturemask for debugging.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
index 704b237ecf70..ca9be583fb62 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
@@ -129,7 +129,8 @@ static void vega10_set_default_registry_data(struct 
pp_hwmgr *hwmgr)
data->registry_data.thermal_support = 1;
data->registry_data.fw_ctf_enabled = 1;
 
-   data->registry_data.avfs_support = 1;
+   data->registry_data.avfs_support =
+   hwmgr->feature_mask & PP_AVFS_MASK ? true : false;
data->registry_data.led_dpm_enabled = 1;
 
data->registry_data.vr0hot_enabled = 1;
-- 
2.13.6

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/3] drm/amdgpu: add AVFS control to PP_FEATURE_MASK

2018-08-10 Thread Alex Deucher
Add a ppfeaturemask flag to disable AVFS control.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/include/amd_shared.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index 265621d8945c..86b167ec9863 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -129,6 +129,7 @@ enum PP_FEATURE_MASK {
PP_GFXOFF_MASK = 0x8000,
PP_ACG_MASK = 0x1,
PP_STUTTER_MODE = 0x2,
+   PP_AVFS_MASK = 0x4,
 };
 
 /**
-- 
2.13.6

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/amdgpu/powerplay/smu7: enable AVFS control via ppfeaturemask

2018-08-10 Thread Alex Deucher
Allow the user to disable AFVS via ppfeaturemask for debugging.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c 
b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c
index a029e47c2319..186dafc7f166 100644
--- a/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c
+++ b/drivers/gpu/drm/amd/powerplay/smumgr/smu7_smumgr.c
@@ -620,7 +620,8 @@ int smu7_init(struct pp_hwmgr *hwmgr)
return -EINVAL;
}
 
-   if (smum_is_hw_avfs_present(hwmgr))
+   if (smum_is_hw_avfs_present(hwmgr) &&
+   (hwmgr->feature_mask & PP_AVFS_MASK))
hwmgr->avfs_supported = true;
 
return 0;
-- 
2.13.6

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 6/6] amdgpu: always add all BOs to lockup table

2018-08-10 Thread Marek Olšák
OK. Thanks.

Marek

On Fri, Aug 10, 2018 at 9:06 AM, Christian König
 wrote:
> Why should it? Adding the handle is now not more than setting an array
> entry.
>
> I've tested with allocating 250k BOs of 4k size each and there wasn't any
> measurable performance differences.
>
> Christian.
>
>
> Am 09.08.2018 um 18:56 schrieb Marek Olšák:
>>
>> I don't think this is a good idea. Can you please explain why this
>> won't cause performance regressions?
>>
>> Thanks
>> Marek
>>
>> On Fri, Aug 3, 2018 at 7:34 AM, Christian König
>>  wrote:
>>>
>>> This way we can always find a BO structure by its handle.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   amdgpu/amdgpu_bo.c | 14 --
>>>   1 file changed, 4 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
>>> index 02592377..422c7c99 100644
>>> --- a/amdgpu/amdgpu_bo.c
>>> +++ b/amdgpu/amdgpu_bo.c
>>> @@ -87,6 +87,10 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev,
>>>
>>>  bo->handle = args.out.handle;
>>>
>>> +   pthread_mutex_lock(>dev->bo_table_mutex);
>>> +   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
>>> +   pthread_mutex_unlock(>dev->bo_table_mutex);
>>> +
>>>  pthread_mutex_init(>cpu_access_mutex, NULL);
>>>
>>>  if (r)
>>> @@ -171,13 +175,6 @@ int amdgpu_bo_query_info(amdgpu_bo_handle bo,
>>>  return 0;
>>>   }
>>>
>>> -static void amdgpu_add_handle_to_table(amdgpu_bo_handle bo)
>>> -{
>>> -   pthread_mutex_lock(>dev->bo_table_mutex);
>>> -   handle_table_insert(>dev->bo_handles, bo->handle, bo);
>>> -   pthread_mutex_unlock(>dev->bo_table_mutex);
>>> -}
>>> -
>>>   static int amdgpu_bo_export_flink(amdgpu_bo_handle bo)
>>>   {
>>>  struct drm_gem_flink flink;
>>> @@ -240,14 +237,11 @@ int amdgpu_bo_export(amdgpu_bo_handle bo,
>>>  return 0;
>>>
>>>  case amdgpu_bo_handle_type_kms:
>>> -   amdgpu_add_handle_to_table(bo);
>>> -   /* fall through */
>>>  case amdgpu_bo_handle_type_kms_noimport:
>>>  *shared_handle = bo->handle;
>>>  return 0;
>>>
>>>  case amdgpu_bo_handle_type_dma_buf_fd:
>>> -   amdgpu_add_handle_to_table(bo);
>>>  return drmPrimeHandleToFD(bo->dev->fd, bo->handle,
>>>DRM_CLOEXEC | DRM_RDWR,
>>>(int*)shared_handle);
>>> --
>>> 2.14.1
>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Christian König

Am 10.08.2018 um 11:21 schrieb Daniel Vetter:

[SNIP]
Then don't track _any_ of the amdgpu internal fences in the reservation object:
- 1 reservation object that you hand to ttm, for use internally within amdgpu
- 1 reservation object that you attach to the dma-buf (or get from the
imported dma-buf), where you play all the tricks to fake fences.


Well that is an interesting idea. Going to try that one out.

Regards,
Christian.


-Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: move amdgpu ttm structures to amdgpu_ttm.h

2018-08-10 Thread Christian König

Well NAK, that is intentionally kept local to the amdgpu_ttm.c file.

This way we can make sure that we don't accidentally leak the structure 
somewhere else.


What you could do is to move it to the beginning of the file.

Christian.

Am 10.08.2018 um 07:44 schrieb Junwei Zhang:

code cleanup for amdgpu ttm structures

Signed-off-by: Junwei Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 17 +
  2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c6611cf..87f4e8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -776,26 +776,6 @@ static unsigned long amdgpu_ttm_io_mem_pfn(struct 
ttm_buffer_object *bo,
(offset >> PAGE_SHIFT);
  }
  
-/*

- * TTM backend functions.
- */
-struct amdgpu_ttm_gup_task_list {
-   struct list_headlist;
-   struct task_struct  *task;
-};
-
-struct amdgpu_ttm_tt {
-   struct ttm_dma_tt   ttm;
-   u64 offset;
-   uint64_tuserptr;
-   struct task_struct  *usertask;
-   uint32_tuserflags;
-   spinlock_t  guptasklock;
-   struct list_headguptasks;
-   atomic_tmmu_invalidations;
-   uint32_tlast_set_pages;
-};
-
  /**
   * amdgpu_ttm_tt_get_user_pages - Pin pages of memory pointed to by a USERPTR
   * pointer to memory
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 8b3cc66..b8c391a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -66,6 +66,23 @@ struct amdgpu_copy_mem {
unsigned long   offset;
  };
  
+struct amdgpu_ttm_gup_task_list {

+   struct list_headlist;
+   struct task_struct  *task;
+};
+
+struct amdgpu_ttm_tt {
+   struct ttm_dma_tt   ttm;
+   u64 offset;
+   uint64_tuserptr;
+   struct task_struct  *usertask;
+   uint32_tuserflags;
+   spinlock_t  guptasklock;
+   struct list_headguptasks;
+   atomic_tmmu_invalidations;
+   uint32_tlast_set_pages;
+};
+
  extern const struct ttm_mem_type_manager_func amdgpu_gtt_mgr_func;
  extern const struct ttm_mem_type_manager_func amdgpu_vram_mgr_func;
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 2/2] [RFC] amdgpu: do not free flink bo for flink_fd

2018-08-10 Thread Christian König

Am 10.08.2018 um 07:05 schrieb Junwei Zhang:

the flink bo is used to export


Why should we do this? That makes no sense, this way we would create a 
memory leak.


Christian.



Signed-off-by: Junwei Zhang 
---
  amdgpu/amdgpu_bo.c | 6 --
  1 file changed, 6 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 6f0baf1..5b91cfc 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -219,12 +219,6 @@ static int amdgpu_bo_export_flink(amdgpu_bo_handle bo)
  
  	bo->flink_name = flink.name;
  
-	if (bo->dev->flink_fd != bo->dev->fd) {

-   struct drm_gem_close args = {};
-   args.handle = handle;
-   drmIoctl(bo->dev->flink_fd, DRM_IOCTL_GEM_CLOSE, );
-   }
-
pthread_mutex_lock(>dev->bo_table_mutex);
r = handle_table_insert(>dev->bo_flink_names, bo->flink_name, bo);
pthread_mutex_unlock(>dev->bo_table_mutex);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 4/5] drm/amdgpu: use bulk moves for efficient VM LRU handling (v2)

2018-08-10 Thread Christian König

Am 10.08.2018 um 13:55 schrieb Huang Rui:

I continue to work for bulk moving that based on the proposal by Christian.

Background:
amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move all of
them on the end of LRU list one by one. Thus, that cause so many BOs moved to
the end of the LRU, and impact performance seriously.

Then Christian provided a workaround to not move PD/PT BOs on LRU with below
patch:
"drm/amdgpu: band aid validating VM PTs"
Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae

However, the final solution should bulk move all PD/PT and PerVM BOs on the LRU
instead of one by one.

Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need to be
validated we move all BOs together to the end of the LRU without dropping the
lock for the LRU.

While doing so we note the beginning and end of this block in the LRU list.

Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything to do,
we don't move every BO one by one, but instead cut the LRU list into pieces so
that we bulk move everything to the end in just one operation.

Test data:
+--+-+---+---+
|  |The Talos|Clpeak(OCL)|BusSpeedReadback(OCL) 
 |
|  |Principle(Vulkan)|   |  
 |
++
|  | |   |0.319 ms(1k) 0.314 ms(2K) 0.308 
ms(4K) |
| Original |  147.7 FPS  |  76.86 us |0.307 ms(8K) 0.310 ms(16K)
 |
++
| Orignial + WA| |   |0.254 ms(1K) 0.241 ms(2K) 
 |
|(don't move   |  162.1 FPS  |  42.15 us |0.230 ms(4K) 0.223 ms(8K) 0.204 
ms(16K)|
|PT BOs on LRU)| |   |  
 |
++
| Bulk move|  163.1 FPS  |  40.52 us |0.244 ms(1K) 0.252 ms(2K) 0.213 
ms(4K) |
|  | |   |0.214 ms(8K) 0.225 ms(16K)
 |
+--+-+---+---+

After test them with above three benchmarks include vulkan and opencl. We can
see the visible improvement than original, and even better than original with
workaround.

v2: move all BOs include idle, relocated, and moved list to the end of LRU and
put them together.

Signed-off-by: Christian König 
Signed-off-by: Huang Rui 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 75 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 ++
  2 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9c84770..81fb1ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -268,6 +268,55 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
  }
  
  /**

+ * amdgpu_vm_move_to_lru_tail - move one list of BOs to end of LRU
+ *
+ * @adev: amdgpu device pointer
+ * @vm: vm providing the BOs
+ * @list: the list that stored BOs
+ *
+ * Move one list of BOs to the end of LRU and update the positions.
+ */
+static void
+amdgpu_vm_move_to_lru_tail_by_list(struct amdgpu_device *adev,
+  struct amdgpu_vm *vm, struct list_head *list)


Am I blind or is adev not actually used in the function?


+{
+   struct amdgpu_vm_bo_base *bo_base, *tmp;
+
+   list_for_each_entry_safe(bo_base, tmp, list, vm_status) {


No need for the _safe variant as far as I can see.

Apart from that this looks good to me.

Christian.


+   struct amdgpu_bo *bo = bo_base->bo;
+
+   if (!bo->parent)
+   continue;
+
+   ttm_bo_move_to_lru_tail(>tbo, >lru_bulk_move);
+   if (bo->shadow)
+   ttm_bo_move_to_lru_tail(>shadow->tbo,
+   >lru_bulk_move);
+   }
+}
+
+/**
+ * amdgpu_vm_move_to_lru_tail - move all BOs to the end of LRU
+ *
+ * @adev: amdgpu device pointer
+ * @vm: vm providing the BOs
+ *
+ * Move all BOs to the end of LRU and remember their positions to put them
+ * together.
+ */
+static void
+amdgpu_vm_move_to_lru_tail(struct amdgpu_device *adev, struct amdgpu_vm *vm)
+{
+   struct ttm_bo_global *glob = adev->mman.bdev.glob;
+
+   spin_lock(>lru_lock);
+   amdgpu_vm_move_to_lru_tail_by_list(adev, vm, >idle);
+   amdgpu_vm_move_to_lru_tail_by_list(adev, vm, >relocated);
+   amdgpu_vm_move_to_lru_tail_by_list(adev, vm, >moved);
+   spin_unlock(>lru_lock);
+}
+
+/**
   * amdgpu_vm_validate_pt_bos - validate the page table BOs
   *
   * @adev: amdgpu device pointer
@@ -286,6 +335,7 @@ int 

Re: [PATCH libdrm 6/6] amdgpu: always add all BOs to lockup table

2018-08-10 Thread Christian König
Why should it? Adding the handle is now not more than setting an array 
entry.


I've tested with allocating 250k BOs of 4k size each and there wasn't 
any measurable performance differences.


Christian.

Am 09.08.2018 um 18:56 schrieb Marek Olšák:

I don't think this is a good idea. Can you please explain why this
won't cause performance regressions?

Thanks
Marek

On Fri, Aug 3, 2018 at 7:34 AM, Christian König
 wrote:

This way we can always find a BO structure by its handle.

Signed-off-by: Christian König 
---
  amdgpu/amdgpu_bo.c | 14 --
  1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 02592377..422c7c99 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -87,6 +87,10 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev,

 bo->handle = args.out.handle;

+   pthread_mutex_lock(>dev->bo_table_mutex);
+   r = handle_table_insert(>dev->bo_handles, bo->handle, bo);
+   pthread_mutex_unlock(>dev->bo_table_mutex);
+
 pthread_mutex_init(>cpu_access_mutex, NULL);

 if (r)
@@ -171,13 +175,6 @@ int amdgpu_bo_query_info(amdgpu_bo_handle bo,
 return 0;
  }

-static void amdgpu_add_handle_to_table(amdgpu_bo_handle bo)
-{
-   pthread_mutex_lock(>dev->bo_table_mutex);
-   handle_table_insert(>dev->bo_handles, bo->handle, bo);
-   pthread_mutex_unlock(>dev->bo_table_mutex);
-}
-
  static int amdgpu_bo_export_flink(amdgpu_bo_handle bo)
  {
 struct drm_gem_flink flink;
@@ -240,14 +237,11 @@ int amdgpu_bo_export(amdgpu_bo_handle bo,
 return 0;

 case amdgpu_bo_handle_type_kms:
-   amdgpu_add_handle_to_table(bo);
-   /* fall through */
 case amdgpu_bo_handle_type_kms_noimport:
 *shared_handle = bo->handle;
 return 0;

 case amdgpu_bo_handle_type_dma_buf_fd:
-   amdgpu_add_handle_to_table(bo);
 return drmPrimeHandleToFD(bo->dev->fd, bo->handle,
   DRM_CLOEXEC | DRM_RDWR,
   (int*)shared_handle);
--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 5/5] drm/amdgpu: move PD/PT bos on LRU again

2018-08-10 Thread Huang Rui
The new bulk moving functionality is ready, the overhead of moving PD/PT bos to
LRU is fixed. So move them on LRU again.

Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 81fb1ac..9f44514 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1127,7 +1127,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
*adev,
   struct amdgpu_vm_bo_base,
   vm_status);
bo_base->moved = false;
-   list_del_init(_base->vm_status);
+   list_move(_base->vm_status, >idle);
 
bo = bo_base->bo->parent;
if (!bo)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 4/5] drm/amdgpu: use bulk moves for efficient VM LRU handling (v2)

2018-08-10 Thread Huang Rui
I continue to work for bulk moving that based on the proposal by Christian.

Background:
amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move all of
them on the end of LRU list one by one. Thus, that cause so many BOs moved to
the end of the LRU, and impact performance seriously.

Then Christian provided a workaround to not move PD/PT BOs on LRU with below
patch:
"drm/amdgpu: band aid validating VM PTs"
Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae

However, the final solution should bulk move all PD/PT and PerVM BOs on the LRU
instead of one by one.

Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need to be
validated we move all BOs together to the end of the LRU without dropping the
lock for the LRU.

While doing so we note the beginning and end of this block in the LRU list.

Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything to do,
we don't move every BO one by one, but instead cut the LRU list into pieces so
that we bulk move everything to the end in just one operation.

Test data:
+--+-+---+---+
|  |The Talos|Clpeak(OCL)|BusSpeedReadback(OCL) 
 |
|  |Principle(Vulkan)|   |  
 |
++
|  | |   |0.319 ms(1k) 0.314 ms(2K) 0.308 
ms(4K) |
| Original |  147.7 FPS  |  76.86 us |0.307 ms(8K) 0.310 ms(16K)
 |
++
| Orignial + WA| |   |0.254 ms(1K) 0.241 ms(2K) 
 |
|(don't move   |  162.1 FPS  |  42.15 us |0.230 ms(4K) 0.223 ms(8K) 0.204 
ms(16K)|
|PT BOs on LRU)| |   |  
 |
++
| Bulk move|  163.1 FPS  |  40.52 us |0.244 ms(1K) 0.252 ms(2K) 0.213 
ms(4K) |
|  | |   |0.214 ms(8K) 0.225 ms(16K)
 |
+--+-+---+---+

After test them with above three benchmarks include vulkan and opencl. We can
see the visible improvement than original, and even better than original with
workaround.

v2: move all BOs include idle, relocated, and moved list to the end of LRU and
put them together.

Signed-off-by: Christian König 
Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 75 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 ++
 2 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9c84770..81fb1ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -268,6 +268,55 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 }
 
 /**
+ * amdgpu_vm_move_to_lru_tail - move one list of BOs to end of LRU
+ *
+ * @adev: amdgpu device pointer
+ * @vm: vm providing the BOs
+ * @list: the list that stored BOs
+ *
+ * Move one list of BOs to the end of LRU and update the positions.
+ */
+static void
+amdgpu_vm_move_to_lru_tail_by_list(struct amdgpu_device *adev,
+  struct amdgpu_vm *vm, struct list_head *list)
+{
+   struct amdgpu_vm_bo_base *bo_base, *tmp;
+
+   list_for_each_entry_safe(bo_base, tmp, list, vm_status) {
+   struct amdgpu_bo *bo = bo_base->bo;
+
+   if (!bo->parent)
+   continue;
+
+   ttm_bo_move_to_lru_tail(>tbo, >lru_bulk_move);
+   if (bo->shadow)
+   ttm_bo_move_to_lru_tail(>shadow->tbo,
+   >lru_bulk_move);
+   }
+}
+
+/**
+ * amdgpu_vm_move_to_lru_tail - move all BOs to the end of LRU
+ *
+ * @adev: amdgpu device pointer
+ * @vm: vm providing the BOs
+ *
+ * Move all BOs to the end of LRU and remember their positions to put them
+ * together.
+ */
+static void
+amdgpu_vm_move_to_lru_tail(struct amdgpu_device *adev, struct amdgpu_vm *vm)
+{
+   struct ttm_bo_global *glob = adev->mman.bdev.glob;
+
+   spin_lock(>lru_lock);
+   amdgpu_vm_move_to_lru_tail_by_list(adev, vm, >idle);
+   amdgpu_vm_move_to_lru_tail_by_list(adev, vm, >relocated);
+   amdgpu_vm_move_to_lru_tail_by_list(adev, vm, >moved);
+   spin_unlock(>lru_lock);
+}
+
+/**
  * amdgpu_vm_validate_pt_bos - validate the page table BOs
  *
  * @adev: amdgpu device pointer
@@ -286,6 +335,7 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
 {
struct ttm_bo_global *glob = adev->mman.bdev.glob;
struct amdgpu_vm_bo_base *bo_base, *tmp;
+   bool validated = false;
int r = 0;

[PATCH v2 3/5] drm/ttm: add bulk move function on LRU

2018-08-10 Thread Huang Rui
This function allow us to bulk move a group of BOs to the tail of their LRU.
The positions of group of BOs are stored on the (first, last) bulk_move_pos
structure.

Signed-off-by: Christian König 
Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 52 
 include/drm/ttm/ttm_bo_api.h | 10 +
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 7117b6b..39d9d55 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -247,6 +247,58 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
 }
 EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
 
+static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
+   struct list_head *lru, bool is_swap)
+{
+   struct list_head entries, before;
+   struct list_head *list1, *list2;
+
+   list1 = is_swap ? >last->swap : >last->lru;
+   list2 = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
+
+   list_cut_position(, lru, list1);
+   list_cut_position(, , list2);
+   list_splice(, lru);
+   list_splice_tail(, lru);
+}
+
+void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
+{
+   unsigned i;
+
+   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+   struct ttm_mem_type_manager *man;
+
+   if (!bulk->tt[i].first)
+   continue;
+
+   man = >tt[i].first->bdev->man[TTM_PL_TT];
+   ttm_bo_bulk_move_helper(>tt[i], >lru[i], false);
+   }
+
+   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+   struct ttm_mem_type_manager *man;
+
+   if (!bulk->vram[i].first)
+   continue;
+
+   man = >vram[i].first->bdev->man[TTM_PL_VRAM];
+   ttm_bo_bulk_move_helper(>vram[i], >lru[i], false);
+   }
+
+   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
+   struct ttm_lru_bulk_move_pos *pos = >swap[i];
+   struct list_head *lru;
+
+   if (!pos->first)
+   continue;
+
+   lru = >first->bdev->glob->swap_lru[i];
+   ttm_bo_bulk_move_helper(>swap[i], lru, true);
+   }
+}
+EXPORT_SYMBOL(ttm_bo_bulk_move_lru_tail);
+
 static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
  struct ttm_mem_reg *mem, bool evict,
  struct ttm_operation_ctx *ctx)
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 0d4eb81..8c19470 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -417,6 +417,16 @@ void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
 struct ttm_lru_bulk_move *bulk);
 
 /**
+ * ttm_bo_bulk_move_lru_tail
+ *
+ * @bulk: bulk move structure
+ *
+ * Bulk move BOs to the LRU tail, only valid to use when driver makes sure that
+ * BO order never changes. Should be called with ttm_bo_global::lru_lock held.
+ */
+void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk);
+
+/**
  * ttm_bo_lock_delayed_workqueue
  *
  * Prevent the delayed workqueue from running.
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 2/5] drm/ttm: revise ttm_bo_move_to_lru_tail to support bulk moves

2018-08-10 Thread Huang Rui
From: Christian König 

When move a BO to the end of LRU, it need remember the BO positions.
Make sure all moved bo in between "first" and "last". And they will be bulk
moving together.

Signed-off-by: Christian König 
Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  8 
 drivers/gpu/drm/ttm/ttm_bo.c   | 26 +-
 include/drm/ttm/ttm_bo_api.h   |  6 +-
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 015613b..9c84770 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -297,9 +297,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
break;
 
spin_lock(>lru_lock);
-   ttm_bo_move_to_lru_tail(>tbo);
+   ttm_bo_move_to_lru_tail(>tbo, NULL);
if (bo->shadow)
-   ttm_bo_move_to_lru_tail(>shadow->tbo);
+   ttm_bo_move_to_lru_tail(>shadow->tbo, NULL);
spin_unlock(>lru_lock);
}
 
@@ -319,9 +319,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (!bo->parent)
continue;
 
-   ttm_bo_move_to_lru_tail(>tbo);
+   ttm_bo_move_to_lru_tail(>tbo, NULL);
if (bo->shadow)
-   ttm_bo_move_to_lru_tail(>shadow->tbo);
+   ttm_bo_move_to_lru_tail(>shadow->tbo, NULL);
}
spin_unlock(>lru_lock);
 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 7c48472..7117b6b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -214,12 +214,36 @@ void ttm_bo_del_sub_from_lru(struct ttm_buffer_object *bo)
 }
 EXPORT_SYMBOL(ttm_bo_del_sub_from_lru);
 
-void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo)
+static void ttm_bo_bulk_move_set_pos(struct ttm_lru_bulk_move_pos *pos,
+struct ttm_buffer_object *bo)
+{
+   if (!pos->first)
+   pos->first = bo;
+   pos->last = bo;
+}
+
+void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
+struct ttm_lru_bulk_move *bulk)
 {
reservation_object_assert_held(bo->resv);
 
ttm_bo_del_from_lru(bo);
ttm_bo_add_to_lru(bo);
+
+   if (bulk && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
+   switch (bo->mem.mem_type) {
+   case TTM_PL_TT:
+   ttm_bo_bulk_move_set_pos(>tt[bo->priority], bo);
+   break;
+
+   case TTM_PL_VRAM:
+   ttm_bo_bulk_move_set_pos(>vram[bo->priority], bo);
+   break;
+   }
+   if (bo->ttm && !(bo->ttm->page_flags &
+(TTM_PAGE_FLAG_SG | TTM_PAGE_FLAG_SWAPPED)))
+   ttm_bo_bulk_move_set_pos(>swap[bo->priority], bo);
+   }
 }
 EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
 
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index a01ba20..0d4eb81 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -51,6 +51,8 @@ struct ttm_placement;
 
 struct ttm_place;
 
+struct ttm_lru_bulk_move;
+
 /**
  * struct ttm_bus_placement
  *
@@ -405,12 +407,14 @@ void ttm_bo_del_from_lru(struct ttm_buffer_object *bo);
  * ttm_bo_move_to_lru_tail
  *
  * @bo: The buffer object.
+ * @bulk: optional bulk move structure to remember BO positions
  *
  * Move this BO to the tail of all lru lists used to lookup and reserve an
  * object. This function must be called with struct ttm_bo_global::lru_lock
  * held, and is used to make a BO less likely to be considered for eviction.
  */
-void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo);
+void ttm_bo_move_to_lru_tail(struct ttm_buffer_object *bo,
+struct ttm_lru_bulk_move *bulk);
 
 /**
  * ttm_bo_lock_delayed_workqueue
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 1/5] drm/ttm: add helper structures for bulk moves on lru list

2018-08-10 Thread Huang Rui
From: Christian König 

Add bulk move pos to store the pointer of first and last buffer object.
The list in between will be bulk moved on lru list.

Signed-off-by: Christian König 
Signed-off-by: Huang Rui 
---
 include/drm/ttm/ttm_bo_driver.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 3234cc3..e4fee8e 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -491,6 +491,34 @@ struct ttm_bo_device {
 };
 
 /**
+ * struct ttm_lru_bulk_move_pos
+ *
+ * @first: first BO in the bulk move range
+ * @last: last BO in the bulk move range
+ *
+ * Positions for a lru bulk move.
+ */
+struct ttm_lru_bulk_move_pos {
+   struct ttm_buffer_object *first;
+   struct ttm_buffer_object *last;
+};
+
+/**
+ * struct ttm_lru_bulk_move
+ *
+ * @tt: first/last lru entry for BOs in the TT domain
+ * @vram: first/last lru entry for BOs in the VRAM domain
+ * @swap: first/last lru entry for BOs on the swap list
+ *
+ * Helper structure for bulk moves on the LRU list.
+ */
+struct ttm_lru_bulk_move {
+   struct ttm_lru_bulk_move_pos tt[TTM_MAX_BO_PRIORITY];
+   struct ttm_lru_bulk_move_pos vram[TTM_MAX_BO_PRIORITY];
+   struct ttm_lru_bulk_move_pos swap[TTM_MAX_BO_PRIORITY];
+};
+
+/**
  * ttm_flag_masked
  *
  * @old: Pointer to the result and original value.
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 0/5] drm/ttm,amdgpu: Introduce LRU bulk move functionality

2018-08-10 Thread Huang Rui
The idea and proposal is originally from Christian, and I continue to work to
deliver it.

Background:
amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move all of
them on the end of LRU list one by one. Thus, that cause so many BOs moved to
the end of the LRU, and impact performance seriously.

Then Christian provided a workaround to not move PD/PT BOs on LRU with below
patch:
"drm/amdgpu: band aid validating VM PTs"
Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae

However, the final solution should bulk move all PD/PT and PerVM BOs on the LRU
instead of one by one.

Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need to be
validated we move all BOs together to the end of the LRU without dropping the
lock for the LRU.

While doing so we note the beginning and end of this block in the LRU list.

Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything to do,
we don't move every BO one by one, but instead cut the LRU list into pieces so
that we bulk move everything to the end in just one operation.

Test data:
+--+-+---+---+
|  |The Talos|Clpeak(OCL)|BusSpeedReadback(OCL) 
 |
|  |Principle(Vulkan)|   |  
 |
++
|  | |   |0.319 ms(1k) 0.314 ms(2K) 0.308 
ms(4K) |
| Original |  147.7 FPS  |  76.86 us |0.307 ms(8K) 0.310 ms(16K)
 |
++
| Orignial + WA| |   |0.254 ms(1K) 0.241 ms(2K) 
 |
|(don't move   |  162.1 FPS  |  42.15 us |0.230 ms(4K) 0.223 ms(8K) 0.204 
ms(16K)|
|PT BOs on LRU)| |   |  
 |
++
| Bulk move|  163.1 FPS  |  40.52 us |0.244 ms(1K) 0.252 ms(2K) 0.213 
ms(4K) |
|  | |   |0.214 ms(8K) 0.225 ms(16K)
 |
+--+-+---+---+

After test them with above three benchmarks include vulkan and opencl. We can
see the visible improvement than original, and even better than original with
workaround.

Changes from V1 -> V2:
- Fix to missed the BOs in relocated/moved that should be also moved to the end
  of LRU.

Thanks,
Rui

Christian König (2):
  drm/ttm: add helper structures for bulk moves on lru list
  drm/ttm: revise ttm_bo_move_to_lru_tail to support bulk moves

Huang Rui (3):
  drm/ttm: add bulk move function on LRU
  drm/amdgpu: use bulk moves for efficient VM LRU handling (v2)
  drm/amdgpu: move PD/PT bos on LRU again

 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 77 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 ++
 drivers/gpu/drm/ttm/ttm_bo.c   | 78 +-
 include/drm/ttm/ttm_bo_api.h   | 16 ++-
 include/drm/ttm/ttm_bo_driver.h| 28 
 5 files changed, 184 insertions(+), 19 deletions(-)

-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix integer overflow test in amdgpu_bo_list_create()

2018-08-10 Thread Huang Rui
On Fri, Aug 10, 2018 at 06:50:32PM +0800, Dan Carpenter wrote:
> We accidentally left out the size of the amdgpu_bo_list struct.  It
> could lead to memory corruption on 32 bit systems.  You'd have to
> pick the absolute maximum and set "num_entries == 59652323" then size
> would wrap to 16 bytes.
> 
> Fixes: 920990cb080a ("drm/amdgpu: allocate the bo_list array after the list")
> Signed-off-by: Dan Carpenter 

Nice catch!
Acked-by: Huang Rui 

> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> index d472a2c8399f..b80243d3972e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> @@ -67,7 +67,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
> struct drm_file *filp,
>   unsigned i;
>   int r;
>  
> - if (num_entries > SIZE_MAX / sizeof(struct amdgpu_bo_list_entry))
> + if (num_entries > (SIZE_MAX - sizeof(struct amdgpu_bo_list))
> + / sizeof(struct amdgpu_bo_list_entry))
>   return -EINVAL;
>  
>   size = sizeof(struct amdgpu_bo_list);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix integer overflow test in amdgpu_bo_list_create()

2018-08-10 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 
On Fri, Aug 10, 2018 at 12:50 PM Dan Carpenter  wrote:
>
> We accidentally left out the size of the amdgpu_bo_list struct.  It
> could lead to memory corruption on 32 bit systems.  You'd have to
> pick the absolute maximum and set "num_entries == 59652323" then size
> would wrap to 16 bytes.
>
> Fixes: 920990cb080a ("drm/amdgpu: allocate the bo_list array after the list")
> Signed-off-by: Dan Carpenter 
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> index d472a2c8399f..b80243d3972e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> @@ -67,7 +67,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
> struct drm_file *filp,
> unsigned i;
> int r;
>
> -   if (num_entries > SIZE_MAX / sizeof(struct amdgpu_bo_list_entry))
> +   if (num_entries > (SIZE_MAX - sizeof(struct amdgpu_bo_list))
> +   / sizeof(struct amdgpu_bo_list_entry))
> return -EINVAL;
>
> size = sizeof(struct amdgpu_bo_list);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix integer overflow test in amdgpu_bo_list_create()

2018-08-10 Thread Dan Carpenter
We accidentally left out the size of the amdgpu_bo_list struct.  It
could lead to memory corruption on 32 bit systems.  You'd have to
pick the absolute maximum and set "num_entries == 59652323" then size
would wrap to 16 bytes.

Fixes: 920990cb080a ("drm/amdgpu: allocate the bo_list array after the list")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index d472a2c8399f..b80243d3972e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -67,7 +67,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct 
drm_file *filp,
unsigned i;
int r;
 
-   if (num_entries > SIZE_MAX / sizeof(struct amdgpu_bo_list_entry))
+   if (num_entries > (SIZE_MAX - sizeof(struct amdgpu_bo_list))
+   / sizeof(struct amdgpu_bo_list_entry))
return -EINVAL;
 
size = sizeof(struct amdgpu_bo_list);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2018 at 11:14 AM, Christian König
 wrote:
> Am 10.08.2018 um 10:29 schrieb Daniel Vetter:
>>
>> [SNIP]
>> I'm only interested in the case of shared buffers. And for those you
>> _do_ pessimistically assume that all access must be implicitly synced.
>> Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
>> makes sense that you don't bother with it.
>
>
> See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.

That's for radv. Won't be enough for EGL_ANDROID_native_fence_sync,
because you cannot know at buffer allocation time how the fencing will
be done in all cases.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2018 at 10:51 AM, Christian König
 wrote:
> Am 10.08.2018 um 10:32 schrieb Daniel Vetter:
>>
>> On Fri, Aug 10, 2018 at 10:24 AM, Christian König
>>  wrote:
>>>
>>> Am 10.08.2018 um 09:51 schrieb Chris Wilson:

 Quoting Christian König (2018-08-09 15:54:31)
>
> Am 09.08.2018 um 16:22 schrieb Daniel Vetter:
>>
>> On Thu, Aug 9, 2018 at 3:58 PM, Christian König
>>  wrote:
>>>
>>> Am 09.08.2018 um 15:38 schrieb Daniel Vetter:

 On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
 [SNIP]
>>>
>>> See to me the explicit fence in the reservation object is not even
>>> remotely
>>> related to implicit or explicit synchronization.
>>
>> Hm, I guess that's the confusion then. The only reason we have the
>> exclusive fence is to implement cross-driver implicit syncing. What
>> else you do internally in your driver doesn't matter, as long as you
>> keep up that contract.
>>
>> And it's intentionally not called write_fence or anything like that,
>> because that's not what it tracks.
>>
>> Of course any buffer moves the kernel does also must be tracked in the
>> exclusive fence, because userspace cannot know about these. So you
>> might have an exclusive fence set and also an explicit fence passed in
>> through the atomic ioctl. Aside: Right now all drivers only observe
>> one or the other, not both, so will break as soon as we start moving
>> shared buffers around. At least on Android or anything else using
>> explicit fencing.
>
> Actually both radeon and nouveau use the approach that shared fences
> need to wait on as well when they don't come from the current driver.

 nouveau has write hazard tracking in its api , and is using the
 excl fence for it was well.

 As far as I can tell, this is all about fixing the lack of
 acknowledgement of the requirement for implicit fence tracking in
 amdgpu's (and its radeon predecessor) ABI.
>>>
>>>
>>> Well I agree that implicit fencing was a bad idea to begin with, but the
>>> solution you guys came up with is not going to work in all cases either.
>>>
 Which is fine so long as you
 get userspace to only use explicit fence passing to the compositor.
>>>
>>>
>>> Well exactly that's the problem.
>>>
>>> I can't pass exactly one explicit fence to the compositor because I have
>>> multiple command submissions which run asynchronously and need to finish
>>> before the compositor can start to use the surface.
>>>
>>> So the whole concept of using a single exclusive fence doesn't work in
>>> the
>>> case of amdgpu. And to be honest I think all drivers with multiple
>>> engines
>>> could benefit of that as well.
>>
>> Fences can be merge. This works, really :-) In amdgpu's implementation
>> of EGL_ANDROID_native_fence_sync you will need to do that before
>> passing the result to the caller.
>
> Yeah, but that is for the userspace API. Not for any internal handling in
> the driver.

dma_fence_array? Or how do you think this fence merging for userspace
is implemented?

The only trouble is that amdgpu doesn't have an explicit hand-over
point in the uapi where an explicitly synced buffer (all of them
really) gets its fences for implicit syncing attached.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2018 at 11:14 AM, Christian König
 wrote:
> Am 10.08.2018 um 10:29 schrieb Daniel Vetter:
>>
>> [SNIP]
>> I'm only interested in the case of shared buffers. And for those you
>> _do_ pessimistically assume that all access must be implicitly synced.
>> Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
>> makes sense that you don't bother with it.
>
>
> See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.
>
>
>>
 - as a consequence, amdgpu needs to pessimistically assume that all
 writes to shared buffer need to obey implicit fencing rules.
 - for shared buffers (across process or drivers) implicit fencing does
 _not_ allow concurrent writers. That limitation is why people want to
 do explicit fencing, and it's the reason why there's only 1 slot for
 an exclusive. Note I really mean concurrent here, a queue of in-flight
 writes by different batches is perfectly fine. But it's a fully
 ordered queue of writes.
 - but as a consequence of amdgpu's lack of implicit fencing and hence
 need to pessimistically assume there's multiple write fences amdgpu
 needs to put multiple fences behind the single exclusive slot. This is
 a limitation imposed by by the amdgpu stack, not something inherit to
 how implicit fencing works.
 - Chris Wilson's patch implements all this (and afaics with a bit more
 coffee, correctly).

 If you want to be less pessimistic in amdgpu for shared buffers, you
 need to start tracking which shared buffer access need implicit and
 which explicit sync. What you can't do is suddenly create more than 1
 exclusive fence, that's not how implicit fencing works. Another thing
 you cannot do is force everyone else (in non-amdgpu or core code) to
 sync against _all_ writes, because that forces implicit syncing. Which
 people very much don't want.
>>>
>>>
>>> I also do see the problem that most other hardware doesn't need that
>>> functionality, because it is driven by a single engine. That's why I
>>> tried
>>> to keep the overhead as low as possible.
>>>
>>> But at least for amdgpu (and I strongly suspect for nouveau as well) it
>>> is
>>> absolutely vital in a number of cases to allow concurrent accesses from
>>> the
>>> same client even when the BO is then later used with implicit
>>> synchronization.
>>>
>>> This is also the reason why the current workaround is so problematic for
>>> us.
>>> Cause as soon as the BO is shared with another (non-amdgpu) device all
>>> command submissions to it will be serialized even when they come from the
>>> same client.
>>>
>>> Would it be an option extend the concept of the "owner" of the BO amdpgu
>>> uses to other drivers as well?
>>>
>>> When you already have explicit synchronization insider your client, but
>>> not
>>> between clients (e.g. still uses DRI2 or DRI3), this could also be rather
>>> beneficial for others as well.
>>
>> Again: How you synchronize your driver internal rendering is totally
>> up to you. If you see an exclusive fence by amdgpu, and submit new
>> rendering by amdgpu, you can totally ignore the exclusive fence. The
>> only api contracts for implicit fencing are between drivers for shared
>> buffers. If you submit rendering to a shared buffer in parallel, all
>> without attaching an exclusive fence that's perfectly fine, but
>> somewhen later on, depending upon protocol (glFlush or glxSwapBuffers
>> or whatever) you have to collect all those concurrent write hazards
>> and bake them into 1 single exclusive fence for implicit fencing.
>>
>> Atm (and Chris seems to concur) the amdgpu uapi doesn't allow you to
>> do that, so for anything shared you have to be super pessimistic.
>> Adding a HAND_OFF_FOR_IMPLICIT_FENCING flag/ioctl would probably fix
>> that. Only when that flag is set would you take all shared write
>> hazards and bake them into one exclusive fence for hand-off to the
>> next driver. You'd also need the same when receiving an implicitly
>> fenced buffer, to make sure that your concurrent writes do synchronize
>> with reading (aka shared fences) done by other drivers. With a bunch
>> of trickery and hacks it might be possible to infer this from current
>> ioctls even, but you need to be really careful.
>
>
> A new uapi is out of question because we need to be backward compatible.

Since when is new uapi out of the question for a performance improvement?

>> And you're right that amdgpu seems to be the only (or one of the only)
>> drivers which do truly concurrent rendering to the same buffer (not
>> just concurrent rendering to multiple buffers all suballocated from
>> the same bo). But we can't fix this in the kernel with the tricks you
>> propose, because without such an uapi extension (or inference) we
>> can't tell the implicit fencing from the explicit fencing case.
>
>
> Sure we can. As I said for amdgpu that is absolutely no problem at all.
>
> In your terminology all rendering from the same client to a BO 

Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Christian König

Am 10.08.2018 um 10:29 schrieb Daniel Vetter:

[SNIP]
I'm only interested in the case of shared buffers. And for those you
_do_ pessimistically assume that all access must be implicitly synced.
Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
makes sense that you don't bother with it.


See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.




- as a consequence, amdgpu needs to pessimistically assume that all
writes to shared buffer need to obey implicit fencing rules.
- for shared buffers (across process or drivers) implicit fencing does
_not_ allow concurrent writers. That limitation is why people want to
do explicit fencing, and it's the reason why there's only 1 slot for
an exclusive. Note I really mean concurrent here, a queue of in-flight
writes by different batches is perfectly fine. But it's a fully
ordered queue of writes.
- but as a consequence of amdgpu's lack of implicit fencing and hence
need to pessimistically assume there's multiple write fences amdgpu
needs to put multiple fences behind the single exclusive slot. This is
a limitation imposed by by the amdgpu stack, not something inherit to
how implicit fencing works.
- Chris Wilson's patch implements all this (and afaics with a bit more
coffee, correctly).

If you want to be less pessimistic in amdgpu for shared buffers, you
need to start tracking which shared buffer access need implicit and
which explicit sync. What you can't do is suddenly create more than 1
exclusive fence, that's not how implicit fencing works. Another thing
you cannot do is force everyone else (in non-amdgpu or core code) to
sync against _all_ writes, because that forces implicit syncing. Which
people very much don't want.


I also do see the problem that most other hardware doesn't need that
functionality, because it is driven by a single engine. That's why I tried
to keep the overhead as low as possible.

But at least for amdgpu (and I strongly suspect for nouveau as well) it is
absolutely vital in a number of cases to allow concurrent accesses from the
same client even when the BO is then later used with implicit
synchronization.

This is also the reason why the current workaround is so problematic for us.
Cause as soon as the BO is shared with another (non-amdgpu) device all
command submissions to it will be serialized even when they come from the
same client.

Would it be an option extend the concept of the "owner" of the BO amdpgu
uses to other drivers as well?

When you already have explicit synchronization insider your client, but not
between clients (e.g. still uses DRI2 or DRI3), this could also be rather
beneficial for others as well.

Again: How you synchronize your driver internal rendering is totally
up to you. If you see an exclusive fence by amdgpu, and submit new
rendering by amdgpu, you can totally ignore the exclusive fence. The
only api contracts for implicit fencing are between drivers for shared
buffers. If you submit rendering to a shared buffer in parallel, all
without attaching an exclusive fence that's perfectly fine, but
somewhen later on, depending upon protocol (glFlush or glxSwapBuffers
or whatever) you have to collect all those concurrent write hazards
and bake them into 1 single exclusive fence for implicit fencing.

Atm (and Chris seems to concur) the amdgpu uapi doesn't allow you to
do that, so for anything shared you have to be super pessimistic.
Adding a HAND_OFF_FOR_IMPLICIT_FENCING flag/ioctl would probably fix
that. Only when that flag is set would you take all shared write
hazards and bake them into one exclusive fence for hand-off to the
next driver. You'd also need the same when receiving an implicitly
fenced buffer, to make sure that your concurrent writes do synchronize
with reading (aka shared fences) done by other drivers. With a bunch
of trickery and hacks it might be possible to infer this from current
ioctls even, but you need to be really careful.


A new uapi is out of question because we need to be backward compatible.


And you're right that amdgpu seems to be the only (or one of the only)
drivers which do truly concurrent rendering to the same buffer (not
just concurrent rendering to multiple buffers all suballocated from
the same bo). But we can't fix this in the kernel with the tricks you
propose, because without such an uapi extension (or inference) we
can't tell the implicit fencing from the explicit fencing case.


Sure we can. As I said for amdgpu that is absolutely no problem at all.

In your terminology all rendering from the same client to a BO is 
explicitly fenced, while all rendering from different clients are 
implicit fenced.



And for shared buffers with explicit fencing we _must_ _not_ sync against
all writes. owner won't help here, because it's still not tracking
whether something is explicit or implicit synced.


Implicit syncing can be disable by giving the 
AMDGPU_GEM_CREATE_EXPLICIT_SYNC flag while creating the BO.



We've cheated a bit with most other drivers in this area, also 

Re: [PATCH 4/5] drm/amdgpu: use bulk moves for efficient VM LRU handling

2018-08-10 Thread Huang Rui
On Thu, Aug 09, 2018 at 08:27:10PM +0800, Koenig, Christian wrote:
> Am 09.08.2018 um 14:25 schrieb Huang Rui:
> > On Thu, Aug 09, 2018 at 03:18:55PM +0800, Koenig, Christian wrote:
> >> Am 09.08.2018 um 08:18 schrieb Huang Rui:
> >>> On Wed, Aug 08, 2018 at 06:47:49PM +0800, Christian König wrote:
>  Am 08.08.2018 um 11:59 schrieb Huang Rui:
> > I continue to work for bulk moving that based on the proposal by 
> > Christian.
> >
> > Background:
> > amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then 
> > move all of
> > them on the end of LRU list one by one. Thus, that cause so many BOs 
> > moved to
> > the end of the LRU, and impact performance seriously.
> >
> > Then Christian provided a workaround to not move PD/PT BOs on LRU with 
> > below
> > patch:
> > "drm/amdgpu: band aid validating VM PTs"
> > Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae
> >
> > However, the final solution should bulk move all PD/PT and PerVM BOs on 
> > the LRU
> > instead of one by one.
> >
> > Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which 
> > need to be
> > validated we move all BOs together to the end of the LRU without 
> > dropping the
> > lock for the LRU.
> >
> > While doing so we note the beginning and end of this block in the LRU 
> > list.
> >
> > Now when amdgpu_vm_validate_pt_bos() is called and we don't have 
> > anything to do,
> > we don't move every BO one by one, but instead cut the LRU list into 
> > pieces so
> > that we bulk move everything to the end in just one operation.
> >
> > Test data:
> > +--+-+---+---+
> > |  |The Talos|Clpeak(OCL)|BusSpeedReadback(OCL) 
> >  |
> > |  |Principle(Vulkan)|   |  
> >  |
> > ++
> > |  | |   |0.319 ms(1k) 0.314 ms(2K) 
> > 0.308 ms(4K) |
> > | Original |  147.7 FPS  |  76.86 us |0.307 ms(8K) 0.310 
> > ms(16K) |
> > ++
> > | Orignial + WA| |   |0.254 ms(1K) 0.241 ms(2K) 
> >  |
> > |(don't move   |  162.1 FPS  |  42.15 us |0.230 ms(4K) 0.223 ms(8K) 
> > 0.204 ms(16K)|
> > |PT BOs on LRU)| |   |  
> >  |
> > ++
> > | Bulk move|  163.1 FPS  |  40.52 us |0.244 ms(1K) 0.252 ms(2K) 
> > 0.213 ms(4K) |
> > |  | |   |0.214 ms(8K) 0.225 
> > ms(16K) |
> > +--+-+---+---+
> >
> > After test them with above three benchmarks include vulkan and opencl. 
> > We can
> > see the visible improvement than original, and even better than 
> > original with
> > workaround.
> >
> > Signed-off-by: Christian König 
> > Signed-off-by: Huang Rui 
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 22 ++
> > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  4 
> > 2 files changed, 18 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > index 9c84770..eda0bb9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > @@ -286,6 +286,7 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device 
> > *adev, struct amdgpu_vm *vm,
> > {
> > struct ttm_bo_global *glob = adev->mman.bdev.glob;
> > struct amdgpu_vm_bo_base *bo_base, *tmp;
> > +   bool validated = false;
> > int r = 0;
> > 
> > list_for_each_entry_safe(bo_base, tmp, >evicted, vm_status) 
> > {
> > @@ -295,14 +296,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device 
> > *adev, struct amdgpu_vm *vm,
> > r = validate(param, bo);
> > if (r)
> > break;
> > -
> > -   spin_lock(>lru_lock);
> > -   ttm_bo_move_to_lru_tail(>tbo, NULL);
> > -   if (bo->shadow)
> > -   
> > ttm_bo_move_to_lru_tail(>shadow->tbo, NULL);
> > -   spin_unlock(>lru_lock);
> > }
> > 
> > +   validated = true;
> 

Re: [Intel-gfx] RFC: Add write flag to reservation object fences

2018-08-10 Thread Christian König

Am 10.08.2018 um 10:32 schrieb Daniel Vetter:

On Fri, Aug 10, 2018 at 10:24 AM, Christian König
 wrote:

Am 10.08.2018 um 09:51 schrieb Chris Wilson:

Quoting Christian König (2018-08-09 15:54:31)

Am 09.08.2018 um 16:22 schrieb Daniel Vetter:

On Thu, Aug 9, 2018 at 3:58 PM, Christian König
 wrote:

Am 09.08.2018 um 15:38 schrieb Daniel Vetter:

On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
[SNIP]

See to me the explicit fence in the reservation object is not even
remotely
related to implicit or explicit synchronization.

Hm, I guess that's the confusion then. The only reason we have the
exclusive fence is to implement cross-driver implicit syncing. What
else you do internally in your driver doesn't matter, as long as you
keep up that contract.

And it's intentionally not called write_fence or anything like that,
because that's not what it tracks.

Of course any buffer moves the kernel does also must be tracked in the
exclusive fence, because userspace cannot know about these. So you
might have an exclusive fence set and also an explicit fence passed in
through the atomic ioctl. Aside: Right now all drivers only observe
one or the other, not both, so will break as soon as we start moving
shared buffers around. At least on Android or anything else using
explicit fencing.

Actually both radeon and nouveau use the approach that shared fences
need to wait on as well when they don't come from the current driver.

nouveau has write hazard tracking in its api , and is using the
excl fence for it was well.

As far as I can tell, this is all about fixing the lack of
acknowledgement of the requirement for implicit fence tracking in
amdgpu's (and its radeon predecessor) ABI.


Well I agree that implicit fencing was a bad idea to begin with, but the
solution you guys came up with is not going to work in all cases either.


Which is fine so long as you
get userspace to only use explicit fence passing to the compositor.


Well exactly that's the problem.

I can't pass exactly one explicit fence to the compositor because I have
multiple command submissions which run asynchronously and need to finish
before the compositor can start to use the surface.

So the whole concept of using a single exclusive fence doesn't work in the
case of amdgpu. And to be honest I think all drivers with multiple engines
could benefit of that as well.

Fences can be merge. This works, really :-) In amdgpu's implementation
of EGL_ANDROID_native_fence_sync you will need to do that before
passing the result to the caller.


Yeah, but that is for the userspace API. Not for any internal handling 
in the driver.


Christian.


-Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2018 at 10:24 AM, Christian König
 wrote:
> Am 10.08.2018 um 09:51 schrieb Chris Wilson:
>>
>> Quoting Christian König (2018-08-09 15:54:31)
>>>
>>> Am 09.08.2018 um 16:22 schrieb Daniel Vetter:

 On Thu, Aug 9, 2018 at 3:58 PM, Christian König
  wrote:
>
> Am 09.08.2018 um 15:38 schrieb Daniel Vetter:
>>
>> On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
>> [SNIP]
>
> See to me the explicit fence in the reservation object is not even
> remotely
> related to implicit or explicit synchronization.

 Hm, I guess that's the confusion then. The only reason we have the
 exclusive fence is to implement cross-driver implicit syncing. What
 else you do internally in your driver doesn't matter, as long as you
 keep up that contract.

 And it's intentionally not called write_fence or anything like that,
 because that's not what it tracks.

 Of course any buffer moves the kernel does also must be tracked in the
 exclusive fence, because userspace cannot know about these. So you
 might have an exclusive fence set and also an explicit fence passed in
 through the atomic ioctl. Aside: Right now all drivers only observe
 one or the other, not both, so will break as soon as we start moving
 shared buffers around. At least on Android or anything else using
 explicit fencing.
>>>
>>> Actually both radeon and nouveau use the approach that shared fences
>>> need to wait on as well when they don't come from the current driver.
>>
>> nouveau has write hazard tracking in its api , and is using the
>> excl fence for it was well.
>>
>> As far as I can tell, this is all about fixing the lack of
>> acknowledgement of the requirement for implicit fence tracking in
>> amdgpu's (and its radeon predecessor) ABI.
>
>
> Well I agree that implicit fencing was a bad idea to begin with, but the
> solution you guys came up with is not going to work in all cases either.
>
>> Which is fine so long as you
>> get userspace to only use explicit fence passing to the compositor.
>
>
> Well exactly that's the problem.
>
> I can't pass exactly one explicit fence to the compositor because I have
> multiple command submissions which run asynchronously and need to finish
> before the compositor can start to use the surface.
>
> So the whole concept of using a single exclusive fence doesn't work in the
> case of amdgpu. And to be honest I think all drivers with multiple engines
> could benefit of that as well.

Fences can be merge. This works, really :-) In amdgpu's implementation
of EGL_ANDROID_native_fence_sync you will need to do that before
passing the result to the caller.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter
On Thu, Aug 9, 2018 at 4:54 PM, Christian König
 wrote:
> Am 09.08.2018 um 16:22 schrieb Daniel Vetter:
>>
>> On Thu, Aug 9, 2018 at 3:58 PM, Christian König
>>  wrote:
>>>
>>> Am 09.08.2018 um 15:38 schrieb Daniel Vetter:

 On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
 [SNIP]
>>>
>>> See to me the explicit fence in the reservation object is not even
>>> remotely
>>> related to implicit or explicit synchronization.
>>
>> Hm, I guess that's the confusion then. The only reason we have the
>> exclusive fence is to implement cross-driver implicit syncing. What
>> else you do internally in your driver doesn't matter, as long as you
>> keep up that contract.
>>
>> And it's intentionally not called write_fence or anything like that,
>> because that's not what it tracks.
>>
>> Of course any buffer moves the kernel does also must be tracked in the
>> exclusive fence, because userspace cannot know about these. So you
>> might have an exclusive fence set and also an explicit fence passed in
>> through the atomic ioctl. Aside: Right now all drivers only observe
>> one or the other, not both, so will break as soon as we start moving
>> shared buffers around. At least on Android or anything else using
>> explicit fencing.
>
>
> Actually both radeon and nouveau use the approach that shared fences need to
> wait on as well when they don't come from the current driver.
>
>>
>> So here's my summary, as I understanding things right now:
>> - for non-shared buffers at least, amdgpu uses explicit fencing, and
>> hence all fences caused by userspace end up as shared fences, whether
>> that's writes or reads. This means you end up with possibly multiple
>> write fences, but never any exclusive fences.
>> - for non-shared buffers the only exclusive fences amdgpu sets are for
>> buffer moves done by the kernel.
>> - amgpu (kernel + userspace combo here) does not seem to have a
>> concept/tracking for when a buffer is used with implicit or explicit
>> fencing. It does however track all writes.
>
>
> No, that is incorrect. It tracks all accesses to a buffer object in the form
> of shared fences, we don't care if it is a write or not.
>
> What we track as well is which client uses a BO last and as long as the same
> client uses the BO we don't add any implicit synchronization.
>
> Only when a BO is used by another client we have implicit synchronization
> for all command submissions. This behavior can be disable with a flag during
> BO creation.

I'm only interested in the case of shared buffers. And for those you
_do_ pessimistically assume that all access must be implicitly synced.
Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
makes sense that you don't bother with it.

>> - as a consequence, amdgpu needs to pessimistically assume that all
>> writes to shared buffer need to obey implicit fencing rules.
>> - for shared buffers (across process or drivers) implicit fencing does
>> _not_ allow concurrent writers. That limitation is why people want to
>> do explicit fencing, and it's the reason why there's only 1 slot for
>> an exclusive. Note I really mean concurrent here, a queue of in-flight
>> writes by different batches is perfectly fine. But it's a fully
>> ordered queue of writes.
>> - but as a consequence of amdgpu's lack of implicit fencing and hence
>> need to pessimistically assume there's multiple write fences amdgpu
>> needs to put multiple fences behind the single exclusive slot. This is
>> a limitation imposed by by the amdgpu stack, not something inherit to
>> how implicit fencing works.
>> - Chris Wilson's patch implements all this (and afaics with a bit more
>> coffee, correctly).
>>
>> If you want to be less pessimistic in amdgpu for shared buffers, you
>> need to start tracking which shared buffer access need implicit and
>> which explicit sync. What you can't do is suddenly create more than 1
>> exclusive fence, that's not how implicit fencing works. Another thing
>> you cannot do is force everyone else (in non-amdgpu or core code) to
>> sync against _all_ writes, because that forces implicit syncing. Which
>> people very much don't want.
>
>
> I also do see the problem that most other hardware doesn't need that
> functionality, because it is driven by a single engine. That's why I tried
> to keep the overhead as low as possible.
>
> But at least for amdgpu (and I strongly suspect for nouveau as well) it is
> absolutely vital in a number of cases to allow concurrent accesses from the
> same client even when the BO is then later used with implicit
> synchronization.
>
> This is also the reason why the current workaround is so problematic for us.
> Cause as soon as the BO is shared with another (non-amdgpu) device all
> command submissions to it will be serialized even when they come from the
> same client.
>
> Would it be an option extend the concept of the "owner" of the BO amdpgu
> uses to other drivers as well?
>
> When you already have explicit synchronization 

Re: [Intel-gfx] RFC: Add write flag to reservation object fences

2018-08-10 Thread Christian König

Am 10.08.2018 um 09:51 schrieb Chris Wilson:

Quoting Christian König (2018-08-09 15:54:31)

Am 09.08.2018 um 16:22 schrieb Daniel Vetter:

On Thu, Aug 9, 2018 at 3:58 PM, Christian König
 wrote:

Am 09.08.2018 um 15:38 schrieb Daniel Vetter:

On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
[SNIP]

See to me the explicit fence in the reservation object is not even remotely
related to implicit or explicit synchronization.

Hm, I guess that's the confusion then. The only reason we have the
exclusive fence is to implement cross-driver implicit syncing. What
else you do internally in your driver doesn't matter, as long as you
keep up that contract.

And it's intentionally not called write_fence or anything like that,
because that's not what it tracks.

Of course any buffer moves the kernel does also must be tracked in the
exclusive fence, because userspace cannot know about these. So you
might have an exclusive fence set and also an explicit fence passed in
through the atomic ioctl. Aside: Right now all drivers only observe
one or the other, not both, so will break as soon as we start moving
shared buffers around. At least on Android or anything else using
explicit fencing.

Actually both radeon and nouveau use the approach that shared fences
need to wait on as well when they don't come from the current driver.

nouveau has write hazard tracking in its api , and is using the
excl fence for it was well.

As far as I can tell, this is all about fixing the lack of
acknowledgement of the requirement for implicit fence tracking in
amdgpu's (and its radeon predecessor) ABI.


Well I agree that implicit fencing was a bad idea to begin with, but the 
solution you guys came up with is not going to work in all cases either.



Which is fine so long as you
get userspace to only use explicit fence passing to the compositor.


Well exactly that's the problem.

I can't pass exactly one explicit fence to the compositor because I have 
multiple command submissions which run asynchronously and need to finish 
before the compositor can start to use the surface.


So the whole concept of using a single exclusive fence doesn't work in 
the case of amdgpu. And to be honest I think all drivers with multiple 
engines could benefit of that as well.


Christian.


-Chris


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] RFC: Add write flag to reservation object fences

2018-08-10 Thread Chris Wilson
Quoting Christian König (2018-08-09 15:54:31)
> Am 09.08.2018 um 16:22 schrieb Daniel Vetter:
> > On Thu, Aug 9, 2018 at 3:58 PM, Christian König
> >  wrote:
> >> Am 09.08.2018 um 15:38 schrieb Daniel Vetter:
> >>> On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
> >>> [SNIP]
> >> See to me the explicit fence in the reservation object is not even remotely
> >> related to implicit or explicit synchronization.
> > Hm, I guess that's the confusion then. The only reason we have the
> > exclusive fence is to implement cross-driver implicit syncing. What
> > else you do internally in your driver doesn't matter, as long as you
> > keep up that contract.
> >
> > And it's intentionally not called write_fence or anything like that,
> > because that's not what it tracks.
> >
> > Of course any buffer moves the kernel does also must be tracked in the
> > exclusive fence, because userspace cannot know about these. So you
> > might have an exclusive fence set and also an explicit fence passed in
> > through the atomic ioctl. Aside: Right now all drivers only observe
> > one or the other, not both, so will break as soon as we start moving
> > shared buffers around. At least on Android or anything else using
> > explicit fencing.
> 
> Actually both radeon and nouveau use the approach that shared fences 
> need to wait on as well when they don't come from the current driver.

nouveau has write hazard tracking in its api , and is using the
excl fence for it was well.

As far as I can tell, this is all about fixing the lack of
acknowledgement of the requirement for implicit fence tracking in
amdgpu's (and its radeon predecessor) ABI. Which is fine so long as you
get userspace to only use explicit fence passing to the compositor.
-Chris
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/6] dma-buf: add reservation object shared fence accessor

2018-08-10 Thread Huang Rui
On Thu, Aug 09, 2018 at 01:37:09PM +0200, Christian König wrote:
> Add a helper to access the shared fences in an reservation object.
> 
> Signed-off-by: Christian König 

Reviewed-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  7 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c |  3 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  4 ++--
>  drivers/gpu/drm/msm/msm_gem.c|  4 ++--
>  drivers/gpu/drm/nouveau/nouveau_fence.c  |  3 +--
>  drivers/gpu/drm/radeon/radeon_sync.c |  3 +--
>  drivers/gpu/drm/ttm/ttm_bo.c |  4 +---
>  include/linux/reservation.h  | 19 +++
>  8 files changed, 29 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index fa38a960ce00..989932234160 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -238,9 +238,7 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
> amdgpu_bo *bo,
>   for (i = 0; i < shared_count; ++i) {
>   struct dma_fence *f;
>  
> - f = rcu_dereference_protected(fobj->shared[i],
> -   reservation_object_held(resv));
> -
> + f = reservation_object_get_shared_fence(resv, fobj, i);
>   if (ef) {
>   if (f->context == ef->base.context) {
>   dma_fence_put(f);
> @@ -273,8 +271,7 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
> amdgpu_bo *bo,
>   struct dma_fence *f;
>   struct amdgpu_amdkfd_fence *efence;
>  
> - f = rcu_dereference_protected(fobj->shared[i],
> - reservation_object_held(resv));
> + f = reservation_object_get_shared_fence(resv, fobj, i);
>  
>   efence = to_amdgpu_amdkfd_fence(f);
>   if (efence) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index 2d6f5ec77a68..dbfd62ab67e4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -212,8 +212,7 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>   return r;
>  
>   for (i = 0; i < flist->shared_count; ++i) {
> - f = rcu_dereference_protected(flist->shared[i],
> -   reservation_object_held(resv));
> + f = reservation_object_get_shared_fence(resv, flist, i);
>   /* We only want to trigger KFD eviction fences on
>* evict or move jobs. Skip KFD fences otherwise.
>*/
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c6611cff64c8..22896a398eab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1482,8 +1482,8 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct 
> ttm_buffer_object *bo,
>   flist = reservation_object_get_list(bo->resv);
>   if (flist) {
>   for (i = 0; i < flist->shared_count; ++i) {
> - f = rcu_dereference_protected(flist->shared[i],
> - reservation_object_held(bo->resv));
> + f = reservation_object_get_shared_fence(bo->resv,
> + flist, i);
>   if (amdkfd_fence_check_mm(f, current->mm))
>   return false;
>   }
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index f583bb4222f9..95d25dbfde2b 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -651,8 +651,8 @@ int msm_gem_sync_object(struct drm_gem_object *obj,
>   return 0;
>  
>   for (i = 0; i < fobj->shared_count; i++) {
> - fence = rcu_dereference_protected(fobj->shared[i],
> - 
> reservation_object_held(msm_obj->resv));
> + fence = reservation_object_get_shared_fence(msm_obj->resv,
> + fobj, i);
>   if (fence->context != fctx->context) {
>   ret = dma_fence_wait(fence, true);
>   if (ret)
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 412d49bc6e56..3ce921c276c1 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -376,8 +376,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct 
> nouveau_channel *chan, bool e
>   struct nouveau_channel *prev = NULL;
>   bool must_wait = true;
>  
> - fence = 

Re: [PATCH] drm/amdgpu/powerplay: check vrefresh when when changing displays

2018-08-10 Thread Huang Rui
On Thu, Aug 09, 2018 at 03:16:23PM -0500, Alex Deucher wrote:
> Compare the current vrefresh in addition to the number of displays
> when determining whether or not the smu needs updates when changing
> modes. The SMU needs to be updated if the vbi timeout changes due
> to a different refresh rate.  Fixes flickering around mode changes
> in some cases on polaris parts.
> 
> Signed-off-by: Alex Deucher 

Reviewed-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c| 3 +++
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.h| 1 +
>  drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c| 1 +
>  drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c  | 3 ++-
>  drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c   | 1 +
>  drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c | 1 +
>  drivers/gpu/drm/amd/powerplay/smumgr/tonga_smumgr.c | 1 +
>  drivers/gpu/drm/amd/powerplay/smumgr/vegam_smumgr.c | 1 +
>  8 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> index 380f282a64ba..ab759e38e4ea 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c
> @@ -4132,6 +4132,9 @@ 
> smu7_check_smc_update_required_for_display_configuration(struct pp_hwmgr 
> *hwmgr)
>   if (data->display_timing.num_existing_displays != 
> hwmgr->display_config->num_display)
>   is_update_required = true;
>  
> + if (data->display_timing.vrefresh != hwmgr->display_config->vrefresh)
> + is_update_required = true;
> +
>   if (phm_cap_enabled(hwmgr->platform_descriptor.platformCaps, 
> PHM_PlatformCaps_SclkDeepSleep)) {
>   if (data->display_timing.min_clock_in_sr != 
> hwmgr->display_config->min_core_set_clock_in_sr &&
>   (data->display_timing.min_clock_in_sr >= 
> SMU7_MINIMUM_ENGINE_CLOCK ||
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.h 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.h
> index 3784ce6e50ab..69d361f8dfca 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.h
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.h
> @@ -156,6 +156,7 @@ struct smu7_vbios_boot_state {
>  struct smu7_display_timing {
>   uint32_t  min_clock_in_sr;
>   uint32_t  num_existing_displays;
> + uint32_t  vrefresh;
>  };
>  
>  struct smu7_dpmlevel_enable_mask {
> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c 
> b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
> index fbe3ef4ee45c..18643e06bc6f 100644
> --- a/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c
> @@ -1231,6 +1231,7 @@ static int ci_populate_single_memory_level(
>   memory_level->DisplayWatermark = PPSMC_DISPLAY_WATERMARK_LOW;
>  
>   data->display_timing.num_existing_displays = 
> hwmgr->display_config->num_display;
> + data->display_timing.vrefresh = hwmgr->display_config->vrefresh;
>  
>   /* stutter mode not support on ci */
>  
> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c 
> b/drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c
> index 18048f8e2f13..ec14798e87b6 100644
> --- a/drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c
> @@ -1210,7 +1210,8 @@ static int fiji_populate_single_memory_level(struct 
> pp_hwmgr *hwmgr,
>* PECI_GetNumberOfActiveDisplays(hwmgr->pPECI,
>* &(data->DisplayTiming.numExistingDisplays));
>*/
> - data->display_timing.num_existing_displays = 1;
> + data->display_timing.num_existing_displays = 
> hwmgr->display_config->num_display;
> + data->display_timing.vrefresh = hwmgr->display_config->vrefresh;
>  
>   if (mclk_stutter_mode_threshold &&
>   (clock <= mclk_stutter_mode_threshold) &&
> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c 
> b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
> index 9299b93aa09a..73aa368a454e 100644
> --- a/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c
> @@ -1280,6 +1280,7 @@ static int iceland_populate_single_memory_level(
>   memory_level->DisplayWatermark = PPSMC_DISPLAY_WATERMARK_LOW;
>  
>   data->display_timing.num_existing_displays = 
> hwmgr->display_config->num_display;
> + data->display_timing.vrefresh = hwmgr->display_config->vrefresh;
>  
>   /* stutter mode not support on iceland */
>  
> diff --git a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c 
> b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> index 1276f168ff68..872d3824337b 100644
> --- a/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/smumgr/polaris10_smumgr.c
> @@ -1103,6 +1103,7 @@ static int 
> 

Re: [PATCH xf86-video-amdgpu] Store FB for each CRTC in drmmode_flipdata_rec

2018-08-10 Thread Johannes Hirte
On 2018 Jul 27, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> We were only storing the FB provided by the client, but on CRTCs with
> TearFree enabled, we use a separate FB. This could cause
> drmmode_flip_handler to fail to clear drmmode_crtc->flip_pending, which
> could result in a hang when waiting for the pending flip to complete. We
> were trying to avoid that by always clearing drmmode_crtc->flip_pending
> when TearFree is enabled, but that wasn't reliable, because
> drmmode_crtc->tear_free can already be FALSE at this point when
> disabling TearFree.
> 
> Now that we're keeping track of each CRTC's flip FB separately,
> drmmode_flip_handler can reliably clear flip_pending, and we no longer
> need the TearFree hack.
> 
> Signed-off-by: Michel Dänzer 

Since this change I get a black screen when login into KDE Plasma. I
have to switch to linux console and back for getting the X11 screen.
Additional the Xorg.log is spammed with:

[   189.744] (WW) AMDGPU(0): get vblank counter failed: Invalid argument
[   189.828] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: Device 
or resource busy, TearFree inactive until next modeset
[   189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: 
Invalid argument
[   189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: 
Invalid argument

The "flip queue failed" message appears only once, the other two are
much more often.

System is a Carrizo A10-8700B, kernel 4.17.13 + this patch:
https://bugzilla.kernel.org/attachment.cgi?id=276173


-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: move amdgpu ttm structures to amdgpu_ttm.h

2018-08-10 Thread Huang Rui
On Fri, Aug 10, 2018 at 01:44:28PM +0800, Junwei Zhang wrote:
> code cleanup for amdgpu ttm structures
> 
> Signed-off-by: Junwei Zhang 

Acked-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 17 +
>  2 files changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c6611cf..87f4e8d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -776,26 +776,6 @@ static unsigned long amdgpu_ttm_io_mem_pfn(struct 
> ttm_buffer_object *bo,
>   (offset >> PAGE_SHIFT);
>  }
>  
> -/*
> - * TTM backend functions.
> - */
> -struct amdgpu_ttm_gup_task_list {
> - struct list_headlist;
> - struct task_struct  *task;
> -};
> -
> -struct amdgpu_ttm_tt {
> - struct ttm_dma_tt   ttm;
> - u64 offset;
> - uint64_tuserptr;
> - struct task_struct  *usertask;
> - uint32_tuserflags;
> - spinlock_t  guptasklock;
> - struct list_headguptasks;
> - atomic_tmmu_invalidations;
> - uint32_tlast_set_pages;
> -};
> -
>  /**
>   * amdgpu_ttm_tt_get_user_pages - Pin pages of memory pointed to by a USERPTR
>   * pointer to memory
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 8b3cc66..b8c391a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -66,6 +66,23 @@ struct amdgpu_copy_mem {
>   unsigned long   offset;
>  };
>  
> +struct amdgpu_ttm_gup_task_list {
> + struct list_headlist;
> + struct task_struct  *task;
> +};
> +
> +struct amdgpu_ttm_tt {
> + struct ttm_dma_tt   ttm;
> + u64 offset;
> + uint64_tuserptr;
> + struct task_struct  *usertask;
> + uint32_tuserflags;
> + spinlock_t  guptasklock;
> + struct list_headguptasks;
> + atomic_tmmu_invalidations;
> + uint32_tlast_set_pages;
> +};
> +
>  extern const struct ttm_mem_type_manager_func amdgpu_gtt_mgr_func;
>  extern const struct ttm_mem_type_manager_func amdgpu_vram_mgr_func;
>  
> -- 
> 1.9.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

2018-08-10 Thread S, Shirish



On 8/10/2018 12:02 PM, Zhu, Rex wrote:


I am Ok with the check when call vce_v3_0_hw_fini.


But we may still need to call amdpug_vce_suspend/resume.


Done in V2. Have moved the check such that both are executed.
Regards,
Shirish S



and not sure whether need to do ring test when resume back.


Best Regards

Rex


*From:* S, Shirish
*Sent:* Friday, August 10, 2018 2:15 PM
*To:* Deucher, Alexander; Zhu, Rex; Liu, Leo
*Cc:* amd-gfx@lists.freedesktop.org; S, Shirish
*Subject:* [PATCH] drm/amdpu/vce_v3: skip suspend and resume if 
powergated

This patch adds a mechanism by which the VCE 3.0 block
shall check if it was enabled or in use before suspending,
if it was powergated while entering suspend then there
is no need to repeat it in vce_3_0_suspend().
Similarly, if the block was powergated while entering suspend
itself then there is no need to resume it.

By this we not only make the suspend and resume sequence
more efficient, but also optimize the overall amdgpu suspend
and resume time by reducing the ring intialize and tests
for unused IP blocks.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 21 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 07924d4..aa85063 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1035,6 +1035,8 @@ struct amdgpu_device {

 /* vce */
 struct amdgpu_vce   vce;
+   bool    is_vce_pg;
+   bool is_vce_disabled;

 /* vcn */
 struct amdgpu_vcn   vcn;
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c

index cc6ce6c..822cfd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -326,6 +326,7 @@ static int vce_v3_0_start(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = false;
 return 0;
 }

@@ -355,6 +356,7 @@ static int vce_v3_0_stop(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = true;
 return 0;
 }

@@ -506,6 +508,17 @@ static int vce_v3_0_suspend(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with suspend sequence only if VCE is started
+    * Mark the block as being disabled if its stopped.
+    */
+   if (adev->is_vce_pg) {
+   DRM_DEBUG("VCE is already powergated, not suspending\n");
+   adev->is_vce_disabled = true;
+   return 0;
+   }
+
+   adev->is_vce_disabled = false;
+
 r = vce_v3_0_hw_fini(adev);
 if (r)
 return r;
@@ -518,6 +531,14 @@ static int vce_v3_0_resume(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with resume sequence if VCE was enabled
+    * while suspending.
+    */
+   if (adev->is_vce_disabled) {
+   DRM_DEBUG("VCE is powergated, not resuming the block\n");
+   return 0;
+   }
+
 r = amdgpu_vce_resume(adev);
 if (r)
 return r;
--
2.7.4



--
Regards,
Shirish S

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v2 1/5] drm/amdgpu:add tmr mac address into amdgpu_firmware_info

2018-08-10 Thread Gao, Likun
For series:
Review-by: Likun Gao 

Thanks for all your efforts and will apply those patch to pco topic branch

Regards,
Likun

-Original Message-
From: James Zhu  
Sent: Friday, August 10, 2018 12:32 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhu, James ; Deucher, Alexander 
; Gao, Likun ; Huang, Ray 

Subject: [PATCH v2 1/5] drm/amdgpu:add tmr mac address into amdgpu_firmware_info

amdgpu IP blocks booting need Trust Memory Region(tmr) mac address of its 
firmware which is loaded by PSP

Signed-off-by: James Zhu 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
index 11e81a3..e9d64e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
@@ -232,6 +232,9 @@ struct amdgpu_firmware_info {
void *kaddr;
/* ucode_size_bytes */
uint32_t ucode_size;
+   /* starting tmr mc address */
+   uint32_t tmr_mc_addr_lo;
+   uint32_t tmr_mc_addr_hi;
 };
 
 struct amdgpu_firmware {
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


答复: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

2018-08-10 Thread Qu, Jim

Fix typo throw() to thaw().

Thanks
JimQu


发件人: Qu, Jim
发送时间: 2018年8月10日 14:55:09
收件人: Zhu, Rex; S, Shirish; Deucher, Alexander; Liu, Leo
抄送: amd-gfx@lists.freedesktop.org
主题: 答复: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

There are ASIC reset/ASIC resume during S4 freeze()/throw() process, I am 
afraid there may be some problem for ring/ib test.

It should be more test to confirm it.

Thanks
JimQu


发件人: amd-gfx  代表 Zhu, Rex 

发送时间: 2018年8月10日 14:32:36
收件人: S, Shirish; Deucher, Alexander; Liu, Leo
抄送: amd-gfx@lists.freedesktop.org
主题: Re: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

I am Ok with the check when call vce_v3_0_hw_fini.


But we may still need to call amdpug_vce_suspend/resume.

and not sure whether need to do ring test when resume back.


Best Regards

Rex


From: S, Shirish
Sent: Friday, August 10, 2018 2:15 PM
To: Deucher, Alexander; Zhu, Rex; Liu, Leo
Cc: amd-gfx@lists.freedesktop.org; S, Shirish
Subject: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

This patch adds a mechanism by which the VCE 3.0 block
shall check if it was enabled or in use before suspending,
if it was powergated while entering suspend then there
is no need to repeat it in vce_3_0_suspend().
Similarly, if the block was powergated while entering suspend
itself then there is no need to resume it.

By this we not only make the suspend and resume sequence
more efficient, but also optimize the overall amdgpu suspend
and resume time by reducing the ring intialize and tests
for unused IP blocks.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 21 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 07924d4..aa85063 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1035,6 +1035,8 @@ struct amdgpu_device {

 /* vce */
 struct amdgpu_vce   vce;
+   boolis_vce_pg;
+   boolis_vce_disabled;

 /* vcn */
 struct amdgpu_vcn   vcn;
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index cc6ce6c..822cfd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -326,6 +326,7 @@ static int vce_v3_0_start(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = false;
 return 0;
 }

@@ -355,6 +356,7 @@ static int vce_v3_0_stop(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = true;
 return 0;
 }

@@ -506,6 +508,17 @@ static int vce_v3_0_suspend(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with suspend sequence only if VCE is started
+* Mark the block as being disabled if its stopped.
+*/
+   if (adev->is_vce_pg) {
+   DRM_DEBUG("VCE is already powergated, not suspending\n");
+   adev->is_vce_disabled = true;
+   return 0;
+   }
+
+   adev->is_vce_disabled = false;
+
 r = vce_v3_0_hw_fini(adev);
 if (r)
 return r;
@@ -518,6 +531,14 @@ static int vce_v3_0_resume(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with resume sequence if VCE was enabled
+* while suspending.
+*/
+   if (adev->is_vce_disabled) {
+   DRM_DEBUG("VCE is powergated, not resuming the block\n");
+   return 0;
+   }
+
 r = amdgpu_vce_resume(adev);
 if (r)
 return r;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


答复: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

2018-08-10 Thread Qu, Jim
There are ASIC reset/ASIC resume during S4 freeze()/throw() process, I am 
afraid there may be some problem for ring/ib test.

It should be more test to confirm it.

Thanks
JimQu


发件人: amd-gfx  代表 Zhu, Rex 

发送时间: 2018年8月10日 14:32:36
收件人: S, Shirish; Deucher, Alexander; Liu, Leo
抄送: amd-gfx@lists.freedesktop.org
主题: Re: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

I am Ok with the check when call vce_v3_0_hw_fini.


But we may still need to call amdpug_vce_suspend/resume.

and not sure whether need to do ring test when resume back.


Best Regards

Rex


From: S, Shirish
Sent: Friday, August 10, 2018 2:15 PM
To: Deucher, Alexander; Zhu, Rex; Liu, Leo
Cc: amd-gfx@lists.freedesktop.org; S, Shirish
Subject: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

This patch adds a mechanism by which the VCE 3.0 block
shall check if it was enabled or in use before suspending,
if it was powergated while entering suspend then there
is no need to repeat it in vce_3_0_suspend().
Similarly, if the block was powergated while entering suspend
itself then there is no need to resume it.

By this we not only make the suspend and resume sequence
more efficient, but also optimize the overall amdgpu suspend
and resume time by reducing the ring intialize and tests
for unused IP blocks.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 21 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 07924d4..aa85063 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1035,6 +1035,8 @@ struct amdgpu_device {

 /* vce */
 struct amdgpu_vce   vce;
+   boolis_vce_pg;
+   boolis_vce_disabled;

 /* vcn */
 struct amdgpu_vcn   vcn;
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index cc6ce6c..822cfd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -326,6 +326,7 @@ static int vce_v3_0_start(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = false;
 return 0;
 }

@@ -355,6 +356,7 @@ static int vce_v3_0_stop(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = true;
 return 0;
 }

@@ -506,6 +508,17 @@ static int vce_v3_0_suspend(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with suspend sequence only if VCE is started
+* Mark the block as being disabled if its stopped.
+*/
+   if (adev->is_vce_pg) {
+   DRM_DEBUG("VCE is already powergated, not suspending\n");
+   adev->is_vce_disabled = true;
+   return 0;
+   }
+
+   adev->is_vce_disabled = false;
+
 r = vce_v3_0_hw_fini(adev);
 if (r)
 return r;
@@ -518,6 +531,14 @@ static int vce_v3_0_resume(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with resume sequence if VCE was enabled
+* while suspending.
+*/
+   if (adev->is_vce_disabled) {
+   DRM_DEBUG("VCE is powergated, not resuming the block\n");
+   return 0;
+   }
+
 r = amdgpu_vce_resume(adev);
 if (r)
 return r;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

2018-08-10 Thread Zhu, Rex
I am Ok with the check when call vce_v3_0_hw_fini.


But we may still need to call amdpug_vce_suspend/resume.

and not sure whether need to do ring test when resume back.


Best Regards

Rex


From: S, Shirish
Sent: Friday, August 10, 2018 2:15 PM
To: Deucher, Alexander; Zhu, Rex; Liu, Leo
Cc: amd-gfx@lists.freedesktop.org; S, Shirish
Subject: [PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

This patch adds a mechanism by which the VCE 3.0 block
shall check if it was enabled or in use before suspending,
if it was powergated while entering suspend then there
is no need to repeat it in vce_3_0_suspend().
Similarly, if the block was powergated while entering suspend
itself then there is no need to resume it.

By this we not only make the suspend and resume sequence
more efficient, but also optimize the overall amdgpu suspend
and resume time by reducing the ring intialize and tests
for unused IP blocks.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 21 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 07924d4..aa85063 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1035,6 +1035,8 @@ struct amdgpu_device {

 /* vce */
 struct amdgpu_vce   vce;
+   boolis_vce_pg;
+   boolis_vce_disabled;

 /* vcn */
 struct amdgpu_vcn   vcn;
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index cc6ce6c..822cfd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -326,6 +326,7 @@ static int vce_v3_0_start(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = false;
 return 0;
 }

@@ -355,6 +356,7 @@ static int vce_v3_0_stop(struct amdgpu_device *adev)
 WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
 mutex_unlock(>grbm_idx_mutex);

+   adev->is_vce_pg = true;
 return 0;
 }

@@ -506,6 +508,17 @@ static int vce_v3_0_suspend(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with suspend sequence only if VCE is started
+* Mark the block as being disabled if its stopped.
+*/
+   if (adev->is_vce_pg) {
+   DRM_DEBUG("VCE is already powergated, not suspending\n");
+   adev->is_vce_disabled = true;
+   return 0;
+   }
+
+   adev->is_vce_disabled = false;
+
 r = vce_v3_0_hw_fini(adev);
 if (r)
 return r;
@@ -518,6 +531,14 @@ static int vce_v3_0_resume(void *handle)
 int r;
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+   /* Proceed with resume sequence if VCE was enabled
+* while suspending.
+*/
+   if (adev->is_vce_disabled) {
+   DRM_DEBUG("VCE is powergated, not resuming the block\n");
+   return 0;
+   }
+
 r = amdgpu_vce_resume(adev);
 if (r)
 return r;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdpu/vce_v3: skip suspend and resume if powergated

2018-08-10 Thread Shirish S
This patch adds a mechanism by which the VCE 3.0 block
shall check if it was enabled or in use before suspending,
if it was powergated while entering suspend then there
is no need to repeat it in vce_3_0_suspend().
Similarly, if the block was powergated while entering suspend
itself then there is no need to resume it.

By this we not only make the suspend and resume sequence
more efficient, but also optimize the overall amdgpu suspend
and resume time by reducing the ring intialize and tests
for unused IP blocks.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 21 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 07924d4..aa85063 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1035,6 +1035,8 @@ struct amdgpu_device {
 
/* vce */
struct amdgpu_vce   vce;
+   boolis_vce_pg;
+   boolis_vce_disabled;
 
/* vcn */
struct amdgpu_vcn   vcn;
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index cc6ce6c..822cfd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -326,6 +326,7 @@ static int vce_v3_0_start(struct amdgpu_device *adev)
WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
mutex_unlock(>grbm_idx_mutex);
 
+   adev->is_vce_pg = false;
return 0;
 }
 
@@ -355,6 +356,7 @@ static int vce_v3_0_stop(struct amdgpu_device *adev)
WREG32(mmGRBM_GFX_INDEX, mmGRBM_GFX_INDEX_DEFAULT);
mutex_unlock(>grbm_idx_mutex);
 
+   adev->is_vce_pg = true;
return 0;
 }
 
@@ -506,6 +508,17 @@ static int vce_v3_0_suspend(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /* Proceed with suspend sequence only if VCE is started
+* Mark the block as being disabled if its stopped.
+*/
+   if (adev->is_vce_pg) {
+   DRM_DEBUG("VCE is already powergated, not suspending\n");
+   adev->is_vce_disabled = true;
+   return 0;
+   }
+
+   adev->is_vce_disabled = false;
+
r = vce_v3_0_hw_fini(adev);
if (r)
return r;
@@ -518,6 +531,14 @@ static int vce_v3_0_resume(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /* Proceed with resume sequence if VCE was enabled
+* while suspending.
+*/
+   if (adev->is_vce_disabled) {
+   DRM_DEBUG("VCE is powergated, not resuming the block\n");
+   return 0;
+   }
+
r = amdgpu_vce_resume(adev);
if (r)
return r;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx