RE: [PATCH 1/6] drm/amdgpu: Remove FW_LOAD_DIRECT type support on VI

2018-10-02 Thread Zhu, Rex
I will add some warnings in dmesg.
Thanks


Best Regards
Rex


> -Original Message-
> From: Quan, Evan
> Sent: Sunday, September 30, 2018 10:30 AM
> To: Zhu, Rex ; amd-gfx@lists.freedesktop.org
> Cc: Zhu, Rex 
> Subject: RE: [PATCH 1/6] drm/amdgpu: Remove FW_LOAD_DIRECT type
> support on VI
> 
> Instead of change the fw load type to AMDGPU_FW_LOAD_SMU silently, it's
> better to warn/error out if user specify FW_LOAD_DIRECT load type.
> 
> Regards,
> Evan
> > -Original Message-
> > From: amd-gfx  On Behalf Of Rex
> > Zhu
> > Sent: 2018年9月30日 0:15
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Zhu, Rex 
> > Subject: [PATCH 1/6] drm/amdgpu: Remove FW_LOAD_DIRECT type
> support on
> > VI
> >
> > AMDGPU_FW_LOAD_DIRECT is used for bring up.
> > Now it don't work any more. so remove the support.
> >
> > Signed-off-by: Rex Zhu 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c |   3 -
> >  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 249 ++-
> > ---
> >  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c|  57 +--
> >  3 files changed, 56 insertions(+), 253 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > index 1fa8bc3..fb2bdf3 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > @@ -297,9 +297,6 @@ enum amdgpu_firmware_load_type
> > case CHIP_POLARIS11:
> > case CHIP_POLARIS12:
> > case CHIP_VEGAM:
> > -   if (!load_type)
> > -   return AMDGPU_FW_LOAD_DIRECT;
> > -   else
> > return AMDGPU_FW_LOAD_SMU;
> > case CHIP_VEGA10:
> > case CHIP_RAVEN:
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > index 2aeef2b..c63ede1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > @@ -1173,64 +1173,61 @@ static int gfx_v8_0_init_microcode(struct
> > amdgpu_device *adev)
> > }
> > }
> >
> > -   if (adev->firmware.load_type == AMDGPU_FW_LOAD_SMU) {
> > -   info = 
> > >firmware.ucode[AMDGPU_UCODE_ID_CP_PFP];
> > -   info->ucode_id = AMDGPU_UCODE_ID_CP_PFP;
> > -   info->fw = adev->gfx.pfp_fw;
> > -   header = (const struct common_firmware_header *)info-
> > >fw->data;
> > -   adev->firmware.fw_size +=
> > -   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > -
> > -   info = 
> > >firmware.ucode[AMDGPU_UCODE_ID_CP_ME];
> > -   info->ucode_id = AMDGPU_UCODE_ID_CP_ME;
> > -   info->fw = adev->gfx.me_fw;
> > -   header = (const struct common_firmware_header *)info-
> > >fw->data;
> > -   adev->firmware.fw_size +=
> > -   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > -
> > -   info = 
> > >firmware.ucode[AMDGPU_UCODE_ID_CP_CE];
> > -   info->ucode_id = AMDGPU_UCODE_ID_CP_CE;
> > -   info->fw = adev->gfx.ce_fw;
> > -   header = (const struct common_firmware_header *)info-
> > >fw->data;
> > -   adev->firmware.fw_size +=
> > -   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > +   info = >firmware.ucode[AMDGPU_UCODE_ID_CP_PFP];
> > +   info->ucode_id = AMDGPU_UCODE_ID_CP_PFP;
> > +   info->fw = adev->gfx.pfp_fw;
> > +   header = (const struct common_firmware_header *)info->fw->data;
> > +   adev->firmware.fw_size +=
> > +   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > +
> > +   info = >firmware.ucode[AMDGPU_UCODE_ID_CP_ME];
> > +   info->ucode_id = AMDGPU_UCODE_ID_CP_ME;
> > +   info->fw = adev->gfx.me_fw;
> > +   header = (const struct common_firmware_header *)info->fw->data;
> > +   adev->firmware.fw_size +=
> > +   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > +
> > +   info = >firmware.ucode[AMDGPU_UCODE_ID_CP_CE];
> > +   info->ucode_id = AMDGPU_UCODE_ID_CP_CE;
> > +   info->fw = adev->gfx.ce_fw;
> > +   header = (const struct common_firmware_header *)info->fw->data;
> > +   adev->firmware.fw_size +=
> > +   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > +
> > +   info = >firmware.ucode[AMDGPU_UCODE_ID_RLC_G];
> > +   info->ucode_id = AMDGPU_UCODE_ID_RLC_G;
> > +   info->fw = adev->gfx.rlc_fw;
> > +   header = (const struct common_firmware_header *)info->fw->data;
> > +   adev->firmware.fw_size +=
> > +   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > +
> > +   info = >firmware.ucode[AMDGPU_UCODE_ID_CP_MEC1];
> > +   info->ucode_id = AMDGPU_UCODE_ID_CP_MEC1;
> > +   info->fw = adev->gfx.mec_fw;
> > +   header = (const struct common_firmware_header *)info->fw->data;
> > +   adev->firmware.fw_size +=
> > +   ALIGN(le32_to_cpu(header->ucode_size_bytes),
> > PAGE_SIZE);
> > +
> > +   /* we need account JT in */
> > +   cp_hdr = (const 

RE: [PATCH 3/4] drm/amdgpu: Add fw load in gfx_v8 and sdma_v3.

2018-10-02 Thread Zhu, Rex
Yes. So the hw ip smu/gfx/sdma have no dependence. We can initialize 
gfx/sdma/smu or sdma/gfx/smu.

In powerplay, if the fw has been loaded successfully, we will skip the fw 
loading.

Rex 

> -Original Message-
> From: Quan, Evan
> Sent: Sunday, September 30, 2018 12:19 PM
> To: Zhu, Rex ; amd-gfx@lists.freedesktop.org
> Cc: Zhu, Rex 
> Subject: RE: [PATCH 3/4] drm/amdgpu: Add fw load in gfx_v8 and sdma_v3.
> 
> Will the pp_funcs->load_firmware be called twice?
> 
> Regards,
> Evan
> > -Original Message-
> > From: amd-gfx  On Behalf Of Rex
> > Zhu
> > Sent: 2018年9月30日 0:19
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Zhu, Rex 
> > Subject: [PATCH 3/4] drm/amdgpu: Add fw load in gfx_v8 and sdma_v3.
> >
> > gfx and sdma can be initialized before smu.
> >
> > Signed-off-by: Rex Zhu 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 11 +++
> > drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |  8 
> >  2 files changed, 19 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > index 6b1954e..77e05c1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> > @@ -4180,9 +4180,20 @@ static void gfx_v8_0_rlc_start(struct
> > amdgpu_device *adev)
> >
> >  static int gfx_v8_0_rlc_resume(struct amdgpu_device *adev)  {
> > +   int r;
> > +
> > gfx_v8_0_rlc_stop(adev);
> > gfx_v8_0_rlc_reset(adev);
> > gfx_v8_0_init_pg(adev);
> > +
> > +   if (adev->powerplay.pp_funcs->load_firmware) {
> > +   r = adev->powerplay.pp_funcs->load_firmware(adev-
> > >powerplay.pp_handle);
> > +   if (r) {
> > +   pr_err("firmware loading failed\n");
> > +   return r;
> > +   }
> > +   }
> > +
> > gfx_v8_0_rlc_start(adev);
> >
> > return 0;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > index 6fb3eda..0bdde7f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
> > @@ -788,6 +788,14 @@ static int sdma_v3_0_start(struct amdgpu_device
> > *adev)  {
> > int r;
> >
> > +   if (adev->powerplay.pp_funcs->load_firmware) {
> > +   r = adev->powerplay.pp_funcs->load_firmware(adev-
> > >powerplay.pp_handle);
> > +   if (r) {
> > +   pr_err("firmware loading failed\n");
> > +   return r;
> > +   }
> > +   }
> > +
> > /* disable sdma engine before programing it */
> > sdma_v3_0_ctx_switch_enable(adev, false);
> > sdma_v3_0_enable(adev, false);
> > --
> > 1.9.1
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 3/6] drm/amdgpu: Fix cg/pg unexpected disabled when hw init failed

2018-10-02 Thread Zhu, Rex


> -Original Message-
> From: Quan, Evan
> Sent: Sunday, September 30, 2018 11:03 AM
> To: Zhu, Rex ; amd-gfx@lists.freedesktop.org
> Cc: Zhu, Rex 
> Subject: RE: [PATCH 3/6] drm/amdgpu: Fix cg/pg unexpected disabled when
> hw init failed
> 
> Comment inline
> 
> > -Original Message-
> > From: amd-gfx  On Behalf Of Rex
> > Zhu
> > Sent: 2018年9月30日 0:15
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Zhu, Rex 
> > Subject: [PATCH 3/6] drm/amdgpu: Fix cg/pg unexpected disabled when hw
> > init failed
> >
> > Check the ip blocks late_initialized state before enable/disable
> > cg/pg, so if hw init failed, cg/pg function will not be executed.
> >
> > Signed-off-by: Rex Zhu 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 95095a8..94c92f5 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -1656,7 +1656,7 @@ static int amdgpu_device_set_cg_state(struct
> > amdgpu_device *adev,
> >
> > for (j = 0; j < adev->num_ip_blocks; j++) {
> > i = state == AMD_CG_STATE_GATE ? j : adev-
> > >num_ip_blocks - j - 1;
> > -   if (!adev->ip_blocks[i].status.valid)
> > +   if (!adev->ip_blocks[i].status.late_initialized)
> > continue;
> > /* skip CG for VCE/UVD, it's handled specially */
> > if (adev->ip_blocks[i].version->type !=
> AMD_IP_BLOCK_TYPE_UVD && @@
> > -1686,7 +1686,7 @@ static int amdgpu_device_set_pg_state(struct
> > amdgpu_device *adev, enum amd_power
> >
> > for (j = 0; j < adev->num_ip_blocks; j++) {
> > i = state == AMD_PG_STATE_GATE ? j : adev-
> > >num_ip_blocks - j - 1;
> > -   if (!adev->ip_blocks[i].status.valid)
> > +   if (!adev->ip_blocks[i].status.late_initialized)
> > continue;
> > /* skip CG for VCE/UVD, it's handled specially */
> > if (adev->ip_blocks[i].version->type !=
> AMD_IP_BLOCK_TYPE_UVD && @@
> > -1723,7 +1723,7 @@ static int amdgpu_device_ip_late_init(struct
> > amdgpu_device *adev)
> > int i = 0, r;
> >
> > for (i = 0; i < adev->num_ip_blocks; i++) {
> > -   if (!adev->ip_blocks[i].status.valid)
> > +   if (!adev->ip_blocks[i].status.hw)
> [Quan, Evan] typo?
No, In late_init, we check the hw_init status.

Rex
> > continue;
> > if (adev->ip_blocks[i].version->funcs->late_init) {
> > r = adev->ip_blocks[i].version->funcs->late_init((void
> > *)adev); @@ -1732,8 +1732,8 @@ static int
> > amdgpu_device_ip_late_init(struct amdgpu_device *adev)
> >   adev->ip_blocks[i].version->funcs-
> > >name, r);
> > return r;
> > }
> > -   adev->ip_blocks[i].status.late_initialized = true;
> > }
> > +   adev->ip_blocks[i].status.late_initialized = true;
> > }
> >
> > amdgpu_device_set_cg_state(adev, AMD_CG_STATE_GATE);
> > --
> > 1.9.1
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: Fix incorrect use of process->mm

2018-10-02 Thread Felix Kuehling
This mm_struct pointer should never be dereferenced. If running in
a user thread, just use current->mm. If running in a kernel worker
use get_task_mm to get a safe reference to the mm_struct.

Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 37 +-
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index d6af31c..3bc0651d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -358,8 +358,8 @@ static int create_compute_queue_nocpsch(struct 
device_queue_manager *dqm,
struct queue *q,
struct qcm_process_device *qpd)
 {
-   int retval;
struct mqd_manager *mqd_mgr;
+   int retval;
 
mqd_mgr = dqm->ops.get_mqd_manager(dqm, KFD_MQD_TYPE_COMPUTE);
if (!mqd_mgr)
@@ -387,8 +387,12 @@ static int create_compute_queue_nocpsch(struct 
device_queue_manager *dqm,
if (!q->properties.is_active)
return 0;
 
-   retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe, q->queue,
-   >properties, q->process->mm);
+   if (WARN(q->process->mm != current->mm,
+"should only run in user thread"))
+   retval = -EFAULT;
+   else
+   retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe, q->queue,
+  >properties, current->mm);
if (retval)
goto out_uninit_mqd;
 
@@ -545,9 +549,15 @@ static int update_queue(struct device_queue_manager *dqm, 
struct queue *q)
retval = map_queues_cpsch(dqm);
else if (q->properties.is_active &&
 (q->properties.type == KFD_QUEUE_TYPE_COMPUTE ||
- q->properties.type == KFD_QUEUE_TYPE_SDMA))
-   retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe, q->queue,
-  >properties, q->process->mm);
+ q->properties.type == KFD_QUEUE_TYPE_SDMA)) {
+   if (WARN(q->process->mm != current->mm,
+"should only run in user thread"))
+   retval = -EFAULT;
+   else
+   retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd,
+  q->pipe, q->queue,
+  >properties, current->mm);
+   }
 
 out_unlock:
dqm_unlock(dqm);
@@ -653,6 +663,7 @@ static int evict_process_queues_cpsch(struct 
device_queue_manager *dqm,
 static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
  struct qcm_process_device *qpd)
 {
+   struct mm_struct *mm = NULL;
struct queue *q;
struct mqd_manager *mqd_mgr;
struct kfd_process_device *pdd;
@@ -686,6 +697,15 @@ static int restore_process_queues_nocpsch(struct 
device_queue_manager *dqm,
kfd_flush_tlb(pdd);
}
 
+   /* Take a safe reference to the mm_struct, which may otherwise
+* disappear even while the kfd_process is still referenced.
+*/
+   mm = get_task_mm(pdd->process->lead_thread);
+   if (!mm) {
+   retval = -EFAULT;
+   goto out;
+   }
+
/* activate all active queues on the qpd */
list_for_each_entry(q, >queues_list, list) {
if (!q->properties.is_evicted)
@@ -700,14 +720,15 @@ static int restore_process_queues_nocpsch(struct 
device_queue_manager *dqm,
q->properties.is_evicted = false;
q->properties.is_active = true;
retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe,
-  q->queue, >properties,
-  q->process->mm);
+  q->queue, >properties, mm);
if (retval)
goto out;
dqm->queue_count++;
}
qpd->evicted = 0;
 out:
+   if (mm)
+   mmput(mm);
dqm_unlock(dqm);
return retval;
 }
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm v2 2/2] amdgpu/test: Fix deadlock tests for AI and RV v2

2018-10-02 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek
On Fri, Sep 28, 2018 at 10:46 AM Andrey Grodzovsky
 wrote:
>
> Seems like AI and RV requires uncashed memory mapping to be able
> to pickup value written to memory by CPU after the WAIT_REG_MEM
> command was already launched.
> .
> Enable the test for AI and RV.
>
> v2:
> Update commit description.
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>  tests/amdgpu/deadlock_tests.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/tests/amdgpu/deadlock_tests.c b/tests/amdgpu/deadlock_tests.c
> index 304482d..292ec4e 100644
> --- a/tests/amdgpu/deadlock_tests.c
> +++ b/tests/amdgpu/deadlock_tests.c
> @@ -80,6 +80,8 @@ static  uint32_t  minor_version;
>  static pthread_t stress_thread;
>  static uint32_t *ptr;
>
> +int use_uc_mtype = 0;
> +
>  static void amdgpu_deadlock_helper(unsigned ip_type);
>  static void amdgpu_deadlock_gfx(void);
>  static void amdgpu_deadlock_compute(void);
> @@ -92,13 +94,14 @@ CU_BOOL suite_deadlock_tests_enable(void)
>  _version, _handle))
> return CU_FALSE;
>
> -   if (device_handle->info.family_id == AMDGPU_FAMILY_AI ||
> -   device_handle->info.family_id == AMDGPU_FAMILY_SI ||
> -   device_handle->info.family_id == AMDGPU_FAMILY_RV) {
> +   if (device_handle->info.family_id == AMDGPU_FAMILY_SI) {
> printf("\n\nCurrently hangs the CP on this ASIC, deadlock 
> suite disabled\n");
> enable = CU_FALSE;
> }
>
> +   if (device_handle->info.family_id >= AMDGPU_FAMILY_AI)
> +   use_uc_mtype = 1;
> +
> if (amdgpu_device_deinitialize(device_handle))
> return CU_FALSE;
>
> @@ -183,8 +186,8 @@ static void amdgpu_deadlock_helper(unsigned ip_type)
> r = amdgpu_cs_ctx_create(device_handle, _handle);
> CU_ASSERT_EQUAL(r, 0);
>
> -   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
> -   AMDGPU_GEM_DOMAIN_GTT, 0,
> +   r = amdgpu_bo_alloc_and_map_raw(device_handle, 4096, 4096,
> +   AMDGPU_GEM_DOMAIN_GTT, 0, use_uc_mtype ? 
> AMDGPU_VM_MTYPE_UC : 0,
> _result_handle, 
> _result_cpu,
> _result_mc_address, 
> _handle);
> CU_ASSERT_EQUAL(r, 0);
> --
> 2.7.4
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm: Allow dynamic drm majors on linux

2018-10-02 Thread Thomas Hellstrom
Ping?

On 09/30/2018 07:31 PM, Thomas Hellstrom wrote:
> Hi, Emil,
>
> On 09/05/2018 03:53 PM, Emil Velikov wrote:
>> On 5 September 2018 at 14:20, Thomas Hellstrom 
>>  wrote:
>>
 In that case, please give me 24h to do a libdrm release before 
 pushing.
 I had to push some workarounds for the sandboxing mentioned earlier 
 :-\

 Thanks
 Emil
>>>
>>> Ouch, I just pushed the patch, but feel free to cut the release just 
>>> before
>>> that commit.
>>>
>> That doesn't quite work. Barring any objections I'll: revert, 
>> release, reapply.
>>
>> -Emil
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> What happened here? I can't really see my commit nor a revert nor a 
> release in libdrm.
>
> /Thomas
>
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-10-02 Thread Christian König
Checking more code and documentation and thinking about it over my 
vacation I think I have some new conclusions here.


Currently we are using get_user_pages() together with an MMU notifier to 
guarantee coherent address space view, because get_user_pages() works by 
grabbing a reference to the pages and ignoring concurrent page table 
updates.


But HMM uses a different approach by checking the address space for 
modifications using hmm_vma_range_done() and re-trying when the address 
space has changed.


Now what you are trying to do is to change that into get_user_pages() 
and HMM callback and this is what won't work. We can either use 
get_user_pages() with MMU notifier or we can use HMM for the work, but 
we can't mix and match.


So my initial guess was correct that we just need to change both sides 
of the implementation at the same time.


Regards,
Christian.

Am 28.09.2018 um 17:13 schrieb Koenig, Christian:

No it definitely isn't.

We have literally worked month on this with the core MM developers.

Making sure that we have a consistent page array is absolutely vital 
for correct operation.


Please also check Jerome's presentation from XDC it also perfectly 
explains why this approach won't work correctly.


Christian.

Am 28.09.2018 17:07 schrieb "Yang, Philip" :
For B path, we take mm->mmap_sem, then call hmm_vma_get_pfns() or 
get_user_pages(). This is obvious.


For A path, mmu notifier 
mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_end() 
is called in many places, and the calling path is quit complicated 
inside mm, it's not obvious. I checked many of the them, for example:


do_munmap()
  down_write(>mmap_sem)
  arch_unmap()
    mpx_notify_unmap()...
   zap_bt_entries_mapping()
 zap_page_range()
 up_write(>mmap_sem)

void zap_page_range(struct vm_area_struct *vma, unsigned long start,
        unsigned long size)
{
    struct mm_struct *mm = vma->vm_mm;
    struct mmu_gather tlb;
    unsigned long end = start + size;

    lru_add_drain();
    tlb_gather_mmu(, mm, start, end);
    update_hiwater_rss(mm);
    mmu_notifier_invalidate_range_start(mm, start, end);
    for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
        unmap_single_vma(, vma, start, end, NULL);
    mmu_notifier_invalidate_range_end(mm, start, end);
    tlb_finish_mmu(, start, end);
}

So AFAIK it's okay without invalidate_range_end() callback.

Regards,
Philip

On 2018-09-28 01:25 AM, Koenig, Christian wrote:

No, that is incorrect as well :)

The mmap_sem isn't necessary taken during page table updates.

What you could do is replace get_user_pages() directly with HMM. If 
I'm not completely mistaken that should work as expected.


Christian.

Am 27.09.2018 22:18 schrieb "Yang, Philip" :
I was trying to understand the way how HMM handle this concurrent 
issue and how we handle it in amdgpu_ttm_tt_userptr_needs_pages() and 
amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid flag, we use 
gtt->mmu_invalidations and gtt->last_set_pages. Both use the same 
lock plus flag idea actually.


Thanks for the information, now I understand fence 
ttm_eu_fence_buffer_objects() put to BOs will block CPU page table 
update. This is another side of this concurrent issue I didn't know.


I had same worry that it has issue without invalidate_range_end() 
callback as the calling sequence Felix lists. Now I think it's fine 
after taking a look again today because of mm->mmap_sem usage, this 
is my understanding:


A path:

down_write(>mmap_sem);
mmu_notifier_invalidate_range_start()
    take_lock()
    release_lock()
CPU page table update
mmu_notifier_invalidate_range_end()
up_write(>mmap_sem);

B path:

again:
down_read(>mmap_sem);
hmm_vma_get_pfns()
up_read(>mmap_sem);


take_lock()
if (!hmm_vma_range_done()) {
   release_lock()
   goto again
}
submit command job...
release_lock()

If you agree, I will submit patch v5 with some minor changes, and 
submit another patch to replace get_user_page() with HMM.


Regards,
Philip

On 2018-09-27 11:36 AM, Christian König wrote:

Yeah, I've read that as well.

My best guess is that we just need to add a call to 
hmm_vma_range_done() after taking the lock and also replace 
get_user_pages() with hmm_vma_get_pfns().


But I'm still not 100% sure how all of that is supposed to work 
together.


Regards,
Christian.

Am 27.09.2018 um 16:50 schrieb Kuehling, Felix:


I think the answer is here: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216


Regards,

Felix

*From:*Koenig, Christian
*Sent:* Thursday, September 27, 2018 10:30 AM
*To:* Kuehling, Felix 
*Cc:* j.gli...@gmail.com; Yang, Philip ; 
amd-gfx@lists.freedesktop.org
*Subject:* RE: [PATCH] drm/amdgpu: use HMM mirror callback to 
replace mmu notifier v4


At least with get_user_pages() that is perfectly possible.

For HMM it could be that this is prevented somehow.

Christian.

Am 27.09.2018 16:27 schrieb "Kuehling, Felix" 

Re: [PATCH v2] drm/amd/display: Use proper enums in process_channel_reply

2018-10-02 Thread Harry Wentland
On 2018-09-27 02:11 PM, Nick Desaulniers wrote:
> On Thu, Sep 27, 2018 at 11:08 AM Nathan Chancellor
>  wrote:
>>
>> On Thu, Sep 27, 2018 at 11:06:33AM -0700, Nathan Chancellor wrote:
>>> Clang warns when one enumerated type is implicitly converted to another.
>>>
>>> drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:315:19: warning:
>>> implicit conversion from enumeration type 'enum
>>> aux_channel_operation_result' to different enumeration type 'enum
>>> aux_transaction_reply' [-Wenum-conversion]
>>> reply->status = AUX_CHANNEL_OPERATION_FAILED_HPD_DISCON;
>>>   ~ ^~~
>>> drivers/gpu/drm/amd/amdgpu/../display/dc/i2caux/dce110/aux_engine_dce110.c:349:19:
>>> warning: implicit conversion from enumeration type 'enum
>>> aux_channel_operation_result' to different enumeration type 'enum
>>> aux_transaction_reply' [-Wenum-conversion]
>>> reply->status = AUX_CHANNEL_OPERATION_FAILED_HPD_DISCON;
>>>   ~ ^~~
>>>
>>> The current enum is incorrect, it should be from aux_transaction_reply,
>>> so use AUX_TRANSACTION_REPLY_HPD_DISCON.
>>>
>>> Reported-by: Nick Desaulniers 
>>> Suggested-by: Nick Desaulniers 
>>> Signed-off-by: Nathan Chancellor 
> 
> Reviewed-by: Nick Desaulniers 

Reviewed-by: Harry Wentland 

Pulling it in through amd-staging-drm-next.

> 
>>> ---
>>>
>>> v1 -> v2:
>>>
>>> * Rather than change status to an integer, use the proper enumerated
>>>   type from aux_transaction reply as suggested by Nick and confirmed
>>>   by Henry.
>>
>> Sigh Harry, sorry...
> 

No worries. You're not the first to get that wrong. :)

Harry

> Thanks for the patch, Nathan.  Don't worry about sending a v3 over
> this; its below the commit line so this detail wont get committed.
> 
>>
>>>
>>>  drivers/gpu/drm/amd/display/dc/dce/dce_aux.c| 2 +-
>>>  .../gpu/drm/amd/display/dc/i2caux/dce110/aux_engine_dce110.c| 2 +-
>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c 
>>> b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
>>> index 3f5b2e6f7553..aaeb7faac0c4 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
>>> +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
>>> @@ -312,7 +312,7 @@ static void process_channel_reply(
>>>
>>>   /* in case HPD is LOW, exit AUX transaction */
>>>   if ((sw_status & AUX_SW_STATUS__AUX_SW_HPD_DISCON_MASK)) {
>>> - reply->status = AUX_CHANNEL_OPERATION_FAILED_HPD_DISCON;
>>> + reply->status = AUX_TRANSACTION_REPLY_HPD_DISCON;
>>>   return;
>>>   }
>>>
>>> diff --git 
>>> a/drivers/gpu/drm/amd/display/dc/i2caux/dce110/aux_engine_dce110.c 
>>> b/drivers/gpu/drm/amd/display/dc/i2caux/dce110/aux_engine_dce110.c
>>> index 8eee8ace1259..59c3ed43d609 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/i2caux/dce110/aux_engine_dce110.c
>>> +++ b/drivers/gpu/drm/amd/display/dc/i2caux/dce110/aux_engine_dce110.c
>>> @@ -346,7 +346,7 @@ static void process_channel_reply(
>>>
>>>   /* in case HPD is LOW, exit AUX transaction */
>>>   if ((sw_status & AUX_SW_STATUS__AUX_SW_HPD_DISCON_MASK)) {
>>> - reply->status = AUX_CHANNEL_OPERATION_FAILED_HPD_DISCON;
>>> + reply->status = AUX_TRANSACTION_REPLY_HPD_DISCON;
>>>   return;
>>>   }
>>>
>>> --
>>> 2.19.0
>>>
> 
> 
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 0/3] A DRM API for adaptive sync and variable refresh rate support

2018-10-02 Thread Harry Wentland


On 2018-10-01 03:15 AM, Daniel Vetter wrote:
> On Mon, Sep 24, 2018 at 02:15:34PM -0400, Nicholas Kazlauskas wrote:
>> These patches are part of a proposed new interface for supporting variable 
>> refresh rate via DRM properties.
>>
>> === Changes from v1 ===
>>
>> For drm:
>>
>> * The variable_refresh_capable property is now flagged as 
>> DRM_MODE_PROP_IMMUTABLE
>>
>> For drm/gpu/amd/display:
>>
>> * Patches no longer pull in IOCTL/FreeSync refactoring code
>> * FreeSync enable/disable behavior has been modified to reflect changes in 
>> userspace behavior from xf86-video-amdgpu and mesa
>>
>> === Adaptive sync and variable refresh rate ===
>>
>> Adaptive sync is part of the DisplayPort spec and allows for graphics 
>> adapters to drive displays with varying frame timings.
>>
>> Variable refresh rate (VRR) is essentially the same, but defined for HDMI.
>>
>> === Use cases for variable refresh rate ===
>>
>> Variable frame (flip) timings don't align well with fixed refresh rate 
>> displays. This results in stuttering, tearing and/or input lag. By adjusting 
>> the display refresh rate dynamically these issues can be reduced or 
>> eliminated.
>>
>> However, not all content is suitable for dynamic refresh adaptation. Content 
>> that is flipped infrequently or at random intervals tends to fair poorly. 
>> Multiple clients trying to flip under the same screen can similarly 
>> interfere with prediction.
>>
>> Userland needs a way to let the driver know when the content on the screen 
>> is suitable for variable refresh rate and if the user wishes to have the 
>> feature enabled.
>>
>> === DRM API to support variable refresh rates ===
>>
>> This patch introduces a new API via atomic properties on the DRM connector 
>> and CRTC.
>>
>> The connector has two new optional properties:
>>
>> * bool variable_refresh_capable - set by the driver if the hardware is 
>> capable of supporting variable refresh tech
>>
>> * bool variable_refresh_enabled - set by the user to enable variable refresh 
>> adjustment over the connector
>>
>> The CRTC has one additional default property:
>>
>> * bool variable_refresh - a content hint to the driver specifying that the 
>> CRTC contents are suitable for variable refresh adjustment
>>
>> == Overview for DRM driver developers ===
>>
>> Driver developers can attach the optional connector properties via 
>> drm_connector_attach_variable_refresh_properties on connectors that support 
>> variable refresh (typically DP or HDMI).
>>
>> The variable_refresh_capable property should be managed as the output on the 
>> connector changes. The property is read only from userspace.
>>
>> The variable_refresh_enabled property is intended to be a property 
>> controlled by userland as a global on/off switch for variable refresh 
>> technology. It should be checked before enabling variable refresh rate.
>>
>> === Overview for Userland developers ==
>>
>> The variable_refresh property on the CRTC should be set to true when the 
>> CRTCs are suitable for variable refresh rate. In practice this is probably 
>> an application like a game - a single window that covers the whole CRTC 
>> surface and is the only client issuing flips.
>>
>> To demonstrate the suitability of the API for variable refresh and dynamic 
>> adaptation there are additional patches using this API that implement 
>> adaptive variable refresh across kernel and userland projects:
>>
>> * DRM (dri-devel)
>> * amdgpu DRM kernel driver (amd-gfx)
>> * xf86-video-amdgpu (amd-gfx)
>> * mesa (mesa-dev)
>>
>> These patches enable adaptive variable refresh on X for AMD hardware 
>> provided that the user sets the variable_refresh_enabled property to true on 
>> supported connectors (ie. using xrandr --set-prop).
>>
>> The patches have been tested as working on upstream userland with the GNOME 
>> desktop environment under a single monitor setup. They also work on KDE in 
>> single monitor setup if the compositor is disabled.
>>
>> The patches require that the application window can issue screen flips via 
>> the Present extension to xf86-video-amdgpu. Due to Present extension 
>> limitations some desktop environments and multi-monitor setups are currently 
>> not compatible.
>>
>> Full implementation details for these changes can be reviewed in their 
>> respective mailing lists.
>>
>> === Previous discussions ===
>>
>> These patches are based upon feedback from patches and feedback from two 
>> previous threads on the subject which are linked below for reference:
>>
>> https://lists.freedesktop.org/archives/amd-gfx/2018-April/021047.html
>> https://lists.freedesktop.org/archives/dri-devel/2017-October/155207.html
>> https://lists.freedesktop.org/archives/dri-devel/2018-September/189404.html
>>
>> Nicholas Kazlauskas
>>
>> Nicholas Kazlauskas (3):
>>   drm: Add variable refresh rate properties to connector
>>   drm: Add variable refresh property to DRM CRTC
>>   drm/amd/display: Set FreeSync state using DRM VRR properties
> 
> Please 

Re: [PATCH v2] drm/amd/display: Signal hw_done() after waiting for flip_done()

2018-10-02 Thread Harry Wentland


On 2018-10-01 07:26 PM, sunpeng...@amd.com wrote:
> From: Shirish S 
> 
> In amdgpu_dm_commit_tail(), wait until flip_done() is signaled before
> we signal hw_done().
> 
> [Why]
> 
> This is to temporary address a paging error that occurs when a

s/temporary/temporarily/g

> nonblocking commit contends with another commit, particularly in a
> mirrored multi-display configuration. The error occurs in
> drm_atomic_helper_wait_for_flip_done(), when we attempt to access the
> contents of new_crtc_state->commit.
> 
> Here's the sequence for a mirrored 2 display setup (irrelevant steps
> left out for clarity):
> 
> **THREAD 1**| **THREAD 2**
> |
> Initialize atomic state for flip|
> |
> Queue worker|
>...
> 
> | Do work for flip
> |
> | Signal hw_done() on CRTC 1
> | Signal hw_done() on CRTC 2
> |
> | Wait for flip_done() on CRTC 1
> 
> < **PREEMPTED BY THREAD 1**
> 
> Initialize atomic state for cursor
> update (1)  |
> |
> Do cursor update worker |
> |
> Clear atomic state (2)  |
> **DONE**|
>...
> |
> | Wait for flip_done() on CRTC 2
> | *ERROR*
> |
> 
> The issue starts with (1). When the atomic state is initialized, the
> current CRTC states are duplicated to be the new_crtc_states, and
> referenced to be the old_crtc_states. (The new_crtc_states are to be
> filled with update data.)
> 
> Some things to note:
> 
> * Due to the mirrored configuration, the cursor updates on both CRTCs.
> 
> * At this point, the pflip IRQ has already been handled, and flip_done
>   signaled on all CRTCs. The cursor commit can therefore continue.
> 
> * The old_crtc_states used by the cursor update are the **same states**
>   as the new_crtc_states used by the flip worker.
> 
> At (2), the old_crtc_state is freed (*), and the cursor commit
> completes. We then context switch back to the flip worker, where we
> attempt to access the new_crtc_state->commit object. This is
> problematic, as this state has already been freed.
> 
> (*) Technically, 'state->crtcs[i].state' is freed, which was made to
> reference old_crtc_state in drm_atomic_helper_swap_state()
> 
> [How]
> 
> By moving hw_done() after wait_for_flip_done(), we're guaranteed that
> the new_crtc_state (from the flip worker's perspective) still exists.
> This is because any other commit will be blocked, waiting for the
> hw_done() signal.
> 
> Note that both the i915 and imx drivers have this sequence flipped
> already, masking this problem.
> 
> Signed-off-by: Shirish S 
> Signed-off-by: Leo Li 

Thanks for the additional explanation. Took me a while to understand what was 
happening here. Let's keep pursuing a proper solution that allows us to flip 
hw_done and wait_for_flip_done again but since that might be non-trivial this 
change is

Reviewed-by: Harry Wentland 

Harry

> ---
> 
> Hi Shirish,
> 
> Sorry for the late reply. I took some time to dig at this issue, and found
> that a proper fix will need more thought. I'm OK with this fix for now, until
> we can address it with a proper fix.
> 
> I've updated the commit message and comments below to reflect the underlying
> issue.
> 
> Thanks,
> Leo
> 
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 774db34..bbfffaf 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4755,12 +4755,18 @@ static void amdgpu_dm_atomic_commit_tail(struct 
> drm_atomic_state *state)
>   }
>   spin_unlock_irqrestore(>ddev->event_lock, flags);
>  
> - /* Signal HW programming completion */
> - drm_atomic_helper_commit_hw_done(state);
>  
>   if (wait_for_vblank)
>   drm_atomic_helper_wait_for_flip_done(dev, state);
>  
> + /*
> +  * FIXME:
> +  * Delay hw_done() until flip_done() is signaled. This is to block
> +  * another commit from freeing the CRTC state while we're still
> +  * waiting on flip_done.
> +  */
> + drm_atomic_helper_commit_hw_done(state);
> +
>   drm_atomic_helper_cleanup_planes(dev, state);
>  
>   /*
> 

RE: [PATCH] drm/amdgpu/vega20: make power profile output more consistent

2018-10-02 Thread Russell, Kent
I don't do any parsing of the names of the Power Profiles in the SMI, this was 
just for the smi_lib which expects specific string formats. I had no issues 
with it, since it wouldn't muck with the SMI as it currently exists. But it 
will help if I decide to try to parse these in the SMI tool in the future.

 Kent

-Original Message-
From: Kuehling, Felix 
Sent: Monday, October 01, 2018 4:34 PM
To: Deucher, Alexander ; Quan, Evan 
; Alex Deucher ; 
amd-gfx@lists.freedesktop.org; Russell, Kent 
Subject: Re: [PATCH] drm/amdgpu/vega20: make power profile output more 
consistent

On 2018-10-01 04:04 PM, Deucher, Alexander wrote:
>> -Original Message-
>> From: amd-gfx  On Behalf Of 
>> Felix Kuehling
>> Sent: Monday, October 1, 2018 3:49 PM
>> To: Quan, Evan ; Alex Deucher 
>> ; amd-gfx@lists.freedesktop.org; Russell, Kent 
>> 
>> Cc: Deucher, Alexander 
>> Subject: Re: [PATCH] drm/amdgpu/vega20: make power profile output 
>> more consistent
>>
>> [+Kent]
>>
>> This may break the rocm-smi tool, which parses the power profile output.
>>
> This was at the request of rocm-smi. Since the current rocm-smi couldn't 
> parse the new format.  This makes it more similar to previous asics.

Thanks! I missed that this was for Vega20.

Regards,
  Felix

>
> Alex
>
>> Regards,
>>   Felix
>>
>>
>> On 2018-09-27 09:15 PM, Quan, Evan wrote:
>>> Reviewed-by: Evan Quan 
>>>
 -Original Message-
 From: amd-gfx  On Behalf Of 
 Alex Deucher
 Sent: 2018年9月27日 21:46
 To: amd-gfx@lists.freedesktop.org
 Cc: Deucher, Alexander 
 Subject: [PATCH] drm/amdgpu/vega20: make power profile output more 
 consistent

 Make the profile name line match previous generations more closely.

 E.g.,
 0 3D_FULL_SCREEN :
 vs:
 0(3D_FULL_SCREEN )

 Signed-off-by: Alex Deucher 
 ---
  drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
 b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
 index 2a554f9edcda..75945fc50b3a 100644
 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
 +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
 @@ -3165,7 +3165,7 @@ static int
>> vega20_get_power_profile_mode(struct
 pp_hwmgr *hwmgr, char *buf)
"[GetPowerProfile] Failed to get activity
>> monitor!",
return result);

 -  size += sprintf(buf + size, "%2d(%14s%s)\n",
 +  size += sprintf(buf + size, "%2d %14s%s:\n",
i, profile_name[i], (i == hwmgr-
> power_profile_mode) ? "*" : " ");
size += sprintf(buf + size,
 "%19s %d(%13s) %7d %7d %7d %7d %7d %7d %7d %7d %7d\n",
 --
 2.13.6

 ___
 amd-gfx mailing list
 amd-gfx@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx