Re: [PATCH 24/26] drm/amdgpu: add RAS callback for gfx

2019-07-31 Thread Kevin Wang

On 8/1/19 1:58 AM, Alex Deucher wrote:
> From: Dennis Li 
>
> Add functions for RAS error inject and query error counter
>
> Signed-off-by: Dennis Li 
> Reviewed-by: Tao Zhou 
> Reviewed-by: Hawking Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |   2 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 530 +++-
>   2 files changed, 531 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 1199b5828b90..554a59b3c4a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -196,6 +196,8 @@ struct amdgpu_gfx_funcs {
>   uint32_t *dst);
>   void (*select_me_pipe_q)(struct amdgpu_device *adev, u32 me, u32 pipe,
>u32 queue, u32 vmid);
> + int (*ras_error_inject)(struct amdgpu_device *adev, void *inject_if);
> + int (*query_ras_error_count) (struct amdgpu_device *adev, void 
> *ras_error_status);
>   };
>   
>   struct amdgpu_ngg_buf {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index fba552b93cc8..d7902e782be4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -318,6 +318,135 @@ enum ta_ras_gfx_subblock {
>   TA_RAS_BLOCK__UTC_ATCL2_CACHE_4K_BANK,
>   TA_RAS_BLOCK__GFX_MAX
>   };
> +
> +struct ras_gfx_subblock {
> + unsigned char *name;
> + int ta_subblock;
> + int supported_error_type;
> +};
> +
> +#define AMDGPU_RAS_SUB_BLOCK(subblock, a, b, c, d)   
>   \
> + [AMDGPU_RAS_BLOCK__##subblock] = { \
> + #subblock, \
> + TA_RAS_BLOCK__##subblock,  \
> + ((a) | ((b) << 1) | ((c) << 2) | ((d) << 3)),  \
> + }
> +
> +static const struct ras_gfx_subblock ras_gfx_subblocks[] = {
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPC_SCRATCH, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPC_UCODE, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_DC_STATE_ME1, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_DC_CSINVOC_ME1, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_DC_RESTORE_ME1, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_DC_STATE_ME2, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_DC_CSINVOC_ME2, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_DC_RESTORE_ME2, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPF_ROQ_ME2, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPF_ROQ_ME1, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPF_TAG, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPG_DMA_ROQ, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPG_DMA_TAG, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_CPG_TAG, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_GDS_MEM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_GDS_INPUT_QUEUE, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PHY_CMD_RAM_MEM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PHY_DATA_RAM_MEM, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PIPE_MEM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SPI_SR_MEM, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQ_SGPR, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQ_LDS_D, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQ_LDS_I, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQ_VGPR, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_UTCL1_LFIFO, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU0_WRITE_DATA_BUF, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU0_UTCL1_LFIFO, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU1_WRITE_DATA_BUF, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU1_UTCL1_LFIFO, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU2_WRITE_DATA_BUF, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU2_UTCL1_LFIFO, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_TAG_RAM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_UTCL1_MISS_FIFO, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_MISS_FIFO, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_BANK_RAM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_TAG_RAM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_HIT_FIFO, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_MISS_FIFO, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_DIRTY_BIT_RAM, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_BANK_RAM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_TAG_RAM, 0, 1, 1, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_UTCL1_MISS_FIFO, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_MISS_FIFO, 1, 0, 0, 1),
> + AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_BANK_RAM, 0, 1, 1, 1),
> + 

RE: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

2019-07-31 Thread Zhang, Hawking
No objection from me for this patch. But I was really shocked at first glance 
to the subject and thought how amdgpu driver survive with this bug in 
bare-metal case... It was then proved to be this is SRIOV specific bug because 
psp was initialized ahead of ih in sriov use case. The status.hw fix in 
suspend/resume call stack looks reasonable to me. Patch is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Liu, Monk  
Sent: 2019年8月1日 11:43
To: Deucher, Alexander ; Koenig, Christian 
; Zhang, Hawking 
Cc: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

If no objection I would submit those three patches, thanks

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Deng, Emily 
Sent: Wednesday, July 31, 2019 5:04 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk 
Subject: RE: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

All looks good to me. Reviewed-by: Emily Deng .

>-Original Message-
>From: amd-gfx  On Behalf Of Monk 
>Liu
>Sent: Wednesday, July 31, 2019 4:54 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)
>
>previously the ucode loading of PSP was repreated, one executed in
>phase_1 init/re-init/resume and the other in fw_loading routine
>
>Avoid this double loading by clearing ip_blocks.status.hw in suspend or 
>reset prior to the FW loading and any block's hw_init/resume
>
>v2:
>still do the smu fw loading since it is needed by bare-metal
>
>v3:
>drop the change in reinit_early_sriov, just clear all block's status.hw 
>in the head place and set the status.hw after hw_init done is enough
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 59
>+++---
> 1 file changed, 38 insertions(+), 21 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 6cb358c..30436ba 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -1673,28 +1673,34 @@ static int amdgpu_device_fw_loading(struct 
>amdgpu_device *adev)
>
>   if (adev->asic_type >= CHIP_VEGA10) {
>   for (i = 0; i < adev->num_ip_blocks; i++) {
>-  if (adev->ip_blocks[i].version->type ==
>AMD_IP_BLOCK_TYPE_PSP) {
>-  if (adev->in_gpu_reset || adev->in_suspend) {
>-  if (amdgpu_sriov_vf(adev) && adev-
>>in_gpu_reset)
>-  break; /* sriov gpu reset, psp
>need to do hw_init before IH because of hw limit */
>-  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>-  if (r) {
>-  DRM_ERROR("resume of IP
>block <%s> failed %d\n",
>+  if (adev->ip_blocks[i].version->type !=
>AMD_IP_BLOCK_TYPE_PSP)
>+  continue;
>+
>+  /* no need to do the fw loading again if already
>done*/
>+  if (adev->ip_blocks[i].status.hw == true)
>+  break;
>+
>+  if (adev->in_gpu_reset || adev->in_suspend) {
>+  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>+  if (r) {
>+  DRM_ERROR("resume of IP block <%s>
>failed %d\n",
> adev-
>>ip_blocks[i].version->funcs->name, r);
>-  return r;
>-  }
>-  } else {
>-  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>-  if (r) {
>-  DRM_ERROR("hw_init of IP
>block <%s> failed %d\n",
>-adev->ip_blocks[i].version-
>>funcs->name, r);
>-  return r;
>-  }
>+  return r;
>+  }
>+  } else {
>+  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>+  if (r) {
>+  DRM_ERROR("hw_init of IP block <%s>
>failed %d\n",
>+adev-
>>ip_blocks[i].version->funcs->name, r);
>+  return r;
>   }
>-  adev->ip_blocks[i].status.hw = true;
>   }
>+
>+  adev->ip_blocks[i].status.hw = true;
>+

RE: [PATCH] drm/amd/powerplay: sort feature status index by asic feature id for smu

2019-07-31 Thread Feng, Kenneth
Reviewed-by: Kenneth Feng 


-Original Message-
From: Wang, Kevin(Yang) 
Sent: Thursday, August 01, 2019 10:44 AM
To: Wang, Kevin(Yang) ; amd-gfx@lists.freedesktop.org
Cc: Feng, Kenneth ; Huang, Ray ; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amd/powerplay: sort feature status index by asic 
feature id for smu

ping...

please help me review it , thanks.

BR
Kevin

On 7/31/19 3:51 PM, Wang, Kevin(Yang) wrote:
> before this change, the pp_feature sysfs show feature enable state by 
> logic feature id, it is not easy to read.
> this change will sort pp_features show index by asic feature id.
>
> before:
> features high: 0x0623 low: 0xb3cdaffb
> 00. DPM_PREFETCHER   ( 0) : enabeld
> 01. DPM_GFXCLK   ( 1) : enabeld
> 02. DPM_UCLK ( 3) : enabeld
> 03. DPM_SOCCLK   ( 4) : enabeld
> 04. DPM_MP0CLK   ( 5) : enabeld
> 05. DPM_LINK ( 6) : enabeld
> 06. DPM_DCEFCLK  ( 7) : enabeld
> 07. DS_GFXCLK(10) : enabeld
> 08. DS_SOCCLK(11) : enabeld
> 09. DS_LCLK  (12) : disabled
> 10. PPT  (23) : enabeld
> 11. TDC  (24) : enabeld
> 12. THERMAL  (33) : enabeld
> 13. RM   (35) : disabled
> ..
>
> after:
> features high: 0x0623 low: 0xb3cdaffb
> 00. DPM_PREFETCHER   ( 0) : enabeld
> 01. DPM_GFXCLK   ( 1) : enabeld
> 02. DPM_GFX_PACE ( 2) : disabled
> 03. DPM_UCLK ( 3) : enabeld
> 04. DPM_SOCCLK   ( 4) : enabeld
> 05. DPM_MP0CLK   ( 5) : enabeld
> 06. DPM_LINK ( 6) : enabeld
> 07. DPM_DCEFCLK  ( 7) : enabeld
> 08. MEM_VDDCI_SCALING( 8) : enabeld
> 09. MEM_MVDD_SCALING ( 9) : enabeld
> 10. DS_GFXCLK(10) : enabeld
> 11. DS_SOCCLK(11) : enabeld
> 12. DS_LCLK  (12) : disabled
> 13. DS_DCEFCLK   (13) : enabeld
> ..
>
> Signed-off-by: Kevin Wang 
> ---
>   drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 14 +++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index eabe8a6d0eb7..9e256aa3b357 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -62,6 +62,8 @@ size_t smu_sys_get_pp_feature_mask(struct smu_context *smu, 
> char *buf)
>   uint32_t feature_mask[2] = { 0 };
>   int32_t feature_index = 0;
>   uint32_t count = 0;
> + uint32_t sort_feature[SMU_FEATURE_COUNT];
> + uint64_t hw_feature_count = 0;
>   
>   ret = smu_feature_get_enabled_mask(smu, feature_mask, 2);
>   if (ret)
> @@ -74,11 +76,17 @@ size_t smu_sys_get_pp_feature_mask(struct smu_context 
> *smu, char *buf)
>   feature_index = smu_feature_get_index(smu, i);
>   if (feature_index < 0)
>   continue;
> + sort_feature[feature_index] = i;
> + hw_feature_count++;
> + }
> +
> + for (i = 0; i < hw_feature_count; i++) {
>   size += sprintf(buf + size, "%02d. %-20s (%2d) : %s\n",
>  count++,
> -smu_get_feature_name(smu, i),
> -feature_index,
> -!!smu_feature_is_enabled(smu, i) ? "enabeld" : 
> "disabled");
> +smu_get_feature_name(smu, sort_feature[i]),
> +i,
> +!!smu_feature_is_enabled(smu, sort_feature[i]) ?
> +"enabeld" : "disabled");
>   }
>   
>   failed:
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

2019-07-31 Thread Liu, Monk
If no objection I would submit those three patches, thanks

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Deng, Emily  
Sent: Wednesday, July 31, 2019 5:04 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk 
Subject: RE: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

All looks good to me. Reviewed-by: Emily Deng .

>-Original Message-
>From: amd-gfx  On Behalf Of Monk 
>Liu
>Sent: Wednesday, July 31, 2019 4:54 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)
>
>previously the ucode loading of PSP was repreated, one executed in
>phase_1 init/re-init/resume and the other in fw_loading routine
>
>Avoid this double loading by clearing ip_blocks.status.hw in suspend or 
>reset prior to the FW loading and any block's hw_init/resume
>
>v2:
>still do the smu fw loading since it is needed by bare-metal
>
>v3:
>drop the change in reinit_early_sriov, just clear all block's status.hw 
>in the head place and set the status.hw after hw_init done is enough
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 59
>+++---
> 1 file changed, 38 insertions(+), 21 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 6cb358c..30436ba 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -1673,28 +1673,34 @@ static int amdgpu_device_fw_loading(struct 
>amdgpu_device *adev)
>
>   if (adev->asic_type >= CHIP_VEGA10) {
>   for (i = 0; i < adev->num_ip_blocks; i++) {
>-  if (adev->ip_blocks[i].version->type ==
>AMD_IP_BLOCK_TYPE_PSP) {
>-  if (adev->in_gpu_reset || adev->in_suspend) {
>-  if (amdgpu_sriov_vf(adev) && adev-
>>in_gpu_reset)
>-  break; /* sriov gpu reset, psp
>need to do hw_init before IH because of hw limit */
>-  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>-  if (r) {
>-  DRM_ERROR("resume of IP
>block <%s> failed %d\n",
>+  if (adev->ip_blocks[i].version->type !=
>AMD_IP_BLOCK_TYPE_PSP)
>+  continue;
>+
>+  /* no need to do the fw loading again if already
>done*/
>+  if (adev->ip_blocks[i].status.hw == true)
>+  break;
>+
>+  if (adev->in_gpu_reset || adev->in_suspend) {
>+  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>+  if (r) {
>+  DRM_ERROR("resume of IP block <%s>
>failed %d\n",
> adev-
>>ip_blocks[i].version->funcs->name, r);
>-  return r;
>-  }
>-  } else {
>-  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>-  if (r) {
>-  DRM_ERROR("hw_init of IP
>block <%s> failed %d\n",
>-adev->ip_blocks[i].version-
>>funcs->name, r);
>-  return r;
>-  }
>+  return r;
>+  }
>+  } else {
>+  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>+  if (r) {
>+  DRM_ERROR("hw_init of IP block <%s>
>failed %d\n",
>+adev-
>>ip_blocks[i].version->funcs->name, r);
>+  return r;
>   }
>-  adev->ip_blocks[i].status.hw = true;
>   }
>+
>+  adev->ip_blocks[i].status.hw = true;
>+  break;
>   }
>   }
>+
>   r = amdgpu_pm_load_smu_firmware(adev, _version);
>
>   return r;
>@@ -2136,7 +2142,9 @@ static int
>amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
>   if (r) {
>   DRM_ERROR("suspend of IP block <%s> failed 
> %d\n",
> adev->ip_blocks[i].version->funcs-
>>name, r);
>+  return r;
>   }
>+  adev->ip_blocks[i].status.hw = false;
>   }
>   }
>
>@@ -2176,14 +2184,16 @@ static int

RE: [PATCH] drm/amd/powerplay: sort feature status index by asic feature id for smu

2019-07-31 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of
> Kevin Wang
> Sent: Thursday, August 01, 2019 10:44 AM
> To: Wang, Kevin(Yang) ; amd-
> g...@lists.freedesktop.org
> Cc: Deucher, Alexander ; Huang, Ray
> ; Feng, Kenneth 
> Subject: Re: [PATCH] drm/amd/powerplay: sort feature status index by asic
> feature id for smu
> 
> ping...
> 
> please help me review it , thanks.
> 
> BR
> Kevin
> 
> On 7/31/19 3:51 PM, Wang, Kevin(Yang) wrote:
> > before this change, the pp_feature sysfs show feature enable state by
> > logic feature id, it is not easy to read.
> > this change will sort pp_features show index by asic feature id.
> >
> > before:
> > features high: 0x0623 low: 0xb3cdaffb
> > 00. DPM_PREFETCHER   ( 0) : enabeld
> > 01. DPM_GFXCLK   ( 1) : enabeld
> > 02. DPM_UCLK ( 3) : enabeld
> > 03. DPM_SOCCLK   ( 4) : enabeld
> > 04. DPM_MP0CLK   ( 5) : enabeld
> > 05. DPM_LINK ( 6) : enabeld
> > 06. DPM_DCEFCLK  ( 7) : enabeld
> > 07. DS_GFXCLK(10) : enabeld
> > 08. DS_SOCCLK(11) : enabeld
> > 09. DS_LCLK  (12) : disabled
> > 10. PPT  (23) : enabeld
> > 11. TDC  (24) : enabeld
> > 12. THERMAL  (33) : enabeld
> > 13. RM   (35) : disabled
> > ..
> >
> > after:
> > features high: 0x0623 low: 0xb3cdaffb
> > 00. DPM_PREFETCHER   ( 0) : enabeld
> > 01. DPM_GFXCLK   ( 1) : enabeld
> > 02. DPM_GFX_PACE ( 2) : disabled
> > 03. DPM_UCLK ( 3) : enabeld
> > 04. DPM_SOCCLK   ( 4) : enabeld
> > 05. DPM_MP0CLK   ( 5) : enabeld
> > 06. DPM_LINK ( 6) : enabeld
> > 07. DPM_DCEFCLK  ( 7) : enabeld
> > 08. MEM_VDDCI_SCALING( 8) : enabeld
> > 09. MEM_MVDD_SCALING ( 9) : enabeld
> > 10. DS_GFXCLK(10) : enabeld
> > 11. DS_SOCCLK(11) : enabeld
> > 12. DS_LCLK  (12) : disabled
> > 13. DS_DCEFCLK   (13) : enabeld
> > ..
> >
> > Signed-off-by: Kevin Wang 
> > ---
> >   drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 14 +++---
> >   1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > index eabe8a6d0eb7..9e256aa3b357 100644
> > --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> > @@ -62,6 +62,8 @@ size_t smu_sys_get_pp_feature_mask(struct
> smu_context *smu, char *buf)
> > uint32_t feature_mask[2] = { 0 };
> > int32_t feature_index = 0;
> > uint32_t count = 0;
> > +   uint32_t sort_feature[SMU_FEATURE_COUNT];
> > +   uint64_t hw_feature_count = 0;
> >
> > ret = smu_feature_get_enabled_mask(smu, feature_mask, 2);
> > if (ret)
> > @@ -74,11 +76,17 @@ size_t smu_sys_get_pp_feature_mask(struct
> smu_context *smu, char *buf)
> > feature_index = smu_feature_get_index(smu, i);
> > if (feature_index < 0)
> > continue;
> > +   sort_feature[feature_index] = i;
> > +   hw_feature_count++;
> > +   }
> > +
> > +   for (i = 0; i < hw_feature_count; i++) {
> > size += sprintf(buf + size, "%02d. %-20s (%2d) : %s\n",
> >count++,
> > -  smu_get_feature_name(smu, i),
> > -  feature_index,
> > -  !!smu_feature_is_enabled(smu, i) ? "enabeld" :
> "disabled");
> > +  smu_get_feature_name(smu, sort_feature[i]),
> > +  i,
> > +  !!smu_feature_is_enabled(smu, sort_feature[i]) ?
> > +  "enabeld" : "disabled");
> > }
> >
> >   failed:
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/powerplay: Allow changing of fan_control in smu_v11_0

2019-07-31 Thread Quan, Evan
Thanks Matt. The patch is reviewed-by: Evan Quan 

Regards,
Evan
> -Original Message-
> From: amd-gfx  On Behalf Of
> Matt Coffin
> Sent: Thursday, August 01, 2019 4:15 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Matt Coffin 
> Subject: [PATCH] drm/amd/powerplay: Allow changing of fan_control in
> smu_v11_0
> 
> [Why]
> Before this change, the fan control state on smu_v11 was not able to be
> changed because the capability check for checking if the fan control 
> capability
> existed was inverted.
> 
> [How]
> The capability check for fan control in smu_v11_0_auto_fan_control was
> inverted, to correctly check for the absence, instead of presence of fan
> control capabilities.
> 
> Signed-off-by: Matt Coffin 
> ---
>  drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> index 0588dd8cd1ba..43fcbdbba630 100644
> --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> @@ -1361,7 +1361,7 @@ smu_v11_0_auto_fan_control(struct smu_context
> *smu, bool auto_fan_control)  {
>   int ret = 0;
> 
> - if (smu_feature_is_supported(smu,
> SMU_FEATURE_FAN_CONTROL_BIT))
> + if (!smu_feature_is_supported(smu,
> SMU_FEATURE_FAN_CONTROL_BIT))
>   return 0;
> 
>   ret = smu_feature_set_enabled(smu,
> SMU_FEATURE_FAN_CONTROL_BIT, auto_fan_control);
> --
> 2.22.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/powerplay: sort feature status index by asic feature id for smu

2019-07-31 Thread Kevin Wang
ping...

please help me review it , thanks.

BR
Kevin

On 7/31/19 3:51 PM, Wang, Kevin(Yang) wrote:
> before this change, the pp_feature sysfs show feature enable state by
> logic feature id, it is not easy to read.
> this change will sort pp_features show index by asic feature id.
>
> before:
> features high: 0x0623 low: 0xb3cdaffb
> 00. DPM_PREFETCHER   ( 0) : enabeld
> 01. DPM_GFXCLK   ( 1) : enabeld
> 02. DPM_UCLK ( 3) : enabeld
> 03. DPM_SOCCLK   ( 4) : enabeld
> 04. DPM_MP0CLK   ( 5) : enabeld
> 05. DPM_LINK ( 6) : enabeld
> 06. DPM_DCEFCLK  ( 7) : enabeld
> 07. DS_GFXCLK(10) : enabeld
> 08. DS_SOCCLK(11) : enabeld
> 09. DS_LCLK  (12) : disabled
> 10. PPT  (23) : enabeld
> 11. TDC  (24) : enabeld
> 12. THERMAL  (33) : enabeld
> 13. RM   (35) : disabled
> ..
>
> after:
> features high: 0x0623 low: 0xb3cdaffb
> 00. DPM_PREFETCHER   ( 0) : enabeld
> 01. DPM_GFXCLK   ( 1) : enabeld
> 02. DPM_GFX_PACE ( 2) : disabled
> 03. DPM_UCLK ( 3) : enabeld
> 04. DPM_SOCCLK   ( 4) : enabeld
> 05. DPM_MP0CLK   ( 5) : enabeld
> 06. DPM_LINK ( 6) : enabeld
> 07. DPM_DCEFCLK  ( 7) : enabeld
> 08. MEM_VDDCI_SCALING( 8) : enabeld
> 09. MEM_MVDD_SCALING ( 9) : enabeld
> 10. DS_GFXCLK(10) : enabeld
> 11. DS_SOCCLK(11) : enabeld
> 12. DS_LCLK  (12) : disabled
> 13. DS_DCEFCLK   (13) : enabeld
> ..
>
> Signed-off-by: Kevin Wang 
> ---
>   drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 14 +++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index eabe8a6d0eb7..9e256aa3b357 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -62,6 +62,8 @@ size_t smu_sys_get_pp_feature_mask(struct smu_context *smu, 
> char *buf)
>   uint32_t feature_mask[2] = { 0 };
>   int32_t feature_index = 0;
>   uint32_t count = 0;
> + uint32_t sort_feature[SMU_FEATURE_COUNT];
> + uint64_t hw_feature_count = 0;
>   
>   ret = smu_feature_get_enabled_mask(smu, feature_mask, 2);
>   if (ret)
> @@ -74,11 +76,17 @@ size_t smu_sys_get_pp_feature_mask(struct smu_context 
> *smu, char *buf)
>   feature_index = smu_feature_get_index(smu, i);
>   if (feature_index < 0)
>   continue;
> + sort_feature[feature_index] = i;
> + hw_feature_count++;
> + }
> +
> + for (i = 0; i < hw_feature_count; i++) {
>   size += sprintf(buf + size, "%02d. %-20s (%2d) : %s\n",
>  count++,
> -smu_get_feature_name(smu, i),
> -feature_index,
> -!!smu_feature_is_enabled(smu, i) ? "enabeld" : 
> "disabled");
> +smu_get_feature_name(smu, sort_feature[i]),
> +i,
> +!!smu_feature_is_enabled(smu, sort_feature[i]) ?
> +"enabeld" : "disabled");
>   }
>   
>   failed:
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/radeon: Fix EEH during kexec

2019-07-31 Thread KyleMahlkuch
During kexec some adapters hit an EEH since they are not properly
shut down in the radeon_pci_shutdown() function. Adding
radeon_suspend_kms() fixes this issue.

Signed-off-by: Kyle Mahlkuch 
---
 drivers/gpu/drm/radeon/radeon_drv.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index a6cbe11..15d7beb 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -349,11 +349,19 @@ static int radeon_pci_probe(struct pci_dev *pdev,
 static void
 radeon_pci_shutdown(struct pci_dev *pdev)
 {
+   struct drm_device *ddev = pci_get_drvdata(pdev);
+
/* if we are running in a VM, make sure the device
 * torn down properly on reboot/shutdown
 */
if (radeon_device_is_virtual())
radeon_pci_remove(pdev);
+
+   /* Some adapters need to be suspended before a
+   * shutdown occurs in order to prevent an error
+   * during kexec.
+   */
+   radeon_suspend_kms(ddev, true, true, false);
 }
 
 static int radeon_pmops_suspend(struct device *dev)
-- 
1.8.3.1



Re: [PATCH] drm/amdkfd: Remove GPU ID in GWS queue creation

2019-07-31 Thread Kuehling, Felix
On 2019-07-26 9:27 p.m., Greathouse, Joseph wrote:
> The gpu_id argument is not needed when enabling GWS on a queue.
> The queue can only be associated with one device, so the only
> possible situations for the call as previously defined were:
> 1) the gpu_id was for the device associated with the target queue
> and things worked as expected, or 2) the gpu_id was for a device
> not associated with the target queue and the request was undefined.
>
> In particular, the previous result of the undefined operation is
> that you would allocate the number of GWS entries available on the
> gpu_id device, even if the queue was on a device with a different
> number available. For example: a queue on a device without GWS
> capability, but the user passes in a gpu_id for a device with GWS.
> We would end up trying to allocate GWS on the device that does not
> support it.
>
> Rather than leaving this footgun around and making life more
> difficult for user space, we can instead grab the gpu_id from the
> target queue. The gpu_id argument being passed in is thus not
> needed. We thus change the field in the ioctl struct to be reserved
> so that nobody expects it to do anything. However, we do not remove
> because that would break user-land API compatibility.
>
> Change-Id: I861cebc8a0a7eab5360da10971a73d5a4700c6d8
> Signed-off-by: Joseph Greathouse 

Cosmetic comments inline. Otherwise this looks good to me.


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 19 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 ++
>   .../amd/amdkfd/kfd_process_queue_manager.c| 12 
>   include/uapi/linux/kfd_ioctl.h|  4 ++--
>   4 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index f91126f5f1be..46005b1dcf79 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1572,20 +1572,27 @@ static int kfd_ioctl_alloc_queue_gws(struct file 
> *filep,
>   {
>   int retval;
>   struct kfd_ioctl_alloc_queue_gws_args *args = data;
> + struct queue *q;
>   struct kfd_dev *dev;
>   
>   if (!hws_gws_support)
>   return -ENODEV;
>   
> - dev = kfd_device_by_id(args->gpu_id);
> - if (!dev) {
> - pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
> - return -ENODEV;
> + mutex_lock(>mutex);
> + q = pqm_get_user_queue(>pqm, args->queue_id);
> +
> + if (q)
> + dev = q->device;

Please use {} around the if-block for consistency with the else-block.


> + else {
> + mutex_unlock(>mutex);

I'd prefer the error handling with a goto out_unlock. That's a common 
convention in kernel code to minimize error prone duplication of 
unwinding code for error handling.


> + return -EINVAL;
>   }
> - if (dev->dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS)
> +
> + if (dev->dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
> + mutex_unlock(>mutex);
>   return -ENODEV;
> + }
>   
> - mutex_lock(>mutex);
>   retval = pqm_set_gws(>pqm, args->queue_id, args->num_gws ? dev->gws 
> : NULL);
>   mutex_unlock(>mutex);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index aa7bf20d20f8..9b9a8da187c8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -915,6 +915,8 @@ int pqm_set_gws(struct process_queue_manager *pqm, 
> unsigned int qid,
>   void *gws);
>   struct kernel_queue *pqm_get_kernel_queue(struct process_queue_manager *pqm,
>   unsigned int qid);
> +struct queue *pqm_get_user_queue(struct process_queue_manager *pqm,
> + unsigned int qid);
>   int pqm_get_wave_state(struct process_queue_manager *pqm,
>  unsigned int qid,
>  void __user *ctl_stack,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 7e6c3ee82f5b..20dae1fdb16a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -473,6 +473,18 @@ struct kernel_queue *pqm_get_kernel_queue(
>   return NULL;
>   }
>   
> +struct queue *pqm_get_user_queue(struct process_queue_manager *pqm,
> + unsigned int qid)
> +{
> + struct process_queue_node *pqn;
> +
> + pqn = get_queue_by_qid(pqm, qid);
> + if (pqn && pqn->q)

It would be sufficient to check if (pqn) here. If pqn->q is NULL, you'll 
return NULL either way. You could even condense it into a single return 
statement:

     return pqn ? pqn->q : NULL;

Regards,
   Felix


> + return pqn->q;
> +
> + 

Re: [PATCH 2/2] drm/amdkfd: enable KFD support for navi14

2019-07-31 Thread Kuehling, Felix

On 2019-07-31 12:33 a.m., Yuan, Xiaojie wrote:
> Reviewed-by: Xiaojie Yuan 
>
> BR,
> Xiaojie
>
> 
> From: amd-gfx  on behalf of Alex 
> Deucher 
> Sent: Saturday, July 27, 2019 3:16 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander
> Subject: [PATCH 2/2] drm/amdkfd: enable KFD support for navi14
>
> Same as navi10.
>
> Signed-off-by: Alex Deucher 
Sorry, I just got back from vacation. This patch is Reviewed-by: Felix 
Kuehling 

But for this to do anything useful, we'll also need to add the Navi14 
device IDs and some ASIC info to KFD. Otherwise you'll probably hit this 
dev_warn in kfd_device.c: lookup_device_info:

     dev_warn(kfd_device, "DID %04x is missing in supported_devices\n",
  did);

Regards,
   Felix


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index f052c70e4659..97f7c5235cc9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -91,6 +91,7 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
>  kfd2kgd = amdgpu_amdkfd_arcturus_get_functions();
>  break;
>  case CHIP_NAVI10:
> +   case CHIP_NAVI14:
>  kfd2kgd = amdgpu_amdkfd_gfx_10_0_get_functions();
>  break;
>  default:
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/8] drm/amdgpu: drop drmP.h in amdgpu_amdkfd_arcturus.c

2019-07-31 Thread Kuehling, Felix
On 2019-07-31 11:52 a.m., Alex Deucher wrote:
> Unused.
>
> Signed-off-by: Alex Deucher 
The series is

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> index 4d9101834ba7..c79aaebeeaf0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> @@ -28,7 +28,6 @@
>   #include 
>   #include 
>   #include 
> -#include 
>   #include "amdgpu.h"
>   #include "amdgpu_amdkfd.h"
>   #include "sdma0/sdma0_4_2_2_offset.h"
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay: Allow changing of fan_control in smu_v11_0

2019-07-31 Thread Matt Coffin
[Why]
Before this change, the fan control state on smu_v11 was not able to be
changed because the capability check for checking if the fan control
capability existed was inverted.

[How]
The capability check for fan control in smu_v11_0_auto_fan_control was
inverted, to correctly check for the absence, instead of presence of fan
control capabilities.

Signed-off-by: Matt Coffin 
---
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index 0588dd8cd1ba..43fcbdbba630 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -1361,7 +1361,7 @@ smu_v11_0_auto_fan_control(struct smu_context *smu, bool 
auto_fan_control)
 {
int ret = 0;
 
-   if (smu_feature_is_supported(smu, SMU_FEATURE_FAN_CONTROL_BIT))
+   if (!smu_feature_is_supported(smu, SMU_FEATURE_FAN_CONTROL_BIT))
return 0;
 
ret = smu_feature_set_enabled(smu, SMU_FEATURE_FAN_CONTROL_BIT, 
auto_fan_control);
-- 
2.22.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v4 14/23] drm/tilcdc: Provide ddc symlink in connector sysfs directory

2019-07-31 Thread Ezequiel Garcia
Hi,

I'm glad to see this work moving forward!

On Wed, 2019-07-24 at 10:01 +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 23.07.19 um 14:44 schrieb Andrzej Pietrasiewicz:
> > Hi Sam,
> > 
> > W dniu 23.07.2019 o 11:05, Sam Ravnborg pisze:
> > > Hi Andrzej
> > > 
> > > On Thu, Jul 11, 2019 at 01:26:41PM +0200, Andrzej Pietrasiewicz wrote:
> > > > Use the ddc pointer provided by the generic connector.
> > > > 
> > > > Signed-off-by: Andrzej Pietrasiewicz 
> > > > ---
> > > >   drivers/gpu/drm/tilcdc/tilcdc_tfp410.c | 1 +
> > > >   1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c
> > > > b/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c
> > > > index 62d014c20988..c373edb95666 100644
> > > > --- a/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c
> > > > +++ b/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c
> > > > @@ -219,6 +219,7 @@ static struct drm_connector
> > > > *tfp410_connector_create(struct drm_device *dev,
> > > >   tfp410_connector->mod = mod;
> > > > connector = _connector->base;
> > > > +connector->ddc = mod->i2c;
> > > > drm_connector_init(dev, connector, _connector_funcs,
> > > >   DRM_MODE_CONNECTOR_DVID);
> > > 
> > > When reading this code, it looks strange that we set connector->ddc
> > > *before* the call to init the connector.
> > > One could risk that drm_connector_init() used memset(..) to clear all
> > > fields or so, and it would break this order.
> > 
> > I verified the code of drm_connector_init() and cannot find any memset()
> > invocations there. What is your actual concern?
> 
> I think this echoes my concern about the implicit order of operation. It
> seems too easy to get this wrong. If you don't want to add an additional
> interface for setting the ddc field, why not add a dedicated initializer
> function that sets the ddc field? Something like this.
> 
> int drm_connector_init_with_ddc(connector, funcs, ..., ddc)
> {
>   ret = drm_connector_init(connector, funcs, ...);
>   if (ret)
>   return ret;
> 
>   if (!ddc)
>   return 0;
> 
>   connector->ddc = ddc;
>   /* set up sysfs */
> 

I know this comment comes late to the party, but I'm a slightly
suprised to see the above instead of implementing drm_connector_init
in terms of drm_connector_init_with_ddc, as we typically do.

Namely, something along these lines (code might not even build!):

--8<-
diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index d49e19f3de3a..dbd095933175 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -179,11 +179,12 @@ void drm_connector_free_work_fn(struct work_struct *work)
 }
 
 /**
- * drm_connector_init - Init a preallocated connector
+ * drm_connector_init_with_ddc - Init a preallocated connector
  * @dev: DRM device
  * @connector: the connector to init
  * @funcs: callbacks for this connector
  * @connector_type: user visible type of the connector
+ * @ddc: pointer to the associated ddc adapter (optional)
  *
  * Initialises a preallocated connector. Connectors should be
  * subclassed as part of driver connector objects.
@@ -191,10 +192,11 @@ void drm_connector_free_work_fn(struct work_struct *work)
  * Returns:
  * Zero on success, error code on failure.
  */
-int drm_connector_init(struct drm_device *dev,
-  struct drm_connector *connector,
-  const struct drm_connector_funcs *funcs,
-  int connector_type)
+int drm_connector_init_with_ddc(struct drm_device *dev,
+   struct drm_connector *connector,
+   const struct drm_connector_funcs *funcs,
+   int connector_type,
+   struct i2c_adapter *ddc)
 {
struct drm_mode_config *config = >mode_config;
int ret;
@@ -215,6 +217,9 @@ int drm_connector_init(struct drm_device *dev,
connector->dev = dev;
connector->funcs = funcs;
 
+   /* provide ddc symlink in sysfs */
+   connector->ddc = ddc;
+
/* connector index is used with 32bit bitmasks */
ret = ida_simple_get(>connector_ida, 0, 32, GFP_KERNEL);
if (ret < 0) {
@@ -295,41 +300,6 @@ int drm_connector_init(struct drm_device *dev,
 
return ret;
 }
-EXPORT_SYMBOL(drm_connector_init);
-
-/**
- * drm_connector_init_with_ddc - Init a preallocated connector
- * @dev: DRM device
- * @connector: the connector to init
- * @funcs: callbacks for this connector
- * @connector_type: user visible type of the connector
- * @ddc: pointer to the associated ddc adapter
- *
- * Initialises a preallocated connector. Connectors should be
- * subclassed as part of driver connector objects.
- *
- * Ensures that the ddc field of the connector is correctly set.
- *
- * Returns:
- * Zero on success, error code on failure.
- */
-int 

[pull] amdgpu, amdkfd drm-fixes-5.3

2019-07-31 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 5.3.  Nothing too major.  A few fixes for navi and some general
fixes.

The following changes since commit 4d5308e7852741318e4d40fb8d43d9311b3984ae:

  Merge tag 'drm-fixes-5.3-2019-07-24' of 
git://people.freedesktop.org/~agd5f/linux into drm-fixes (2019-07-26 14:10:26 
+1000)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/drm-fixes-5.3-2019-07-31

for you to fetch changes up to 6dee4829cfde106a8af7d0d3ba23022f8f054761:

  drm/amd/powerplay: correct UVD/VCE/VCN power status retrieval (2019-07-31 
02:02:22 -0500)


drm-fixes-5.3-2019-07-31:

amdgpu:
- Fix temperature granularity for navi
- Fix stable pstate setting for navi
- Fix VCN DPM enablement on navi
- Fix error handling on CS ioctl when processing dependencies
- Fix possible information leak in debugfs

amdkfd:
- fix memory alignment for VegaM


Alex Deucher (1):
  drm/amdgpu/powerplay: use proper revision id for navi

Christian König (1):
  drm/amdgpu: fix error handling in amdgpu_cs_process_fence_dep

Evan Quan (7):
  drm/amd/powerplay: fix null pointer dereference around dpm state relates
  drm/amd/powerplay: enable SW SMU reset functionality
  drm/amd/powerplay: add new sensor type for VCN powergate status
  drm/amd/powerplay: support VCN powergate status retrieval on Raven
  drm/amd/powerplay: support VCN powergate status retrieval for SW SMU
  drm/amd/powerplay: correct Navi10 VCN powergate control (v2)
  drm/amd/powerplay: correct UVD/VCE/VCN power status retrieval

Kent Russell (1):
  drm/amdkfd: Fix byte align on VegaM

Kevin Wang (2):
  drm/amd/powerplay: add callback function of get_thermal_temperature_range
  drm/amd/powerplay: fix temperature granularity error in smu11

Wang Xiayang (1):
  drm/amdgpu: fix a potential information leaking bug

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 26 
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 74 +++
 drivers/gpu/drm/amd/include/kgd_pp_interface.h|  1 +
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 23 ---
 drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c |  9 +++
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h|  1 -
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c| 48 +--
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 36 ++-
 drivers/gpu/drm/amd/powerplay/vega20_ppt.c| 34 ---
 11 files changed, 150 insertions(+), 107 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 25/26] drm/amdgpu: support gfx ras error injection and err_cnt query

2019-07-31 Thread Alex Deucher
From: Dennis Li 

check gfx error count in both ras querry function and
ras interrupt handler.

gfx ras is still disabled by default due to known stability
issue found in gpu reset.

Signed-off-by: Dennis Li 
Reviewed-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 ---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   |  2 ++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index ccd5863bca88..a96b0f17c619 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -600,6 +600,10 @@ int amdgpu_ras_error_query(struct amdgpu_device *adev,
if (adev->umc.funcs->query_ras_error_count)
adev->umc.funcs->query_ras_error_count(adev, _data);
break;
+   case AMDGPU_RAS_BLOCK__GFX:
+   if (adev->gfx.funcs->query_ras_error_count)
+   adev->gfx.funcs->query_ras_error_count(adev, _data);
+   break;
default:
break;
}
@@ -637,13 +641,22 @@ int amdgpu_ras_error_inject(struct amdgpu_device *adev,
if (!obj)
return -EINVAL;
 
-   if (block_info.block_id != TA_RAS_BLOCK__UMC) {
+   switch (info->head.block) {
+   case AMDGPU_RAS_BLOCK__GFX:
+   if (adev->gfx.funcs->ras_error_inject)
+   ret = adev->gfx.funcs->ras_error_inject(adev, info);
+   else
+   ret = -EINVAL;
+   break;
+   case AMDGPU_RAS_BLOCK__UMC:
+   ret = psp_ras_trigger_error(>psp, _info);
+   break;
+   default:
DRM_INFO("%s error injection is not supported yet\n",
 ras_block_str(info->head.block));
-   return -EINVAL;
+   ret = -EINVAL;
}
 
-   ret = psp_ras_trigger_error(>psp, _info);
if (ret)
DRM_ERROR("RAS ERROR: inject %s error failed ret %d\n",
ras_block_str(info->head.block),
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d7902e782be4..206176710f79 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -5607,6 +5607,8 @@ static int gfx_v9_0_process_ras_data_cb(struct 
amdgpu_device *adev,
 {
/* TODO ue will trigger an interrupt. */
kgd2kfd_set_sram_ecc_flag(adev->kfd.dev);
+   if (adev->gfx.funcs->query_ras_error_count)
+   adev->gfx.funcs->query_ras_error_count(adev, err_data);
amdgpu_ras_reset_gpu(adev, 0);
return AMDGPU_RAS_UE;
 }
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 24/26] drm/amdgpu: add RAS callback for gfx

2019-07-31 Thread Alex Deucher
From: Dennis Li 

Add functions for RAS error inject and query error counter

Signed-off-by: Dennis Li 
Reviewed-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |   2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 530 +++-
 2 files changed, 531 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 1199b5828b90..554a59b3c4a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -196,6 +196,8 @@ struct amdgpu_gfx_funcs {
uint32_t *dst);
void (*select_me_pipe_q)(struct amdgpu_device *adev, u32 me, u32 pipe,
 u32 queue, u32 vmid);
+   int (*ras_error_inject)(struct amdgpu_device *adev, void *inject_if);
+   int (*query_ras_error_count) (struct amdgpu_device *adev, void 
*ras_error_status);
 };
 
 struct amdgpu_ngg_buf {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index fba552b93cc8..d7902e782be4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -318,6 +318,135 @@ enum ta_ras_gfx_subblock {
TA_RAS_BLOCK__UTC_ATCL2_CACHE_4K_BANK,
TA_RAS_BLOCK__GFX_MAX
 };
+
+struct ras_gfx_subblock {
+   unsigned char *name;
+   int ta_subblock;
+   int supported_error_type;
+};
+
+#define AMDGPU_RAS_SUB_BLOCK(subblock, a, b, c, d) 
\
+   [AMDGPU_RAS_BLOCK__##subblock] = { \
+   #subblock, \
+   TA_RAS_BLOCK__##subblock,  \
+   ((a) | ((b) << 1) | ((c) << 2) | ((d) << 3)),  \
+   }
+
+static const struct ras_gfx_subblock ras_gfx_subblocks[] = {
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPC_SCRATCH, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPC_UCODE, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_DC_STATE_ME1, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_DC_CSINVOC_ME1, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_DC_RESTORE_ME1, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_DC_STATE_ME2, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_DC_CSINVOC_ME2, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_DC_RESTORE_ME2, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPF_ROQ_ME2, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPF_ROQ_ME1, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPF_TAG, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPG_DMA_ROQ, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPG_DMA_TAG, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_CPG_TAG, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_MEM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_INPUT_QUEUE, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PHY_CMD_RAM_MEM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PHY_DATA_RAM_MEM, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PIPE_MEM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SPI_SR_MEM, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_SGPR, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_LDS_D, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_LDS_I, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_VGPR, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_UTCL1_LFIFO, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU0_WRITE_DATA_BUF, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU0_UTCL1_LFIFO, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU1_WRITE_DATA_BUF, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU1_UTCL1_LFIFO, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU2_WRITE_DATA_BUF, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU2_UTCL1_LFIFO, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_TAG_RAM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_UTCL1_MISS_FIFO, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_MISS_FIFO, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_BANK_RAM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_TAG_RAM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_HIT_FIFO, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_MISS_FIFO, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_DIRTY_BIT_RAM, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_BANK_RAM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_TAG_RAM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_UTCL1_MISS_FIFO, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_MISS_FIFO, 1, 0, 0, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_BANK_RAM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKB_TAG_RAM, 0, 1, 1, 1),
+   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKB_HIT_FIFO, 1, 0, 0, 1),
+   

[PATCH 22/26] drm/amd/include: add define of TCP_EDC_CNT_NEW

2019-07-31 Thread Alex Deucher
From: Dennis Li 

Change-Id: Iedd4bac2187e3b800662485d4623ace246af3f36
Signed-off-by: Dennis Li 
Reviewed-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
index f1d048e0ed2c..ca16d9125fbc 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
@@ -1700,6 +1700,8 @@
 #define mmTCP_BUFFER_ADDR_HASH_CNTL_BASE_IDX   
0
 #define mmTCP_EDC_CNT  
0x0b17
 #define mmTCP_EDC_CNT_BASE_IDX 
0
+#define mmTCP_EDC_CNT_NEW  
0x0b18
+#define mmTCP_EDC_CNT_NEW_BASE_IDX 
0
 #define mmTC_CFG_L1_LOAD_POLICY0   
0x0b1a
 #define mmTC_CFG_L1_LOAD_POLICY0_BASE_IDX  
0
 #define mmTC_CFG_L1_LOAD_POLICY1   
0x0b1b
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 26/26] drm/amdgpu: disable inject for failed subblocks of gfx

2019-07-31 Thread Alex Deucher
From: Dennis Li 

some subblocks of gfx fail in inject test, disable them

Change-Id: I54176e291cf5d58a94838ec688a96289c6cebb46
Signed-off-by: Dennis Li 
Reviewed-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 281 +++---
 1 file changed, 165 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 206176710f79..bcd0301eee1e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -322,129 +322,166 @@ enum ta_ras_gfx_subblock {
 struct ras_gfx_subblock {
unsigned char *name;
int ta_subblock;
-   int supported_error_type;
+   int hw_supported_error_type;
+   int sw_supported_error_type;
 };
 
-#define AMDGPU_RAS_SUB_BLOCK(subblock, a, b, c, d) 
\
+#define AMDGPU_RAS_SUB_BLOCK(subblock, a, b, c, d, e, f, g, h) 
\
[AMDGPU_RAS_BLOCK__##subblock] = { \
#subblock, \
TA_RAS_BLOCK__##subblock,  \
((a) | ((b) << 1) | ((c) << 2) | ((d) << 3)),  \
+   (((e) << 1) | ((f) << 3) | (g) | ((h) << 2)),  \
}
 
 static const struct ras_gfx_subblock ras_gfx_subblocks[] = {
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPC_SCRATCH, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPC_UCODE, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_DC_STATE_ME1, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_DC_CSINVOC_ME1, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_DC_RESTORE_ME1, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_DC_STATE_ME2, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_DC_CSINVOC_ME2, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_DC_RESTORE_ME2, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPF_ROQ_ME2, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPF_ROQ_ME1, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPF_TAG, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPG_DMA_ROQ, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPG_DMA_TAG, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_CPG_TAG, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_MEM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_INPUT_QUEUE, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PHY_CMD_RAM_MEM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PHY_DATA_RAM_MEM, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_GDS_OA_PIPE_MEM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SPI_SR_MEM, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_SGPR, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_LDS_D, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_LDS_I, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQ_VGPR, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_UTCL1_LFIFO, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU0_WRITE_DATA_BUF, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU0_UTCL1_LFIFO, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU1_WRITE_DATA_BUF, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU1_UTCL1_LFIFO, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU2_WRITE_DATA_BUF, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_CU2_UTCL1_LFIFO, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_TAG_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_UTCL1_MISS_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_MISS_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKA_BANK_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_TAG_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_HIT_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_MISS_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_DIRTY_BIT_RAM, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKA_BANK_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_TAG_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_UTCL1_MISS_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_MISS_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_INST_BANKB_BANK_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKB_TAG_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKB_HIT_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKB_MISS_FIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKB_DIRTY_BIT_RAM, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_SQC_DATA_BANKB_BANK_RAM, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_TA_FS_DFIFO, 0, 1, 1, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_TA_FS_AFIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_TA_FL_LFIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_TA_FX_LFIFO, 1, 0, 0, 1),
-   AMDGPU_RAS_SUB_BLOCK(GFX_TA_FS_CFIFO, 1, 0, 0, 1),
-  

[PATCH 21/26] drm/amd/include: add bitfield define for EDC registers

2019-07-31 Thread Alex Deucher
From: Dennis Li 

Add EDC registers to support VEGA20 RAS

Change-Id: Iafa8029135aa407edc0c77f1779a1cb9982c1492
Signed-off-by: Dennis Li 
Reviewed-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  | 157 ++
 1 file changed, 157 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
index 2e1214be67a2..064c4bb1dc62 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
@@ -21,6 +21,105 @@
 #ifndef _gc_9_0_SH_MASK_HEADER
 #define _gc_9_0_SH_MASK_HEADER
 
+//GCEA_EDC_CNT
+#define GCEA_EDC_CNT__DRAMRD_CMDMEM_SEC_COUNT__SHIFT   
   0x0
+#define GCEA_EDC_CNT__DRAMRD_CMDMEM_DED_COUNT__SHIFT   
   0x2
+#define GCEA_EDC_CNT__DRAMWR_CMDMEM_SEC_COUNT__SHIFT   
   0x4
+#define GCEA_EDC_CNT__DRAMWR_CMDMEM_DED_COUNT__SHIFT   
   0x6
+#define GCEA_EDC_CNT__DRAMWR_DATAMEM_SEC_COUNT__SHIFT  
   0x8
+#define GCEA_EDC_CNT__DRAMWR_DATAMEM_DED_COUNT__SHIFT  
   0xa
+#define GCEA_EDC_CNT__RRET_TAGMEM_SEC_COUNT__SHIFT 
   0xc
+#define GCEA_EDC_CNT__RRET_TAGMEM_DED_COUNT__SHIFT 
   0xe
+#define GCEA_EDC_CNT__WRET_TAGMEM_SEC_COUNT__SHIFT 
   0x10
+#define GCEA_EDC_CNT__WRET_TAGMEM_DED_COUNT__SHIFT 
   0x12
+#define GCEA_EDC_CNT__DRAMRD_PAGEMEM_SED_COUNT__SHIFT  
   0x14
+#define GCEA_EDC_CNT__DRAMWR_PAGEMEM_SED_COUNT__SHIFT  
   0x16
+#define GCEA_EDC_CNT__IORD_CMDMEM_SED_COUNT__SHIFT 
   0x18
+#define GCEA_EDC_CNT__IOWR_CMDMEM_SED_COUNT__SHIFT 
   0x1a
+#define GCEA_EDC_CNT__IOWR_DATAMEM_SED_COUNT__SHIFT
   0x1c
+#define GCEA_EDC_CNT__DRAMRD_CMDMEM_SEC_COUNT_MASK 
   0x0003L
+#define GCEA_EDC_CNT__DRAMRD_CMDMEM_DED_COUNT_MASK 
   0x000CL
+#define GCEA_EDC_CNT__DRAMWR_CMDMEM_SEC_COUNT_MASK 
   0x0030L
+#define GCEA_EDC_CNT__DRAMWR_CMDMEM_DED_COUNT_MASK 
   0x00C0L
+#define GCEA_EDC_CNT__DRAMWR_DATAMEM_SEC_COUNT_MASK
   0x0300L
+#define GCEA_EDC_CNT__DRAMWR_DATAMEM_DED_COUNT_MASK
   0x0C00L
+#define GCEA_EDC_CNT__RRET_TAGMEM_SEC_COUNT_MASK   
   0x3000L
+#define GCEA_EDC_CNT__RRET_TAGMEM_DED_COUNT_MASK   
   0xC000L
+#define GCEA_EDC_CNT__WRET_TAGMEM_SEC_COUNT_MASK   
   0x0003L
+#define GCEA_EDC_CNT__WRET_TAGMEM_DED_COUNT_MASK   
   0x000CL
+#define GCEA_EDC_CNT__DRAMRD_PAGEMEM_SED_COUNT_MASK
   0x0030L
+#define GCEA_EDC_CNT__DRAMWR_PAGEMEM_SED_COUNT_MASK
   0x00C0L
+#define GCEA_EDC_CNT__IORD_CMDMEM_SED_COUNT_MASK   
   0x0300L
+#define GCEA_EDC_CNT__IOWR_CMDMEM_SED_COUNT_MASK   
   0x0C00L
+#define GCEA_EDC_CNT__IOWR_DATAMEM_SED_COUNT_MASK  
   0x3000L
+
+#define GCEA_EDC_CNT2__GMIRD_CMDMEM_SEC_COUNT__SHIFT   
   0x0
+#define GCEA_EDC_CNT2__GMIRD_CMDMEM_DED_COUNT__SHIFT   
   0x2
+#define GCEA_EDC_CNT2__GMIWR_CMDMEM_SEC_COUNT__SHIFT   
   0x4
+#define GCEA_EDC_CNT2__GMIWR_CMDMEM_DED_COUNT__SHIFT   
   0x6
+#define GCEA_EDC_CNT2__GMIWR_DATAMEM_SEC_COUNT__SHIFT  
   0x8
+#define GCEA_EDC_CNT2__GMIWR_DATAMEM_DED_COUNT__SHIFT  
  

[PATCH 15/26] drm/amdgpu: add structures for umc error address translation

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

add related registers, callback function and channel index table

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c   | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
index f5d6def96414..dfa1a39e57af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
@@ -24,6 +24,8 @@
 struct amdgpu_umc_funcs {
void (*query_ras_error_count)(struct amdgpu_device *adev,
void *ras_error_status);
+   void (*query_ras_error_address)(struct amdgpu_device *adev,
+   void *ras_error_status);
 };
 
 struct amdgpu_umc {
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 5b1ccb81b3a2..e05f3e68edb0 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -29,6 +29,16 @@
 #include "umc/umc_6_1_1_offset.h"
 #include "umc/umc_6_1_1_sh_mask.h"
 
+#define smnMCA_UMC0_MCUMC_ADDRT0   0x50f10
+
+static uint32_t
+   
umc_v6_1_channel_idx_tbl[UMC_V6_1_UMC_INSTANCE_NUM][UMC_V6_1_CHANNEL_INSTANCE_NUM]
 = {
+   {2, 18, 11, 27},{4, 20, 13, 29},
+   {1, 17, 8, 24}, {7, 23, 14, 30},
+   {10, 26, 3, 19},{12, 28, 5, 21},
+   {9, 25, 0, 16}, {15, 31, 6, 22}
+};
+
 static void umc_v6_1_enable_umc_index_mode(struct amdgpu_device *adev,
   uint32_t umc_instance)
 {
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 23/26] drm/amdgpu: add define for gfx ras subblock

2019-07-31 Thread Alex Deucher
From: Dennis Li 

Change-Id: Ib4b019b2bcbe6ef0b85ef170e7cf032bfa400553
Signed-off-by: Dennis Li 
Reviewed-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 230 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 201 +
 2 files changed, 431 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 2c86a5135ec9..2765f2dbb1e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -52,6 +52,236 @@ enum amdgpu_ras_block {
 #define AMDGPU_RAS_BLOCK_COUNT AMDGPU_RAS_BLOCK__LAST
 #define AMDGPU_RAS_BLOCK_MASK  ((1ULL << AMDGPU_RAS_BLOCK_COUNT) - 1)
 
+enum amdgpu_ras_gfx_subblock {
+   /* CPC */
+   AMDGPU_RAS_BLOCK__GFX_CPC_INDEX_START = 0,
+   AMDGPU_RAS_BLOCK__GFX_CPC_SCRATCH =
+   AMDGPU_RAS_BLOCK__GFX_CPC_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_CPC_UCODE,
+   AMDGPU_RAS_BLOCK__GFX_DC_STATE_ME1,
+   AMDGPU_RAS_BLOCK__GFX_DC_CSINVOC_ME1,
+   AMDGPU_RAS_BLOCK__GFX_DC_RESTORE_ME1,
+   AMDGPU_RAS_BLOCK__GFX_DC_STATE_ME2,
+   AMDGPU_RAS_BLOCK__GFX_DC_CSINVOC_ME2,
+   AMDGPU_RAS_BLOCK__GFX_DC_RESTORE_ME2,
+   AMDGPU_RAS_BLOCK__GFX_CPC_INDEX_END =
+   AMDGPU_RAS_BLOCK__GFX_DC_RESTORE_ME2,
+   /* CPF */
+   AMDGPU_RAS_BLOCK__GFX_CPF_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_CPF_ROQ_ME2 =
+   AMDGPU_RAS_BLOCK__GFX_CPF_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_CPF_ROQ_ME1,
+   AMDGPU_RAS_BLOCK__GFX_CPF_TAG,
+   AMDGPU_RAS_BLOCK__GFX_CPF_INDEX_END = AMDGPU_RAS_BLOCK__GFX_CPF_TAG,
+   /* CPG */
+   AMDGPU_RAS_BLOCK__GFX_CPG_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_CPG_DMA_ROQ =
+   AMDGPU_RAS_BLOCK__GFX_CPG_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_CPG_DMA_TAG,
+   AMDGPU_RAS_BLOCK__GFX_CPG_TAG,
+   AMDGPU_RAS_BLOCK__GFX_CPG_INDEX_END = AMDGPU_RAS_BLOCK__GFX_CPG_TAG,
+   /* GDS */
+   AMDGPU_RAS_BLOCK__GFX_GDS_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_GDS_MEM = AMDGPU_RAS_BLOCK__GFX_GDS_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_GDS_INPUT_QUEUE,
+   AMDGPU_RAS_BLOCK__GFX_GDS_OA_PHY_CMD_RAM_MEM,
+   AMDGPU_RAS_BLOCK__GFX_GDS_OA_PHY_DATA_RAM_MEM,
+   AMDGPU_RAS_BLOCK__GFX_GDS_OA_PIPE_MEM,
+   AMDGPU_RAS_BLOCK__GFX_GDS_INDEX_END =
+   AMDGPU_RAS_BLOCK__GFX_GDS_OA_PIPE_MEM,
+   /* SPI */
+   AMDGPU_RAS_BLOCK__GFX_SPI_SR_MEM,
+   /* SQ */
+   AMDGPU_RAS_BLOCK__GFX_SQ_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_SQ_SGPR = AMDGPU_RAS_BLOCK__GFX_SQ_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_SQ_LDS_D,
+   AMDGPU_RAS_BLOCK__GFX_SQ_LDS_I,
+   AMDGPU_RAS_BLOCK__GFX_SQ_VGPR,
+   AMDGPU_RAS_BLOCK__GFX_SQ_INDEX_END = AMDGPU_RAS_BLOCK__GFX_SQ_VGPR,
+   /* SQC (3 ranges) */
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX_START,
+   /* SQC range 0 */
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX0_START =
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX_START,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_UTCL1_LFIFO =
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX0_START,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_CU0_WRITE_DATA_BUF,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_CU0_UTCL1_LFIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_CU1_WRITE_DATA_BUF,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_CU1_UTCL1_LFIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_CU2_WRITE_DATA_BUF,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_CU2_UTCL1_LFIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX0_END =
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_CU2_UTCL1_LFIFO,
+   /* SQC range 1 */
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX1_START,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKA_TAG_RAM =
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX1_START,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKA_UTCL1_MISS_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKA_MISS_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKA_BANK_RAM,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKA_TAG_RAM,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKA_HIT_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKA_MISS_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKA_DIRTY_BIT_RAM,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKA_BANK_RAM,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX1_END =
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKA_BANK_RAM,
+   /* SQC range 2 */
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX2_START,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKB_TAG_RAM =
+   AMDGPU_RAS_BLOCK__GFX_SQC_INDEX2_START,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKB_UTCL1_MISS_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKB_MISS_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_INST_BANKB_BANK_RAM,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKB_TAG_RAM,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKB_HIT_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKB_MISS_FIFO,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKB_DIRTY_BIT_RAM,
+   AMDGPU_RAS_BLOCK__GFX_SQC_DATA_BANKB_BANK_RAM,
+   

[PATCH 18/26] drm/amdgpu: update interrupt callback for all ras clients

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

add err_data parameter in interrupt cb for ras clients

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 ++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index bd7f431a24d9..048803c75048 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3951,6 +3951,7 @@ static int gfx_v9_0_early_init(void *handle)
 }
 
 static int gfx_v9_0_process_ras_data_cb(struct amdgpu_device *adev,
+   struct ras_err_data *err_data,
struct amdgpu_iv_entry *entry);
 
 static int gfx_v9_0_ecc_late_init(void *handle)
@@ -5265,6 +5266,7 @@ static int gfx_v9_0_priv_inst_irq(struct amdgpu_device 
*adev,
 }
 
 static int gfx_v9_0_process_ras_data_cb(struct amdgpu_device *adev,
+   struct ras_err_data *err_data,
struct amdgpu_iv_entry *entry)
 {
/* TODO ue will trigger an interrupt. */
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index a02bc633a89a..ee06cbe2a7e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -239,12 +239,12 @@ static int gmc_v9_0_ecc_interrupt_state(struct 
amdgpu_device *adev,
 }
 
 static int gmc_v9_0_process_ras_data_cb(struct amdgpu_device *adev,
+   struct ras_err_data *err_data,
struct amdgpu_iv_entry *entry)
 {
-   struct ras_err_data err_data = {0, 0};
kgd2kfd_set_sram_ecc_flag(adev->kfd.dev);
if (adev->umc.funcs->query_ras_error_count)
-   adev->umc.funcs->query_ras_error_count(adev, _data);
+   adev->umc.funcs->query_ras_error_count(adev, err_data);
amdgpu_ras_reset_gpu(adev, 0);
return AMDGPU_RAS_UE;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 36dc5025c461..2ffc9a41d8b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1624,6 +1624,7 @@ static int sdma_v4_0_early_init(void *handle)
 }
 
 static int sdma_v4_0_process_ras_data_cb(struct amdgpu_device *adev,
+   struct ras_err_data *err_data,
struct amdgpu_iv_entry *entry);
 
 static int sdma_v4_0_late_init(void *handle)
@@ -1957,6 +1958,7 @@ static int sdma_v4_0_process_trap_irq(struct 
amdgpu_device *adev,
 }
 
 static int sdma_v4_0_process_ras_data_cb(struct amdgpu_device *adev,
+   struct ras_err_data *err_data,
struct amdgpu_iv_entry *entry)
 {
uint32_t instance, err_source;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 13/26] drm/amdgpu: update algorithm of umc uncorrectable error counting

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

remove the check of ErrorCodeExt

v2: refine the if condition for ue counting

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 8fbd81d3ce70..5b1ccb81b3a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -113,12 +113,12 @@ static void 
umc_v6_1_querry_uncorrectable_error_count(struct amdgpu_device *adev
 
/* check the MCUMC_STATUS */
mc_umc_status = RREG64(mc_umc_status_addr + umc_reg_offset);
-   if (REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 
&&
-   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, 
ErrorCodeExt) == 6 &&
-   (REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, 
UECC) == 1 ||
-   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, PCC) 
== 1 ||
-   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, UC) 
== 1 ||
-   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, TCC) 
== 1))
+   if ((REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 
1) &&
+   (REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, 
Deferred) == 1 ||
+   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, UECC) == 
1 ||
+   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, PCC) == 1 
||
+   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, UC) == 1 
||
+   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, TCC) == 
1))
*error_count += 1;
 }
 
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 11/26] drm/amdgpu: use 64bit operation macros for umc

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

replace some 32bit macros with 64bit operations to simplify code

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 1ca5ae642946..8fbd81d3ce70 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -94,18 +94,11 @@ static void umc_v6_1_query_correctable_error_count(struct 
amdgpu_device *adev,
 
/* check for SRAM correctable error
  MCUMC_STATUS is a 64 bit register */
-   mc_umc_status =
-   RREG32(mc_umc_status_addr + umc_reg_offset);
-   mc_umc_status |=
-   (uint64_t)RREG32(mc_umc_status_addr + umc_reg_offset + 1) << 32;
+   mc_umc_status = RREG64(mc_umc_status_addr + umc_reg_offset);
if (REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, 
ErrorCodeExt) == 6 &&
REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 
&&
REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, CECC) == 
1)
*error_count += 1;
-
-   /* clear the MCUMC_STATUS */
-   WREG32(mc_umc_status_addr + umc_reg_offset, 0);
-   WREG32(mc_umc_status_addr + umc_reg_offset + 1, 0);
 }
 
 static void umc_v6_1_querry_uncorrectable_error_count(struct amdgpu_device 
*adev,
@@ -119,10 +112,7 @@ static void 
umc_v6_1_querry_uncorrectable_error_count(struct amdgpu_device *adev
 SOC15_REG_OFFSET(UMC, 0, mmMCA_UMC_UMC0_MCUMC_STATUST0);
 
/* check the MCUMC_STATUS */
-   mc_umc_status = RREG32(mc_umc_status_addr + umc_reg_offset);
-   mc_umc_status |=
-   (uint64_t)RREG32(mc_umc_status_addr + umc_reg_offset + 1) << 32;
-
+   mc_umc_status = RREG64(mc_umc_status_addr + umc_reg_offset);
if (REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 
&&
REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, 
ErrorCodeExt) == 6 &&
(REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, 
UECC) == 1 ||
@@ -130,17 +120,16 @@ static void 
umc_v6_1_querry_uncorrectable_error_count(struct amdgpu_device *adev
REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, UC) 
== 1 ||
REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, TCC) 
== 1))
*error_count += 1;
-
-   /* clear the MCUMC_STATUS */
-   WREG32(mc_umc_status_addr + umc_reg_offset, 0);
-   WREG32(mc_umc_status_addr + umc_reg_offset + 1, 0);
 }
 
 static void umc_v6_1_query_ras_error_count(struct amdgpu_device *adev,
   void *ras_error_status)
 {
struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
-   uint32_t umc_inst, channel_inst, umc_reg_offset;
+   uint32_t umc_inst, channel_inst, umc_reg_offset, mc_umc_status_addr;
+
+   mc_umc_status_addr =
+   SOC15_REG_OFFSET(UMC, 0, mmMCA_UMC_UMC0_MCUMC_STATUST0);
 
for (umc_inst = 0; umc_inst < UMC_V6_1_UMC_INSTANCE_NUM; umc_inst++) {
/* enable the index mode to query eror count per channel */
@@ -152,6 +141,8 @@ static void umc_v6_1_query_ras_error_count(struct 
amdgpu_device *adev,
   
&(err_data->ce_count));
umc_v6_1_querry_uncorrectable_error_count(adev, 
umc_reg_offset,
  
&(err_data->ue_count));
+   /* clear umc status */
+   WREG64(mc_umc_status_addr + umc_reg_offset, 0x0ULL);
}
}
umc_v6_1_disable_umc_index_mode(adev);
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 20/26] drm/amdgpu: remove ras_reserve_vram in ras injection

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

error injection address is not in gpu address space

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index a87deb7be414..ccd5863bca88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -60,6 +60,9 @@ const char *ras_block_string[] = {
 #define AMDGPU_RAS_FLAG_INIT_NEED_RESET2
 #define RAS_DEFAULT_FLAGS (AMDGPU_RAS_FLAG_INIT_BY_VBIOS)
 
+/* inject address is 52 bits */
+#defineRAS_UMC_INJECT_ADDR_LIMIT   (0x1ULL << 52)
+
 static int amdgpu_ras_reserve_vram(struct amdgpu_device *adev,
uint64_t offset, uint64_t size,
struct amdgpu_bo **bo_ptr);
@@ -245,7 +248,6 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file 
*f, const char __user *
 {
struct amdgpu_device *adev = (struct amdgpu_device 
*)file_inode(f)->i_private;
struct ras_debug_if data;
-   struct amdgpu_bo *bo;
int ret = 0;
 
ret = amdgpu_ras_debugfs_ctrl_parse_data(f, buf, size, pos, );
@@ -263,17 +265,14 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file 
*f, const char __user *
ret = amdgpu_ras_feature_enable(adev, , 1);
break;
case 2:
-   ret = amdgpu_ras_reserve_vram(adev,
-   data.inject.address, PAGE_SIZE, );
-   if (ret) {
-   /* address was offset, now it is absolute.*/
-   data.inject.address += adev->gmc.vram_start;
-   if (data.inject.address > adev->gmc.vram_end)
-   break;
-   } else
-   data.inject.address = amdgpu_bo_gpu_offset(bo);
+   if ((data.inject.address >= adev->gmc.mc_vram_size) ||
+   (data.inject.address >= RAS_UMC_INJECT_ADDR_LIMIT)) {
+   ret = -EINVAL;
+   break;
+   }
+
+   /* data.inject.address is offset instead of absolute gpu 
address */
ret = amdgpu_ras_error_inject(adev, );
-   amdgpu_ras_release_vram(adev, );
break;
default:
ret = -EINVAL;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 17/26] drm/amdgpu: allow ras interrupt callback to return error data

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

add error data as parameter for ras interrupt cb and process it

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 37 +
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1914f37bee59..0eeb85d8399d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1003,7 +1003,7 @@ static void amdgpu_ras_interrupt_handler(struct 
ras_manager *obj)
struct ras_ih_data *data = >ih_data;
struct amdgpu_iv_entry entry;
int ret;
-   struct ras_err_data err_data = {0, 0};
+   struct ras_err_data err_data = {0, 0, 0, NULL};
 
while (data->rptr != data->wptr) {
rmb();
@@ -1018,14 +1018,14 @@ static void amdgpu_ras_interrupt_handler(struct 
ras_manager *obj)
 * from the callback to udpate the error type/count, etc
 */
if (data->cb) {
-   ret = data->cb(obj->adev, );
+   ret = data->cb(obj->adev, _data, );
/* ue will trigger an interrupt, and in that case
 * we need do a reset to recovery the whole system.
 * But leave IP do that recovery, here we just dispatch
 * the error.
 */
if (ret == AMDGPU_RAS_UE) {
-   obj->err_data.ue_count++;
+   obj->err_data.ue_count += err_data.ue_count;
}
/* Might need get ce count by register, but not all IP
 * saves ce count, some IP just use one bit or two bits
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 0920db7aff34..2c86a5135ec9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -76,9 +76,6 @@ struct ras_common_if {
char name[32];
 };
 
-typedef int (*ras_ih_cb)(struct amdgpu_device *adev,
-   struct amdgpu_iv_entry *entry);
-
 struct amdgpu_ras {
/* ras infrastructure */
/* for ras itself. */
@@ -108,21 +105,6 @@ struct amdgpu_ras {
uint32_t flags;
 };
 
-struct ras_ih_data {
-   /* interrupt bottom half */
-   struct work_struct ih_work;
-   int inuse;
-   /* IP callback */
-   ras_ih_cb cb;
-   /* full of entries */
-   unsigned char *ring;
-   unsigned int ring_size;
-   unsigned int element_size;
-   unsigned int aligned_element_size;
-   unsigned int rptr;
-   unsigned int wptr;
-};
-
 struct ras_fs_data {
char sysfs_name[32];
char debugfs_name[32];
@@ -149,6 +131,25 @@ struct ras_err_handler_data {
int last_reserved;
 };
 
+typedef int (*ras_ih_cb)(struct amdgpu_device *adev,
+   struct ras_err_data *err_data,
+   struct amdgpu_iv_entry *entry);
+
+struct ras_ih_data {
+   /* interrupt bottom half */
+   struct work_struct ih_work;
+   int inuse;
+   /* IP callback */
+   ras_ih_cb cb;
+   /* full of entries */
+   unsigned char *ring;
+   unsigned int ring_size;
+   unsigned int element_size;
+   unsigned int aligned_element_size;
+   unsigned int rptr;
+   unsigned int wptr;
+};
+
 struct ras_manager {
struct ras_common_if head;
/* reference count */
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 08/26] drm/amdgpu: querry umc error count

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

check umc error count in both ras querry function and
ras interrupt handler

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 ++-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 3be306bf1603..845e75f35b19 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -586,11 +586,19 @@ int amdgpu_ras_error_query(struct amdgpu_device *adev,
struct ras_query_if *info)
 {
struct ras_manager *obj = amdgpu_ras_find_obj(adev, >head);
+   struct ras_err_data err_data = {0, 0};
 
if (!obj)
return -EINVAL;
-   /* TODO might read the register to read the count */
 
+   switch (info->head.block) {
+   case AMDGPU_RAS_BLOCK__UMC:
+   if (adev->umc_funcs->query_ras_error_count)
+   adev->umc_funcs->query_ras_error_count(adev, _data);
+   break;
+   default:
+   break;
+   }
info->ue_count = obj->err_data.ue_count;
info->ce_count = obj->err_data.ce_count;
 
@@ -984,6 +992,7 @@ static void amdgpu_ras_interrupt_handler(struct ras_manager 
*obj)
struct ras_ih_data *data = >ih_data;
struct amdgpu_iv_entry entry;
int ret;
+   struct ras_err_data err_data = {0, 0};
 
while (data->rptr != data->wptr) {
rmb();
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 4116841eb0e3..2748bd110fab 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -241,7 +241,10 @@ static int gmc_v9_0_ecc_interrupt_state(struct 
amdgpu_device *adev,
 static int gmc_v9_0_process_ras_data_cb(struct amdgpu_device *adev,
struct amdgpu_iv_entry *entry)
 {
+   struct ras_err_data err_data = {0, 0};
kgd2kfd_set_sram_ecc_flag(adev->kfd.dev);
+   if (adev->umc_funcs->query_ras_error_count)
+   adev->umc_funcs->query_ras_error_count(adev, _data);
amdgpu_ras_reset_gpu(adev, 0);
return AMDGPU_RAS_UE;
 }
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 07/26] drm/amdgpu: init umc v6_1 functions for vega20

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

init umc callback function for vega20 in sw early init phase

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 5bd937ffc3ad..4116841eb0e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -45,6 +45,7 @@
 #include "mmhub_v1_0.h"
 #include "gfxhub_v1_1.h"
 #include "mmhub_v9_4.h"
+#include "umc_v6_1.h"
 
 #include "ivsrcid/vmc/irqsrcs_vmc_1_0.h"
 
@@ -623,12 +624,24 @@ static void gmc_v9_0_set_gmc_funcs(struct amdgpu_device 
*adev)
adev->gmc.gmc_funcs = _v9_0_gmc_funcs;
 }
 
+static void gmc_v9_0_set_umc_funcs(struct amdgpu_device *adev)
+{
+   switch (adev->asic_type) {
+   case CHIP_VEGA20:
+   adev->umc_funcs = _v6_1_funcs;
+   break;
+   default:
+   break;
+   }
+}
+
 static int gmc_v9_0_early_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
gmc_v9_0_set_gmc_funcs(adev);
gmc_v9_0_set_irq_funcs(adev);
+   gmc_v9_0_set_umc_funcs(adev);
 
adev->gmc.shared_aperture_start = 0x2000ULL;
adev->gmc.shared_aperture_end =
@@ -717,6 +730,7 @@ static int gmc_v9_0_ecc_late_init(void *handle)
amdgpu_ras_feature_enable_on_boot(adev, _block, 0);
return 0;
}
+
/* handle resume path. */
if (*ras_if) {
/* resend ras TA enable cmd during resume.
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 14/26] drm/amdgpu: add support for recording ras error address

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

more than one error address may be recorded in one query

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index e087da46fc24..1914f37bee59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -586,7 +586,7 @@ int amdgpu_ras_error_query(struct amdgpu_device *adev,
struct ras_query_if *info)
 {
struct ras_manager *obj = amdgpu_ras_find_obj(adev, >head);
-   struct ras_err_data err_data = {0, 0};
+   struct ras_err_data err_data = {0, 0, 0, NULL};
 
if (!obj)
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 80e94d604a2e..0920db7aff34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -131,6 +131,8 @@ struct ras_fs_data {
 struct ras_err_data {
unsigned long ue_count;
unsigned long ce_count;
+   unsigned long err_addr_cnt;
+   uint64_t *err_addr;
 };
 
 struct ras_err_handler_data {
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 05/26] drm/amdgpu: add umc v6_1_1 IP headers

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

the change introduces IP headers for unified memory controller (umc)

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 .../include/asic_reg/umc/umc_6_1_1_offset.h   | 31 +++
 .../include/asic_reg/umc/umc_6_1_1_sh_mask.h  | 91 +++
 2 files changed, 122 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_offset.h
 create mode 100644 drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_sh_mask.h

diff --git a/drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_offset.h
new file mode 100644
index ..043aa695d63f
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_offset.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2019  Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+#ifndef _umc_6_1_1_OFFSET_HEADER
+#define _umc_6_1_1_OFFSET_HEADER
+
+#define mmUMCCH0_0_EccErrCntSel
0x0360
+#define mmUMCCH0_0_EccErrCntSel_BASE_IDX   
0
+#define mmUMCCH0_0_EccErrCnt   
0x0361
+#define mmUMCCH0_0_EccErrCnt_BASE_IDX  
0
+#define mmMCA_UMC_UMC0_MCUMC_STATUST0  
0x03c2
+#define mmMCA_UMC_UMC0_MCUMC_STATUST0_BASE_IDX 
0
+
+#endif
diff --git a/drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_sh_mask.h
new file mode 100644
index ..45c888280af9
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_sh_mask.h
@@ -0,0 +1,91 @@
+/*
+ * Copyright (C) 2019  Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+#ifndef _umc_6_1_1_SH_MASK_HEADER
+#define _umc_6_1_1_SH_MASK_HEADER
+
+//UMCCH0_0_EccErrCntSel
+#define UMCCH0_0_EccErrCntSel__EccErrCntCsSel__SHIFT   
   0x0
+#define UMCCH0_0_EccErrCntSel__EccErrInt__SHIFT
   0xc
+#define UMCCH0_0_EccErrCntSel__EccErrCntEn__SHIFT  
   0xf
+#define UMCCH0_0_EccErrCntSel__EccErrCntCsSel_MASK 
   0x000FL
+#define UMCCH0_0_EccErrCntSel__EccErrInt_MASK  
   0x3000L
+#define UMCCH0_0_EccErrCntSel__EccErrCntEn_MASK
   0x8000L
+//UMCCH0_0_EccErrCnt
+#define UMCCH0_0_EccErrCnt__EccErrCnt__SHIFT 

[PATCH 06/26] drm/amdgpu: add umc v6_1 query error count support

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

Implement umc query_ras_error_count function to support querry
both correctable and uncorrectable error

Signed-off-by: Hawking Zhang 
Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   4 +
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 162 ++
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.h |  39 +++
 3 files changed, 205 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/umc_v6_1.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 7a1a78c7b329..cc38a6836825 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -81,6 +81,10 @@ amdgpu-y += \
gfxhub_v1_0.o mmhub_v1_0.o gmc_v9_0.o gfxhub_v1_1.o mmhub_v9_4.o \
gfxhub_v2_0.o mmhub_v2_0.o gmc_v10_0.o
 
+# add UMC block
+amdgpu-y += \
+   umc_v6_1.o
+
 # add IH block
 amdgpu-y += \
amdgpu_irq.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
new file mode 100644
index ..1ca5ae642946
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -0,0 +1,162 @@
+/*
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "umc_v6_1.h"
+#include "amdgpu_ras.h"
+#include "amdgpu.h"
+
+#include "rsmu/rsmu_0_0_2_offset.h"
+#include "rsmu/rsmu_0_0_2_sh_mask.h"
+#include "umc/umc_6_1_1_offset.h"
+#include "umc/umc_6_1_1_sh_mask.h"
+
+static void umc_v6_1_enable_umc_index_mode(struct amdgpu_device *adev,
+  uint32_t umc_instance)
+{
+   uint32_t rsmu_umc_index;
+
+   rsmu_umc_index = RREG32_SOC15(RSMU, 0,
+   mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU);
+   rsmu_umc_index = REG_SET_FIELD(rsmu_umc_index,
+   RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+   RSMU_UMC_INDEX_MODE_EN, 1);
+   rsmu_umc_index = REG_SET_FIELD(rsmu_umc_index,
+   RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+   RSMU_UMC_INDEX_INSTANCE, umc_instance);
+   rsmu_umc_index = REG_SET_FIELD(rsmu_umc_index,
+   RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+   RSMU_UMC_INDEX_WREN, 1 << umc_instance);
+   WREG32_SOC15(RSMU, 0, mmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+   rsmu_umc_index);
+}
+
+static void umc_v6_1_disable_umc_index_mode(struct amdgpu_device *adev)
+{
+   WREG32_FIELD15(RSMU, 0, RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU,
+   RSMU_UMC_INDEX_MODE_EN, 0);
+}
+
+static void umc_v6_1_query_correctable_error_count(struct amdgpu_device *adev,
+  uint32_t umc_reg_offset,
+  unsigned long *error_count)
+{
+   uint32_t ecc_err_cnt_sel, ecc_err_cnt_sel_addr;
+   uint32_t ecc_err_cnt, ecc_err_cnt_addr;
+   uint64_t mc_umc_status;
+   uint32_t mc_umc_status_addr;
+
+   ecc_err_cnt_sel_addr =
+   SOC15_REG_OFFSET(UMC, 0, mmUMCCH0_0_EccErrCntSel);
+   ecc_err_cnt_addr =
+   SOC15_REG_OFFSET(UMC, 0, mmUMCCH0_0_EccErrCnt);
+   mc_umc_status_addr =
+   SOC15_REG_OFFSET(UMC, 0, mmMCA_UMC_UMC0_MCUMC_STATUST0);
+
+   /* select the lower chip and check the error count */
+   ecc_err_cnt_sel = RREG32(ecc_err_cnt_sel_addr + umc_reg_offset);
+   ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, UMCCH0_0_EccErrCntSel,
+   EccErrCntCsSel, 0);
+   WREG32(ecc_err_cnt_sel_addr + umc_reg_offset, ecc_err_cnt_sel);
+   ecc_err_cnt = RREG32(ecc_err_cnt_addr + umc_reg_offset);
+   *error_count +=
+   REG_GET_FIELD(ecc_err_cnt, UMCCH0_0_EccErrCnt, EccErrCnt);
+   /* clear 

[PATCH 09/26] drm/amdgpu: add ras error count after each query (v2)

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

v1: increase ras ce/ue error count
v2: log the number of correctable and uncorrectable errors

Signed-off-by: Tao Zhou 
Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 845e75f35b19..4f81b1f6d09f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -599,9 +599,20 @@ int amdgpu_ras_error_query(struct amdgpu_device *adev,
default:
break;
}
+
+   obj->err_data.ue_count += err_data.ue_count;
+   obj->err_data.ce_count += err_data.ce_count;
+
info->ue_count = obj->err_data.ue_count;
info->ce_count = obj->err_data.ce_count;
 
+   if (err_data.ce_count)
+   dev_info(adev->dev, "%ld correctable errors detected in %s 
block\n",
+obj->err_data.ce_count, 
ras_block_str(info->head.block));
+   if (err_data.ue_count)
+   dev_info(adev->dev, "%ld uncorrectable errors detected in %s 
block\n",
+obj->err_data.ue_count, 
ras_block_str(info->head.block));
+
return 0;
 }
 
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 10/26] drm/amdgpu: add RREG64/WREG64(_PCIE) operations

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

add 64 bits register access functions

v2: implement 64 bit functions in low level

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 11 
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 73 ++
 drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +
 3 files changed, 129 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 72d0331f4ca1..3e2b623d86c7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -640,6 +640,9 @@ void amdgpu_cgs_destroy_device(struct cgs_device 
*cgs_device);
 typedef uint32_t (*amdgpu_rreg_t)(struct amdgpu_device*, uint32_t);
 typedef void (*amdgpu_wreg_t)(struct amdgpu_device*, uint32_t, uint32_t);
 
+typedef uint64_t (*amdgpu_rreg64_t)(struct amdgpu_device*, uint32_t);
+typedef void (*amdgpu_wreg64_t)(struct amdgpu_device*, uint32_t, uint64_t);
+
 typedef uint32_t (*amdgpu_block_rreg_t)(struct amdgpu_device*, uint32_t, 
uint32_t);
 typedef void (*amdgpu_block_wreg_t)(struct amdgpu_device*, uint32_t, uint32_t, 
uint32_t);
 
@@ -833,6 +836,8 @@ struct amdgpu_device {
amdgpu_wreg_t   pcie_wreg;
amdgpu_rreg_t   pciep_rreg;
amdgpu_wreg_t   pciep_wreg;
+   amdgpu_rreg64_t pcie_rreg64;
+   amdgpu_wreg64_t pcie_wreg64;
/* protects concurrent UVD register access */
spinlock_t uvd_ctx_idx_lock;
amdgpu_rreg_t   uvd_ctx_rreg;
@@ -1033,6 +1038,8 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t 
reg, uint32_t v,
uint32_t acc_flags);
 void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t 
value);
 uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset);
+uint64_t amdgpu_mm_rreg64(struct amdgpu_device *adev, uint32_t reg);
+void amdgpu_mm_wreg64(struct amdgpu_device *adev, uint32_t reg, uint64_t v);
 
 u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg);
 void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v);
@@ -1060,12 +1067,16 @@ int emu_soc_asic_init(struct amdgpu_device *adev);
 #define DREG32(reg) printk(KERN_INFO "REGISTER: " #reg " : 0x%08X\n", 
amdgpu_mm_rreg(adev, (reg), 0))
 #define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
 #define WREG32_IDX(reg, v) amdgpu_mm_wreg(adev, (reg), (v), AMDGPU_REGS_IDX)
+#define RREG64(reg) amdgpu_mm_rreg64(adev, (reg))
+#define WREG64(reg, v) amdgpu_mm_wreg64(adev, (reg), (v))
 #define REG_SET(FIELD, v) (((v) << FIELD##_SHIFT) & FIELD##_MASK)
 #define REG_GET(FIELD, v) (((v) << FIELD##_SHIFT) & FIELD##_MASK)
 #define RREG32_PCIE(reg) adev->pcie_rreg(adev, (reg))
 #define WREG32_PCIE(reg, v) adev->pcie_wreg(adev, (reg), (v))
 #define RREG32_PCIE_PORT(reg) adev->pciep_rreg(adev, (reg))
 #define WREG32_PCIE_PORT(reg, v) adev->pciep_wreg(adev, (reg), (v))
+#define RREG64_PCIE(reg) adev->pcie_rreg64(adev, (reg))
+#define WREG64_PCIE(reg, v) adev->pcie_wreg64(adev, (reg), (v))
 #define RREG32_SMC(reg) adev->smc_rreg(adev, (reg))
 #define WREG32_SMC(reg, v) adev->smc_wreg(adev, (reg), (v))
 #define RREG32_UVD_CTX(reg) adev->uvd_ctx_rreg(adev, (reg))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 127ed01ed8fd..08ba05c34782 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -260,6 +260,43 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t 
reg, uint32_t v,
}
 }
 
+/**
+ * amdgpu_mm_rreg64 - read a 64 bit memory mapped IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @reg: dword aligned register offset
+ *
+ * Returns the 64 bit value from the offset specified.
+ */
+uint64_t amdgpu_mm_rreg64(struct amdgpu_device *adev, uint32_t reg)
+{
+   uint64_t ret;
+
+   if ((reg * 4) < adev->rmmio_size)
+   ret = readq(((void __iomem *)adev->rmmio) + (reg * 4));
+   else
+   BUG();
+
+   return ret;
+}
+
+/**
+ * amdgpu_mm_wreg64 - write to a 64 bit memory mapped IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @reg: dword aligned register offset
+ * @v: 64 bit value to write to the register
+ *
+ * Writes the value specified to the offset specified.
+ */
+void amdgpu_mm_wreg64(struct amdgpu_device *adev, uint32_t reg, uint64_t v)
+{
+   if ((reg * 4) < adev->rmmio_size)
+   writeq(v, ((void __iomem *)adev->rmmio) + (reg * 4));
+   else
+   BUG();
+}
+
 /**
  * amdgpu_io_rreg - read an IO register
  *
@@ -415,6 +452,40 @@ static void amdgpu_invalid_wreg(struct amdgpu_device 
*adev, uint32_t reg, uint32
BUG();
 }
 
+/**
+ * amdgpu_invalid_rreg64 - dummy 64 bit reg read function
+ *
+ * @adev: amdgpu device pointer
+ * @reg: offset of register
+ *
+ * Dummy register read function.  Used for register blocks
+ * that certain asics don't 

[PATCH 12/26] drm/amdgpu: switch to amdgpu_umc structure

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

create new amdgpu_umc structure to for more umc
settings in future and switch to the new structure

Signed-off-by: Tao Zhou 
Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 6 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 8 +---
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 3e2b623d86c7..41f677613ffa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -951,6 +951,9 @@ struct amdgpu_device {
/* KFD */
struct amdgpu_kfd_dev   kfd;
 
+   /* UMC */
+   struct amdgpu_umc   umc;
+
/* display related functionality */
struct amdgpu_display_manager dm;
 
@@ -976,7 +979,6 @@ struct amdgpu_device {
 
const struct amdgpu_nbio_funcs  *nbio_funcs;
const struct amdgpu_df_funcs*df_funcs;
-   const struct amdgpu_umc_funcs   *umc_funcs;
 
/* delayed work_func for deferring clockgating during resume */
struct delayed_work delayed_init_work;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 4f81b1f6d09f..e087da46fc24 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -593,8 +593,8 @@ int amdgpu_ras_error_query(struct amdgpu_device *adev,
 
switch (info->head.block) {
case AMDGPU_RAS_BLOCK__UMC:
-   if (adev->umc_funcs->query_ras_error_count)
-   adev->umc_funcs->query_ras_error_count(adev, _data);
+   if (adev->umc.funcs->query_ras_error_count)
+   adev->umc.funcs->query_ras_error_count(adev, _data);
break;
default:
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
index 1ee1a00e5ac8..f5d6def96414 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
@@ -26,4 +26,10 @@ struct amdgpu_umc_funcs {
void *ras_error_status);
 };
 
+struct amdgpu_umc {
+   /* max error count in one ras query call */
+   uint32_t max_ras_err_cnt_per_query;
+   const struct amdgpu_umc_funcs *funcs;
+};
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 2748bd110fab..a02bc633a89a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -243,8 +243,8 @@ static int gmc_v9_0_process_ras_data_cb(struct 
amdgpu_device *adev,
 {
struct ras_err_data err_data = {0, 0};
kgd2kfd_set_sram_ecc_flag(adev->kfd.dev);
-   if (adev->umc_funcs->query_ras_error_count)
-   adev->umc_funcs->query_ras_error_count(adev, _data);
+   if (adev->umc.funcs->query_ras_error_count)
+   adev->umc.funcs->query_ras_error_count(adev, _data);
amdgpu_ras_reset_gpu(adev, 0);
return AMDGPU_RAS_UE;
 }
@@ -631,7 +631,9 @@ static void gmc_v9_0_set_umc_funcs(struct amdgpu_device 
*adev)
 {
switch (adev->asic_type) {
case CHIP_VEGA20:
-   adev->umc_funcs = _v6_1_funcs;
+   adev->umc.max_ras_err_cnt_per_query =
+   UMC_V6_1_UMC_INSTANCE_NUM * 
UMC_V6_1_CHANNEL_INSTANCE_NUM;
+   adev->umc.funcs = _v6_1_funcs;
break;
default:
break;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 19/26] drm/amdgpu: add check for ras error type

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

only ue and ce errors are supported

Signed-off-by: Tao Zhou 
Reviewed-by: Dennis Li 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 0eeb85d8399d..a87deb7be414 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -153,9 +153,14 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file 
*f,
return -EINVAL;
 
data->head.block = block_id;
-   data->head.type = memcmp("ue", err, 2) == 0 ?
-   AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE :
-   AMDGPU_RAS_ERROR__SINGLE_CORRECTABLE;
+   /* only ue and ce errors are supported */
+   if (!memcmp("ue", err, 2))
+   data->head.type = AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
+   else if (!memcmp("ce", err, 2))
+   data->head.type = AMDGPU_RAS_ERROR__SINGLE_CORRECTABLE;
+   else
+   return -EINVAL;
+
data->op = op;
 
if (op == 2) {
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 16/26] drm/amdgpu: query umc ras error address

2019-07-31 Thread Alex Deucher
From: Tao Zhou 

query umc ras error address, translate it to gpu 4k page view
and save it.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 80 +++
 1 file changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index e05f3e68edb0..bff1a12f2cc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -31,6 +31,16 @@
 
 #define smnMCA_UMC0_MCUMC_ADDRT0   0x50f10
 
+/*
+ * (addr / 256) * 8192, the higher 26 bits in ErrorAddr
+ * is the index of 8KB block
+ */
+#define ADDR_OF_8KB_BLOCK(addr)(((addr) & ~0xffULL) << 5)
+/* channel index is the index of 256B block */
+#define ADDR_OF_256B_BLOCK(channel_index)  ((channel_index) << 8)
+/* offset in 256B block */
+#define OFFSET_IN_256B_BLOCK(addr) ((addr) & 0xffULL)
+
 static uint32_t

umc_v6_1_channel_idx_tbl[UMC_V6_1_UMC_INSTANCE_NUM][UMC_V6_1_CHANNEL_INSTANCE_NUM]
 = {
{2, 18, 11, 27},{4, 20, 13, 29},
@@ -158,6 +168,76 @@ static void umc_v6_1_query_ras_error_count(struct 
amdgpu_device *adev,
umc_v6_1_disable_umc_index_mode(adev);
 }
 
+static void umc_v6_1_query_error_address(struct amdgpu_device *adev,
+uint32_t umc_reg_offset, uint32_t 
channel_index,
+struct ras_err_data *err_data)
+{
+   uint32_t lsb;
+   uint64_t mc_umc_status, err_addr;
+   uint32_t mc_umc_status_addr;
+
+   /* skip error address process if -ENOMEM */
+   if (!err_data->err_addr)
+   return;
+
+   mc_umc_status_addr =
+   SOC15_REG_OFFSET(UMC, 0, mmMCA_UMC_UMC0_MCUMC_STATUST0);
+   mc_umc_status = RREG64(mc_umc_status_addr + umc_reg_offset);
+
+   /* calculate error address if ue/ce error is detected */
+   if (REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 
&&
+   (REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, UECC) == 
1 ||
+   REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, CECC) == 
1)) {
+   err_addr = RREG64_PCIE(smnMCA_UMC0_MCUMC_ADDRT0 + 
umc_reg_offset * 4);
+
+   /* the lowest lsb bits should be ignored */
+   lsb = REG_GET_FIELD(err_addr, MCA_UMC_UMC0_MCUMC_ADDRT0, LSB);
+   err_addr = REG_GET_FIELD(err_addr, MCA_UMC_UMC0_MCUMC_ADDRT0, 
ErrorAddr);
+   err_addr &= ~((0x1ULL << lsb) - 1);
+
+   /* translate umc channel address to soc pa, 3 parts are 
included */
+   err_data->err_addr[err_data->err_addr_cnt] =
+   ADDR_OF_8KB_BLOCK(err_addr)
+   | 
ADDR_OF_256B_BLOCK(channel_index)
+   | 
OFFSET_IN_256B_BLOCK(err_addr);
+
+   err_data->err_addr_cnt++;
+   }
+}
+
+static void umc_v6_1_query_ras_error_address(struct amdgpu_device *adev,
+void *ras_error_status)
+{
+   struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
+   uint32_t umc_inst, channel_inst, umc_reg_offset;
+   uint32_t channel_index, mc_umc_status_addr;
+
+   mc_umc_status_addr =
+   SOC15_REG_OFFSET(UMC, 0, mmMCA_UMC_UMC0_MCUMC_STATUST0);
+
+   for (umc_inst = 0; umc_inst < UMC_V6_1_UMC_INSTANCE_NUM; umc_inst++) {
+   /* enable the index mode to query eror count per channel */
+   umc_v6_1_enable_umc_index_mode(adev, umc_inst);
+   for (channel_inst = 0; channel_inst < 
UMC_V6_1_CHANNEL_INSTANCE_NUM; channel_inst++) {
+   /* calc the register offset according to channel 
instance */
+   umc_reg_offset = UMC_V6_1_PER_CHANNEL_OFFSET * 
channel_inst;
+   /* get channel index of interleaved memory */
+   channel_index = 
umc_v6_1_channel_idx_tbl[umc_inst][channel_inst];
+
+   umc_v6_1_query_error_address(adev, umc_reg_offset,
+channel_index, err_data);
+
+   /* clear umc status */
+   WREG64(mc_umc_status_addr + umc_reg_offset, 0x0ULL);
+   /* clear error address register */
+   WREG64_PCIE(smnMCA_UMC0_MCUMC_ADDRT0 + umc_reg_offset * 
4, 0x0ULL);
+   }
+   }
+
+   umc_v6_1_disable_umc_index_mode(adev);
+}
+
 const struct amdgpu_umc_funcs umc_v6_1_funcs = {
.query_ras_error_count = umc_v6_1_query_ras_error_count,
+   .query_ras_error_address = umc_v6_1_query_ras_error_address,
 };
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

[PATCH 02/26] drm/amdgpu: init RSMU and UMC ip base address for vega20

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

the driver needs to program RSMU and UMC registers to
support vega20 RAS feature

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 2 ++
 drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 121cc5544b2b..a197f4b33eda 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -755,6 +755,8 @@ enum amd_hw_ip_block_type {
NBIF_HWIP,
THM_HWIP,
CLK_HWIP,
+   UMC_HWIP,
+   RSMU_HWIP,
MAX_HWIP
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
index 79223188bd47..587e33f5dcce 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c
@@ -50,6 +50,8 @@ int vega20_reg_base_init(struct amdgpu_device *adev)
adev->reg_offset[NBIF_HWIP][i] = (uint32_t 
*)(&(NBIO_BASE.instance[i]));
adev->reg_offset[THM_HWIP][i] = (uint32_t 
*)(&(THM_BASE.instance[i]));
adev->reg_offset[CLK_HWIP][i] = (uint32_t 
*)(&(CLK_BASE.instance[i]));
+   adev->reg_offset[UMC_HWIP][i] = (uint32_t 
*)(&(UMC_BASE.instance[i]));
+   adev->reg_offset[RSMU_HWIP][i] = (uint32_t 
*)(&(RSMU_BASE.instance[i]));
}
return 0;
 }
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 03/26] drm/amdgpu: add amdgpu_umc_functions structure

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

This is common structure as UMC callback function

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 29 +
 2 files changed, 31 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a197f4b33eda..72d0331f4ca1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -86,6 +86,7 @@
 #include "amdgpu_smu.h"
 #include "amdgpu_discovery.h"
 #include "amdgpu_mes.h"
+#include "amdgpu_umc.h"
 
 #define MAX_GPU_INSTANCE   16
 
@@ -970,6 +971,7 @@ struct amdgpu_device {
 
const struct amdgpu_nbio_funcs  *nbio_funcs;
const struct amdgpu_df_funcs*df_funcs;
+   const struct amdgpu_umc_funcs   *umc_funcs;
 
/* delayed work_func for deferring clockgating during resume */
struct delayed_work delayed_init_work;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
new file mode 100644
index ..1ee1a00e5ac8
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2019  Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+#ifndef __AMDGPU_UMC_H__
+#define __AMDGPU_UMC_H__
+
+struct amdgpu_umc_funcs {
+   void (*query_ras_error_count)(struct amdgpu_device *adev,
+   void *ras_error_status);
+};
+
+#endif
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 01/26] drm/amdgpu: move some ras data structure to amdgpu_ras.h

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

These are common structures that can be included by IP specific
source files

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 68 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 69 -
 2 files changed, 68 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index b45aaf04a574..3be306bf1603 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -28,74 +28,6 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_atomfirmware.h"
 
-struct ras_ih_data {
-   /* interrupt bottom half */
-   struct work_struct ih_work;
-   int inuse;
-   /* IP callback */
-   ras_ih_cb cb;
-   /* full of entries */
-   unsigned char *ring;
-   unsigned int ring_size;
-   unsigned int element_size;
-   unsigned int aligned_element_size;
-   unsigned int rptr;
-   unsigned int wptr;
-};
-
-struct ras_fs_data {
-   char sysfs_name[32];
-   char debugfs_name[32];
-};
-
-struct ras_err_data {
-   unsigned long ue_count;
-   unsigned long ce_count;
-};
-
-struct ras_err_handler_data {
-   /* point to bad pages array */
-   struct {
-   unsigned long bp;
-   struct amdgpu_bo *bo;
-   } *bps;
-   /* the count of entries */
-   int count;
-   /* the space can place new entries */
-   int space_left;
-   /* last reserved entry's index + 1 */
-   int last_reserved;
-};
-
-struct ras_manager {
-   struct ras_common_if head;
-   /* reference count */
-   int use;
-   /* ras block link */
-   struct list_head node;
-   /* the device */
-   struct amdgpu_device *adev;
-   /* debugfs */
-   struct dentry *ent;
-   /* sysfs */
-   struct device_attribute sysfs_attr;
-   int attr_inuse;
-
-   /* fs node name */
-   struct ras_fs_data fs_data;
-
-   /* IH data */
-   struct ras_ih_data ih_data;
-
-   struct ras_err_data err_data;
-};
-
-struct ras_badpage {
-   unsigned int bp;
-   unsigned int size;
-   unsigned int flags;
-};
-
 const char *ras_error_string[] = {
"none",
"parity",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index b2841195bd3b..80e94d604a2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -108,8 +108,75 @@ struct amdgpu_ras {
uint32_t flags;
 };
 
-/* interfaces for IP */
+struct ras_ih_data {
+   /* interrupt bottom half */
+   struct work_struct ih_work;
+   int inuse;
+   /* IP callback */
+   ras_ih_cb cb;
+   /* full of entries */
+   unsigned char *ring;
+   unsigned int ring_size;
+   unsigned int element_size;
+   unsigned int aligned_element_size;
+   unsigned int rptr;
+   unsigned int wptr;
+};
+
+struct ras_fs_data {
+   char sysfs_name[32];
+   char debugfs_name[32];
+};
+
+struct ras_err_data {
+   unsigned long ue_count;
+   unsigned long ce_count;
+};
+
+struct ras_err_handler_data {
+   /* point to bad pages array */
+   struct {
+   unsigned long bp;
+   struct amdgpu_bo *bo;
+   } *bps;
+   /* the count of entries */
+   int count;
+   /* the space can place new entries */
+   int space_left;
+   /* last reserved entry's index + 1 */
+   int last_reserved;
+};
 
+struct ras_manager {
+   struct ras_common_if head;
+   /* reference count */
+   int use;
+   /* ras block link */
+   struct list_head node;
+   /* the device */
+   struct amdgpu_device *adev;
+   /* debugfs */
+   struct dentry *ent;
+   /* sysfs */
+   struct device_attribute sysfs_attr;
+   int attr_inuse;
+
+   /* fs node name */
+   struct ras_fs_data fs_data;
+
+   /* IH data */
+   struct ras_ih_data ih_data;
+
+   struct ras_err_data err_data;
+};
+
+struct ras_badpage {
+   unsigned int bp;
+   unsigned int size;
+   unsigned int flags;
+};
+
+/* interfaces for IP */
 struct ras_fs_if {
struct ras_common_if head;
char sysfs_name[32];
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 04/26] drm/amdgpu: add rsmu v_0_0_2 ip headers

2019-07-31 Thread Alex Deucher
From: Hawking Zhang 

remote smu (rsmu) is a sub-block used as ip register interface,
error handling, reset generation.etc

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
---
 .../include/asic_reg/rsmu/rsmu_0_0_2_offset.h | 27 
 .../asic_reg/rsmu/rsmu_0_0_2_sh_mask.h| 32 +++
 2 files changed, 59 insertions(+)
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_offset.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_sh_mask.h

diff --git a/drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_offset.h
new file mode 100644
index ..46466ae77f19
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_offset.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2019  Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+#ifndef _rsmu_0_0_2_OFFSET_HEADER
+#define _rsmu_0_0_2_OFFSET_HEADER
+
+#definemmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU 
0x0d91
+#definemmRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU_BASE_IDX
0
+
+#endif
diff --git a/drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_sh_mask.h
new file mode 100644
index ..ea0acb598254
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_sh_mask.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2019  Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN
+ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+#ifndef _rsmu_0_0_2_SH_MASK_HEADER
+#define _rsmu_0_0_2_SH_MASK_HEADER
+
+//RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU
+#define
RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU__RSMU_UMC_INDEX_WREN__SHIFT   
0x0
+#define
RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU__RSMU_UMC_INDEX_INSTANCE__SHIFT   
0x10
+#define
RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU__RSMU_UMC_INDEX_MODE_EN__SHIFT
0x1f
+#defineRSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU__RSMU_UMC_INDEX_WREN_MASK 
0xL
+#define
RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU__RSMU_UMC_INDEX_INSTANCE_MASK 
0x000FL
+#define
RSMU_UMC_INDEX_REGISTER_NBIF_VG20_GPU__RSMU_UMC_INDEX_MODE_EN_MASK  
0x8000L
+
+#endif
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 00/26] Further RAS enablement for vega20

2019-07-31 Thread Alex Deucher
This series enables additional RAS features for vega20.

Dennis Li (6):
  drm/amd/include: add bitfield define for EDC registers
  drm/amd/include: add define of TCP_EDC_CNT_NEW
  drm/amdgpu: add define for gfx ras subblock
  drm/amdgpu: add RAS callback for gfx
  drm/amdgpu: support gfx ras error injection and err_cnt query
  drm/amdgpu: disable inject for failed subblocks of gfx

Hawking Zhang (8):
  drm/amdgpu: move some ras data structure to amdgpu_ras.h
  drm/amdgpu: init RSMU and UMC ip base address for vega20
  drm/amdgpu: add amdgpu_umc_functions structure
  drm/amdgpu: add rsmu v_0_0_2 ip headers
  drm/amdgpu: add umc v6_1_1 IP headers
  drm/amdgpu: add umc v6_1 query error count support
  drm/amdgpu: init umc v6_1 functions for vega20
  drm/amdgpu: querry umc error count

Tao Zhou (12):
  drm/amdgpu: add ras error count after each query (v2)
  drm/amdgpu: add RREG64/WREG64(_PCIE) operations
  drm/amdgpu: use 64bit operation macros for umc
  drm/amdgpu: switch to amdgpu_umc structure
  drm/amdgpu: update algorithm of umc uncorrectable error counting
  drm/amdgpu: add support for recording ras error address
  drm/amdgpu: add structures for umc error address translation
  drm/amdgpu: query umc ras error address
  drm/amdgpu: allow ras interrupt callback to return error data
  drm/amdgpu: update interrupt callback for all ras clients
  drm/amdgpu: add check for ras error type
  drm/amdgpu: remove ras_reserve_vram in ras injection

 drivers/gpu/drm/amd/amdgpu/Makefile   |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  17 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  73 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 145 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h   | 308 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h   |  37 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 784 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |  19 +
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c|   2 +
 drivers/gpu/drm/amd/amdgpu/soc15.c|  45 +
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c | 243 ++
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.h |  39 +
 drivers/gpu/drm/amd/amdgpu/vega20_reg_init.c  |   2 +
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |   2 +
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  | 157 
 .../include/asic_reg/rsmu/rsmu_0_0_2_offset.h |  27 +
 .../asic_reg/rsmu/rsmu_0_0_2_sh_mask.h|  32 +
 .../include/asic_reg/umc/umc_6_1_1_offset.h   |  31 +
 .../include/asic_reg/umc/umc_6_1_1_sh_mask.h  |  91 ++
 20 files changed, 1967 insertions(+), 93 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/umc_v6_1.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_offset.h
 create mode 100644 
drivers/gpu/drm/amd/include/asic_reg/rsmu/rsmu_0_0_2_sh_mask.h
 create mode 100644 drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_offset.h
 create mode 100644 drivers/gpu/drm/amd/include/asic_reg/umc/umc_6_1_1_sh_mask.h

-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/8] drm/amdgpu: drop drmP.h in amdgpu_amdkfd_arcturus.c

2019-07-31 Thread Sam Ravnborg
Hi Alex.

On Wed, Jul 31, 2019 at 10:52:39AM -0500, Alex Deucher wrote:
> Unused.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> index 4d9101834ba7..c79aaebeeaf0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> @@ -28,7 +28,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include "amdgpu.h"
>  #include "amdgpu_amdkfd.h"
>  #include "sdma0/sdma0_4_2_2_offset.h"

Thanks!

All patches are:
Acked-by: Sam Ravnborg 


Actual status in drm-misc:

$ git grep drmP | cut -d '/' -f 1 | sort | uniq -c
  6 amd <= fixed by this patchset
  8 arm <= patch sent. Needs to rebase and resend
  6 armada  <= patch sent. Needs to rebase and resend
  1 etnaviv <= already fixed in etnaviv repo
  2 exynos  <= Somehow missed these. Patch ready, needs to send it 
out
  1 i2c <= patch sent. Needs to rebase and resend
  2 msm <= patch sent. Needs to rebase and resend
 27 nouveau <= already fixed in nouveau repo
  4 tegra   <= patch sent. Needs to reabse and resend
 13 vmwgfx  <= already fixed in vmwgfx repo

So things looks doable. I just need to find a few hours..

Sam

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v19 02/15] arm64: Introduce prctl() options to control the tagged user addresses ABI

2019-07-31 Thread Dave Hansen
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
> +long set_tagged_addr_ctrl(unsigned long arg)
> +{
> + if (!tagged_addr_prctl_allowed)
> + return -EINVAL;
> + if (is_compat_task())
> + return -EINVAL;
> + if (arg & ~PR_TAGGED_ADDR_ENABLE)
> + return -EINVAL;
> +
> + update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
> +
> + return 0;
> +}

Instead of a plain enable/disable, a more flexible ABI would be to have
the tag mask be passed in.  That way, an implementation that has a
flexible tag size can select it.  It also ensures that userspace
actually knows what the tag size is and isn't surprised if a hardware
implementation changes the tag size or position.

Also, this whole set deals with tagging/untagging, but there's an
effective loss of address space when you do this.  Is that dealt with
anywhere?  How do we ensure that allocations don't get placed at a
tagged address before this gets turned on?  Where's that checking?


Re: [PATCH 02/13] amdgpu: don't initialize range->list in amdgpu_hmm_init_range

2019-07-31 Thread Jason Gunthorpe
On Wed, Jul 31, 2019 at 01:25:06PM +, Kuehling, Felix wrote:
> On 2019-07-30 1:51 a.m., Christoph Hellwig wrote:
> > The list is used to add the range to another list as an entry in the
> > core hmm code, so there is no need to initialize it in a driver.
> 
> I've seen code that uses list_empty to check whether a list head has 
> been added to a list or not. For that to work, the list head needs to be 
> initialized, and it has to be removed with list_del_init. 

I think the ida is that 'list' is a private member of range and
drivers shouldn't touch it at all.

> ever do that with range->list, then this patch is Reviewed-by: Felix 
> Kuehling 

Please put tags on their own empty line so that patchworks will
collect them automatically..

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 10/13] drm: zte: Provide ddc symlink in hdmi connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/zte/zx_hdmi.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/zte/zx_hdmi.c b/drivers/gpu/drm/zte/zx_hdmi.c
index a50f5a1f09b8..b98a1420dcd3 100644
--- a/drivers/gpu/drm/zte/zx_hdmi.c
+++ b/drivers/gpu/drm/zte/zx_hdmi.c
@@ -319,8 +319,10 @@ static int zx_hdmi_register(struct drm_device *drm, struct 
zx_hdmi *hdmi)
 
hdmi->connector.polled = DRM_CONNECTOR_POLL_HPD;
 
-   drm_connector_init(drm, >connector, _hdmi_connector_funcs,
-  DRM_MODE_CONNECTOR_HDMIA);
+   drm_connector_init_with_ddc(drm, >connector,
+   _hdmi_connector_funcs,
+   DRM_MODE_CONNECTOR_HDMIA,
+   >ddc->adap);
drm_connector_helper_add(>connector,
 _hdmi_connector_helper_funcs);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 13/13] drm/i915: Provide ddc symlink in hdmi connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/i915/display/intel_hdmi.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c 
b/drivers/gpu/drm/i915/display/intel_hdmi.c
index 0ebec69bbbfc..7e69e5782f6e 100644
--- a/drivers/gpu/drm/i915/display/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
@@ -3084,6 +3084,7 @@ void intel_hdmi_init_connector(struct intel_digital_port 
*intel_dig_port,
struct intel_encoder *intel_encoder = _dig_port->base;
struct drm_device *dev = intel_encoder->base.dev;
struct drm_i915_private *dev_priv = to_i915(dev);
+   struct i2c_adapter *ddc;
enum port port = intel_encoder->port;
 
DRM_DEBUG_KMS("Adding HDMI connector on port %c\n",
@@ -3094,8 +3095,13 @@ void intel_hdmi_init_connector(struct intel_digital_port 
*intel_dig_port,
 intel_dig_port->max_lanes, port_name(port)))
return;
 
-   drm_connector_init(dev, connector, _hdmi_connector_funcs,
-  DRM_MODE_CONNECTOR_HDMIA);
+   intel_hdmi->ddc_bus = intel_hdmi_ddc_pin(dev_priv, port);
+   ddc = intel_gmbus_get_adapter(dev_priv, intel_hdmi->ddc_bus);
+
+   drm_connector_init_with_ddc(dev, connector,
+   _hdmi_connector_funcs,
+   DRM_MODE_CONNECTOR_HDMIA,
+   ddc);
drm_connector_helper_add(connector, _hdmi_connector_helper_funcs);
 
connector->interlace_allowed = 1;
@@ -3105,8 +3111,6 @@ void intel_hdmi_init_connector(struct intel_digital_port 
*intel_dig_port,
if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
connector->ycbcr_420_allowed = true;
 
-   intel_hdmi->ddc_bus = intel_hdmi_ddc_pin(dev_priv, port);
-
if (WARN_ON(port == PORT_A))
return;
intel_encoder->hpd_pin = intel_hpd_pin_default(dev_priv, port);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 11/13] drm: zte: Provide ddc symlink in vga connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/zte/zx_vga.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/zte/zx_vga.c b/drivers/gpu/drm/zte/zx_vga.c
index 9b67e419280c..c4fa3bbaba78 100644
--- a/drivers/gpu/drm/zte/zx_vga.c
+++ b/drivers/gpu/drm/zte/zx_vga.c
@@ -165,8 +165,10 @@ static int zx_vga_register(struct drm_device *drm, struct 
zx_vga *vga)
 
vga->connector.polled = DRM_CONNECTOR_POLL_HPD;
 
-   ret = drm_connector_init(drm, connector, _vga_connector_funcs,
-DRM_MODE_CONNECTOR_VGA);
+   ret = drm_connector_init_with_ddc(drm, connector,
+ _vga_connector_funcs,
+ DRM_MODE_CONNECTOR_VGA,
+ >ddc->adap);
if (ret) {
DRM_DEV_ERROR(dev, "failed to init connector: %d\n", ret);
goto clean_encoder;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 12/13] drm/tilcdc: Provide ddc symlink in connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/tilcdc/tilcdc_tfp410.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c 
b/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c
index c6e4e52f32bc..d51776dd7a03 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_tfp410.c
@@ -222,8 +222,10 @@ static struct drm_connector 
*tfp410_connector_create(struct drm_device *dev,
 
connector = _connector->base;
 
-   drm_connector_init(dev, connector, _connector_funcs,
-   DRM_MODE_CONNECTOR_DVID);
+   drm_connector_init_with_ddc(dev, connector,
+   _connector_funcs,
+   DRM_MODE_CONNECTOR_DVID,
+   mod->i2c);
drm_connector_helper_add(connector, _connector_helper_funcs);
 
connector->polled = DRM_CONNECTOR_POLL_CONNECT |
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 09/13] drm/vc4: Provide ddc symlink in connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/vc4/vc4_hdmi.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index ee7d4e7b0ee3..eb57c907a256 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -267,7 +267,8 @@ static const struct drm_connector_helper_funcs 
vc4_hdmi_connector_helper_funcs =
 };
 
 static struct drm_connector *vc4_hdmi_connector_init(struct drm_device *dev,
-struct drm_encoder 
*encoder)
+struct drm_encoder 
*encoder,
+struct i2c_adapter *ddc)
 {
struct drm_connector *connector;
struct vc4_hdmi_connector *hdmi_connector;
@@ -281,8 +282,10 @@ static struct drm_connector 
*vc4_hdmi_connector_init(struct drm_device *dev,
 
hdmi_connector->encoder = encoder;
 
-   drm_connector_init(dev, connector, _hdmi_connector_funcs,
-  DRM_MODE_CONNECTOR_HDMIA);
+   drm_connector_init_with_ddc(dev, connector,
+   _hdmi_connector_funcs,
+   DRM_MODE_CONNECTOR_HDMIA,
+   ddc);
drm_connector_helper_add(connector, _hdmi_connector_helper_funcs);
 
/* Create and attach TV margin props to this connector. */
@@ -1395,7 +1398,8 @@ static int vc4_hdmi_bind(struct device *dev, struct 
device *master, void *data)
 DRM_MODE_ENCODER_TMDS, NULL);
drm_encoder_helper_add(hdmi->encoder, _hdmi_encoder_helper_funcs);
 
-   hdmi->connector = vc4_hdmi_connector_init(drm, hdmi->encoder);
+   hdmi->connector =
+   vc4_hdmi_connector_init(drm, hdmi->encoder, hdmi->ddc);
if (IS_ERR(hdmi->connector)) {
ret = PTR_ERR(hdmi->connector);
goto err_destroy_encoder;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 08/13] drm/tegra: Provide ddc symlink in output connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/tegra/hdmi.c | 7 ---
 drivers/gpu/drm/tegra/sor.c  | 7 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c
index 334c4d7d238b..416a2862a84b 100644
--- a/drivers/gpu/drm/tegra/hdmi.c
+++ b/drivers/gpu/drm/tegra/hdmi.c
@@ -1425,9 +1425,10 @@ static int tegra_hdmi_init(struct host1x_client *client)
 
hdmi->output.dev = client->dev;
 
-   drm_connector_init(drm, >output.connector,
-  _hdmi_connector_funcs,
-  DRM_MODE_CONNECTOR_HDMIA);
+   drm_connector_init_with_ddc(drm, >output.connector,
+   _hdmi_connector_funcs,
+   DRM_MODE_CONNECTOR_HDMIA,
+   hdmi->output.ddc);
drm_connector_helper_add(>output.connector,
 _hdmi_connector_helper_funcs);
hdmi->output.connector.dpms = DRM_MODE_DPMS_OFF;
diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c
index 4ffe3794e6d3..3a69e387c62d 100644
--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -2832,9 +2832,10 @@ static int tegra_sor_init(struct host1x_client *client)
 
sor->output.dev = sor->dev;
 
-   drm_connector_init(drm, >output.connector,
-  _sor_connector_funcs,
-  connector);
+   drm_connector_init_with_ddc(drm, >output.connector,
+   _sor_connector_funcs,
+   connector,
+   sor->output.ddc);
drm_connector_helper_add(>output.connector,
 _sor_connector_helper_funcs);
sor->output.connector.dpms = DRM_MODE_DPMS_OFF;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 06/13] drm/msm/hdmi: Provide ddc symlink in hdmi connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/msm/hdmi/hdmi_connector.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/hdmi/hdmi_connector.c 
b/drivers/gpu/drm/msm/hdmi/hdmi_connector.c
index 07b4cb877d82..1f03262b8a52 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi_connector.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi_connector.c
@@ -450,8 +450,10 @@ struct drm_connector *msm_hdmi_connector_init(struct hdmi 
*hdmi)
 
connector = _connector->base;
 
-   drm_connector_init(hdmi->dev, connector, _connector_funcs,
-   DRM_MODE_CONNECTOR_HDMIA);
+   drm_connector_init_with_ddc(hdmi->dev, connector,
+   _connector_funcs,
+   DRM_MODE_CONNECTOR_HDMIA,
+   hdmi->i2c);
drm_connector_helper_add(connector, _hdmi_connector_helper_funcs);
 
connector->polled = DRM_CONNECTOR_POLL_CONNECT |
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 05/13] drm: rockchip: Provide ddc symlink in inno_hdmi sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/rockchip/inno_hdmi.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/inno_hdmi.c 
b/drivers/gpu/drm/rockchip/inno_hdmi.c
index ed344a795b4d..e5864e823020 100644
--- a/drivers/gpu/drm/rockchip/inno_hdmi.c
+++ b/drivers/gpu/drm/rockchip/inno_hdmi.c
@@ -624,8 +624,10 @@ static int inno_hdmi_register(struct drm_device *drm, 
struct inno_hdmi *hdmi)
 
drm_connector_helper_add(>connector,
 _hdmi_connector_helper_funcs);
-   drm_connector_init(drm, >connector, _hdmi_connector_funcs,
-  DRM_MODE_CONNECTOR_HDMIA);
+   drm_connector_init_with_ddc(drm, >connector,
+   _hdmi_connector_funcs,
+   DRM_MODE_CONNECTOR_HDMIA,
+   hdmi->ddc);
 
drm_connector_attach_encoder(>connector, encoder);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 07/13] drm/mediatek: Provide ddc symlink in hdmi connector sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/mediatek/mtk_hdmi.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_hdmi.c 
b/drivers/gpu/drm/mediatek/mtk_hdmi.c
index ce91b61364eb..f419765b7cc0 100644
--- a/drivers/gpu/drm/mediatek/mtk_hdmi.c
+++ b/drivers/gpu/drm/mediatek/mtk_hdmi.c
@@ -1299,9 +1299,10 @@ static int mtk_hdmi_bridge_attach(struct drm_bridge 
*bridge)
struct mtk_hdmi *hdmi = hdmi_ctx_from_bridge(bridge);
int ret;
 
-   ret = drm_connector_init(bridge->encoder->dev, >conn,
-_hdmi_connector_funcs,
-DRM_MODE_CONNECTOR_HDMIA);
+   ret = drm_connector_init_with_ddc(bridge->encoder->dev, >conn,
+ _hdmi_connector_funcs,
+ DRM_MODE_CONNECTOR_HDMIA,
+ hdmi->ddc_adpt);
if (ret) {
dev_err(hdmi->dev, "Failed to initialize connector: %d\n", ret);
return ret;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 04/13] drm: rockchip: Provide ddc symlink in rk3066_hdmi sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/rockchip/rk3066_hdmi.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rk3066_hdmi.c 
b/drivers/gpu/drm/rockchip/rk3066_hdmi.c
index 85fc5f01f761..e874f5fdeec4 100644
--- a/drivers/gpu/drm/rockchip/rk3066_hdmi.c
+++ b/drivers/gpu/drm/rockchip/rk3066_hdmi.c
@@ -564,9 +564,10 @@ rk3066_hdmi_register(struct drm_device *drm, struct 
rk3066_hdmi *hdmi)
 
drm_connector_helper_add(>connector,
 _hdmi_connector_helper_funcs);
-   drm_connector_init(drm, >connector,
-  _hdmi_connector_funcs,
-  DRM_MODE_CONNECTOR_HDMIA);
+   drm_connector_init_with_ddc(drm, >connector,
+   _hdmi_connector_funcs,
+   DRM_MODE_CONNECTOR_HDMIA,
+   hdmi->ddc);
 
drm_connector_attach_encoder(>connector, encoder);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 03/13] drm/exynos: Provide ddc symlink in connector's sysfs

2019-07-31 Thread Andrzej Pietrasiewicz
Switch to using the ddc provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
Acked-by: Sam Ravnborg 
Reviewed-by: Emil Velikov 
---
 drivers/gpu/drm/exynos/exynos_hdmi.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_hdmi.c 
b/drivers/gpu/drm/exynos/exynos_hdmi.c
index bc1565f1822a..d4a9c9e17436 100644
--- a/drivers/gpu/drm/exynos/exynos_hdmi.c
+++ b/drivers/gpu/drm/exynos/exynos_hdmi.c
@@ -940,8 +940,10 @@ static int hdmi_create_connector(struct drm_encoder 
*encoder)
connector->interlace_allowed = true;
connector->polled = DRM_CONNECTOR_POLL_HPD;
 
-   ret = drm_connector_init(hdata->drm_dev, connector,
-   _connector_funcs, DRM_MODE_CONNECTOR_HDMIA);
+   ret = drm_connector_init_with_ddc(hdata->drm_dev, connector,
+ _connector_funcs,
+ DRM_MODE_CONNECTOR_HDMIA,
+ hdata->ddc_adpt);
if (ret) {
DRM_DEV_ERROR(hdata->dev,
  "Failed to initialize connector with drm\n");
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 01/13] drm/amdgpu: Provide ddc symlink in dm connector's sysfs directory

2019-07-31 Thread Andrzej Pietrasiewicz
Use the ddc pointer provided by the generic connector.

Signed-off-by: Andrzej Pietrasiewicz 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 4a29f72334d0..f7d79b0032d2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5144,11 +5144,12 @@ static int amdgpu_dm_connector_init(struct 
amdgpu_display_manager *dm,
 
connector_type = to_drm_connector_type(link->connector_signal);
 
-   res = drm_connector_init(
+   res = drm_connector_init_with_ddc(
dm->ddev,
>base,
_dm_connector_funcs,
-   connector_type);
+   connector_type,
+   >base);
 
if (res) {
DRM_ERROR("connector_init failed\n");
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 02/13] drm/radeon: Eliminate possible use of an uninitialized variable

2019-07-31 Thread Andrzej Pietrasiewicz
ddc local variable is passed to drm_connector_init_with_ddc() and should
be NULL if no ddc is available.

Signed-off-by: Andrzej Pietrasiewicz 
---
 drivers/gpu/drm/radeon/radeon_connectors.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c 
b/drivers/gpu/drm/radeon/radeon_connectors.c
index b3ad8d890801..d11131d03ed6 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -1870,7 +1870,7 @@ radeon_add_atom_connector(struct drm_device *dev,
struct radeon_connector_atom_dig *radeon_dig_connector;
struct drm_encoder *encoder;
struct radeon_encoder *radeon_encoder;
-   struct i2c_adapter *ddc;
+   struct i2c_adapter *ddc = NULL;
uint32_t subpixel_order = SubPixelNone;
bool shared_ddc = false;
bool is_dp_bridge = false;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 00/13] Next round of associating ddc adapters with connectors

2019-07-31 Thread Andrzej Pietrasiewicz
Now that some of the patches of the previous v6 series are applied,
I'm resending the remaining ones (patches 3-13) with Acked-by and
Reviewed-by added.

I'm also taking this opportunity to provide the symlink for another
connector in amdgpu (patch 1), and to fix a small but nasty bug
which can cause a use of an uninitialized variable (patch 2).

Andrzej Pietrasiewicz (13):
  drm/amdgpu: Provide ddc symlink in dm connector's sysfs directory
  drm/radeon: Eliminate possible use of an uninitialized variable
  drm/exynos: Provide ddc symlink in connector's sysfs
  drm: rockchip: Provide ddc symlink in rk3066_hdmi sysfs directory
  drm: rockchip: Provide ddc symlink in inno_hdmi sysfs directory
  drm/msm/hdmi: Provide ddc symlink in hdmi connector sysfs directory
  drm/mediatek: Provide ddc symlink in hdmi connector sysfs directory
  drm/tegra: Provide ddc symlink in output connector sysfs directory
  drm/vc4: Provide ddc symlink in connector sysfs directory
  drm: zte: Provide ddc symlink in hdmi connector sysfs directory
  drm: zte: Provide ddc symlink in vga connector sysfs directory
  drm/tilcdc: Provide ddc symlink in connector sysfs directory
  drm/i915: Provide ddc symlink in hdmi connector sysfs directory

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  5 +++--
 drivers/gpu/drm/exynos/exynos_hdmi.c  |  6 --
 drivers/gpu/drm/i915/display/intel_hdmi.c | 12 
 drivers/gpu/drm/mediatek/mtk_hdmi.c   |  7 ---
 drivers/gpu/drm/msm/hdmi/hdmi_connector.c |  6 --
 drivers/gpu/drm/radeon/radeon_connectors.c|  2 +-
 drivers/gpu/drm/rockchip/inno_hdmi.c  |  6 --
 drivers/gpu/drm/rockchip/rk3066_hdmi.c|  7 ---
 drivers/gpu/drm/tegra/hdmi.c  |  7 ---
 drivers/gpu/drm/tegra/sor.c   |  7 ---
 drivers/gpu/drm/tilcdc/tilcdc_tfp410.c|  6 --
 drivers/gpu/drm/vc4/vc4_hdmi.c| 12 
 drivers/gpu/drm/zte/zx_hdmi.c |  6 --
 drivers/gpu/drm/zte/zx_vga.c  |  6 --
 14 files changed, 60 insertions(+), 35 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v19 00/15] arm64: untag user pointers passed to the kernel

2019-07-31 Thread Dave Hansen
On 7/23/19 10:58 AM, Andrey Konovalov wrote:
> The mmap and mremap (only new_addr) syscalls do not currently accept
> tagged addresses. Architectures may interpret the tag as a background
> colour for the corresponding vma.

What the heck is a "background colour"? :)


[PATCH 2/3] drm/amd/display: Skip determining update type for async updates

2019-07-31 Thread Nicholas Kazlauskas
[Why]
By passing through the dm_determine_update_type_for_commit for atomic
commits that can be done asynchronously we are incurring a
performance penalty by locking access to the global private object
and holding that access until the end of the programming sequence.

This is also allocating a new large dc_state on every access in addition
to retaining all the references on each stream and plane until the end
of the programming sequence.

[How]
Shift the determination for async update before validation. Return early
if it's going to be an async update.

Cc: Leo Li 
Cc: David Francis 
Signed-off-by: Nicholas Kazlauskas 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 27 ++-
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 2efb0eadf602..4c90662e9fa2 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7216,6 +7216,26 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
if (ret)
goto fail;
 
+   if (state->legacy_cursor_update) {
+   /*
+* This is a fast cursor update coming from the plane update
+* helper, check if it can be done asynchronously for better
+* performance.
+*/
+   state->async_update =
+   !drm_atomic_helper_async_check(dev, state);
+
+   /*
+* Skip the remaining global validation if this is an async
+* update. Cursor updates can be done without affecting
+* state or bandwidth calcs and this avoids the performance
+* penalty of locking the private state object and
+* allocating a new dc_state.
+*/
+   if (state->async_update)
+   return 0;
+   }
+
/* Check scaling and underscan changes*/
/* TODO Removed scaling changes validation due to inability to commit
 * new stream into context w\o causing full reset. Need to
@@ -7268,13 +7288,6 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
ret = -EINVAL;
goto fail;
}
-   } else if (state->legacy_cursor_update) {
-   /*
-* This is a fast cursor update coming from the plane update
-* helper, check if it can be done asynchronously for better
-* performance.
-*/
-   state->async_update = !drm_atomic_helper_async_check(dev, 
state);
}
 
/* Must be success */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/3] drm/amd/display: Don't replace the dc_state for fast updates

2019-07-31 Thread Nicholas Kazlauskas
[Why]
DRM private objects have no hw_done/flip_done fencing mechanism on their
own and cannot be used to sequence commits accordingly.

When issuing commits that don't touch the same set of hardware resources
like page-flips on different CRTCs we can run into the issue below
because of this:

1. Client requests non-blocking Commit #1, has a new dc_state #1,
state is swapped, commit tail is deferred to work queue

2. Client requests non-blocking Commit #2, has a new dc_state #2,
state is swapped, commit tail is deferred to work queue

3. Commit #2 work starts, commit tail finishes,
atomic state is cleared, dc_state #1 is freed

4. Commit #1 work starts,
commit tail encounters null pointer deref on dc_state #1

In order to change the DC state as in the private object we need to
ensure that we wait for all outstanding commits to finish and that
any other pending commits must wait for the current one to finish as
well.

We do this for MEDIUM and FULL updates. But not for FAST updates, nor
would we want to since it would cause stuttering from the delays.

FAST updates that go through dm_determine_update_type_for_commit always
create a new dc_state and lock the DRM private object if there are
any changed planes.

We need the old state to validate, but we don't actually need the new
state here.

[How]
If the commit isn't a full update then the use after free can be
resolved by simply discarding the new state entirely and retaining
the existing one instead.

With this change the sequence above can be reexamined. Commit #2 will
still free Commit #1's reference, but before this happens we actually
added an additional reference as part of Commit #2.

If an update comes in during this that needs to change the dc_state
it will need to wait on Commit #1 and Commit #2 to finish. Then it'll
swap the state, finish the work in commit tail and drop the last
reference on Commit #2's dc_state.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204181
Cc: Leo Li 
Cc: David Francis 
Signed-off-by: Nicholas Kazlauskas 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 4c90662e9fa2..fe5291b16193 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7288,6 +7288,29 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
ret = -EINVAL;
goto fail;
}
+   } else {
+   /*
+* The commit is a fast update. Fast updates shouldn't change
+* the DC context, affect global validation, and can have their
+* commit work done in parallel with other commits not touching
+* the same resource. If we have a new DC context as part of
+* the DM atomic state from validation we need to free it and
+* retain the existing one instead.
+*/
+   struct dm_atomic_state *new_dm_state, *old_dm_state;
+
+   new_dm_state = dm_atomic_get_new_state(state);
+   old_dm_state = dm_atomic_get_old_state(state);
+
+   if (new_dm_state && old_dm_state) {
+   if (new_dm_state->context)
+   dc_release_state(new_dm_state->context);
+
+   new_dm_state->context = old_dm_state->context;
+
+   if (old_dm_state->context)
+   dc_retain_state(old_dm_state->context);
+   }
}
 
/* Must be success */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/3] drm/amd/display: Allow cursor async updates for framebuffer swaps

2019-07-31 Thread Nicholas Kazlauskas
[Why]
We previously allowed framebuffer swaps as async updates for cursor
planes but had to disable them due to a bug in DRM with async update
handling and incorrect ref counting. The check to block framebuffer
swaps has been added to DRM for a while now, so this check is redundant.

The real fix that allows this to properly in DRM has also finally been
merged and is getting backported into stable branches, so dropping
this now seems to be the right time to do so.

[How]
Drop the redundant check for old_fb != new_fb.

With the proper fix in DRM, this should also fix some cursor stuttering
issues with xf86-video-amdgpu since it double buffers the cursor.

IGT tests that swap framebuffers (-varying-size for example) should
also pass again.

Cc: Leo Li 
Cc: David Francis 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 040169180a63..2efb0eadf602 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4504,20 +4504,10 @@ static int dm_plane_atomic_check(struct drm_plane 
*plane,
 static int dm_plane_atomic_async_check(struct drm_plane *plane,
   struct drm_plane_state *new_plane_state)
 {
-   struct drm_plane_state *old_plane_state =
-   drm_atomic_get_old_plane_state(new_plane_state->state, plane);
-
/* Only support async updates on cursor planes. */
if (plane->type != DRM_PLANE_TYPE_CURSOR)
return -EINVAL;
 
-   /*
-* DRM calls prepare_fb and cleanup_fb on new_plane_state for
-* async commits so don't allow fb changes.
-*/
-   if (old_plane_state->fb != new_plane_state->fb)
-   return -EINVAL;
-
return 0;
 }
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 7/8] drm/amdgpu: drop drmP.h from vcn_v2_0.c

2019-07-31 Thread Alex Deucher
And fix the fallout.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
index eef3ec5449af..36ad0c0e8efb 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
@@ -22,7 +22,7 @@
  */
 
 #include 
-#include 
+
 #include "amdgpu.h"
 #include "amdgpu_vcn.h"
 #include "soc15.h"
@@ -2112,7 +2112,7 @@ static int vcn_v2_0_dec_ring_test_ring(struct amdgpu_ring 
*ring)
tmp = RREG32(adev->vcn.inst[ring->me].external.scratch9);
if (tmp == 0xDEADBEEF)
break;
-   DRM_UDELAY(1);
+   udelay(1);
}
 
if (i >= adev->usec_timeout)
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/8] drm/amdgpu: drop drmP.h from navi10_ih.c

2019-07-31 Thread Alex Deucher
And fix the fallout.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index e963746be11c..9fe08408db58 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -21,7 +21,8 @@
  *
  */
 
-#include 
+#include 
+
 #include "amdgpu.h"
 #include "amdgpu_ih.h"
 
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/8] drm/amdgpu: drop drmP.h in amdgpu_amdkfd_arcturus.c

2019-07-31 Thread Alex Deucher
Unused.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 4d9101834ba7..c79aaebeeaf0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -28,7 +28,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "sdma0/sdma0_4_2_2_offset.h"
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 5/8] drm/amdgpu: drop drmP.h from nv.c

2019-07-31 Thread Alex Deucher
And fix up the fallout.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index e4885e2d281a..595a907f4ea7 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -23,7 +23,8 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+
 #include "amdgpu.h"
 #include "amdgpu_atombios.h"
 #include "amdgpu_ih.h"
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/8] drm/amdgpu: drop drmP.h in gfx_v10_0.c

2019-07-31 Thread Alex Deucher
And fix the fallout.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index e8731df40340..82732178d365 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -20,8 +20,12 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
+
+#include 
+#include 
 #include 
-#include 
+#include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_gfx.h"
 #include "amdgpu_psp.h"
@@ -393,7 +397,7 @@ static int gfx_v10_0_ring_test_ring(struct amdgpu_ring 
*ring)
if (amdgpu_emu_mode == 1)
msleep(1);
else
-   DRM_UDELAY(1);
+   udelay(1);
}
if (i < adev->usec_timeout) {
if (amdgpu_emu_mode == 1)
@@ -4551,7 +4555,7 @@ static int gfx_v10_0_ring_preempt_ib(struct amdgpu_ring 
*ring)
if (ring->trail_seq ==
le32_to_cpu(*(ring->trail_fence_cpu_addr)))
break;
-   DRM_UDELAY(1);
+   udelay(1);
}
 
if (i >= adev->usec_timeout) {
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 6/8] drm/amdgpu: drop drmP.h from sdma_v5_0.c

2019-07-31 Thread Alex Deucher
And fix the fallout.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 3e536140bfd6..aa43dc6c599a 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -21,8 +21,11 @@
  *
  */
 
+#include 
 #include 
-#include 
+#include 
+#include 
+
 #include "amdgpu.h"
 #include "amdgpu_ucode.h"
 #include "amdgpu_trace.h"
@@ -882,7 +885,7 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring 
*ring)
if (amdgpu_emu_mode == 1)
msleep(1);
else
-   DRM_UDELAY(1);
+   udelay(1);
}
 
if (i < adev->usec_timeout) {
@@ -1337,7 +1340,7 @@ static int sdma_v5_0_ring_preempt_ib(struct amdgpu_ring 
*ring)
if (ring->trail_seq ==
le32_to_cpu(*(ring->trail_fence_cpu_addr)))
break;
-   DRM_UDELAY(1);
+   udelay(1);
}
 
if (i >= adev->usec_timeout) {
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/8] drm/amdgpu: drop drmP.h from amdgpu_amdkfd_gfx_v10.c

2019-07-31 Thread Alex Deucher
Unused.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 0723f800e815..7c03a7fcd011 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_ucode.h"
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Review required [Was: Associate ddc adapters with connectors]

2019-07-31 Thread Neil Armstrong
Hi Andrzej,

On 31/07/2019 16:22, Neil Armstrong wrote:
> On 31/07/2019 15:10, Andrzej Pietrasiewicz wrote:
>> W dniu 31.07.2019 o 12:40, Sam Ravnborg pisze:
>>> Hi Neil.
>>>
>>> On Wed, Jul 31, 2019 at 10:00:14AM +0200, Neil Armstrong wrote:
 Hi Sam,

 On 26/07/2019 20:55, Sam Ravnborg wrote:
> Hi all.
>
> Andrzej have done a good job following up on feedback and this series is
> now ready.
>
> We need ack on the patches touching the individual drivers before we can
> proceed.
> Please check your drivers and get back.

 I can apply all core and maintainer-acked patches for now :
 1, 2, 7, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23

 and Andrzej can resend not applied patches with Yours and Emil's 
 Reviewed-by,
 so we can wait a few more days to apply them.
>>>
>>> Sounds like a good plan.
>>> Thanks for thaking care of this.
>>
>> When is it good time to resend patches 3, 4, 5, 6, 8, 9, 12, 13, 14, 15, 24 
>> as a
>> new series?
> 
> I'll ping you when everything is applied, build-tested and pushed on 
> drm-misc-next

I pushed 1, 2, 7, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23 :
bed7a2182de6 drm/radeon: Provide ddc symlink in connector sysfs directory
5b50fa2b35a4 drm/amdgpu: Provide ddc symlink in connector sysfs directory
cfb444552926 drm/bridge: ti-tfp410: Provide ddc symlink in connector sysfs 
directory
9ebc4d2140ad drm/bridge: dw-hdmi: Provide ddc symlink in connector sysfs 
directory
a4f9087e85de drm/bridge: dumb-vga-dac: Provide ddc symlink in connector sysfs 
directory
350fd554ee44 drm/ast: Provide ddc symlink in connector sysfs directory
9572ae176a10 drm/mgag200: Provide ddc symlink in connector sysfs directory
7058e76682d7 drm: sti: Provide ddc symlink in hdmi connector sysfs directory
2ae7eb372ed4 drm/imx: imx-tve: Provide ddc symlink in connector's sysfs
be0ec35940bc drm/imx: imx-ldb: Provide ddc symlink in connector's sysfs
1e8f17855ff8 drm/sun4i: hdmi: Provide ddc symlink in sun4i hdmi connector sysfs 
directory
100163df4203 drm: Add drm_connector_init() variant with ddc
e1a29c6c5955 drm: Add ddc link in sysfs created by drm_connector

Neil

> 
> Neil
> 
>>
>> Andrzej
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Review required [Was: Associate ddc adapters with connectors]

2019-07-31 Thread Neil Armstrong
On 31/07/2019 15:10, Andrzej Pietrasiewicz wrote:
> W dniu 31.07.2019 o 12:40, Sam Ravnborg pisze:
>> Hi Neil.
>>
>> On Wed, Jul 31, 2019 at 10:00:14AM +0200, Neil Armstrong wrote:
>>> Hi Sam,
>>>
>>> On 26/07/2019 20:55, Sam Ravnborg wrote:
 Hi all.

 Andrzej have done a good job following up on feedback and this series is
 now ready.

 We need ack on the patches touching the individual drivers before we can
 proceed.
 Please check your drivers and get back.
>>>
>>> I can apply all core and maintainer-acked patches for now :
>>> 1, 2, 7, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23
>>>
>>> and Andrzej can resend not applied patches with Yours and Emil's 
>>> Reviewed-by,
>>> so we can wait a few more days to apply them.
>>
>> Sounds like a good plan.
>> Thanks for thaking care of this.
> 
> When is it good time to resend patches 3, 4, 5, 6, 8, 9, 12, 13, 14, 15, 24 
> as a
> new series?

I'll ping you when everything is applied, build-tested and pushed on 
drm-misc-next

Neil

> 
> Andrzej

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 07/13] mm: remove the page_shift member from struct hmm_range

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote:
> All users pass PAGE_SIZE here, and if we wanted to support single
> entries for huge pages we should really just add a HMM_FAULT_HUGEPAGE
> flag instead that uses the huge page size instead of having the
> caller calculate that size once, just for the hmm code to verify it.

Maybe this was meant to support device page size != native page size? 
Anyway, looks like we didn't use it that way.

Acked-by: Felix Kuehling 


>
> Signed-off-by: Christoph Hellwig 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 -
>   drivers/gpu/drm/nouveau/nouveau_svm.c   |  1 -
>   include/linux/hmm.h | 22 -
>   mm/hmm.c| 42 ++---
>   4 files changed, 9 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 71d6e7087b0b..8bf79288c4e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -818,7 +818,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   0 : range->flags[HMM_PFN_WRITE];
>   range->pfn_flags_mask = 0;
>   range->pfns = pfns;
> - range->page_shift = PAGE_SHIFT;
>   range->start = start;
>   range->end = start + ttm->num_pages * PAGE_SIZE;
>   
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
> b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index 40e706234554..e7068ce46949 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -680,7 +680,6 @@ nouveau_svm_fault(struct nvif_notify *notify)
>args.i.p.addr + args.i.p.size, fn - fi);
>   
>   /* Have HMM fault pages within the fault window to the GPU. */
> - range.page_shift = PAGE_SHIFT;
>   range.start = args.i.p.addr;
>   range.end = args.i.p.addr + args.i.p.size;
>   range.pfns = args.phys;
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index c5b51376b453..51e18fbb8953 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -158,7 +158,6 @@ enum hmm_pfn_value_e {
>* @values: pfn value for some special case (none, special, error, ...)
>* @default_flags: default flags for the range (write, read, ... see hmm 
> doc)
>* @pfn_flags_mask: allows to mask pfn flags so that only default_flags 
> matter
> - * @page_shift: device virtual address shift value (should be >= PAGE_SHIFT)
>* @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
>* @valid: pfns array did not change since it has been fill by an HMM 
> function
>*/
> @@ -172,31 +171,10 @@ struct hmm_range {
>   const uint64_t  *values;
>   uint64_tdefault_flags;
>   uint64_tpfn_flags_mask;
> - uint8_t page_shift;
>   uint8_t pfn_shift;
>   boolvalid;
>   };
>   
> -/*
> - * hmm_range_page_shift() - return the page shift for the range
> - * @range: range being queried
> - * Return: page shift (page size = 1 << page shift) for the range
> - */
> -static inline unsigned hmm_range_page_shift(const struct hmm_range *range)
> -{
> - return range->page_shift;
> -}
> -
> -/*
> - * hmm_range_page_size() - return the page size for the range
> - * @range: range being queried
> - * Return: page size for the range in bytes
> - */
> -static inline unsigned long hmm_range_page_size(const struct hmm_range 
> *range)
> -{
> - return 1UL << hmm_range_page_shift(range);
> -}
> -
>   /*
>* hmm_range_wait_until_valid() - wait for range to be valid
>* @range: range affected by invalidation to wait on
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 926735a3aef9..f26d6abc4ed2 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -344,13 +344,12 @@ static int hmm_vma_walk_hole_(unsigned long addr, 
> unsigned long end,
>   struct hmm_vma_walk *hmm_vma_walk = walk->private;
>   struct hmm_range *range = hmm_vma_walk->range;
>   uint64_t *pfns = range->pfns;
> - unsigned long i, page_size;
> + unsigned long i;
>   
>   hmm_vma_walk->last = addr;
> - page_size = hmm_range_page_size(range);
> - i = (addr - range->start) >> range->page_shift;
> + i = (addr - range->start) >> PAGE_SHIFT;
>   
> - for (; addr < end; addr += page_size, i++) {
> + for (; addr < end; addr += PAGE_SIZE, i++) {
>   pfns[i] = range->values[HMM_PFN_NONE];
>   if (fault || write_fault) {
>   int ret;
> @@ -772,7 +771,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, 
> unsigned long hmask,
> struct mm_walk *walk)
>   {
>   #ifdef CONFIG_HUGETLB_PAGE
> - unsigned long addr = start, i, pfn, mask, size, pfn_inc;
> + unsigned long addr = start, i, pfn, mask;
>   struct hmm_vma_walk 

Re: [PATCH 06/13] mm: remove superflous arguments from hmm_range_register

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote:
> The start, end and page_shift values are all saved in the range
> structure, so we might as well use that for argument passing.
>
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Felix Kuehling 


> ---
>   Documentation/vm/hmm.rst|  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  7 +--
>   drivers/gpu/drm/nouveau/nouveau_svm.c   |  5 ++---
>   include/linux/hmm.h |  6 +-
>   mm/hmm.c| 20 +---
>   5 files changed, 14 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> index ddcb5ca8b296..e63c11f7e0e0 100644
> --- a/Documentation/vm/hmm.rst
> +++ b/Documentation/vm/hmm.rst
> @@ -222,7 +222,7 @@ The usage pattern is::
> range.flags = ...;
> range.values = ...;
> range.pfn_shift = ...;
> -  hmm_range_register();
> +  hmm_range_register(, mirror);
>   
> /*
>  * Just wait for range to be valid, safe to ignore return value as we
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index f0821638bbc6..71d6e7087b0b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -818,8 +818,11 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   0 : range->flags[HMM_PFN_WRITE];
>   range->pfn_flags_mask = 0;
>   range->pfns = pfns;
> - hmm_range_register(range, mirror, start,
> -start + ttm->num_pages * PAGE_SIZE, PAGE_SHIFT);
> + range->page_shift = PAGE_SHIFT;
> + range->start = start;
> + range->end = start + ttm->num_pages * PAGE_SIZE;
> +
> + hmm_range_register(range, mirror);
>   
>   /*
>* Just wait for range to be valid, safe to ignore return value as we
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
> b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index b889d5ec4c7e..40e706234554 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -492,9 +492,7 @@ nouveau_range_fault(struct nouveau_svmm *svmm, struct 
> hmm_range *range)
>   range->default_flags = 0;
>   range->pfn_flags_mask = -1UL;
>   
> - ret = hmm_range_register(range, >mirror,
> -  range->start, range->end,
> -  PAGE_SHIFT);
> + ret = hmm_range_register(range, >mirror);
>   if (ret) {
>   up_read(>hmm->mm->mmap_sem);
>   return (int)ret;
> @@ -682,6 +680,7 @@ nouveau_svm_fault(struct nvif_notify *notify)
>args.i.p.addr + args.i.p.size, fn - fi);
>   
>   /* Have HMM fault pages within the fault window to the GPU. */
> + range.page_shift = PAGE_SHIFT;
>   range.start = args.i.p.addr;
>   range.end = args.i.p.addr + args.i.p.size;
>   range.pfns = args.phys;
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 59be0aa2476d..c5b51376b453 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -400,11 +400,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror);
>   /*
>* Please see Documentation/vm/hmm.rst for how to use the range API.
>*/
> -int hmm_range_register(struct hmm_range *range,
> -struct hmm_mirror *mirror,
> -unsigned long start,
> -unsigned long end,
> -unsigned page_shift);
> +int hmm_range_register(struct hmm_range *range, struct hmm_mirror *mirror);
>   void hmm_range_unregister(struct hmm_range *range);
>   
>   /*
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 3a3852660757..926735a3aef9 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -843,35 +843,25 @@ static void hmm_pfns_clear(struct hmm_range *range,
>* hmm_range_register() - start tracking change to CPU page table over a 
> range
>* @range: range
>* @mm: the mm struct for the range of virtual address
> - * @start: start virtual address (inclusive)
> - * @end: end virtual address (exclusive)
> - * @page_shift: expect page shift for the range
> + *
>* Return: 0 on success, -EFAULT if the address space is no longer valid
>*
>* Track updates to the CPU page table see include/linux/hmm.h
>*/
> -int hmm_range_register(struct hmm_range *range,
> -struct hmm_mirror *mirror,
> -unsigned long start,
> -unsigned long end,
> -unsigned page_shift)
> +int hmm_range_register(struct hmm_range *range, struct hmm_mirror *mirror)
>   {
> - unsigned long mask = ((1UL << page_shift) - 1UL);
> + unsigned long mask = ((1UL << range->page_shift) - 1UL);
>   struct hmm *hmm = mirror->hmm;
>   unsigned long flags;
>   
>   range->valid = false;
>   range->hmm = NULL;
>   
> -

Re: [PATCH 02/13] amdgpu: don't initialize range->list in amdgpu_hmm_init_range

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote:
> The list is used to add the range to another list as an entry in the
> core hmm code, so there is no need to initialize it in a driver.

I've seen code that uses list_empty to check whether a list head has 
been added to a list or not. For that to work, the list head needs to be 
initialized, and it has to be removed with list_del_init. If HMM doesn't 
ever do that with range->list, then this patch is Reviewed-by: Felix 
Kuehling 


>
> Signed-off-by: Christoph Hellwig 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index b698b423b25d..60b9fc9561d7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -484,6 +484,5 @@ void amdgpu_hmm_init_range(struct hmm_range *range)
>   range->flags = hmm_range_flags;
>   range->values = hmm_range_values;
>   range->pfn_shift = PAGE_SHIFT;
> - INIT_LIST_HEAD(>list);
>   }
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 01/13] amdgpu: remove -EAGAIN handling for hmm_range_fault

2019-07-31 Thread Kuehling, Felix
On 2019-07-30 1:51 a.m., Christoph Hellwig wrote:
> hmm_range_fault can only return -EAGAIN if called with the block
> argument set to false, so remove the special handling for it.

The block argument no longer exists. You replaced that with the 
HMM_FAULT_ALLOW_RETRY with opposite logic. So this should read 
"hmm_range_fault can only return -EAGAIN if called with the 
HMM_FAULT_ALLOW_RETRY flag set, so remove the special handling for it.

With that fixed, this commit is Reviewed-by: Felix Kuehling 


>
> Signed-off-by: Christoph Hellwig 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 23 +++
>   1 file changed, 3 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 12a59ac83f72..f0821638bbc6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -778,7 +778,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   struct hmm_range *range;
>   unsigned long i;
>   uint64_t *pfns;
> - int retry = 0;
>   int r = 0;
>   
>   if (!mm) /* Happens during process shutdown */
> @@ -822,7 +821,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   hmm_range_register(range, mirror, start,
>  start + ttm->num_pages * PAGE_SIZE, PAGE_SHIFT);
>   
> -retry:
>   /*
>* Just wait for range to be valid, safe to ignore return value as we
>* will use the return value of hmm_range_fault() below under the
> @@ -831,24 +829,12 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT);
>   
>   down_read(>mmap_sem);
> -
>   r = hmm_range_fault(range, 0);
> - if (unlikely(r < 0)) {
> - if (likely(r == -EAGAIN)) {
> - /*
> -  * return -EAGAIN, mmap_sem is dropped
> -  */
> - if (retry++ < MAX_RETRY_HMM_RANGE_FAULT)
> - goto retry;
> - else
> - pr_err("Retry hmm fault too many times\n");
> - }
> -
> - goto out_up_read;
> - }
> -
>   up_read(>mmap_sem);
>   
> + if (unlikely(r < 0))
> + goto out_free_pfns;
> +
>   for (i = 0; i < ttm->num_pages; i++) {
>   pages[i] = hmm_device_entry_to_page(range, pfns[i]);
>   if (unlikely(!pages[i])) {
> @@ -864,9 +850,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, 
> struct page **pages)
>   
>   return 0;
>   
> -out_up_read:
> - if (likely(r != -EAGAIN))
> - up_read(>mmap_sem);
>   out_free_pfns:
>   hmm_range_unregister(range);
>   kvfree(pfns);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Review required [Was: Associate ddc adapters with connectors]

2019-07-31 Thread Andrzej Pietrasiewicz

W dniu 31.07.2019 o 12:40, Sam Ravnborg pisze:

Hi Neil.

On Wed, Jul 31, 2019 at 10:00:14AM +0200, Neil Armstrong wrote:

Hi Sam,

On 26/07/2019 20:55, Sam Ravnborg wrote:

Hi all.

Andrzej have done a good job following up on feedback and this series is
now ready.

We need ack on the patches touching the individual drivers before we can
proceed.
Please check your drivers and get back.


I can apply all core and maintainer-acked patches for now :
1, 2, 7, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23

and Andrzej can resend not applied patches with Yours and Emil's Reviewed-by,
so we can wait a few more days to apply them.


Sounds like a good plan.
Thanks for thaking care of this.


When is it good time to resend patches 3, 4, 5, 6, 8, 9, 12, 13, 14, 15, 24 as a
new series?

Andrzej


Re: [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13

2019-07-31 Thread Daniel Vetter
On Wed, Jul 31, 2019 at 1:19 PM Christian König
 wrote:
>
> Am 31.07.19 um 12:39 schrieb Daniel Vetter:
> > On Wed, Jul 31, 2019 at 11:44 AM Christian König
> >  wrote:
> >> Am 31.07.19 um 11:12 schrieb Daniel Vetter:
> >>> [SNIP]
> >>> I think I brought this up before, but new top-post for a clean start.
> >>>
> >>> Use-case I have in mind is something like amdkfd's model, where you have a
> >>> list of buffers (per context or whatever) that you always need to have
> >>> present. Idea is to also use this for traditional CS for vk/gl, to cut
> >>> down on the buffer management overhead, but we'd still allow additional
> >>> buffers to be listed per-CS on top of that default working set.
> >>>
> >>> This of course means no implicit sync anymore on these default buffers
> >>> (the point is to avoid touching every buffer on every CS, updating fences
> >>> would defeat that). That's why the CS can still list additional buffers,
> >>> the only reason for that is to add implicit sync fences. Those buffers
> >>> would be most likely in the default working set already.
> >>>
> >>> Consequence is that I want the amdkfd model of "evict when needed, but
> >>> keep resident by default", but also working implicit fences. And it must
> >>> be doable without touching every bo on every CS. Listing possible
> >>> implementation options:
> >>>
> >>> - the amdkfd trick doesn't work because it would break implicit fencing -
> >>> any implicit sync would always result in the context getting
> >>> preempted/evicted, which isn't great.
> >> I'm actually working on re-working implicit fencing towards better
> >> supporting this.
> >>
> >> Basic idea is that you split up the fences in a reservation object into
> >> three categories:
> >> 1. Implicit sync on write.
> >> 2. Implicit sync on read.
> >> 3. No implicit sync at all.
> > Not really sure what you want to do here ... implicit sync is opt-in
> > (or opt-out flag if you need to keep CS backwards compat) per BO/CS.
> > At least when we discussed this forever at some XDCs consensus was
> > that storing the implicit sync mode on the BO is not going to work.
>
> Well that's exactly what we are doing for amdgpu and this is working
> perfectly. See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.

Iirc the example where you can't decide at creation time is
EGL-on-gbm. If you run on something like X, you need implicit sync. If
you run on Android, you need explicit sync. And the only way to figure
that out is watch whether your compositor uses the explicit sync
android extension And from a very quick look in mesa this seems only
wired up for radv and not classic GL, so I guess you didn't take that
case into account?

For vk (which this is for) the create-time flag obviously works,
because vk is generally not totally crazy.

> >> This should not only help the KFD, but also with amdgpu command
> >> submission and things like page tables updates.
> >>
> >> E.g. we need to fences for page tables updates around in reservation
> >> objects as well, but you really really really don't want any implicit
> >> synchronization with them :)
> > Why do you even try to do implicit sync with your pagetables? How can
> > your pagetables even get anywhere near where implicit sync matters?
>
> When you unmap a BO from a page table the BO needs to stay in the same
> place until the unmap operation is completed.
>
> Otherwise you open up a short window where a process could access memory
> which doesn't belongs to the process.
>
> This unmap operation is usually an SDMA operation and nobody except for
> the memory management needs to take this into account.
>
> >>> - we share the resv_obj between all the buffers in the default working set
> >>> of a context, with unsharing/resharing the resv_obj if they 
> >>> enter/leave
> >>> the default working set. That way there's only one resv_obj to update 
> >>> on
> >>> each CS, and we can attach a new shared fence for every CS. Trouble is
> >>> that this means a given buffer can only be part of one default working
> >>> set, so all shared buffers would need to be listed again separately. 
> >>> Not
> >>> so great if userspace has to deal with that fairly arbitrary 
> >>> limitation.
> >> Yeah, that is exactly what we do with the per VM BOs in amdgpu.
> >>
> >> The limitation that you have only one working set actually turned out to
> >> be not a limitation at all, but rather seen as something welcomed by our
> >> Vulkan guys.
> > We have per-ctx vms in i915, but I guess even for those sharing will be 
> > limited.
>
> In amdgpu we had this funky stuff with bo lists which should represent
> the resources used for a command submission.
>
> But after actually talking to the Vulkan and other userspace guys we
> completely deprecated that.
>
> We settled on having per process resources which are always valid and a
> dynamic list of resources you send to the kernel with each command
> submission.

The per-ctx vm is for arb_robustness, we had a 

Re: [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13

2019-07-31 Thread Christian König

Am 31.07.19 um 12:39 schrieb Daniel Vetter:

On Wed, Jul 31, 2019 at 11:44 AM Christian König
 wrote:

Am 31.07.19 um 11:12 schrieb Daniel Vetter:

[SNIP]
I think I brought this up before, but new top-post for a clean start.

Use-case I have in mind is something like amdkfd's model, where you have a
list of buffers (per context or whatever) that you always need to have
present. Idea is to also use this for traditional CS for vk/gl, to cut
down on the buffer management overhead, but we'd still allow additional
buffers to be listed per-CS on top of that default working set.

This of course means no implicit sync anymore on these default buffers
(the point is to avoid touching every buffer on every CS, updating fences
would defeat that). That's why the CS can still list additional buffers,
the only reason for that is to add implicit sync fences. Those buffers
would be most likely in the default working set already.

Consequence is that I want the amdkfd model of "evict when needed, but
keep resident by default", but also working implicit fences. And it must
be doable without touching every bo on every CS. Listing possible
implementation options:

- the amdkfd trick doesn't work because it would break implicit fencing -
any implicit sync would always result in the context getting
preempted/evicted, which isn't great.

I'm actually working on re-working implicit fencing towards better
supporting this.

Basic idea is that you split up the fences in a reservation object into
three categories:
1. Implicit sync on write.
2. Implicit sync on read.
3. No implicit sync at all.

Not really sure what you want to do here ... implicit sync is opt-in
(or opt-out flag if you need to keep CS backwards compat) per BO/CS.
At least when we discussed this forever at some XDCs consensus was
that storing the implicit sync mode on the BO is not going to work.


Well that's exactly what we are doing for amdgpu and this is working 
perfectly. See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.



This should not only help the KFD, but also with amdgpu command
submission and things like page tables updates.

E.g. we need to fences for page tables updates around in reservation
objects as well, but you really really really don't want any implicit
synchronization with them :)

Why do you even try to do implicit sync with your pagetables? How can
your pagetables even get anywhere near where implicit sync matters?


When you unmap a BO from a page table the BO needs to stay in the same 
place until the unmap operation is completed.


Otherwise you open up a short window where a process could access memory 
which doesn't belongs to the process.


This unmap operation is usually an SDMA operation and nobody except for 
the memory management needs to take this into account.



- we share the resv_obj between all the buffers in the default working set
of a context, with unsharing/resharing the resv_obj if they enter/leave
the default working set. That way there's only one resv_obj to update on
each CS, and we can attach a new shared fence for every CS. Trouble is
that this means a given buffer can only be part of one default working
set, so all shared buffers would need to be listed again separately. Not
so great if userspace has to deal with that fairly arbitrary limitation.

Yeah, that is exactly what we do with the per VM BOs in amdgpu.

The limitation that you have only one working set actually turned out to
be not a limitation at all, but rather seen as something welcomed by our
Vulkan guys.

We have per-ctx vms in i915, but I guess even for those sharing will be limited.


In amdgpu we had this funky stuff with bo lists which should represent 
the resources used for a command submission.


But after actually talking to the Vulkan and other userspace guys we 
completely deprecated that.


We settled on having per process resources which are always valid and a 
dynamic list of resources you send to the kernel with each command 
submission.



I also don't really see a way to have an implementation with good
performance where BOs can be in multiple working sets at the same time.


- we allow the ->move_notify callback to add new fences, which the
exporter needs to wait on before it schedules the pipelined move. This
also avoids the per-BO update on every CS, and it would allow buffers to
be shared and to be in multiple default working sets. The downside is
that ->move_notify needs to be able to cope with added fences, which
means we might need to grow the shared fences array, which might fail
with ENOMEM. Not great. We could fix this with some kind of permanent
shared fence slot reservations (i.e. a reserved slot which outlives
holding the reservation lock), but that might waste quite a bit of
memory. Probably not real problem in the grand scheme of things. I think
the fence itself can be preallocated per context, so that isn't the
problem.

Well the ENOMEM problem is the 

Re: [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13

2019-07-31 Thread Daniel Vetter
On Wed, Jul 31, 2019 at 11:44 AM Christian König
 wrote:
>
> Am 31.07.19 um 11:12 schrieb Daniel Vetter:
> > [SNIP]
> > I think I brought this up before, but new top-post for a clean start.
> >
> > Use-case I have in mind is something like amdkfd's model, where you have a
> > list of buffers (per context or whatever) that you always need to have
> > present. Idea is to also use this for traditional CS for vk/gl, to cut
> > down on the buffer management overhead, but we'd still allow additional
> > buffers to be listed per-CS on top of that default working set.
> >
> > This of course means no implicit sync anymore on these default buffers
> > (the point is to avoid touching every buffer on every CS, updating fences
> > would defeat that). That's why the CS can still list additional buffers,
> > the only reason for that is to add implicit sync fences. Those buffers
> > would be most likely in the default working set already.
> >
> > Consequence is that I want the amdkfd model of "evict when needed, but
> > keep resident by default", but also working implicit fences. And it must
> > be doable without touching every bo on every CS. Listing possible
> > implementation options:
> >
> > - the amdkfd trick doesn't work because it would break implicit fencing -
> >any implicit sync would always result in the context getting
> >preempted/evicted, which isn't great.
>
> I'm actually working on re-working implicit fencing towards better
> supporting this.
>
> Basic idea is that you split up the fences in a reservation object into
> three categories:
> 1. Implicit sync on write.
> 2. Implicit sync on read.
> 3. No implicit sync at all.

Not really sure what you want to do here ... implicit sync is opt-in
(or opt-out flag if you need to keep CS backwards compat) per BO/CS.
At least when we discussed this forever at some XDCs consensus was
that storing the implicit sync mode on the BO is not going to work.

> This should not only help the KFD, but also with amdgpu command
> submission and things like page tables updates.
>
> E.g. we need to fences for page tables updates around in reservation
> objects as well, but you really really really don't want any implicit
> synchronization with them :)

Why do you even try to do implicit sync with your pagetables? How can
your pagetables even get anywhere near where implicit sync matters?
I'm confused ... If it's because ttm doesn't allow you to override the
eviction order because it's a midlayer I think the correct fix is to
demidlayer.

> I think that having a consensus of the meaning of the fences in a
> reservation object will be rather fundamental for what we are planning
> to do here.

Yeah that I can agree on.

> > - we share the resv_obj between all the buffers in the default working set
> >of a context, with unsharing/resharing the resv_obj if they enter/leave
> >the default working set. That way there's only one resv_obj to update on
> >each CS, and we can attach a new shared fence for every CS. Trouble is
> >that this means a given buffer can only be part of one default working
> >set, so all shared buffers would need to be listed again separately. Not
> >so great if userspace has to deal with that fairly arbitrary limitation.
>
> Yeah, that is exactly what we do with the per VM BOs in amdgpu.
>
> The limitation that you have only one working set actually turned out to
> be not a limitation at all, but rather seen as something welcomed by our
> Vulkan guys.

We have per-ctx vms in i915, but I guess even for those sharing will be limited.

> I also don't really see a way to have an implementation with good
> performance where BOs can be in multiple working sets at the same time.
>
> > - we allow the ->move_notify callback to add new fences, which the
> >exporter needs to wait on before it schedules the pipelined move. This
> >also avoids the per-BO update on every CS, and it would allow buffers to
> >be shared and to be in multiple default working sets. The downside is
> >that ->move_notify needs to be able to cope with added fences, which
> >means we might need to grow the shared fences array, which might fail
> >with ENOMEM. Not great. We could fix this with some kind of permanent
> >shared fence slot reservations (i.e. a reserved slot which outlives
> >holding the reservation lock), but that might waste quite a bit of
> >memory. Probably not real problem in the grand scheme of things. I think
> >the fence itself can be preallocated per context, so that isn't the
> >problem.
>
> Well the ENOMEM problem is the least of the problems with this approach.
> You can still block for the fence which you wanted to add.
>
> The real problem is that you can't tell if a BO is busy or not by just
> looking at its current fences.
>
> In other words when you stop adding fences you also want to stop moving
> them individually on the LRU.

Well yeah, otherwise you're back a per BO overhead on CS. That's kinda

RE: AMDPU breaks suspend after kernel 5.0

2019-07-31 Thread Gao, Likun
Hi Gover,

Sorry for responds late, can you help to give a try to add the patch attached 
and share me the related result and logs? 
Besides, do you have tried to revert this commit to see whether it's good?
Thanks.

Regards,
Likun

-Original Message-
From: Paul Gover  
Sent: Tuesday, July 30, 2019 9:34 PM
To: Gao, Likun 
Cc: amd-gfx@lists.freedesktop.org
Subject: AMDPU breaks suspend after kernel 5.0

Hi Likun,

Sorry if you don't want emails like this.  I added info. to
https://bugs.freedesktop.org/show_bug.cgi?id=110258
but people on Gentoo forums said email would be better.

Git bisect lead me to you:
---
106c7d6148e5aadd394e6701f7e498df49b869d1 is the first bad commit commit 
106c7d6148e5aadd394e6701f7e498df49b869d1
Author: Likun Gao 
Date:   Thu Nov 8 20:19:54 2018 +0800

drm/amdgpu: abstract the function of enter/exit safe mode for RLC

Abstract the function of amdgpu_gfx_rlc_enter/exit_safe_mode and some part 
of rlc_init to improve the reusability of RLC.

Signed-off-by: Likun Gao 
Acked-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 

:04 04 8f3b365496f3bbd380a62032f20642ace51c8fef 
e14ec968011019e3f601df3f15682bb9ae0bafc6 M  drivers
-
Symptoms are when resuming after pm-suspend, the screen is blank or corrupt, 
the keyboard dead, and syslog shows

kernel: [   81.09] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, 
signaled seq=51, emitted seq=52
kernel: [   81.096671] [drm] IP block:gfx_v8_0 is hung!
kernel: [   81.096734] [drm] GPU recovery disabled.
-
or similar.  The problem occurs with all kernels since 5.0 up to and including 
5.3-rc2.  My laptop is:

HP 15-bw0xx
cpu:AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G with integrated graphics:
Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98E4]

There are several similar reports on the web, most or all for Stoney hardware, 
but that might be a coincidence as laptop users are more concerned about 
suspend, and there are a lot of laptops with similar integrated graphics 
motherboards.

I'm running Gentoo with a custom kernel, the most relevant bits of the config 
CONFIG_DRM_AMDGPU=y # CONFIG_DRM_AMDGPU_SI is not set # CONFIG_DRM_AMDGPU_CIK 
is not set # CONFIG_DRM_AMDGPU_USERPTR is not set

If you tell me how, I'm willing to try to collect traces etc.

Paul Gover




0001-drm-amdgpu-debug-for-gfx-v8-Stoney-pm-suspend.patch
Description: 0001-drm-amdgpu-debug-for-gfx-v8-Stoney-pm-suspend.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13

2019-07-31 Thread Christian König

Am 31.07.19 um 11:12 schrieb Daniel Vetter:

[SNIP]
I think I brought this up before, but new top-post for a clean start.

Use-case I have in mind is something like amdkfd's model, where you have a
list of buffers (per context or whatever) that you always need to have
present. Idea is to also use this for traditional CS for vk/gl, to cut
down on the buffer management overhead, but we'd still allow additional
buffers to be listed per-CS on top of that default working set.

This of course means no implicit sync anymore on these default buffers
(the point is to avoid touching every buffer on every CS, updating fences
would defeat that). That's why the CS can still list additional buffers,
the only reason for that is to add implicit sync fences. Those buffers
would be most likely in the default working set already.

Consequence is that I want the amdkfd model of "evict when needed, but
keep resident by default", but also working implicit fences. And it must
be doable without touching every bo on every CS. Listing possible
implementation options:

- the amdkfd trick doesn't work because it would break implicit fencing -
   any implicit sync would always result in the context getting
   preempted/evicted, which isn't great.


I'm actually working on re-working implicit fencing towards better 
supporting this.


Basic idea is that you split up the fences in a reservation object into 
three categories:

1. Implicit sync on write.
2. Implicit sync on read.
3. No implicit sync at all.

This should not only help the KFD, but also with amdgpu command 
submission and things like page tables updates.


E.g. we need to fences for page tables updates around in reservation 
objects as well, but you really really really don't want any implicit 
synchronization with them :)


I think that having a consensus of the meaning of the fences in a 
reservation object will be rather fundamental for what we are planning 
to do here.



- we share the resv_obj between all the buffers in the default working set
   of a context, with unsharing/resharing the resv_obj if they enter/leave
   the default working set. That way there's only one resv_obj to update on
   each CS, and we can attach a new shared fence for every CS. Trouble is
   that this means a given buffer can only be part of one default working
   set, so all shared buffers would need to be listed again separately. Not
   so great if userspace has to deal with that fairly arbitrary limitation.


Yeah, that is exactly what we do with the per VM BOs in amdgpu.

The limitation that you have only one working set actually turned out to 
be not a limitation at all, but rather seen as something welcomed by our 
Vulkan guys.


I also don't really see a way to have an implementation with good 
performance where BOs can be in multiple working sets at the same time.



- we allow the ->move_notify callback to add new fences, which the
   exporter needs to wait on before it schedules the pipelined move. This
   also avoids the per-BO update on every CS, and it would allow buffers to
   be shared and to be in multiple default working sets. The downside is
   that ->move_notify needs to be able to cope with added fences, which
   means we might need to grow the shared fences array, which might fail
   with ENOMEM. Not great. We could fix this with some kind of permanent
   shared fence slot reservations (i.e. a reserved slot which outlives
   holding the reservation lock), but that might waste quite a bit of
   memory. Probably not real problem in the grand scheme of things. I think
   the fence itself can be preallocated per context, so that isn't the
   problem.


Well the ENOMEM problem is the least of the problems with this approach. 
You can still block for the fence which you wanted to add.


The real problem is that you can't tell if a BO is busy or not by just 
looking at its current fences.


In other words when you stop adding fences you also want to stop moving 
them individually on the LRU.


When the memory management evicts one BO you essentially kick out a 
whole process/working set.


So when you want to kick out the next BO you actually want to do this 
for BOs which now became available anyway.


That approach won't work with the move_notify callback.


- same as above, but the new fence doesn't get added, but returned to the
   caller, and the exporter deals with the ENOMEM mess. Might not work
   since an importer could have a lot of contexts using a given object, and
   so would have a lot of fences to add.


I don't think that this will work.

See you not only need to be able to add the fence to the BO currently 
evicted, but also to all other BO in your process/working set.


Additional to that moving the ENOMEM handling from the importer to the 
exporter sounds as helpful as adding another layer of abstraction :)


Regards,
Christian.



- something entirely different?

Thoughts?

Cheers, Daniel


---
  drivers/dma-buf/dma-buf.c | 183 

Re: [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13

2019-07-31 Thread Daniel Vetter
On Wed, Jun 26, 2019 at 02:23:05PM +0200, Christian König wrote:
> On the exporter side we add optional explicit pinning callbacks. If those
> callbacks are implemented the framework no longer caches sg tables and the
> map/unmap callbacks are always called with the lock of the reservation object
> held.
> 
> On the importer side we add an optional invalidate callback. This callback is
> used by the exporter to inform the importers that their mappings should be
> destroyed as soon as possible.
> 
> This allows the exporter to provide the mappings without the need to pin
> the backing store.
> 
> v2: don't try to invalidate mappings when the callback is NULL,
> lock the reservation obj while using the attachments,
> add helper to set the callback
> v3: move flag for invalidation support into the DMA-buf,
> use new attach_info structure to set the callback
> v4: use importer_priv field instead of mangling exporter priv.
> v5: drop invalidation_supported flag
> v6: squash together with pin/unpin changes
> v7: pin/unpin takes an attachment now
> v8: nuke dma_buf_attachment_(map|unmap)_locked,
> everything is now handled backward compatible
> v9: always cache when export/importer don't agree on dynamic handling
> v10: minimal style cleanup
> v11: drop automatically re-entry avoidance
> v12: rename callback to move_notify
> v13: add might_lock in appropriate places
> 
> Signed-off-by: Christian König 

I think I brought this up before, but new top-post for a clean start.

Use-case I have in mind is something like amdkfd's model, where you have a
list of buffers (per context or whatever) that you always need to have
present. Idea is to also use this for traditional CS for vk/gl, to cut
down on the buffer management overhead, but we'd still allow additional
buffers to be listed per-CS on top of that default working set.

This of course means no implicit sync anymore on these default buffers
(the point is to avoid touching every buffer on every CS, updating fences
would defeat that). That's why the CS can still list additional buffers,
the only reason for that is to add implicit sync fences. Those buffers
would be most likely in the default working set already.

Consequence is that I want the amdkfd model of "evict when needed, but
keep resident by default", but also working implicit fences. And it must
be doable without touching every bo on every CS. Listing possible
implementation options:

- the amdkfd trick doesn't work because it would break implicit fencing -
  any implicit sync would always result in the context getting
  preempted/evicted, which isn't great.

- we share the resv_obj between all the buffers in the default working set
  of a context, with unsharing/resharing the resv_obj if they enter/leave
  the default working set. That way there's only one resv_obj to update on
  each CS, and we can attach a new shared fence for every CS. Trouble is
  that this means a given buffer can only be part of one default working
  set, so all shared buffers would need to be listed again separately. Not
  so great if userspace has to deal with that fairly arbitrary limitation.

- we allow the ->move_notify callback to add new fences, which the
  exporter needs to wait on before it schedules the pipelined move. This
  also avoids the per-BO update on every CS, and it would allow buffers to
  be shared and to be in multiple default working sets. The downside is
  that ->move_notify needs to be able to cope with added fences, which
  means we might need to grow the shared fences array, which might fail
  with ENOMEM. Not great. We could fix this with some kind of permanent
  shared fence slot reservations (i.e. a reserved slot which outlives
  holding the reservation lock), but that might waste quite a bit of
  memory. Probably not real problem in the grand scheme of things. I think
  the fence itself can be preallocated per context, so that isn't the
  problem.

- same as above, but the new fence doesn't get added, but returned to the
  caller, and the exporter deals with the ENOMEM mess. Might not work
  since an importer could have a lot of contexts using a given object, and
  so would have a lot of fences to add.

- something entirely different?

Thoughts?

Cheers, Daniel

> ---
>  drivers/dma-buf/dma-buf.c | 183 --
>  include/linux/dma-buf.h   | 108 --
>  2 files changed, 277 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 6c15deb5d4ad..bd8611fa2cfa 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -531,6 +531,9 @@ struct dma_buf *dma_buf_export(const struct 
> dma_buf_export_info *exp_info)
>   return ERR_PTR(-EINVAL);
>   }
>  
> + if (WARN_ON(exp_info->ops->cache_sgt_mapping && exp_info->ops->pin))
> + return ERR_PTR(-EINVAL);
> +
>   if (!try_module_get(exp_info->owner))
>   return ERR_PTR(-ENOENT);

RE: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

2019-07-31 Thread Deng, Emily
All looks good to me. Reviewed-by: Emily Deng .

>-Original Message-
>From: amd-gfx  On Behalf Of Monk
>Liu
>Sent: Wednesday, July 31, 2019 4:54 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Liu, Monk 
>Subject: [PATCH] drm/amdgpu: fix double ucode load by PSP(v3)
>
>previously the ucode loading of PSP was repreated, one executed in
>phase_1 init/re-init/resume and the other in fw_loading routine
>
>Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset
>prior to the FW loading and any block's hw_init/resume
>
>v2:
>still do the smu fw loading since it is needed by bare-metal
>
>v3:
>drop the change in reinit_early_sriov, just clear all block's status.hw in the
>head place and set the status.hw after hw_init done is enough
>
>Signed-off-by: Monk Liu 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 59
>+++---
> 1 file changed, 38 insertions(+), 21 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 6cb358c..30436ba 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -1673,28 +1673,34 @@ static int amdgpu_device_fw_loading(struct
>amdgpu_device *adev)
>
>   if (adev->asic_type >= CHIP_VEGA10) {
>   for (i = 0; i < adev->num_ip_blocks; i++) {
>-  if (adev->ip_blocks[i].version->type ==
>AMD_IP_BLOCK_TYPE_PSP) {
>-  if (adev->in_gpu_reset || adev->in_suspend) {
>-  if (amdgpu_sriov_vf(adev) && adev-
>>in_gpu_reset)
>-  break; /* sriov gpu reset, psp
>need to do hw_init before IH because of hw limit */
>-  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>-  if (r) {
>-  DRM_ERROR("resume of IP
>block <%s> failed %d\n",
>+  if (adev->ip_blocks[i].version->type !=
>AMD_IP_BLOCK_TYPE_PSP)
>+  continue;
>+
>+  /* no need to do the fw loading again if already
>done*/
>+  if (adev->ip_blocks[i].status.hw == true)
>+  break;
>+
>+  if (adev->in_gpu_reset || adev->in_suspend) {
>+  r = adev->ip_blocks[i].version->funcs-
>>resume(adev);
>+  if (r) {
>+  DRM_ERROR("resume of IP block <%s>
>failed %d\n",
> adev-
>>ip_blocks[i].version->funcs->name, r);
>-  return r;
>-  }
>-  } else {
>-  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>-  if (r) {
>-  DRM_ERROR("hw_init of IP
>block <%s> failed %d\n",
>-adev->ip_blocks[i].version-
>>funcs->name, r);
>-  return r;
>-  }
>+  return r;
>+  }
>+  } else {
>+  r = adev->ip_blocks[i].version->funcs-
>>hw_init(adev);
>+  if (r) {
>+  DRM_ERROR("hw_init of IP block <%s>
>failed %d\n",
>+adev-
>>ip_blocks[i].version->funcs->name, r);
>+  return r;
>   }
>-  adev->ip_blocks[i].status.hw = true;
>   }
>+
>+  adev->ip_blocks[i].status.hw = true;
>+  break;
>   }
>   }
>+
>   r = amdgpu_pm_load_smu_firmware(adev, _version);
>
>   return r;
>@@ -2136,7 +2142,9 @@ static int
>amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev)
>   if (r) {
>   DRM_ERROR("suspend of IP block <%s>
>failed %d\n",
> adev->ip_blocks[i].version->funcs-
>>name, r);
>+  return r;
>   }
>+  adev->ip_blocks[i].status.hw = false;
>   }
>   }
>
>@@ -2176,14 +2184,16 @@ static int
>amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)
>   if (is_support_sw_smu(adev)) {
>   /* todo */
>   } else if (adev->powerplay.pp_funcs &&
>- adev->powerplay.pp_funcs->set_mp1_state)
>{
>+ adev->powerplay.pp_funcs-
>>set_mp1_state) 

[PATCH] drm/amdgpu: fix double ucode load by PSP(v3)

2019-07-31 Thread Monk Liu
previously the ucode loading of PSP was repreated, one executed in
phase_1 init/re-init/resume and the other in fw_loading routine

Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset
prior to the FW loading and any block's hw_init/resume

v2:
still do the smu fw loading since it is needed by bare-metal

v3:
drop the change in reinit_early_sriov, just clear all block's status.hw
in the head place and set the status.hw after hw_init done is enough

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 59 +++---
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6cb358c..30436ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1673,28 +1673,34 @@ static int amdgpu_device_fw_loading(struct 
amdgpu_device *adev)
 
if (adev->asic_type >= CHIP_VEGA10) {
for (i = 0; i < adev->num_ip_blocks; i++) {
-   if (adev->ip_blocks[i].version->type == 
AMD_IP_BLOCK_TYPE_PSP) {
-   if (adev->in_gpu_reset || adev->in_suspend) {
-   if (amdgpu_sriov_vf(adev) && 
adev->in_gpu_reset)
-   break; /* sriov gpu reset, psp 
need to do hw_init before IH because of hw limit */
-   r = 
adev->ip_blocks[i].version->funcs->resume(adev);
-   if (r) {
-   DRM_ERROR("resume of IP block 
<%s> failed %d\n",
+   if (adev->ip_blocks[i].version->type != 
AMD_IP_BLOCK_TYPE_PSP)
+   continue;
+
+   /* no need to do the fw loading again if already done*/
+   if (adev->ip_blocks[i].status.hw == true)
+   break;
+
+   if (adev->in_gpu_reset || adev->in_suspend) {
+   r = 
adev->ip_blocks[i].version->funcs->resume(adev);
+   if (r) {
+   DRM_ERROR("resume of IP block <%s> 
failed %d\n",
  
adev->ip_blocks[i].version->funcs->name, r);
-   return r;
-   }
-   } else {
-   r = 
adev->ip_blocks[i].version->funcs->hw_init(adev);
-   if (r) {
-   DRM_ERROR("hw_init of IP block 
<%s> failed %d\n",
- 
adev->ip_blocks[i].version->funcs->name, r);
-   return r;
-   }
+   return r;
+   }
+   } else {
+   r = 
adev->ip_blocks[i].version->funcs->hw_init(adev);
+   if (r) {
+   DRM_ERROR("hw_init of IP block <%s> 
failed %d\n",
+ 
adev->ip_blocks[i].version->funcs->name, r);
+   return r;
}
-   adev->ip_blocks[i].status.hw = true;
}
+
+   adev->ip_blocks[i].status.hw = true;
+   break;
}
}
+
r = amdgpu_pm_load_smu_firmware(adev, _version);
 
return r;
@@ -2136,7 +2142,9 @@ static int amdgpu_device_ip_suspend_phase1(struct 
amdgpu_device *adev)
if (r) {
DRM_ERROR("suspend of IP block <%s> failed 
%d\n",
  
adev->ip_blocks[i].version->funcs->name, r);
+   return r;
}
+   adev->ip_blocks[i].status.hw = false;
}
}
 
@@ -2176,14 +2184,16 @@ static int amdgpu_device_ip_suspend_phase2(struct 
amdgpu_device *adev)
if (is_support_sw_smu(adev)) {
/* todo */
} else if (adev->powerplay.pp_funcs &&
-  adev->powerplay.pp_funcs->set_mp1_state) {
+  
adev->powerplay.pp_funcs->set_mp1_state) {
r = adev->powerplay.pp_funcs->set_mp1_state(
adev->powerplay.pp_handle,
adev->mp1_state);
if (r) {
DRM_ERROR("SMC failed to set 

[PATCH] drm/amdgpu: fix double ucode load by PSP(v2)

2019-07-31 Thread Monk Liu
previously the ucode loading of PSP was repreated, one executed in
phase_1 init/re-init/resume and the other in fw_loading routine

Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset
prior to the FW loading and any block's hw_init/resume

v2:
still do the smu fw loading since it is needed by bare-metal

Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 77 +++---
 1 file changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6cb358c..38b14ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1673,28 +1673,34 @@ static int amdgpu_device_fw_loading(struct 
amdgpu_device *adev)
 
if (adev->asic_type >= CHIP_VEGA10) {
for (i = 0; i < adev->num_ip_blocks; i++) {
-   if (adev->ip_blocks[i].version->type == 
AMD_IP_BLOCK_TYPE_PSP) {
-   if (adev->in_gpu_reset || adev->in_suspend) {
-   if (amdgpu_sriov_vf(adev) && 
adev->in_gpu_reset)
-   break; /* sriov gpu reset, psp 
need to do hw_init before IH because of hw limit */
-   r = 
adev->ip_blocks[i].version->funcs->resume(adev);
-   if (r) {
-   DRM_ERROR("resume of IP block 
<%s> failed %d\n",
+   if (adev->ip_blocks[i].version->type != 
AMD_IP_BLOCK_TYPE_PSP)
+   continue;
+
+   /* no need to do the fw loading again if already done*/
+   if (adev->ip_blocks[i].status.hw == true)
+   break;
+
+   if (adev->in_gpu_reset || adev->in_suspend) {
+   r = 
adev->ip_blocks[i].version->funcs->resume(adev);
+   if (r) {
+   DRM_ERROR("resume of IP block <%s> 
failed %d\n",
  
adev->ip_blocks[i].version->funcs->name, r);
-   return r;
-   }
-   } else {
-   r = 
adev->ip_blocks[i].version->funcs->hw_init(adev);
-   if (r) {
-   DRM_ERROR("hw_init of IP block 
<%s> failed %d\n",
- 
adev->ip_blocks[i].version->funcs->name, r);
-   return r;
-   }
+   return r;
+   }
+   } else {
+   r = 
adev->ip_blocks[i].version->funcs->hw_init(adev);
+   if (r) {
+   DRM_ERROR("hw_init of IP block <%s> 
failed %d\n",
+ 
adev->ip_blocks[i].version->funcs->name, r);
+   return r;
}
-   adev->ip_blocks[i].status.hw = true;
}
+
+   adev->ip_blocks[i].status.hw = true;
+   break;
}
}
+
r = amdgpu_pm_load_smu_firmware(adev, _version);
 
return r;
@@ -2128,6 +2134,7 @@ static int amdgpu_device_ip_suspend_phase1(struct 
amdgpu_device *adev)
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
if (!adev->ip_blocks[i].status.valid)
continue;
+
/* displays are handled separately */
if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) {
/* XXX handle errors */
@@ -2136,7 +2143,9 @@ static int amdgpu_device_ip_suspend_phase1(struct 
amdgpu_device *adev)
if (r) {
DRM_ERROR("suspend of IP block <%s> failed 
%d\n",
  
adev->ip_blocks[i].version->funcs->name, r);
+   return r;
}
+   adev->ip_blocks[i].status.hw = false;
}
}
 
@@ -2176,14 +2185,16 @@ static int amdgpu_device_ip_suspend_phase2(struct 
amdgpu_device *adev)
if (is_support_sw_smu(adev)) {
/* todo */
} else if (adev->powerplay.pp_funcs &&
-  adev->powerplay.pp_funcs->set_mp1_state) {
+  
adev->powerplay.pp_funcs->set_mp1_state) {
r 

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-31 Thread Daniel Vetter
On Wed, Jul 31, 2019 at 10:25:15AM +0200, Christian König wrote:
> Am 31.07.19 um 10:05 schrieb Daniel Vetter:
> > [SNIP]
> > > > Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
> > > > was.  The discussion helped clear up several bits of confusion on my 
> > > > part.
> > > >   From proposed names, I find MAPPED and PINNED slightly confusing.
> > > > In terms of backing store description, maybe these are a little better:
> > > > DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
> > > > DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)
> > > That's still not correct. Let me describe what each of the tree stands 
> > > for:
> > > 
> > > 1. The backing store is a shmem file so the individual pages are
> > > swapable by the core OS.
> > > 2. The backing store is allocate GPU accessible but not currently in use
> > > by the GPU.
> > > 3. The backing store is currently in use by the GPU.
> > > 
> > > For i915 all three of those are basically the same and you only need to
> > > worry about it much.
> > We do pretty much have these three states for i915 gem bo too. Of
> > course none have a reasonable upper limit since it's all shared
> > memory. The hard limit would be system memory + swap for 1, and only
> > system memory for 2 and 3.
> 
> Good to know.
> 
> > > But for other drivers that's certainly not true and we need this
> > > distinction of the backing store of an object.
> > > 
> > > I'm just not sure how we would handle that for cgroups. From experience
> > > we certainly want a limit over all 3, but you usually also want to limit
> > > 3 alone.
> > To avoid lolz against the shrinker I think you also want to limit 2+3.
> > Afaiui ttm does that with the global limit, to avoid driving the
> > system against the wall.
> 
> Yes, exactly. But I think you only need that when 2+3 are not backed by
> pinning shmem. E.g. for i915 I'm not sure you want this limitation.

Maybe I need to share how bad exactly the i915 driver is fighting its own
shrinker at the next conference, over some good drinks ... Just becaue we
use shmem directly doesn't make this easier really at all, we're still
pinning memory that the core mm can't evict anymore.

> > [SNIP]
> > > #1 and #2 in my example above should probably not be configured by the
> > > driver itself.
> > > 
> > > And yes seeing those as special for state handling sounds like the
> > > correct approach to me.
> > Do we have any hw that wants custom versions of 3?
> 
> I can't think of any. If a driver needs something special for 3 then that
> should be domain VRAM or domain PRIV.
> 
> As far as I can see with the proposed separation we can even handle AGP.
> 
> > The only hw designs
> > I know of either have one shared translation table (but only one per
> > device, so having just 1 domain is good enough). Or TT mappings are in
> > the per-process pagetables, and then you're defacto unlimited (and
> > again one domain is good enough). So roughly:
> > 
> > - 1&2 global accross all drivers. 1 and 2 are disjoint (i.e. a bo is
> > only account to one of them, never both).
> > - 3 would be a subgroup of 2, and per device. A bo in group 3 is also
> > always in group 2.
> 
> Yes, that sounds like a good description certainly like the right why to see
> it.
> 
> > For VRAM and VRAM-similar things (like stolen system memory, or if you
> > have VRAM that's somehow split up like with a dual gpu perhaps) I
> > agree the driver needs to register that. And we just have some
> > standard flags indicating that "this is kinda like VRAM".
> 
> Yeah, agree totally as well.

Cheers, Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > > > > > TTM was clearly missing that resulting in a whole bunch of extra
> > > > > > > > handling and rather complicated handling.
> > > > > > > > 
> > > > > > > > > +#define DRM_MEM_SYSTEM 0
> > > > > > > > > +#define DRM_MEM_STOLEN 1
> > > > > > > > I think we need a better naming for that.
> > > > > > > > 
> > > > > > > > STOLEN sounds way to much like stolen VRAM for integrated GPUs, 
> > > > > > > > but at
> > > > > > > > least for TTM this is the system memory currently GPU 
> > > > > > > > accessible.
> > > > > > > Yup this is wrong, for i915 we use this as stolen, for ttm it's 
> > > > > > > the gpu
> > > > > > > translation table window into system memory. Not the same thing 
> > > > > > > at all.
> > > > > > Thought so. The closest I have in mind is GTT, but everything else 
> > > > > > works
> > > > > > as well.
> > > > > Would your GPU_MAPPED above work for TT? I think we'll also need
> > > > > STOLEN, I'm even hearing noises that there's going to be stolen for
> > > > > discrete vram for us ... Also if we expand I guess we need to teach
> > > > > ttm to cope with more, or maybe treat the DRM one as some kind of
> > > > > sub-flavour.
> > > > Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
> > > > DRM_MEM_PRIV.  Or maybe can argue it falls into 

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-31 Thread Christian König

Am 31.07.19 um 10:05 schrieb Daniel Vetter:

[SNIP]

Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
was.  The discussion helped clear up several bits of confusion on my part.
  From proposed names, I find MAPPED and PINNED slightly confusing.
In terms of backing store description, maybe these are a little better:
DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)

That's still not correct. Let me describe what each of the tree stands for:

1. The backing store is a shmem file so the individual pages are
swapable by the core OS.
2. The backing store is allocate GPU accessible but not currently in use
by the GPU.
3. The backing store is currently in use by the GPU.

For i915 all three of those are basically the same and you only need to
worry about it much.

We do pretty much have these three states for i915 gem bo too. Of
course none have a reasonable upper limit since it's all shared
memory. The hard limit would be system memory + swap for 1, and only
system memory for 2 and 3.


Good to know.


But for other drivers that's certainly not true and we need this
distinction of the backing store of an object.

I'm just not sure how we would handle that for cgroups. From experience
we certainly want a limit over all 3, but you usually also want to limit
3 alone.

To avoid lolz against the shrinker I think you also want to limit 2+3.
Afaiui ttm does that with the global limit, to avoid driving the
system against the wall.


Yes, exactly. But I think you only need that when 2+3 are not backed by 
pinning shmem. E.g. for i915 I'm not sure you want this limitation.



[SNIP]

#1 and #2 in my example above should probably not be configured by the
driver itself.

And yes seeing those as special for state handling sounds like the
correct approach to me.

Do we have any hw that wants custom versions of 3?


I can't think of any. If a driver needs something special for 3 then 
that should be domain VRAM or domain PRIV.


As far as I can see with the proposed separation we can even handle AGP.


The only hw designs
I know of either have one shared translation table (but only one per
device, so having just 1 domain is good enough). Or TT mappings are in
the per-process pagetables, and then you're defacto unlimited (and
again one domain is good enough). So roughly:

- 1&2 global accross all drivers. 1 and 2 are disjoint (i.e. a bo is
only account to one of them, never both).
- 3 would be a subgroup of 2, and per device. A bo in group 3 is also
always in group 2.


Yes, that sounds like a good description certainly like the right why to 
see it.



For VRAM and VRAM-similar things (like stolen system memory, or if you
have VRAM that's somehow split up like with a dual gpu perhaps) I
agree the driver needs to register that. And we just have some
standard flags indicating that "this is kinda like VRAM".


Yeah, agree totally as well.

Christian.


-Daniel


Regards,
Christian.


TTM was clearly missing that resulting in a whole bunch of extra
handling and rather complicated handling.


+#define DRM_MEM_SYSTEM 0
+#define DRM_MEM_STOLEN 1

I think we need a better naming for that.

STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
least for TTM this is the system memory currently GPU accessible.

Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
translation table window into system memory. Not the same thing at all.

Thought so. The closest I have in mind is GTT, but everything else works
as well.

Would your GPU_MAPPED above work for TT? I think we'll also need
STOLEN, I'm even hearing noises that there's going to be stolen for
discrete vram for us ... Also if we expand I guess we need to teach
ttm to cope with more, or maybe treat the DRM one as some kind of
sub-flavour.

Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
DRM_MEM_PRIV.  Or maybe can argue it falls into UNTRANSLATED type that
I suggested above, I'm not sure.

-Brian



-Daniel


Christian.


-Daniel


Thanks for looking into that,
Christian.

Am 30.07.19 um 02:32 schrieb Brian Welty:

[ By request, resending to include amd-gfx + intel-gfx.  Since resending,
  I fixed the nit with ordering of header includes that Sam noted. ]

This RFC series is first implementation of some ideas expressed
earlier on dri-devel [1].

Some of the goals (open for much debate) are:
  - Create common base structure (subclass) for memory regions (patch #1)
  - Create common memory region types (patch #2)
  - Create common set of memory_region function callbacks (based on
ttm_mem_type_manager_funcs and intel_memory_regions_ops)
  - Create common helpers that operate on drm_mem_region to be leveraged
by both TTM drivers and i915, reducing code duplication
  - Above might start with refactoring ttm_bo_manager.c as these are
helpers for using drm_mm's range allocator and could be made to
   

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-31 Thread Daniel Vetter
On Wed, Jul 31, 2019 at 8:54 AM Koenig, Christian
 wrote:
>
> Am 31.07.19 um 02:51 schrieb Brian Welty:
> [SNIP]
> >> +/*
> >> + * Memory types for drm_mem_region
> >> + */
> > #define DRM_MEM_SWAP?
>  btw what did you have in mind for this? Since we use shmem we kinda don't
>  know whether the BO is actually swapped out or not, at least on the i915
>  side. So this would be more 
>  NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
> >>> Yeah, the problem is not everybody can use shmem. For some use cases you
> >>> have to use memory allocated through dma_alloc_coherent().
> >>>
> >>> So to be able to swap this out you need a separate domain to copy it
> >>> from whatever is backing it currently to shmem.
> >>>
> >>> So we essentially have:
> >>> DRM_MEM_SYS_SWAPABLE
> >>> DRM_MEM_SYS_NOT_GPU_MAPPED
> >>> DRM_MEM_SYS_GPU_MAPPED
> >>>
> >>> Or something like that.
> >> Yeah i915-gem is similar. We oportunistically keep the pages pinned
> >> sometimes even if not currently mapped into the (what ttm calls) TT.
> >> So I think these three for system memory make sense for us too. I
> >> think that's similar (at least in spirit) to the dma_alloc cache you
> >> have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
> >> we could have something like PINNED or so. Although it's not
> >> permanently pinned, so maybe that's confusing too.
> >>
> > Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
> > was.  The discussion helped clear up several bits of confusion on my part.
> >  From proposed names, I find MAPPED and PINNED slightly confusing.
> > In terms of backing store description, maybe these are a little better:
> >DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
> >DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)
>
> That's still not correct. Let me describe what each of the tree stands for:
>
> 1. The backing store is a shmem file so the individual pages are
> swapable by the core OS.
> 2. The backing store is allocate GPU accessible but not currently in use
> by the GPU.
> 3. The backing store is currently in use by the GPU.
>
> For i915 all three of those are basically the same and you only need to
> worry about it much.

We do pretty much have these three states for i915 gem bo too. Of
course none have a reasonable upper limit since it's all shared
memory. The hard limit would be system memory + swap for 1, and only
system memory for 2 and 3.

> But for other drivers that's certainly not true and we need this
> distinction of the backing store of an object.
>
> I'm just not sure how we would handle that for cgroups. From experience
> we certainly want a limit over all 3, but you usually also want to limit
> 3 alone.

To avoid lolz against the shrinker I think you also want to limit 2+3.
Afaiui ttm does that with the global limit, to avoid driving the
system against the wall.

> And you also want to limit the amount of bytes moved between those
> states because each state transition might have a bandwidth cost
> associated with it.
>
> > Are these allowed to be both overlapping? Or non-overlapping (partitioned)?
> > Per Christian's point about removing .start, seems it doesn't need to
> > matter.
>
> You should probably completely drop the idea of this being regions.
>
> And we should also rename them to something like drm_mem_domains to make
> that clear.

+1 on domains. Some of these domains might be physically contiguous
regions, but some clearly arent.

> > Whatever we define for these sub-types, does it make sense for SYSTEM and
> > VRAM to each have them defined?
>
> No, absolutely not. VRAM as well as other private memory types are
> completely driver specific.
>
> > I'm unclear how DRM_MEM_SWAP (or DRM_MEM_SYS_SWAPABLE) would get
> > configured by driver...  this is a fixed size partition of host memory?
> > Or it is a kind of dummy memory region just for swap implementation?
>
> #1 and #2 in my example above should probably not be configured by the
> driver itself.
>
> And yes seeing those as special for state handling sounds like the
> correct approach to me.

Do we have any hw that wants custom versions of 3? The only hw designs
I know of either have one shared translation table (but only one per
device, so having just 1 domain is good enough). Or TT mappings are in
the per-process pagetables, and then you're defacto unlimited (and
again one domain is good enough). So roughly:

- 1&2 global accross all drivers. 1 and 2 are disjoint (i.e. a bo is
only account to one of them, never both).
- 3 would be a subgroup of 2, and per device. A bo in group 3 is also
always in group 2.

For VRAM and VRAM-similar things (like stolen system memory, or if you
have VRAM that's somehow split up like with a dual gpu perhaps) I
agree the driver needs to register that. And we just have some
standard flags indicating that "this is kinda like VRAM".
-Daniel

>
> Regards,
> Christian.
>
> > TTM was clearly missing that resulting in a 

Re: Review required [Was: Associate ddc adapters with connectors]

2019-07-31 Thread Neil Armstrong
Hi Sam,

On 26/07/2019 20:55, Sam Ravnborg wrote:
> Hi all.
> 
> Andrzej have done a good job following up on feedback and this series is
> now ready.
> 
> We need ack on the patches touching the individual drivers before we can
> proceed.
> Please check your drivers and get back.

I can apply all core and maintainer-acked patches for now :
1, 2, 7, 10, 11, 16, 17, 18, 19, 20, 21, 22, 23

and Andrzej can resend not applied patches with Yours and Emil's Reviewed-by,
so we can wait a few more days to apply them.

Neil

> 
>   Sam
> 
>> Hi Andezej.
>>
>> On Fri, Jul 26, 2019 at 07:22:54PM +0200, Andrzej Pietrasiewicz wrote:
>>> It is difficult for a user to know which of the i2c adapters is for which
>>> drm connector. This series addresses this problem.
>>>
>>> The idea is to have a symbolic link in connector's sysfs directory, e.g.:
>>>
>>> ls -l /sys/class/drm/card0-HDMI-A-1/ddc
>>> lrwxrwxrwx 1 root root 0 Jun 24 10:42 /sys/class/drm/card0-HDMI-A-1/ddc \
>>> -> ../../../../soc/1388.i2c/i2c-2
>>>
>>> The user then knows that their card0-HDMI-A-1 uses i2c-2 and can e.g. run
>>> ddcutil:
>>>
>>> ddcutil -b 2 getvcp 0x10
>>> VCP code 0x10 (Brightness): current value =90, max value =   100
>>>
>>> The first patch in the series adds struct i2c_adapter pointer to struct
>>> drm_connector. If the field is used by a particular driver, then an
>>> appropriate symbolic link is created by the generic code, which is also 
>>> added
>>> by this patch.
>>>
>>> Patch 2 adds a new variant of drm_connector_init(), see the changelog
>>> below.
>>>
>>> Patches 3..24 are examples of how to convert a driver to this new scheme.
>>>
>> ...
>>>
>>> v5..v6:
>>>
>>> - improved subject line of patch 1
>>> - added kernel-doc for drm_connector_init_with_ddc()
>>> - improved kernel-doc for the ddc field of struct drm_connector
>>> - added Reviewed-by in patches 17 and 18
>>> - added Acked-by in patch 2
>>> - made the ownership of ddc i2c_adapter explicit in all patches,
>>> this made the affected patches much simpler
>>
>> Looks good now.
>> Patch 1 and 2 are:
>> Reviewed-by: Sam Ravnborg 
>>
>> The remaining patches are:
>> Acked-by: Sam Ravnborg 
>>
>>  Sam
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay: sort feature status index by asic feature id for smu

2019-07-31 Thread Wang, Kevin(Yang)
before this change, the pp_feature sysfs show feature enable state by
logic feature id, it is not easy to read.
this change will sort pp_features show index by asic feature id.

before:
features high: 0x0623 low: 0xb3cdaffb
00. DPM_PREFETCHER   ( 0) : enabeld
01. DPM_GFXCLK   ( 1) : enabeld
02. DPM_UCLK ( 3) : enabeld
03. DPM_SOCCLK   ( 4) : enabeld
04. DPM_MP0CLK   ( 5) : enabeld
05. DPM_LINK ( 6) : enabeld
06. DPM_DCEFCLK  ( 7) : enabeld
07. DS_GFXCLK(10) : enabeld
08. DS_SOCCLK(11) : enabeld
09. DS_LCLK  (12) : disabled
10. PPT  (23) : enabeld
11. TDC  (24) : enabeld
12. THERMAL  (33) : enabeld
13. RM   (35) : disabled
..

after:
features high: 0x0623 low: 0xb3cdaffb
00. DPM_PREFETCHER   ( 0) : enabeld
01. DPM_GFXCLK   ( 1) : enabeld
02. DPM_GFX_PACE ( 2) : disabled
03. DPM_UCLK ( 3) : enabeld
04. DPM_SOCCLK   ( 4) : enabeld
05. DPM_MP0CLK   ( 5) : enabeld
06. DPM_LINK ( 6) : enabeld
07. DPM_DCEFCLK  ( 7) : enabeld
08. MEM_VDDCI_SCALING( 8) : enabeld
09. MEM_MVDD_SCALING ( 9) : enabeld
10. DS_GFXCLK(10) : enabeld
11. DS_SOCCLK(11) : enabeld
12. DS_LCLK  (12) : disabled
13. DS_DCEFCLK   (13) : enabeld
..

Signed-off-by: Kevin Wang 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index eabe8a6d0eb7..9e256aa3b357 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -62,6 +62,8 @@ size_t smu_sys_get_pp_feature_mask(struct smu_context *smu, 
char *buf)
uint32_t feature_mask[2] = { 0 };
int32_t feature_index = 0;
uint32_t count = 0;
+   uint32_t sort_feature[SMU_FEATURE_COUNT];
+   uint64_t hw_feature_count = 0;
 
ret = smu_feature_get_enabled_mask(smu, feature_mask, 2);
if (ret)
@@ -74,11 +76,17 @@ size_t smu_sys_get_pp_feature_mask(struct smu_context *smu, 
char *buf)
feature_index = smu_feature_get_index(smu, i);
if (feature_index < 0)
continue;
+   sort_feature[feature_index] = i;
+   hw_feature_count++;
+   }
+
+   for (i = 0; i < hw_feature_count; i++) {
size += sprintf(buf + size, "%02d. %-20s (%2d) : %s\n",
   count++,
-  smu_get_feature_name(smu, i),
-  feature_index,
-  !!smu_feature_is_enabled(smu, i) ? "enabeld" : 
"disabled");
+  smu_get_feature_name(smu, sort_feature[i]),
+  i,
+  !!smu_feature_is_enabled(smu, sort_feature[i]) ?
+  "enabeld" : "disabled");
}
 
 failed:
-- 
2.22.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v4 19/23] drm/bridge: dw-hdmi: Provide ddc symlink in connector sysfs directory

2019-07-31 Thread Neil Armstrong
Hi,

On 30/07/2019 19:30, Sam Ravnborg wrote:
> Hi Neil.
> 
>>> Signed-off-by: Andrzej Pietrasiewicz 
>>> ---
>>>  drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 40 +++
>>>  1 file changed, 20 insertions(+), 20 deletions(-)
>>>
> ...
>>
>> Reviewed-by: Neil Armstrong 
> 
> There is now a much simpler v6 of this patch.
> Care to take a look and ack/r-b?

I saw it too late, I reviewed the bridge patches, now
I'll have a look at the whole patchset.

Neil

> 
>   Sam
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-31 Thread Koenig, Christian
Am 31.07.19 um 02:51 schrieb Brian Welty:
[SNIP]
>> +/*
>> + * Memory types for drm_mem_region
>> + */
> #define DRM_MEM_SWAP?
 btw what did you have in mind for this? Since we use shmem we kinda don't
 know whether the BO is actually swapped out or not, at least on the i915
 side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
>>> Yeah, the problem is not everybody can use shmem. For some use cases you
>>> have to use memory allocated through dma_alloc_coherent().
>>>
>>> So to be able to swap this out you need a separate domain to copy it
>>> from whatever is backing it currently to shmem.
>>>
>>> So we essentially have:
>>> DRM_MEM_SYS_SWAPABLE
>>> DRM_MEM_SYS_NOT_GPU_MAPPED
>>> DRM_MEM_SYS_GPU_MAPPED
>>>
>>> Or something like that.
>> Yeah i915-gem is similar. We oportunistically keep the pages pinned
>> sometimes even if not currently mapped into the (what ttm calls) TT.
>> So I think these three for system memory make sense for us too. I
>> think that's similar (at least in spirit) to the dma_alloc cache you
>> have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
>> we could have something like PINNED or so. Although it's not
>> permanently pinned, so maybe that's confusing too.
>>
> Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
> was.  The discussion helped clear up several bits of confusion on my part.
>  From proposed names, I find MAPPED and PINNED slightly confusing.
> In terms of backing store description, maybe these are a little better:
>DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
>DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)

That's still not correct. Let me describe what each of the tree stands for:

1. The backing store is a shmem file so the individual pages are 
swapable by the core OS.
2. The backing store is allocate GPU accessible but not currently in use 
by the GPU.
3. The backing store is currently in use by the GPU.

For i915 all three of those are basically the same and you only need to 
worry about it much.

But for other drivers that's certainly not true and we need this 
distinction of the backing store of an object.

I'm just not sure how we would handle that for cgroups. From experience 
we certainly want a limit over all 3, but you usually also want to limit 
3 alone.

And you also want to limit the amount of bytes moved between those 
states because each state transition might have a bandwidth cost 
associated with it.

> Are these allowed to be both overlapping? Or non-overlapping (partitioned)?
> Per Christian's point about removing .start, seems it doesn't need to
> matter.

You should probably completely drop the idea of this being regions.

And we should also rename them to something like drm_mem_domains to make 
that clear.

> Whatever we define for these sub-types, does it make sense for SYSTEM and
> VRAM to each have them defined?

No, absolutely not. VRAM as well as other private memory types are 
completely driver specific.

> I'm unclear how DRM_MEM_SWAP (or DRM_MEM_SYS_SWAPABLE) would get
> configured by driver...  this is a fixed size partition of host memory?
> Or it is a kind of dummy memory region just for swap implementation?

#1 and #2 in my example above should probably not be configured by the 
driver itself.

And yes seeing those as special for state handling sounds like the 
correct approach to me.

Regards,
Christian.

> TTM was clearly missing that resulting in a whole bunch of extra
> handling and rather complicated handling.
>
>> +#define DRM_MEM_SYSTEM 0
>> +#define DRM_MEM_STOLEN 1
> I think we need a better naming for that.
>
> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
> least for TTM this is the system memory currently GPU accessible.
 Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
 translation table window into system memory. Not the same thing at all.
>>> Thought so. The closest I have in mind is GTT, but everything else works
>>> as well.
>> Would your GPU_MAPPED above work for TT? I think we'll also need
>> STOLEN, I'm even hearing noises that there's going to be stolen for
>> discrete vram for us ... Also if we expand I guess we need to teach
>> ttm to cope with more, or maybe treat the DRM one as some kind of
>> sub-flavour.
> Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
> DRM_MEM_PRIV.  Or maybe can argue it falls into UNTRANSLATED type that
> I suggested above, I'm not sure.
>
> -Brian
>
>
>> -Daniel
>>
>>> Christian.
>>>
 -Daniel

> Thanks for looking into that,
> Christian.
>
> Am 30.07.19 um 02:32 schrieb Brian Welty:
>> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>>  I fixed the nit with ordering of header includes that Sam noted. ]
>>
>> This RFC series is first implementation of some ideas expressed
>> earlier