Re: [PATCH] drm/amdgpu/pm: fix powerplay OD interface

2021-11-23 Thread Christian König




Am 23.11.21 um 21:01 schrieb Alex Deucher:

The overclocking interface currently appends data to a
string.  Revert back to using sprintf().

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1774
Fixes: 6db0c87a0a8ee1 ("amdgpu/pm: Replace hwmgr smu usage of sprintf with 
sysfs_emit")
Signed-off-by: Alex Deucher 


Acked-by: Christian König 


---
  .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  | 20 +++
  .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c   | 24 
  .../drm/amd/pm/powerplay/hwmgr/smu8_hwmgr.c   |  6 +-
  .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 28 +
  .../drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c | 10 ++--
  .../drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c | 58 +--
  6 files changed, 67 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
index 258c573acc97..1f406f21b452 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
@@ -1024,8 +1024,6 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
uint32_t min_freq, max_freq = 0;
uint32_t ret = 0;
  
-	phm_get_sysfs_buf(&buf, &size);

-
switch (type) {
case PP_SCLK:
smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetGfxclkFrequency, &now);
@@ -1038,13 +1036,13 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
else
i = 1;
  
-		size += sysfs_emit_at(buf, size, "0: %uMhz %s\n",

+   size += sprintf(buf + size, "0: %uMhz %s\n",
data->gfx_min_freq_limit/100,
i == 0 ? "*" : "");
-   size += sysfs_emit_at(buf, size, "1: %uMhz %s\n",
+   size += sprintf(buf + size, "1: %uMhz %s\n",
i == 1 ? now : SMU10_UMD_PSTATE_GFXCLK,
i == 1 ? "*" : "");
-   size += sysfs_emit_at(buf, size, "2: %uMhz %s\n",
+   size += sprintf(buf + size, "2: %uMhz %s\n",
data->gfx_max_freq_limit/100,
i == 2 ? "*" : "");
break;
@@ -1052,7 +1050,7 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetFclkFrequency, &now);
  
  		for (i = 0; i < mclk_table->count; i++)

-   size += sysfs_emit_at(buf, size, "%d: %uMhz %s\n",
+   size += sprintf(buf + size, "%d: %uMhz %s\n",
i,
mclk_table->entries[i].clk / 100,
((mclk_table->entries[i].clk / 100)
@@ -1067,10 +1065,10 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
if (ret)
return ret;
  
-			size += sysfs_emit_at(buf, size, "%s:\n", "OD_SCLK");

-   size += sysfs_emit_at(buf, size, "0: %10uMhz\n",
+   size += sprintf(buf + size, "%s:\n", "OD_SCLK");
+   size += sprintf(buf + size, "0: %10uMhz\n",
(data->gfx_actual_soft_min_freq > 0) ? 
data->gfx_actual_soft_min_freq : min_freq);
-   size += sysfs_emit_at(buf, size, "1: %10uMhz\n",
+   size += sprintf(buf + size, "1: %10uMhz\n",
(data->gfx_actual_soft_max_freq > 0) ? 
data->gfx_actual_soft_max_freq : max_freq);
}
break;
@@ -1083,8 +1081,8 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
if (ret)
return ret;
  
-			size += sysfs_emit_at(buf, size, "%s:\n", "OD_RANGE");

-   size += sysfs_emit_at(buf, size, "SCLK: %7uMHz 
%10uMHz\n",
+   size += sprintf(buf + size, "%s:\n", "OD_RANGE");
+   size += sprintf(buf + size, "SCLK: %7uMHz %10uMHz\n",
min_freq, max_freq);
}
break;
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
index aceebf584225..611969bf4520 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
@@ -4914,8 +4914,6 @@ static int smu7_print_clock_levels(struct pp_hwmgr *hwmgr,
int size = 0;
uint32_t i, now, clock, pcie_speed;
  
-	phm_get_sysfs_buf(&buf, &size);

-
switch (type) {
case PP_SCLK:
smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetSclkFrequency, 
&clock);
@@ -4928,7 +4926,7 @@ static int smu7_print_clock_levels(struct pp_hwmgr *hwmgr,
now = i;
  
  		for (i = 0; i < sclk_table->count; i++)

-

Re: [PATCH 1/2] drm/amdgpu: enable Navi 48-bit IH timestamp counter

2021-11-23 Thread Christian König

Am 23.11.21 um 17:03 schrieb Philip Yang:

By default this timestamp is 32 bit counter. It gets overflowed in
around 10 minutes.

Signed-off-by: Philip Yang 


Reviewed-by: Christian König  for the series.


---
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index 1d8414c3fadb..dafad6030947 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -160,6 +160,7 @@ static int navi10_ih_toggle_ring_interrupts(struct 
amdgpu_device *adev,
  
  	tmp = RREG32(ih_regs->ih_rb_cntl);

tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_ENABLE, (enable ? 1 : 0));
+   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_GPU_TS_ENABLE, 1);
/* enable_intr field is only valid in ring0 */
if (ih == &adev->irq.ih)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, ENABLE_INTR, (enable ? 1 : 
0));




RE: [PATCH] drm/amdgpu: fix byteorder error in amdgpu discovery

2021-11-23 Thread Chen, Guchun
[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Yang Wang
Sent: Wednesday, November 24, 2021 12:37 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Lazar, Lijo 
; Wang, Yang(Kevin) ; Zhang, 
Hawking 
Subject: [PATCH] drm/amdgpu: fix byteorder error in amdgpu discovery

fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)

Signed-off-by: Yang Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..503995c7ff6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -248,8 +248,8 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
 
offset = offsetof(struct binary_header, binary_checksum) +
sizeof(bhdr->binary_checksum);
-   size = bhdr->binary_size - offset;
-   checksum = bhdr->binary_checksum;
+   size = le16_to_cpu(bhdr->binary_size) - offset;
+   checksum = le16_to_cpu(bhdr->binary_checksum);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
  size, checksum)) {
@@ -270,7 +270,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
}
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ihdr->size, checksum)) {
+ le16_to_cpu(ihdr->size), 
checksum)) {
DRM_ERROR("invalid ip discovery data table checksum\n");
r = -EINVAL;
goto out;
@@ -282,7 +282,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
ghdr = (struct gpu_info_header *)(adev->mman.discovery_bin + offset);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ghdr->size, checksum)) {
+ le32_to_cpu(ghdr->size), 
checksum)) {
DRM_ERROR("invalid gc data table checksum\n");
r = -EINVAL;
goto out;
@@ -489,10 +489,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
le16_to_cpu(bhdr->table_list[HARVEST_INFO].offset));
 
for (i = 0; i < 32; i++) {
-   if (le32_to_cpu(harvest_info->list[i].hw_id) == 0)
+   if (le16_to_cpu(harvest_info->list[i].hw_id) == 0)
break;
 
-   switch (le32_to_cpu(harvest_info->list[i].hw_id)) {
+   switch (le16_to_cpu(harvest_info->list[i].hw_id)) {
case VCN_HWID:
vcn_harvest_count++;
if (harvest_info->list[i].number_instance == 0)
-- 
2.25.1


Re: [PATCH] drm/amdgpu: fix byteorder error in amdgpu discovery

2021-11-23 Thread Lazar, Lijo




On 11/24/2021 10:07 AM, Yang Wang wrote:

fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)

Signed-off-by: Yang Wang 


Reviewed-by: Lijo Lazar 

Thanks,
Lijo


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..503995c7ff6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -248,8 +248,8 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
  
  	offset = offsetof(struct binary_header, binary_checksum) +

sizeof(bhdr->binary_checksum);
-   size = bhdr->binary_size - offset;
-   checksum = bhdr->binary_checksum;
+   size = le16_to_cpu(bhdr->binary_size) - offset;
+   checksum = le16_to_cpu(bhdr->binary_checksum);
  
  	if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,

  size, checksum)) {
@@ -270,7 +270,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
}
  
  	if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,

- ihdr->size, checksum)) {
+ le16_to_cpu(ihdr->size), 
checksum)) {
DRM_ERROR("invalid ip discovery data table checksum\n");
r = -EINVAL;
goto out;
@@ -282,7 +282,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
ghdr = (struct gpu_info_header *)(adev->mman.discovery_bin + offset);
  
  	if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,

- ghdr->size, checksum)) {
+ le32_to_cpu(ghdr->size), 
checksum)) {
DRM_ERROR("invalid gc data table checksum\n");
r = -EINVAL;
goto out;
@@ -489,10 +489,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
le16_to_cpu(bhdr->table_list[HARVEST_INFO].offset));
  
  	for (i = 0; i < 32; i++) {

-   if (le32_to_cpu(harvest_info->list[i].hw_id) == 0)
+   if (le16_to_cpu(harvest_info->list[i].hw_id) == 0)
break;
  
-		switch (le32_to_cpu(harvest_info->list[i].hw_id)) {

+   switch (le16_to_cpu(harvest_info->list[i].hw_id)) {
case VCN_HWID:
vcn_harvest_count++;
if (harvest_info->list[i].number_instance == 0)



[PATCH] drm/amdgpu: fix byteorder error in amdgpu discovery

2021-11-23 Thread Yang Wang
fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)

Signed-off-by: Yang Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..503995c7ff6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -248,8 +248,8 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
 
offset = offsetof(struct binary_header, binary_checksum) +
sizeof(bhdr->binary_checksum);
-   size = bhdr->binary_size - offset;
-   checksum = bhdr->binary_checksum;
+   size = le16_to_cpu(bhdr->binary_size) - offset;
+   checksum = le16_to_cpu(bhdr->binary_checksum);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
  size, checksum)) {
@@ -270,7 +270,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
}
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ihdr->size, checksum)) {
+ le16_to_cpu(ihdr->size), 
checksum)) {
DRM_ERROR("invalid ip discovery data table checksum\n");
r = -EINVAL;
goto out;
@@ -282,7 +282,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
ghdr = (struct gpu_info_header *)(adev->mman.discovery_bin + offset);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ghdr->size, checksum)) {
+ le32_to_cpu(ghdr->size), 
checksum)) {
DRM_ERROR("invalid gc data table checksum\n");
r = -EINVAL;
goto out;
@@ -489,10 +489,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
le16_to_cpu(bhdr->table_list[HARVEST_INFO].offset));
 
for (i = 0; i < 32; i++) {
-   if (le32_to_cpu(harvest_info->list[i].hw_id) == 0)
+   if (le16_to_cpu(harvest_info->list[i].hw_id) == 0)
break;
 
-   switch (le32_to_cpu(harvest_info->list[i].hw_id)) {
+   switch (le16_to_cpu(harvest_info->list[i].hw_id)) {
case VCN_HWID:
vcn_harvest_count++;
if (harvest_info->list[i].number_instance == 0)
-- 
2.25.1



RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-23 Thread Chen, Guchun
[Public]

Hi Jane/Alex,

Adding a check of new IP in this case looks good to me.

Regards,
Guchun

From: Jian, Jane 
Sent: Wednesday, November 24, 2021 10:54 AM
To: Deucher, Alexander ; Chen, Guchun 
; Chen, JingWen 
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID


[Public]

Hi Guchun,

Per Alex's suggestion, we would better add a check for new vcn0 IP version, 
which is a version only owned by sriov and a way that I originally did, how do 
you think?

Thanks,
Jane

From: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Sent: Tuesday, November 23, 2021 11:03 PM
To: Jian, Jane mailto:jane.j...@amd.com>>; 
amd-gfx@lists.freedesktop.org; Chen, 
Guchun mailto:guchun.c...@amd.com>>; Chen, JingWen 
mailto:jingwen.ch...@amd.com>>
Subject: Re: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID


[Public]

Can we just add a check for the new IP version in that case?  This looks really 
hacky.

Alex


From: Jane Jian mailto:jane.j...@amd.com>>
Sent: Tuesday, November 23, 2021 6:34 AM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>; Chen, 
Guchun mailto:guchun.c...@amd.com>>; Chen, JingWen 
mailto:jingwen.ch...@amd.com>>
Cc: Jian, Jane mailto:jane.j...@amd.com>>
Subject: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init 
for SIENNA_CICHLID

[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia 
bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
skip ip revision match case and continue use asic type to check

Signed-off-by: Jane Jian mailto:jane.j...@amd.com>>
Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 6 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..0a91e53f520c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1334,7 +1334,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 return r;
 }

-   r = amdgpu_discovery_set_mm_ip_blocks(adev);
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev))
+   r = amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
+   else
+   r = amdgpu_discovery_set_mm_ip_blocks(adev);
 if (r)
 return r;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 4f7c70845785..87f56b61be53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -86,6 +86,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 for (i = 0; i < adev->vcn.num_vcn_inst; i++)
 atomic_set(&adev->vcn.inst[i].dpg_enc_submission_cnt, 0);

+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev)) {
+   fw_name = FIRMWARE_SIENNA_CICHLID;
+   goto next;
+   }
 switch (adev->ip_versions[UVD_HWIP][0]) {
 case IP_VERSION(1, 0, 0):
 case IP_VERSION(1, 0, 1):
@@ -168,6 +172,8 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 return -EINVAL;
 }

+next:
+
 r = request_firmware(&adev->vcn.fw, fw_name, adev->dev);
 if (r) {
 dev_err(adev->dev, "amdgpu_vcn: Can't load firmware \"%s\"\n",
--
2.17.1


RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-23 Thread Jian, Jane
[Public]

Hi Guchun,

Per Alex's suggestion, we would better add a check for new vcn0 IP version, 
which is a version only owned by sriov and a way that I originally did, how do 
you think?

Thanks,
Jane

From: Deucher, Alexander 
Sent: Tuesday, November 23, 2021 11:03 PM
To: Jian, Jane ; amd-gfx@lists.freedesktop.org; Chen, Guchun 
; Chen, JingWen 
Subject: Re: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID


[Public]

Can we just add a check for the new IP version in that case?  This looks really 
hacky.

Alex


From: Jane Jian mailto:jane.j...@amd.com>>
Sent: Tuesday, November 23, 2021 6:34 AM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>; Chen, 
Guchun mailto:guchun.c...@amd.com>>; Chen, JingWen 
mailto:jingwen.ch...@amd.com>>
Cc: Jian, Jane mailto:jane.j...@amd.com>>
Subject: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init 
for SIENNA_CICHLID

[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia 
bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
skip ip revision match case and continue use asic type to check

Signed-off-by: Jane Jian mailto:jane.j...@amd.com>>
Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 6 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..0a91e53f520c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1334,7 +1334,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 return r;
 }

-   r = amdgpu_discovery_set_mm_ip_blocks(adev);
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev))
+   r = amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
+   else
+   r = amdgpu_discovery_set_mm_ip_blocks(adev);
 if (r)
 return r;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 4f7c70845785..87f56b61be53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -86,6 +86,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 for (i = 0; i < adev->vcn.num_vcn_inst; i++)
 atomic_set(&adev->vcn.inst[i].dpg_enc_submission_cnt, 0);

+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev)) {
+   fw_name = FIRMWARE_SIENNA_CICHLID;
+   goto next;
+   }
 switch (adev->ip_versions[UVD_HWIP][0]) {
 case IP_VERSION(1, 0, 0):
 case IP_VERSION(1, 0, 1):
@@ -168,6 +172,8 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 return -EINVAL;
 }

+next:
+
 r = request_firmware(&adev->vcn.fw, fw_name, adev->dev);
 if (r) {
 dev_err(adev->dev, "amdgpu_vcn: Can't load firmware \"%s\"\n",
--
2.17.1


[PATCH 2/3] drm/amdgpu: fix vkms crtc settings

2021-11-23 Thread Flora Cui
otherwise adev->mode_info.crtcs[] is NULL

Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 42 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.h |  5 ++-
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index ac9a8cd21c4b..6c62c45e3e3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -16,6 +16,8 @@
 #include "ivsrcid/ivsrcid_vislands30.h"
 #include "amdgpu_vkms.h"
 #include "amdgpu_display.h"
+#include "atom.h"
+#include "amdgpu_irq.h"
 
 /**
  * DOC: amdgpu_vkms
@@ -41,14 +43,13 @@ static const u32 amdgpu_vkms_formats[] = {
 
 static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer *timer)
 {
-   struct amdgpu_vkms_output *output = container_of(timer,
-struct 
amdgpu_vkms_output,
-vblank_hrtimer);
-   struct drm_crtc *crtc = &output->crtc;
+   struct amdgpu_crtc *amdgpu_crtc = container_of(timer, struct 
amdgpu_crtc, vblank_timer);
+   struct drm_crtc *crtc = &amdgpu_crtc->base;
+   struct amdgpu_vkms_output *output = 
drm_crtc_to_amdgpu_vkms_output(crtc);
u64 ret_overrun;
bool ret;
 
-   ret_overrun = hrtimer_forward_now(&output->vblank_hrtimer,
+   ret_overrun = hrtimer_forward_now(&amdgpu_crtc->vblank_timer,
  output->period_ns);
WARN_ON(ret_overrun != 1);
 
@@ -65,22 +66,21 @@ static int amdgpu_vkms_enable_vblank(struct drm_crtc *crtc)
unsigned int pipe = drm_crtc_index(crtc);
struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
struct amdgpu_vkms_output *out = drm_crtc_to_amdgpu_vkms_output(crtc);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
 
drm_calc_timestamping_constants(crtc, &crtc->mode);
 
-   hrtimer_init(&out->vblank_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-   out->vblank_hrtimer.function = &amdgpu_vkms_vblank_simulate;
out->period_ns = ktime_set(0, vblank->framedur_ns);
-   hrtimer_start(&out->vblank_hrtimer, out->period_ns, HRTIMER_MODE_REL);
+   hrtimer_start(&amdgpu_crtc->vblank_timer, out->period_ns, 
HRTIMER_MODE_REL);
 
return 0;
 }
 
 static void amdgpu_vkms_disable_vblank(struct drm_crtc *crtc)
 {
-   struct amdgpu_vkms_output *out = drm_crtc_to_amdgpu_vkms_output(crtc);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
 
-   hrtimer_cancel(&out->vblank_hrtimer);
+   hrtimer_cancel(&amdgpu_crtc->vblank_timer);
 }
 
 static bool amdgpu_vkms_get_vblank_timestamp(struct drm_crtc *crtc,
@@ -92,13 +92,14 @@ static bool amdgpu_vkms_get_vblank_timestamp(struct 
drm_crtc *crtc,
unsigned int pipe = crtc->index;
struct amdgpu_vkms_output *output = 
drm_crtc_to_amdgpu_vkms_output(crtc);
struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
 
if (!READ_ONCE(vblank->enabled)) {
*vblank_time = ktime_get();
return true;
}
 
-   *vblank_time = READ_ONCE(output->vblank_hrtimer.node.expires);
+   *vblank_time = READ_ONCE(amdgpu_crtc->vblank_timer.node.expires);
 
if (WARN_ON(*vblank_time == vblank->time))
return true;
@@ -165,6 +166,8 @@ static const struct drm_crtc_helper_funcs 
amdgpu_vkms_crtc_helper_funcs = {
 static int amdgpu_vkms_crtc_init(struct drm_device *dev, struct drm_crtc *crtc,
  struct drm_plane *primary, struct drm_plane *cursor)
 {
+   struct amdgpu_device *adev = drm_to_adev(dev);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
int ret;
 
ret = drm_crtc_init_with_planes(dev, crtc, primary, cursor,
@@ -176,6 +179,17 @@ static int amdgpu_vkms_crtc_init(struct drm_device *dev, 
struct drm_crtc *crtc,
 
drm_crtc_helper_add(crtc, &amdgpu_vkms_crtc_helper_funcs);
 
+   amdgpu_crtc->crtc_id = drm_crtc_index(crtc);
+   adev->mode_info.crtcs[drm_crtc_index(crtc)] = amdgpu_crtc;
+
+   amdgpu_crtc->pll_id = ATOM_PPLL_INVALID;
+   amdgpu_crtc->encoder = NULL;
+   amdgpu_crtc->connector = NULL;
+   amdgpu_crtc->vsync_timer_enabled = AMDGPU_IRQ_STATE_DISABLE;
+
+   hrtimer_init(&amdgpu_crtc->vblank_timer, CLOCK_MONOTONIC, 
HRTIMER_MODE_REL);
+   amdgpu_crtc->vblank_timer.function = &amdgpu_vkms_vblank_simulate;
+
return ret;
 }
 
@@ -401,7 +415,7 @@ int amdgpu_vkms_output_init(struct drm_device *dev,
 {
struct drm_connector *connector = &output->connector;
struct drm_encoder *encoder = &output->encoder;
-   struct drm_crtc *crtc = &output->crtc;
+   struct drm_crtc *crtc = &output->crtc.base;
struct drm_plane *primary, *cursor = NULL;
int ret;
 
@@ -504,8 +518,8 @@

[PATCH 3/3] drm/amdgpu: check atomic flag to differeniate with legacy path

2021-11-23 Thread Flora Cui
since vkms support atomic KMS interface

Signed-off-by: Flora Cui 
Reviewed-by: Guchun Chen 
Acked-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7d4115d52523..8e9e50aa4a95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3830,7 +3830,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
/* disable all interrupts */
amdgpu_irq_disable_all(adev);
if (adev->mode_info.mode_config_initialized){
-   if (!amdgpu_device_has_dc_support(adev))
+   if (!drm_drv_uses_atomic_modeset(adev_to_drm(adev)))
drm_helper_force_disable_all(adev_to_drm(adev));
else
drm_atomic_helper_shutdown(adev_to_drm(adev));
@@ -5124,7 +5124,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
drm_sched_start(&ring->sched, 
!tmp_adev->asic_reset_res);
}
 
-   if (!amdgpu_device_has_dc_support(tmp_adev) && !job_signaled) {
+   if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && 
!job_signaled) {
drm_helper_resume_force_mode(adev_to_drm(tmp_adev));
}
 
-- 
2.25.1



[PATCH 1/3] drm/amdgpu: cancel the correct hrtimer on exit

2021-11-23 Thread Flora Cui
Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index ce982afeff91..ac9a8cd21c4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -504,8 +504,8 @@ static int amdgpu_vkms_sw_fini(void *handle)
int i = 0;
 
for (i = 0; i < adev->mode_info.num_crtc; i++)
-   if (adev->mode_info.crtcs[i])
-   hrtimer_cancel(&adev->mode_info.crtcs[i]->vblank_timer);
+   if (adev->amdgpu_vkms_output[i].vblank_hrtimer.function)
+   
hrtimer_cancel(&adev->amdgpu_vkms_output[i].vblank_hrtimer);
 
kfree(adev->mode_info.bios_hardcoded_edid);
kfree(adev->amdgpu_vkms_output);
-- 
2.25.1



RE: [PATCH] drm/amdgpu/pm: fix powerplay OD interface

2021-11-23 Thread Quan, Evan
[AMD Official Use Only]

Acked-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Wednesday, November 24, 2021 4:01 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu/pm: fix powerplay OD interface
> 
> The overclocking interface currently appends data to a string.  Revert back to
> using sprintf().
> 
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1774&data=04%7C01%7Cevan.quan%40amd.com%7Cfc28
> 95d7646643ab7b8d08d9aebc0fb6%7C3dd8961fe4884e608e11a82d994e183d%
> 7C0%7C0%7C637732945016986765%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> 0&sdata=UgMIgPfWnSKlGFszZHUBlYQcgzHwZ4zAlCmLMhhA%2Ftw%3D
> &reserved=0
> Fixes: 6db0c87a0a8ee1 ("amdgpu/pm: Replace hwmgr smu usage of sprintf
> with sysfs_emit")
> Signed-off-by: Alex Deucher 
> ---
>  .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  | 20 +++
>  .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c   | 24 
>  .../drm/amd/pm/powerplay/hwmgr/smu8_hwmgr.c   |  6 +-
>  .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 28 +
>   .../drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c | 10 ++--
>   .../drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c | 58 +--
> 
>  6 files changed, 67 insertions(+), 79 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> index 258c573acc97..1f406f21b452 100644
> --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
> @@ -1024,8 +1024,6 @@ static int smu10_print_clock_levels(struct
> pp_hwmgr *hwmgr,
>   uint32_t min_freq, max_freq = 0;
>   uint32_t ret = 0;
> 
> - phm_get_sysfs_buf(&buf, &size);
> -
>   switch (type) {
>   case PP_SCLK:
>   smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetGfxclkFrequency, &now); @@ -1038,13 +1036,13 @@
> static int smu10_print_clock_levels(struct pp_hwmgr *hwmgr,
>   else
>   i = 1;
> 
> - size += sysfs_emit_at(buf, size, "0: %uMhz %s\n",
> + size += sprintf(buf + size, "0: %uMhz %s\n",
>   data->gfx_min_freq_limit/100,
>   i == 0 ? "*" : "");
> - size += sysfs_emit_at(buf, size, "1: %uMhz %s\n",
> + size += sprintf(buf + size, "1: %uMhz %s\n",
>   i == 1 ? now :
> SMU10_UMD_PSTATE_GFXCLK,
>   i == 1 ? "*" : "");
> - size += sysfs_emit_at(buf, size, "2: %uMhz %s\n",
> + size += sprintf(buf + size, "2: %uMhz %s\n",
>   data->gfx_max_freq_limit/100,
>   i == 2 ? "*" : "");
>   break;
> @@ -1052,7 +1050,7 @@ static int smu10_print_clock_levels(struct
> pp_hwmgr *hwmgr,
>   smum_send_msg_to_smc(hwmgr,
> PPSMC_MSG_GetFclkFrequency, &now);
> 
>   for (i = 0; i < mclk_table->count; i++)
> - size += sysfs_emit_at(buf, size, "%d: %uMhz %s\n",
> + size += sprintf(buf + size, "%d: %uMhz %s\n",
>   i,
>   mclk_table->entries[i].clk / 100,
>   ((mclk_table->entries[i].clk / 100)
> @@ -1067,10 +1065,10 @@ static int smu10_print_clock_levels(struct
> pp_hwmgr *hwmgr,
>   if (ret)
>   return ret;
> 
> - size += sysfs_emit_at(buf, size, "%s:\n", "OD_SCLK");
> - size += sysfs_emit_at(buf, size, "0: %10uMhz\n",
> + size += sprintf(buf + size, "%s:\n", "OD_SCLK");
> + size += sprintf(buf + size, "0: %10uMhz\n",
>   (data->gfx_actual_soft_min_freq > 0) ? data-
> >gfx_actual_soft_min_freq : min_freq);
> - size += sysfs_emit_at(buf, size, "1: %10uMhz\n",
> + size += sprintf(buf + size, "1: %10uMhz\n",
>   (data->gfx_actual_soft_max_freq > 0) ? data-
> >gfx_actual_soft_max_freq : max_freq);
>   }
>   break;
> @@ -1083,8 +1081,8 @@ static int smu10_print_clock_levels(struct
> pp_hwmgr *hwmgr,
>   if (ret)
>   return ret;
> 
> - size += sysfs_emit_at(buf, size, "%s:\n",
> "OD_RANGE");
> - size += sysfs_emit_at(buf, size,
> "SCLK: %7uMHz %10uMHz\n",
> + size += sprintf(buf + size, "%s:\n", "OD_RANGE");
> + size += sprintf(buf + size,
> "SCLK: %7uMHz %10uMHz\n",
>   min_freq, max_freq);
>   }
>   break;
> diff --git a

Re: [PATCH v3 4/6] drm: implement a method to free unused pages

2021-11-23 Thread Arunpravin



On 18/11/21 12:32 am, Matthew Auld wrote:
> On 16/11/2021 20:18, Arunpravin wrote:
>> On contiguous allocation, we round up the size
>> to the *next* power of 2, implement a function
>> to free the unused pages after the newly allocate block.
>>
>> v2(Matthew Auld):
>>- replace function name 'drm_buddy_free_unused_pages' with
>>  drm_buddy_block_trim
>>- replace input argument name 'actual_size' with 'new_size'
>>- add more validation checks for input arguments
>>- add overlaps check to avoid needless searching and splitting
>>- merged the below patch to see the feature in action
>>  - add free unused pages support to i915 driver
>>- lock drm_buddy_block_trim() function as it calls mark_free/mark_split
>>  are all globally visible
>>
>> Signed-off-by: Arunpravin 
>> ---
>>   drivers/gpu/drm/drm_buddy.c   | 108 ++
>>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  10 ++
>>   include/drm/drm_buddy.h   |   4 +
>>   3 files changed, 122 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index 0a9db2981188..943fe2ad27bf 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -284,6 +284,114 @@ static inline bool contains(u64 s1, u64 e1, u64 s2, 
>> u64 e2)
>>  return s1 <= s2 && e1 >= e2;
>>   }
>>   
>> +/**
>> + * drm_buddy_block_trim - free unused pages
>> + *
>> + * @mm: DRM buddy manager
>> + * @new_size: original size requested
>> + * @blocks: output list head to add allocated blocks
>> + *
>> + * For contiguous allocation, we round up the size to the nearest
>> + * power of two value, drivers consume *actual* size, so remaining
>> + * portions are unused and it can be freed.
>> + *
>> + * Returns:
>> + * 0 on success, error code on failure.
>> + */
>> +int drm_buddy_block_trim(struct drm_buddy_mm *mm,
>> + u64 new_size,
>> + struct list_head *blocks)
>> +{
>> +struct drm_buddy_block *block;
>> +struct drm_buddy_block *buddy;
>> +u64 new_start;
>> +u64 new_end;
>> +LIST_HEAD(dfs);
>> +u64 count = 0;
>> +int err;
>> +
>> +if (!list_is_singular(blocks))
>> +return -EINVAL;
>> +
>> +block = list_first_entry(blocks,
>> + struct drm_buddy_block,
>> + link);
>> +
>> +if (!drm_buddy_block_is_allocated(block))
>> +return -EINVAL;
>> +
>> +if (new_size > drm_buddy_block_size(mm, block))
>> +return -EINVAL;
>> +
>> +if (!new_size && !IS_ALIGNED(new_size, mm->chunk_size))
>> +return -EINVAL;
>> +
>> +if (new_size == drm_buddy_block_size(mm, block))
>> +return 0;
>> +
>> +list_del(&block->link);
>> +
>> +new_start = drm_buddy_block_offset(block);
>> +new_end = new_start + new_size - 1;
>> +
>> +mark_free(mm, block);
>> +
>> +list_add(&block->tmp_link, &dfs);
>> +
>> +do {
>> +u64 block_start;
>> +u64 block_end;
>> +
>> +block = list_first_entry_or_null(&dfs,
>> + struct drm_buddy_block,
>> + tmp_link);
>> +if (!block)
>> +break;
>> +
>> +list_del(&block->tmp_link);
>> +
>> +if (count == new_size)
>> +return 0;
>> +
>> +block_start = drm_buddy_block_offset(block);
>> +block_end = block_start + drm_buddy_block_size(mm, block) - 1;
>> +
>> +if (!overlaps(new_start, new_end, block_start, block_end))
>> +continue;
>> +
>> +if (contains(new_start, new_end, block_start, block_end)) {
>> +BUG_ON(!drm_buddy_block_is_free(block));
>> +
>> +/* Allocate only required blocks */
>> +mark_allocated(block);
>> +mm->avail -= drm_buddy_block_size(mm, block);
>> +list_add_tail(&block->link, blocks);
>> +count += drm_buddy_block_size(mm, block);
>> +continue;
>> +}
>> +
>> +if (!drm_buddy_block_is_split(block)) {
> 
> Should always be true, right? But I guess depends if we want to re-use 
> this for generic range allocation...
yes, since we re-use this for generic range allocation I think we can
keep this check
> 
>> +err = split_block(mm, block);
>> +if (unlikely(err))
>> +goto err_undo;
>> +}
>> +
>> +list_add(&block->right->tmp_link, &dfs);
>> +list_add(&block->left->tmp_link, &dfs);
>> +} while (1);
>> +
>> +return -ENOSPC;
>> +
>> +err_undo:
>> +buddy = get_buddy(block);
>> +if (buddy &&
>> +(drm_buddy_block_is_free(block) &&
>> + drm_buddy_block_is_free(buddy)))
>> +   

Re: [PATCH v3 2/6] drm: improve drm_buddy_alloc function

2021-11-23 Thread Arunpravin



On 18/11/21 12:09 am, Matthew Auld wrote:
> On 16/11/2021 20:18, Arunpravin wrote:
>> - Make drm_buddy_alloc a single function to handle
>>range allocation and non-range allocation demands
>>
>> - Implemented a new function alloc_range() which allocates
>>the requested power-of-two block comply with range limitations
>>
>> - Moved order computation and memory alignment logic from
>>i915 driver to drm buddy
>>
>> v2:
>>merged below changes to keep the build unbroken
>> - drm_buddy_alloc_range() becomes obsolete and may be removed
>> - enable ttm range allocation (fpfn / lpfn) support in i915 driver
>> - apply enhanced drm_buddy_alloc() function to i915 driver
>>
>> v3(Matthew Auld):
>>- Fix alignment issues and remove unnecessary list_empty check
>>- add more validation checks for input arguments
>>- make alloc_range() block allocations as bottom-up
>>- optimize order computation logic
>>- replace uint64_t with u64, which is preferred in the kernel
>>
>> Signed-off-by: Arunpravin 
>> ---
>>   drivers/gpu/drm/drm_buddy.c   | 259 ++
>>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  69 ++---
>>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.h |   2 +
>>   include/drm/drm_buddy.h   |  22 +-
>>   4 files changed, 203 insertions(+), 149 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index 39eb1d224bec..c9b18a29f8d1 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -274,63 +274,6 @@ void drm_buddy_free_list(struct drm_buddy_mm *mm, 
>> struct list_head *objects)
>>   }
>>   EXPORT_SYMBOL(drm_buddy_free_list);
>>   
>> -/**
>> - * drm_buddy_alloc - allocate power-of-two blocks
>> - *
>> - * @mm: DRM buddy manager to allocate from
>> - * @order: size of the allocation
>> - *
>> - * The order value here translates to:
>> - *
>> - * 0 = 2^0 * mm->chunk_size
>> - * 1 = 2^1 * mm->chunk_size
>> - * 2 = 2^2 * mm->chunk_size
>> - *
>> - * Returns:
>> - * allocated ptr to the &drm_buddy_block on success
>> - */
>> -struct drm_buddy_block *
>> -drm_buddy_alloc(struct drm_buddy_mm *mm, unsigned int order)
>> -{
>> -struct drm_buddy_block *block = NULL;
>> -unsigned int i;
>> -int err;
>> -
>> -for (i = order; i <= mm->max_order; ++i) {
>> -block = list_first_entry_or_null(&mm->free_list[i],
>> - struct drm_buddy_block,
>> - link);
>> -if (block)
>> -break;
>> -}
>> -
>> -if (!block)
>> -return ERR_PTR(-ENOSPC);
>> -
>> -BUG_ON(!drm_buddy_block_is_free(block));
>> -
>> -while (i != order) {
>> -err = split_block(mm, block);
>> -if (unlikely(err))
>> -goto out_free;
>> -
>> -/* Go low */
>> -block = block->left;
>> -i--;
>> -}
>> -
>> -mark_allocated(block);
>> -mm->avail -= drm_buddy_block_size(mm, block);
>> -kmemleak_update_trace(block);
>> -return block;
>> -
>> -out_free:
>> -if (i != order)
>> -__drm_buddy_free(mm, block);
>> -return ERR_PTR(err);
>> -}
>> -EXPORT_SYMBOL(drm_buddy_alloc);
>> -
>>   static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2)
>>   {
>>  return s1 <= e2 && e1 >= s2;
>> @@ -341,52 +284,22 @@ static inline bool contains(u64 s1, u64 e1, u64 s2, 
>> u64 e2)
>>  return s1 <= s2 && e1 >= e2;
>>   }
>>   
>> -/**
>> - * drm_buddy_alloc_range - allocate range
>> - *
>> - * @mm: DRM buddy manager to allocate from
>> - * @blocks: output list head to add allocated blocks
>> - * @start: start of the allowed range for this block
>> - * @size: size of the allocation
>> - *
>> - * Intended for pre-allocating portions of the address space, for example to
>> - * reserve a block for the initial framebuffer or similar, hence the 
>> expectation
>> - * here is that drm_buddy_alloc() is still the main vehicle for
>> - * allocations, so if that's not the case then the drm_mm range allocator is
>> - * probably a much better fit, and so you should probably go use that 
>> instead.
>> - *
>> - * Note that it's safe to chain together multiple alloc_ranges
>> - * with the same blocks list
>> - *
>> - * Returns:
>> - * 0 on success, error code on failure.
>> - */
>> -int drm_buddy_alloc_range(struct drm_buddy_mm *mm,
>> -  struct list_head *blocks,
>> -  u64 start, u64 size)
>> +static struct drm_buddy_block *
>> +alloc_range(struct drm_buddy_mm *mm,
>> +u64 start, u64 end,
>> +unsigned int order)
>>   {
>>  struct drm_buddy_block *block;
>>  struct drm_buddy_block *buddy;
>> -LIST_HEAD(allocated);
>>  LIST_HEAD(dfs);
>> -u64 end;
>>  int err;
>>  int i;
>>   
>> -if (size < mm->chunk_size)
>> -return -EINVAL;
>> -
>> -if (!IS_ALIGNE

[PATCH v2 1/2] drm/amdkfd: Use bitmap_zalloc() when applicable

2021-11-23 Thread Christophe JAILLET
'doorbell_bitmap' and 'queue_slot_bitmap' are bitmaps. So use
'bitmap_zalloc()' to simplify code, improve the semantic and avoid some
open-coded arithmetic in allocator arguments.

Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.

Reviewed-by: Felix Kuehling 
Signed-off-by: Christophe JAILLET 
---
v1 --> v2: Compile tested :)
   Add a missing ',' (kernel test robot)
   Add kfd_process_queue_manager.c (Felix Kuehling)
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 7 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 7 +++
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index d4c8a6948a9f..67bb1654becc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1011,7 +1011,7 @@ static void kfd_process_destroy_pdds(struct kfd_process 
*p)
free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
get_order(KFD_CWSR_TBA_TMA_SIZE));
 
-   kfree(pdd->qpd.doorbell_bitmap);
+   bitmap_free(pdd->qpd.doorbell_bitmap);
idr_destroy(&pdd->alloc_idr);
 
kfd_free_process_doorbells(pdd->dev, pdd->doorbell_index);
@@ -1433,9 +1433,8 @@ static int init_doorbell_bitmap(struct qcm_process_device 
*qpd,
if (!KFD_IS_SOC15(dev))
return 0;
 
-   qpd->doorbell_bitmap =
-   kzalloc(DIV_ROUND_UP(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
-BITS_PER_BYTE), GFP_KERNEL);
+   qpd->doorbell_bitmap = bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
+GFP_KERNEL);
if (!qpd->doorbell_bitmap)
return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 4f8464658daf..c5f5a25c6dcc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -135,9 +135,8 @@ void kfd_process_dequeue_from_all_devices(struct 
kfd_process *p)
 int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p)
 {
INIT_LIST_HEAD(&pqm->queues);
-   pqm->queue_slot_bitmap =
-   kzalloc(DIV_ROUND_UP(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
-   BITS_PER_BYTE), GFP_KERNEL);
+   pqm->queue_slot_bitmap = 
bitmap_zalloc(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
+  GFP_KERNEL);
if (!pqm->queue_slot_bitmap)
return -ENOMEM;
pqm->process = p;
@@ -159,7 +158,7 @@ void pqm_uninit(struct process_queue_manager *pqm)
kfree(pqn);
}
 
-   kfree(pqm->queue_slot_bitmap);
+   bitmap_free(pqm->queue_slot_bitmap);
pqm->queue_slot_bitmap = NULL;
 }
 
-- 
2.30.2



[PATCH v2 2/2] drm/amdkfd: Slighly optimize 'init_doorbell_bitmap()'

2021-11-23 Thread Christophe JAILLET
The 'doorbell_bitmap' bitmap has just been allocated. So we can use the
non-atomic '__set_bit()' function to save a few cycles as no concurrent
access can happen.

Reviewed-by: Felix Kuehling 
Signed-off-by: Christophe JAILLET 
---
bitmap_set() could certainly also be use, but range checking would be
tricky.

v1 --> v2: No change
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 67bb1654becc..9158f9754a24 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1446,9 +1446,9 @@ static int init_doorbell_bitmap(struct qcm_process_device 
*qpd,
 
for (i = 0; i < KFD_MAX_NUM_OF_QUEUES_PER_PROCESS / 2; i++) {
if (i >= range_start && i <= range_end) {
-   set_bit(i, qpd->doorbell_bitmap);
-   set_bit(i + KFD_QUEUE_DOORBELL_MIRROR_OFFSET,
-   qpd->doorbell_bitmap);
+   __set_bit(i, qpd->doorbell_bitmap);
+   __set_bit(i + KFD_QUEUE_DOORBELL_MIRROR_OFFSET,
+ qpd->doorbell_bitmap);
}
}
 
-- 
2.30.2



[PATCH] drm/amdgpu/pm: fix powerplay OD interface

2021-11-23 Thread Alex Deucher
The overclocking interface currently appends data to a
string.  Revert back to using sprintf().

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1774
Fixes: 6db0c87a0a8ee1 ("amdgpu/pm: Replace hwmgr smu usage of sprintf with 
sysfs_emit")
Signed-off-by: Alex Deucher 
---
 .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  | 20 +++
 .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c   | 24 
 .../drm/amd/pm/powerplay/hwmgr/smu8_hwmgr.c   |  6 +-
 .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 28 +
 .../drm/amd/pm/powerplay/hwmgr/vega12_hwmgr.c | 10 ++--
 .../drm/amd/pm/powerplay/hwmgr/vega20_hwmgr.c | 58 +--
 6 files changed, 67 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
index 258c573acc97..1f406f21b452 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
@@ -1024,8 +1024,6 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
uint32_t min_freq, max_freq = 0;
uint32_t ret = 0;
 
-   phm_get_sysfs_buf(&buf, &size);
-
switch (type) {
case PP_SCLK:
smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetGfxclkFrequency, &now);
@@ -1038,13 +1036,13 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
else
i = 1;
 
-   size += sysfs_emit_at(buf, size, "0: %uMhz %s\n",
+   size += sprintf(buf + size, "0: %uMhz %s\n",
data->gfx_min_freq_limit/100,
i == 0 ? "*" : "");
-   size += sysfs_emit_at(buf, size, "1: %uMhz %s\n",
+   size += sprintf(buf + size, "1: %uMhz %s\n",
i == 1 ? now : SMU10_UMD_PSTATE_GFXCLK,
i == 1 ? "*" : "");
-   size += sysfs_emit_at(buf, size, "2: %uMhz %s\n",
+   size += sprintf(buf + size, "2: %uMhz %s\n",
data->gfx_max_freq_limit/100,
i == 2 ? "*" : "");
break;
@@ -1052,7 +1050,7 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetFclkFrequency, &now);
 
for (i = 0; i < mclk_table->count; i++)
-   size += sysfs_emit_at(buf, size, "%d: %uMhz %s\n",
+   size += sprintf(buf + size, "%d: %uMhz %s\n",
i,
mclk_table->entries[i].clk / 100,
((mclk_table->entries[i].clk / 100)
@@ -1067,10 +1065,10 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
if (ret)
return ret;
 
-   size += sysfs_emit_at(buf, size, "%s:\n", "OD_SCLK");
-   size += sysfs_emit_at(buf, size, "0: %10uMhz\n",
+   size += sprintf(buf + size, "%s:\n", "OD_SCLK");
+   size += sprintf(buf + size, "0: %10uMhz\n",
(data->gfx_actual_soft_min_freq > 0) ? 
data->gfx_actual_soft_min_freq : min_freq);
-   size += sysfs_emit_at(buf, size, "1: %10uMhz\n",
+   size += sprintf(buf + size, "1: %10uMhz\n",
(data->gfx_actual_soft_max_freq > 0) ? 
data->gfx_actual_soft_max_freq : max_freq);
}
break;
@@ -1083,8 +1081,8 @@ static int smu10_print_clock_levels(struct pp_hwmgr 
*hwmgr,
if (ret)
return ret;
 
-   size += sysfs_emit_at(buf, size, "%s:\n", "OD_RANGE");
-   size += sysfs_emit_at(buf, size, "SCLK: %7uMHz 
%10uMHz\n",
+   size += sprintf(buf + size, "%s:\n", "OD_RANGE");
+   size += sprintf(buf + size, "SCLK: %7uMHz %10uMHz\n",
min_freq, max_freq);
}
break;
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
index aceebf584225..611969bf4520 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
@@ -4914,8 +4914,6 @@ static int smu7_print_clock_levels(struct pp_hwmgr *hwmgr,
int size = 0;
uint32_t i, now, clock, pcie_speed;
 
-   phm_get_sysfs_buf(&buf, &size);
-
switch (type) {
case PP_SCLK:
smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetSclkFrequency, 
&clock);
@@ -4928,7 +4926,7 @@ static int smu7_print_clock_levels(struct pp_hwmgr *hwmgr,
now = i;
 
for (i = 0; i < sclk_table->count; i++)
-   

[PATCH v5] drm/amdgpu: handle IH ring1 overflow

2021-11-23 Thread Philip Yang
IH ring1 is used to process GPU retry fault, overflow is enabled to
drain retry fault because we want receive other interrupts while
handling retry fault to recover range. There is no overflow flag set
when wptr pass rptr. Use timestamp of rptr and wptr to handle overflow
and drain retry fault.

Add helper function amdgpu_ih_decode_iv_ts to get 48bit timestamp from
IV entry. drain retry fault is done if processed_timestamp is
equal to or larger than checkpoint timestamp.

Add function amdgpu_ih_process1 to process IH ring1 until timestamp of
rptr is larger then timestamp of next entry.

Helper amdgpu_ih_ts_after to compare time stamps with 48bit wrap around.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  | 107 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  13 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c  |   1 +
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c  |   1 +
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c  |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c|   2 +-
 7 files changed, 99 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
index 0c7963dfacad..30b4e0e01444 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
@@ -164,52 +164,45 @@ void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, 
const uint32_t *iv,
}
 }
 
+/* return true if time stamp t2 is after t1 with 48bit wrap around */
+static inline bool amdgpu_ih_ts_after(uint64_t t1, uint64_t t2)
+{
+   return ((int64_t)(t2 << 16) - (int64_t)(t1 << 16)) > 0LL;
+}
+
 /* Waiter helper that checks current rptr matches or passes checkpoint wptr */
-static bool amdgpu_ih_has_checkpoint_processed(struct amdgpu_device *adev,
+static bool amdgpu_ih_has_checkpoint_processed_ts(struct amdgpu_device *adev,
struct amdgpu_ih_ring *ih,
-   uint32_t checkpoint_wptr,
-   uint32_t *prev_rptr)
+   uint64_t checkpoint_ts)
 {
-   uint32_t cur_rptr = ih->rptr | (*prev_rptr & ~ih->ptr_mask);
-
-   /* rptr has wrapped. */
-   if (cur_rptr < *prev_rptr)
-   cur_rptr += ih->ptr_mask + 1;
-   *prev_rptr = cur_rptr;
-
-   /* check ring is empty to workaround missing wptr overflow flag */
-   return cur_rptr >= checkpoint_wptr ||
-  (cur_rptr & ih->ptr_mask) == amdgpu_ih_get_wptr(adev, ih);
+   return !amdgpu_ih_ts_after(ih->processed_timestamp, checkpoint_ts);
 }
 
 /**
- * amdgpu_ih_wait_on_checkpoint_process - wait to process IVs up to checkpoint
+ * amdgpu_ih_wait_on_checkpoint_process_ts - wait to process IVs up to 
checkpoint
  *
  * @adev: amdgpu_device pointer
  * @ih: ih ring to process
  *
  * Used to ensure ring has processed IVs up to the checkpoint write pointer.
  */
-int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device *adev,
+int amdgpu_ih_wait_on_checkpoint_process_ts(struct amdgpu_device *adev,
struct amdgpu_ih_ring *ih)
 {
-   uint32_t checkpoint_wptr, rptr;
+   uint32_t checkpoint_wptr;
+   uint64_t checkpoint_ts;
 
if (!ih->enabled || adev->shutdown)
return -ENODEV;
 
checkpoint_wptr = amdgpu_ih_get_wptr(adev, ih);
-   /* Order wptr with rptr. */
+   /* Order wptr with ring data. */
rmb();
-   rptr = READ_ONCE(ih->rptr);
-
-   /* wptr has wrapped. */
-   if (rptr > checkpoint_wptr)
-   checkpoint_wptr += ih->ptr_mask + 1;
+   checkpoint_ts = amdgpu_ih_decode_iv_ts(adev, ih, checkpoint_wptr, -1);
 
return wait_event_interruptible(ih->wait_process,
-   amdgpu_ih_has_checkpoint_processed(adev, ih,
-   checkpoint_wptr, &rptr));
+   amdgpu_ih_has_checkpoint_processed_ts(adev, ih,
+   checkpoint_ts));
 }
 
 /**
@@ -254,6 +247,56 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct 
amdgpu_ih_ring *ih)
return IRQ_HANDLED;
 }
 
+/**
+ * amdgpu_ih_process1 - interrupt handler work for IH ring1
+ *
+ * @adev: amdgpu_device pointer
+ * @ih: ih ring to process
+ *
+ * Interrupt handler of IH ring1, walk the IH ring1.
+ * Returns irq process return code.
+ */
+int amdgpu_ih_process1(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih)
+{
+   uint64_t ts, ts_next;
+   unsigned int count;
+   u32 wptr;
+
+   if (!ih->enabled || adev->shutdown)
+   return IRQ_NONE;
+
+   wptr = amdgpu_ih_get_wptr(adev, ih);
+   if (ih->rptr == wptr)
+   return 0;
+
+restart_ih:
+   count = AMDGPU_IH_MAX_NUM_IVS;
+
+   ts = amdgpu_ih_decode_iv_ts(adev, ih, ih->rptr, 0);
+   ts_next = amdgpu_ih_decode_iv_ts(adev, ih, 

Re: [PATCH 2/2] drm/amdgpu: enable Navi retry fault wptr overflow

2021-11-23 Thread Felix Kuehling
Am 2021-11-23 um 11:03 a.m. schrieb Philip Yang:
> If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
> will be full quickly. IH cannot receive other interrupts, this causes
> deadlock if migrating buffer using sdma and waiting for sdma done
> while handling retry fault.
>
> Remove VMC from IH storm client, enable ring1 write pointer
> overflow, then IH will drop retry fault interrupts and be able to receive
> other interrupts while driver is handling retry fault.
>
> IH ring1 write pointer doesn't writeback to memory by IH, and ring1
> write pointer recorded by self-irq is not updated, so always read
> the latest ring1 write pointer from register.
>
> Signed-off-by: Philip Yang 

Looks like the same thing you did in this commit for Vega:

b672cb1eee59 drm/amdgpu: enable retry fault wptr overflow

This series is

Reviewed-by: Felix Kuehling 


> ---
>  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 33 ++
>  1 file changed, 12 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> index dafad6030947..38241cf0e1f1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> @@ -276,10 +276,8 @@ static int navi10_ih_enable_ring(struct amdgpu_device 
> *adev,
>   tmp = navi10_ih_rb_cntl(ih, tmp);
>   if (ih == &adev->irq.ih)
>   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RPTR_REARM, 
> !!adev->irq.msi_enabled);
> - if (ih == &adev->irq.ih1) {
> - tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_ENABLE, 0);
> + if (ih == &adev->irq.ih1)
>   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_FULL_DRAIN_ENABLE, 1);
> - }
>  
>   if (amdgpu_sriov_vf(adev) && amdgpu_sriov_reg_indirect_ih(adev)) {
>   if (psp_reg_program(&adev->psp, ih_regs->psp_reg_id, tmp)) {
> @@ -320,7 +318,6 @@ static int navi10_ih_irq_init(struct amdgpu_device *adev)
>  {
>   struct amdgpu_ih_ring *ih[] = {&adev->irq.ih, &adev->irq.ih1, 
> &adev->irq.ih2};
>   u32 ih_chicken;
> - u32 tmp;
>   int ret;
>   int i;
>  
> @@ -364,15 +361,6 @@ static int navi10_ih_irq_init(struct amdgpu_device *adev)
>   adev->nbio.funcs->ih_doorbell_range(adev, ih[0]->use_doorbell,
>   ih[0]->doorbell_index);
>  
> - tmp = RREG32_SOC15(OSSSYS, 0, mmIH_STORM_CLIENT_LIST_CNTL);
> - tmp = REG_SET_FIELD(tmp, IH_STORM_CLIENT_LIST_CNTL,
> - CLIENT18_IS_STORM_CLIENT, 1);
> - WREG32_SOC15(OSSSYS, 0, mmIH_STORM_CLIENT_LIST_CNTL, tmp);
> -
> - tmp = RREG32_SOC15(OSSSYS, 0, mmIH_INT_FLOOD_CNTL);
> - tmp = REG_SET_FIELD(tmp, IH_INT_FLOOD_CNTL, FLOOD_CNTL_ENABLE, 1);
> - WREG32_SOC15(OSSSYS, 0, mmIH_INT_FLOOD_CNTL, tmp);
> -
>   pci_set_master(adev->pdev);
>  
>   /* enable interrupts */
> @@ -421,12 +409,19 @@ static u32 navi10_ih_get_wptr(struct amdgpu_device 
> *adev,
>   u32 wptr, tmp;
>   struct amdgpu_ih_regs *ih_regs;
>  
> - wptr = le32_to_cpu(*ih->wptr_cpu);
> - ih_regs = &ih->ih_regs;
> + if (ih == &adev->irq.ih) {
> + /* Only ring0 supports writeback. On other rings fall back
> +  * to register-based code with overflow checking below.
> +  */
> + wptr = le32_to_cpu(*ih->wptr_cpu);
>  
> - if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
> - goto out;
> + if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
> + goto out;
> + }
>  
> + ih_regs = &ih->ih_regs;
> +
> + /* Double check that the overflow wasn't already cleared. */
>   wptr = RREG32_NO_KIQ(ih_regs->ih_rb_wptr);
>   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
>   goto out;
> @@ -514,15 +509,11 @@ static int navi10_ih_self_irq(struct amdgpu_device 
> *adev,
> struct amdgpu_irq_src *source,
> struct amdgpu_iv_entry *entry)
>  {
> - uint32_t wptr = cpu_to_le32(entry->src_data[0]);
> -
>   switch (entry->ring_id) {
>   case 1:
> - *adev->irq.ih1.wptr_cpu = wptr;
>   schedule_work(&adev->irq.ih1_work);
>   break;
>   case 2:
> - *adev->irq.ih2.wptr_cpu = wptr;
>   schedule_work(&adev->irq.ih2_work);
>   break;
>   default: break;


[PATCH 1/2] drm/amdgpu: enable Navi 48-bit IH timestamp counter

2021-11-23 Thread Philip Yang
By default this timestamp is 32 bit counter. It gets overflowed in
around 10 minutes.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index 1d8414c3fadb..dafad6030947 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -160,6 +160,7 @@ static int navi10_ih_toggle_ring_interrupts(struct 
amdgpu_device *adev,
 
tmp = RREG32(ih_regs->ih_rb_cntl);
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_ENABLE, (enable ? 1 : 0));
+   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_GPU_TS_ENABLE, 1);
/* enable_intr field is only valid in ring0 */
if (ih == &adev->irq.ih)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, ENABLE_INTR, (enable ? 1 : 
0));
-- 
2.17.1



[PATCH 2/2] drm/amdgpu: enable Navi retry fault wptr overflow

2021-11-23 Thread Philip Yang
If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
will be full quickly. IH cannot receive other interrupts, this causes
deadlock if migrating buffer using sdma and waiting for sdma done
while handling retry fault.

Remove VMC from IH storm client, enable ring1 write pointer
overflow, then IH will drop retry fault interrupts and be able to receive
other interrupts while driver is handling retry fault.

IH ring1 write pointer doesn't writeback to memory by IH, and ring1
write pointer recorded by self-irq is not updated, so always read
the latest ring1 write pointer from register.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 33 ++
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index dafad6030947..38241cf0e1f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -276,10 +276,8 @@ static int navi10_ih_enable_ring(struct amdgpu_device 
*adev,
tmp = navi10_ih_rb_cntl(ih, tmp);
if (ih == &adev->irq.ih)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RPTR_REARM, 
!!adev->irq.msi_enabled);
-   if (ih == &adev->irq.ih1) {
-   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_ENABLE, 0);
+   if (ih == &adev->irq.ih1)
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, RB_FULL_DRAIN_ENABLE, 1);
-   }
 
if (amdgpu_sriov_vf(adev) && amdgpu_sriov_reg_indirect_ih(adev)) {
if (psp_reg_program(&adev->psp, ih_regs->psp_reg_id, tmp)) {
@@ -320,7 +318,6 @@ static int navi10_ih_irq_init(struct amdgpu_device *adev)
 {
struct amdgpu_ih_ring *ih[] = {&adev->irq.ih, &adev->irq.ih1, 
&adev->irq.ih2};
u32 ih_chicken;
-   u32 tmp;
int ret;
int i;
 
@@ -364,15 +361,6 @@ static int navi10_ih_irq_init(struct amdgpu_device *adev)
adev->nbio.funcs->ih_doorbell_range(adev, ih[0]->use_doorbell,
ih[0]->doorbell_index);
 
-   tmp = RREG32_SOC15(OSSSYS, 0, mmIH_STORM_CLIENT_LIST_CNTL);
-   tmp = REG_SET_FIELD(tmp, IH_STORM_CLIENT_LIST_CNTL,
-   CLIENT18_IS_STORM_CLIENT, 1);
-   WREG32_SOC15(OSSSYS, 0, mmIH_STORM_CLIENT_LIST_CNTL, tmp);
-
-   tmp = RREG32_SOC15(OSSSYS, 0, mmIH_INT_FLOOD_CNTL);
-   tmp = REG_SET_FIELD(tmp, IH_INT_FLOOD_CNTL, FLOOD_CNTL_ENABLE, 1);
-   WREG32_SOC15(OSSSYS, 0, mmIH_INT_FLOOD_CNTL, tmp);
-
pci_set_master(adev->pdev);
 
/* enable interrupts */
@@ -421,12 +409,19 @@ static u32 navi10_ih_get_wptr(struct amdgpu_device *adev,
u32 wptr, tmp;
struct amdgpu_ih_regs *ih_regs;
 
-   wptr = le32_to_cpu(*ih->wptr_cpu);
-   ih_regs = &ih->ih_regs;
+   if (ih == &adev->irq.ih) {
+   /* Only ring0 supports writeback. On other rings fall back
+* to register-based code with overflow checking below.
+*/
+   wptr = le32_to_cpu(*ih->wptr_cpu);
 
-   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
-   goto out;
+   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
+   goto out;
+   }
 
+   ih_regs = &ih->ih_regs;
+
+   /* Double check that the overflow wasn't already cleared. */
wptr = RREG32_NO_KIQ(ih_regs->ih_rb_wptr);
if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
goto out;
@@ -514,15 +509,11 @@ static int navi10_ih_self_irq(struct amdgpu_device *adev,
  struct amdgpu_irq_src *source,
  struct amdgpu_iv_entry *entry)
 {
-   uint32_t wptr = cpu_to_le32(entry->src_data[0]);
-
switch (entry->ring_id) {
case 1:
-   *adev->irq.ih1.wptr_cpu = wptr;
schedule_work(&adev->irq.ih1_work);
break;
case 2:
-   *adev->irq.ih2.wptr_cpu = wptr;
schedule_work(&adev->irq.ih2_work);
break;
default: break;
-- 
2.17.1



Re: [PATCH v1 1/9] mm: add zone device coherent type memory support

2021-11-23 Thread Alistair Popple
On Tuesday, 23 November 2021 4:16:55 AM AEDT Felix Kuehling wrote:

[...]

> > Right, so long as my fix goes in I don't think there is anything wrong with
> > pinning device public pages. Agree that we should avoid FOLL_LONGTERM pins 
> > for
> > device memory though. I think the way to do that is update 
> > is_pinnable_page()
> > so we treat device pages the same as other unpinnable pages ie. long-term 
> > pins
> > will migrate the page.
> 
> I'm trying to understand check_and_migrate_movable_pages in gup.c. It
> doesn't look like the right way to migrate device pages. We may have to
> do something different there as well. So instead of changing
> is_pinnable_page, it maybe better to explicitly check for is_device_page
> or is_device_coherent_page in check_and_migrate_movable_pages to migrate
> it correctly, or just fail outright.

Yes, I think you're right. I was thinking check_and_migrate_movable_pages()
would work for coherent device pages. Now I see it won't because it assumes
they are lru pages and it tries to isolate them which will never succeed
because device pages aren't on a lru.

I think migrating them is the right thing to do for FOLL_LONGTERM though.

 - Alistair

> Thanks,
>   Felix
> 
> >
> >>> In the case of device-private pages this is enforced by the fact they 
> >>> never
> >>> have present pte's, so any attempt to GUP them results in a fault. But if 
> >>> I'm
> >>> understanding this series correctly that won't be the case for coherent 
> >>> device
> >>> pages right?
> >> Right.
> >>
> >> Regards,
> >>   Felix
> >>
> >>
>  -return is_device_private_page(page);
>  +return is_device_page(page);
>   }
>   
>   /* For file back page */
>  @@ -2791,7 +2791,7 @@ EXPORT_SYMBOL(migrate_vma_setup);
>    * handle_pte_fault()
>    *   do_anonymous_page()
>    * to map in an anonymous zero page but the struct page will be a 
>  ZONE_DEVICE
>  - * private page.
>  + * private or coherent page.
>    */
>   static void migrate_vma_insert_page(struct migrate_vma *migrate,
>   unsigned long addr,
>  @@ -2867,10 +2867,15 @@ static void migrate_vma_insert_page(struct 
>  migrate_vma *migrate,
>   swp_entry = 
>  make_readable_device_private_entry(
>   
>  page_to_pfn(page));
>   entry = swp_entry_to_pte(swp_entry);
>  +} else if (is_device_page(page)) {
> >>> How about adding an explicit `is_device_coherent_page()` helper? It would 
> >>> make
> >>> the test more explicit that this is expected to handle just coherent 
> >>> pages and
> >>> I bet there will be future changes that need to differentiate between 
> >>> private
> >>> and coherent pages anyway.
> >>>
>  +entry = pte_mkold(mk_pte(page,
>  + 
>  READ_ONCE(vma->vm_page_prot)));
>  +if (vma->vm_flags & VM_WRITE)
>  +entry = pte_mkwrite(pte_mkdirty(entry));
>   } else {
>   /*
>  - * For now we only support migrating to 
>  un-addressable
>  - * device memory.
>  + * We support migrating to private and coherent 
>  types
>  + * for device zone memory.
>    */
>   pr_warn_once("Unsupported ZONE_DEVICE page 
>  type.\n");
>   goto abort;
>  @@ -2976,10 +2981,10 @@ void migrate_vma_pages(struct migrate_vma 
>  *migrate)
>   mapping = page_mapping(page);
>   
>   if (is_zone_device_page(newpage)) {
>  -if (is_device_private_page(newpage)) {
>  +if (is_device_page(newpage)) {
>   /*
>  - * For now only support private 
>  anonymous when
>  - * migrating to un-addressable device 
>  memory.
>  + * For now only support private and 
>  coherent
>  + * anonymous when migrating to device 
>  memory.
>    */
>   if (mapping) {
>   migrate->src[i] &= 
>  ~MIGRATE_PFN_MIGRATE;
> 
> >
> >
> 






Re: [PATCH 2/2] drm/amdkfd: Slighly optimize 'init_doorbell_bitmap()'

2021-11-23 Thread Christophe JAILLET

Le 22/11/2021 à 22:44, Felix Kuehling a écrit :

Am 2021-11-21 um 12:41 p.m. schrieb Christophe JAILLET:

The 'doorbell_bitmap' bitmap has just been allocated. So we can use the
non-atomic '__set_bit()' function to save a few cycles as no concurrent
access can happen.

Signed-off-by: Christophe JAILLET 


Thank you for the patches. I think the same sort of change (at least the
allocation/freeing part) could be applied to the queue_slot_bitmap in
kfd_process_queue_manager.c. Would you like to submit another revision
of this patch series that handles that as well?


I'll send a v2 which will fix the missing ',' spotted by the kernel test 
robot and include kfd_process_queue_manager.c.


All my patches are compile tested (otherwise it is said bellow the ---). 
Looks like I missed this one :(.


CJ



Either way, this series is

Reviewed-by: Felix Kuehling 



---
bitmap_set() could certainly also be use, but range checking would be
tricky.
---
  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 172ee8763523..2e9d341062c4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1447,9 +1447,9 @@ static int init_doorbell_bitmap(struct qcm_process_device 
*qpd,
  
  	for (i = 0; i < KFD_MAX_NUM_OF_QUEUES_PER_PROCESS / 2; i++) {

if (i >= range_start && i <= range_end) {
-   set_bit(i, qpd->doorbell_bitmap);
-   set_bit(i + KFD_QUEUE_DOORBELL_MIRROR_OFFSET,
-   qpd->doorbell_bitmap);
+   __set_bit(i, qpd->doorbell_bitmap);
+   __set_bit(i + KFD_QUEUE_DOORBELL_MIRROR_OFFSET,
+ qpd->doorbell_bitmap);
}
}
  






Re: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-23 Thread Deucher, Alexander
[Public]

Can we just add a check for the new IP version in that case?  This looks really 
hacky.

Alex


From: Jane Jian 
Sent: Tuesday, November 23, 2021 6:34 AM
To: amd-gfx@lists.freedesktop.org ; Deucher, 
Alexander ; Chen, Guchun ; 
Chen, JingWen 
Cc: Jian, Jane 
Subject: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init 
for SIENNA_CICHLID

[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia 
bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
skip ip revision match case and continue use asic type to check

Signed-off-by: Jane Jian 
Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 6 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..0a91e53f520c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1334,7 +1334,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 return r;
 }

-   r = amdgpu_discovery_set_mm_ip_blocks(adev);
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev))
+   r = amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
+   else
+   r = amdgpu_discovery_set_mm_ip_blocks(adev);
 if (r)
 return r;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 4f7c70845785..87f56b61be53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -86,6 +86,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 for (i = 0; i < adev->vcn.num_vcn_inst; i++)
 atomic_set(&adev->vcn.inst[i].dpg_enc_submission_cnt, 0);

+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev)) {
+   fw_name = FIRMWARE_SIENNA_CICHLID;
+   goto next;
+   }
 switch (adev->ip_versions[UVD_HWIP][0]) {
 case IP_VERSION(1, 0, 0):
 case IP_VERSION(1, 0, 1):
@@ -168,6 +172,8 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 return -EINVAL;
 }

+next:
+
 r = request_firmware(&adev->vcn.fw, fw_name, adev->dev);
 if (r) {
 dev_err(adev->dev, "amdgpu_vcn: Can't load firmware \"%s\"\n",
--
2.17.1



Re: [PATCH 1/2] drm/amdgpu: fix vkms hrtimer settings

2021-11-23 Thread Alex Deucher
On Mon, Nov 22, 2021 at 8:59 PM Cui, Flora  wrote:
>
> [Public]
>
>
> Modprobe -r amdgpu get oops in amdgpu_vkms_sw_fini()
>
>   for (i = 0; i < adev->mode_info.num_crtc; i++)
>
>  if (adev->mode_info.crtcs[i])
>
>
> hrtimer_cancel(&adev->mode_info.crtcs[i]->vblank_timer);
>
> adev->mode_info.crtcs[i]->vblank_timer is not initiated as vkms init its own 
> amdgpu_vkms_output-> vblank_hrtimer. This patch drop amdgpu_vkms_output-> 
> vblank_hrtimer and try with adev->mode_info.crtcs[i]->vblank_timer to keep 
> align with amdgpu_dm & dce_vx_0.c
>
>

I think this might be better as two patches.  The simple fix for this
problem would be:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index ce982afeff91..b620a6a3cb9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -504,8 +504,7 @@ static int amdgpu_vkms_sw_fini(void *handle)
int i = 0;

for (i = 0; i < adev->mode_info.num_crtc; i++)
-   if (adev->mode_info.crtcs[i])
-   hrtimer_cancel(&adev->mode_info.crtcs[i]->vblank_timer);
+   hrtimer_cancel(&adev->amdgpu_vkms_output[i].vblank_hrtimer);

kfree(adev->mode_info.bios_hardcoded_edid);
kfree(adev->amdgpu_vkms_output);

Then apply your patch on top to share more code with the existing dce files.

Alex

>
>
>
> From: Deucher, Alexander 
> Sent: 2021年11月23日 0:43
> To: Chen, Guchun ; Cui, Flora ; 
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 1/2] drm/amdgpu: fix vkms hrtimer settings
>
>
>
> [Public]
>
>
>
> Can you explain how the current code is failing?  It's not immediately 
> obvious to me.  I'm not opposed to this change, it's just not clear to me 
> where the current code fails.
>
>
>
> Alex
>
>
>
> 
>
> From: Chen, Guchun 
> Sent: Monday, November 22, 2021 8:49 AM
> To: Cui, Flora ; amd-gfx@lists.freedesktop.org 
> ; Deucher, Alexander 
> 
> Subject: RE: [PATCH 1/2] drm/amdgpu: fix vkms hrtimer settings
>
>
>
> [Public]
>
> Series is:
> Reviewed-by: Guchun Chen 
>
> +Alex to comment this series as well.
>
> Regards,
> Guchun
>
> -Original Message-
> From: Cui, Flora 
> Sent: Monday, November 22, 2021 5:04 PM
> To: amd-gfx@lists.freedesktop.org; Chen, Guchun 
> Cc: Cui, Flora 
> Subject: [PATCH 1/2] drm/amdgpu: fix vkms hrtimer settings
>
> otherwise adev->mode_info.crtcs[] is NULL
>
> Signed-off-by: Flora Cui 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 38   
> drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.h |  5 ++--
>  2 files changed, 28 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> index ce982afeff91..6c62c45e3e3e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
> @@ -16,6 +16,8 @@
>  #include "ivsrcid/ivsrcid_vislands30.h"
>  #include "amdgpu_vkms.h"
>  #include "amdgpu_display.h"
> +#include "atom.h"
> +#include "amdgpu_irq.h"
>
>  /**
>   * DOC: amdgpu_vkms
> @@ -41,14 +43,13 @@ static const u32 amdgpu_vkms_formats[] = {
>
>  static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer 
> *timer)  {
> -   struct amdgpu_vkms_output *output = container_of(timer,
> -struct 
> amdgpu_vkms_output,
> -vblank_hrtimer);
> -   struct drm_crtc *crtc = &output->crtc;
> +   struct amdgpu_crtc *amdgpu_crtc = container_of(timer, struct 
> amdgpu_crtc, vblank_timer);
> +   struct drm_crtc *crtc = &amdgpu_crtc->base;
> +   struct amdgpu_vkms_output *output =
> +drm_crtc_to_amdgpu_vkms_output(crtc);
>  u64 ret_overrun;
>  bool ret;
>
> -   ret_overrun = hrtimer_forward_now(&output->vblank_hrtimer,
> +   ret_overrun = hrtimer_forward_now(&amdgpu_crtc->vblank_timer,
>output->period_ns);
>  WARN_ON(ret_overrun != 1);
>
> @@ -65,22 +66,21 @@ static int amdgpu_vkms_enable_vblank(struct drm_crtc 
> *crtc)
>  unsigned int pipe = drm_crtc_index(crtc);
>  struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>  struct amdgpu_vkms_output *out = 
> drm_crtc_to_amdgpu_vkms_output(crtc);
> +   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
>
>  drm_calc_timestamping_constants(crtc, &crtc->mode);
>
> -   hrtimer_init(&out->vblank_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -   out->vblank_hrtimer.function = &amdgpu_vkms_vblank_simulate;
>  out->period_ns = ktime_set(0, vblank->framedur_ns);
> -   hrtimer_start(&out->vblank_hrtimer, out->period_ns, HRTIMER_MODE_REL);
> +   hrtimer_start(&amdgpu_crtc->vblank_timer, out->period_ns,
> +HRTIMER_MODE_REL);
>
>  return 0;

Re: [PATCH 3/3] drm/amd/pm: Print the error on command submission

2021-11-23 Thread Deucher, Alexander
[AMD Official Use Only]

Series is:
Reviewed-by: Alex Deucher 

From: Tuikov, Luben 
Sent: Monday, November 22, 2021 5:25 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Tuikov, Luben ; Deucher, Alexander 

Subject: [PATCH 3/3] drm/amd/pm: Print the error on command submission

Print the error on command submission immediately after submitting to
the SMU. This is rate-limited. It helps to immediately know there was an
error on command submission, rather than leave it up to clients to report
the error, as sometimes they do not.

Cc: Alex Deucher 
Signed-off-by: Luben Tuikov 
Acked-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index f9a42a07eeaebf..048ca16738638f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -352,7 +352,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
 __smu_cmn_send_msg(smu, (uint16_t) index, param);
 reg = __smu_cmn_poll_stat(smu);
 res = __smu_cmn_reg2errno(smu, reg);
-   if (res == -EREMOTEIO)
+   if (res != 0)
 __smu_cmn_reg_print_error(smu, reg, index, param, msg);
 if (read_arg)
 smu_cmn_read_arg(smu, read_arg);
--
2.34.0



[PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-23 Thread Jane Jian
[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia 
bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
skip ip revision match case and continue use asic type to check

Signed-off-by: Jane Jian 
Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 6 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..0a91e53f520c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1334,7 +1334,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
return r;
}
 
-   r = amdgpu_discovery_set_mm_ip_blocks(adev);
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev))
+   r = amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
+   else
+   r = amdgpu_discovery_set_mm_ip_blocks(adev);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 4f7c70845785..87f56b61be53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -86,6 +86,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
for (i = 0; i < adev->vcn.num_vcn_inst; i++)
atomic_set(&adev->vcn.inst[i].dpg_enc_submission_cnt, 0);
 
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev)) {
+   fw_name = FIRMWARE_SIENNA_CICHLID;
+   goto next;
+   }
switch (adev->ip_versions[UVD_HWIP][0]) {
case IP_VERSION(1, 0, 0):
case IP_VERSION(1, 0, 1):
@@ -168,6 +172,8 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
return -EINVAL;
}
 
+next:
+
r = request_firmware(&adev->vcn.fw, fw_name, adev->dev);
if (r) {
dev_err(adev->dev, "amdgpu_vcn: Can't load firmware \"%s\"\n",
-- 
2.17.1



[PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-23 Thread Jane Jian
[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia 
bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
skip ip revision match case and continue use asic type to check

Signed-off-by: Jane Jian 
Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 6 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..0a91e53f520c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1334,7 +1334,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
return r;
}
 
-   r = amdgpu_discovery_set_mm_ip_blocks(adev);
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev))
+   r = amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
+   else
+   r = amdgpu_discovery_set_mm_ip_blocks(adev);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 4f7c70845785..87f56b61be53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -86,6 +86,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
for (i = 0; i < adev->vcn.num_vcn_inst; i++)
atomic_set(&adev->vcn.inst[i].dpg_enc_submission_cnt, 0);
 
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev)) {
+   fw_name = FIRMWARE_SIENNA_CICHLID;
+   goto next;
+   }
switch (adev->ip_versions[UVD_HWIP][0]) {
case IP_VERSION(1, 0, 0):
case IP_VERSION(1, 0, 1):
@@ -168,6 +172,8 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
return -EINVAL;
}
 
+next:
+
r = request_firmware(&adev->vcn.fw, fw_name, adev->dev);
if (r) {
dev_err(adev->dev, "amdgpu_vcn: Can't load firmware \"%s\"\n",
-- 
2.17.1



Re: [PATCH v10 08/10] dyndbg: add print-to-tracefs, selftest with it - RFC

2021-11-23 Thread Simon Ser
First off, let me reiterate that this feature would be invaluable as user-space
developers. It's often pretty difficult to figure out the cause of an EINVAL,
we have to ask users to follow complicated instructions [1] to grab DRM logs.
Then have to skim through several megabytes of logs to find the error.

I have a hack [2] which just calls system("sudo dmesg") after a failed atomic
commit, it's been pretty handy. But it's really just a hack, a proper solution
would be awesome.

[1]: https://gitlab.freedesktop.org/wlroots/wlroots/-/wikis/DRM-Debugging
[2]: https://gitlab.freedesktop.org/emersion/libliftoff/-/merge_requests/61

> > > > Having a subsystem specific trace buffer would allow subsystem specific
> > > > trace log permissions depending on the sensitivity of the data. But
> > > > doesn't drm output today go to the system log which is typically world
> > > > readable today?

dmesg isn't world-readable these days, it's been changed recently-ish (last
year?) at least on my distribution (Arch). I need root to grab dmesg.

(Maybe we can we just let the DRM master process grab the logs?)

> > > Yes, and that is exactly the problem. The DRM debug output is so high
> > > traffic it would make the system log both unusable due to cruft and
> > > slow down the whole machine. The debug output is only useful when
> > > something went wrong, and at that point it is too late to enable
> > > debugging. That's why a flight recorder with an over-written circular
> > > in-memory buffer is needed.
> >
> > Seans patch reuses enum drm_debug_category to split the tracing
> > stream into 10 sub-streams
> > - how much traffic from each ?
> > - are some sub-streams more valuable for post-mortem ?
> > - any value from further refinement of categories ?
> > - drop irrelevant callsites individually to reduce clutter, extend
> > buffer time/space ?
>
> I think it's hard to predict which sub-streams you are going to need
> before you have a bug to debug. Hence I would err on the side of
> enabling too much. This also means that better or more refined
> categorisation might not be that much of help - or if it is, then are
> the excluded debug messages worth having in the kernel to begin with.
> Well, we're probably not that interested in GPU debugs but just
> everything related to the KMS side, which on the existing categories
> is... everything except half of CORE and DRIVER, maybe? Not sure.

We've been recommending drm.debug=0x19F so far (see wiki linked above).
KMS + PRIME + ATOMIC + LEASE is definitely something we want in, and
CORE + DRIVER contains other useful info. We definitely don't want VBL.

> My feeling is that that could mean in the order of hundreds of log
> events at framerate (e.g. 60 times per second) per each enabled output
> individually. And per DRM device, of course. This is with the
> uninteresting GPU debugs already excluded.

Indeed, successful KMS atomic commits already generate a lot of noise. On my
machine, setting drm.debug=0x19F and running the following command:

sudo dmesg -w | pv >/dev/null

I get 400KiB/s when idling, and 850KiB/s when wiggling the cursor.

> Still, I don't think the flight recorder buffer would need to be
> massive. I suspect it would be enough to hold a few frames' worth which
> is just a split second under active operation. When something happens,
> the userspace stack is likely going to stop on its tracks immediately
> to collect the debug information, which means the flooding should pause
> and the relevant messages don't get overwritten before we get them. In
> a multi-seat system where each device is controlled by a separate
> display server instance, per-device logs would help with this. OTOH,
> multi-seat is not a very common use case I suppose.

There's also the case of multi-GPU where GPU B's logs could clutter GPU A's,
making it harder to understand the cause of an atomic commit failure on GPU A.
So per-device logs would be useful, but not a hard requirement for me, having
*anything* at all would already be a big win.

In my experiments linked above [2], system("sudo dmesg") after atomic commit
failure worked pretty well, and the bottom of the log contained the cause of
the failure. It was pretty useful to system("sudo dmesg -C") before performing
an atomic commit, to be able to only collect the extract of the log relevant to
the atomic commit.

Having some kind of "marker" mechanism could be pretty cool. "Mark" the log
stream before performing an atomic commit (ideally that'd just return e.g. an
uint64 offset), then on failure request the logs collected after that mark.


Re: [PATCH v10 08/10] dyndbg: add print-to-tracefs, selftest with it - RFC

2021-11-23 Thread Pekka Paalanen
On Mon, 22 Nov 2021 15:42:38 -0700
jim.cro...@gmail.com wrote:

> On Mon, Nov 22, 2021 at 2:02 AM Pekka Paalanen  wrote:
> >
> > On Fri, 19 Nov 2021 11:21:36 -0500
> > Jason Baron  wrote:
> >  
> > > On 11/18/21 10:24 AM, Pekka Paalanen wrote:  
> > > > On Thu, 18 Nov 2021 09:29:27 -0500
> > > > Jason Baron  wrote:
> > > >  
> > > >> On 11/16/21 3:46 AM, Pekka Paalanen wrote:  
> > > >>> On Fri, 12 Nov 2021 10:08:41 -0500
> > > >>> Jason Baron  wrote:
> > > >>>  
> > >  On 11/12/21 6:49 AM, Vincent Whitchurch wrote:  
> > > > On Thu, Nov 11, 2021 at 03:02:04PM -0700, Jim Cromie wrote:  
> > > >> Sean Paul proposed, in:
> > > >> https://urldefense.com/v3/__https://patchwork.freedesktop.org/series/78133/__;!!GjvTz_vk!HcKnMRByYkIdyF1apqQjlN5aBIomzJR1an3YWXM6KXs0EftVMQdrewRA8Dki4A$
> > > >> drm/trace: Mirror DRM debug logs to tracefs
> > > >>
> > > >> His patchset's objective is to be able to independently steer some 
> > > >> of
> > > >> the drm.debug stream to an alternate tracing destination, by 
> > > >> splitting
> > > >> drm_debug_enabled() into syslog & trace flavors, and enabling them
> > > >> separately.  2 advantages were identified:
> > > >>
> > > >> 1- syslog is heavyweight, tracefs is much lighter
> > > >> 2- separate selection of enabled categories means less traffic
> > > >>
> > > >> Dynamic-Debug can do 2nd exceedingly well:
> > > >>
> > > >> A- all work is behind jump-label's NOOP, zero off cost.
> > > >> B- exact site selectivity, precisely the useful traffic.
> > > >>can tailor enabled set interactively, at shell.
> > > >>
> > > >> Since the tracefs interface is effective for drm (the threads 
> > > >> suggest
> > > >> so), adding that interface to dynamic-debug has real potential for
> > > >> everyone including drm.
> > > >>
> > > >> if CONFIG_TRACING:
> > > >>
> > > >> Grab Sean's trace_init/cleanup code, use it to provide tracefs
> > > >> available by default to all pr_debugs.  This will likely need some
> > > >> further per-module treatment; perhaps something reflecting 
> > > >> hierarchy
> > > >> of module,file,function,line, maybe with a tuned flattening.
> > > >>
> > > >> endif CONFIG_TRACING
> > > >>
> > > >> Add a new +T flag to enable tracing, independent of +p, and add and
> > > >> use 3 macros: dyndbg_site_is_enabled/logging/tracing(), to 
> > > >> encapsulate
> > > >> the flag checks.  Existing code treats T like other flags.  
> > > >
> > > > I posted a patchset a while ago to do something very similar, but 
> > > > that
> > > > got stalled for some reason and I unfortunately didn't follow it up:
> > > >
> > > >  
> > > > https://urldefense.com/v3/__https://lore.kernel.org/lkml/20200825153338.17061-1-vincent.whitchu...@axis.com/__;!!GjvTz_vk!HcKnMRByYkIdyF1apqQjlN5aBIomzJR1an3YWXM6KXs0EftVMQdrewRGytKHPg$
> > > >
> > > > A key difference between that patchset and this patch (besides that
> > > > small fact that I used +x instead of +T) was that my patchset 
> > > > allowed
> > > > the dyndbg trace to be emitted to the main buffer and did not force 
> > > > them
> > > > to be in an instance-specific buffer.  
> > > 
> > >  Yes, I agree I'd prefer that we print here to the 'main' buffer - it
> > >  seems to keep things simpler and easier to combine the output from
> > >  different sources as you mentioned.  
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> I'm not quite sure I understand this discussion, but I would like to
> > > >>> remind you all of what Sean's original work is about:
> > > >>>
> > > >>> Userspace configures DRM tracing into a flight recorder buffer (I 
> > > >>> guess
> > > >>> this is what you refer to "instance-specific buffer").
> > > >>>
> > > >>> Userspace runs happily for months, and then hits a problem: a failure
> > > >>> in the DRM sub-system most likely, e.g. an ioctl that should never
> > > >>> fail, failed. Userspace handles that failure by dumping the flight
> > > >>> recorder buffer into a file and saving or sending a bug report. The
> > > >>> flight recorder contents give a log of all relevant DRM in-kernel
> > > >>> actions leading to the unexpected failure to help developers debug it.
> > > >>>
> > > >>> I don't mind if one can additionally send the flight recorder stream 
> > > >>> to
> > > >>> the main buffer, but I do want the separate flight recorder buffer to
> > > >>> be an option so that a) unrelated things cannot flood the interesting
> > > >>> bits out of it, and b) the scope of collected information is relevant.
> > > >>>
> > > >>> The very reason for this work is problems that are very difficult to
> > > >>> reproduce in practice, either because the problem itself is triggered
> > > >>> very rarely and randomly, or because the end users of the system have
> > > >>> either no knowledge or no access to r