RE: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

2024-01-02 Thread Zhang, Hawking
[AMD Official Use Only - General]

Yes, it is.

Regards,
Hawking

From: Deucher, Alexander 
Sent: Wednesday, January 3, 2024 03:45
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; 
Zhou1, Tao ; Yang, Stanley ; Wang, 
Yang(Kevin) ; Chai, Thomas ; Li, 
Candice 
Cc: Lazar, Lijo ; Ma, Le 
Subject: Re: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed


[AMD Official Use Only - General]

Is mmIP_DISCOVERY_VERSION at the same offset across ASIC families?

Alex


From: Hawking Zhang mailto:hawking.zh...@amd.com>>
Sent: Monday, January 1, 2024 10:43 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>; Zhou1, 
Tao mailto:tao.zh...@amd.com>>; Yang, Stanley 
mailto:stanley.y...@amd.com>>; Wang, Yang(Kevin) 
mailto:kevinyang.w...@amd.com>>; Chai, Thomas 
mailto:yipeng.c...@amd.com>>; Li, Candice 
mailto:candice...@amd.com>>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; 
Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>; Ma, Le 
mailto:le...@amd.com>>
Subject: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

Check and report boot status if discovery failed.

Signed-off-by: Hawking Zhang 
mailto:hawking.zh...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index b8fde08aec8e..302b71e9f1e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -27,6 +27,7 @@
 #include "amdgpu_discovery.h"
 #include "soc15_hw_ip.h"
 #include "discovery.h"
+#include "amdgpu_ras.h"

 #include "soc15.h"
 #include "gfx_v9_0.h"
@@ -98,6 +99,7 @@
 #define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"
 MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);

+#define mmIP_DISCOVERY_VERSION  0x16A00
 #define mmRCC_CONFIG_MEMSIZE0xde3
 #define mmMP0_SMN_C2PMSG_33 0x16061
 #define mmMM_INDEX  0x0
@@ -518,7 +520,9 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
 out:
 kfree(adev->mman.discovery_bin);
 adev->mman.discovery_bin = NULL;
-
+   if ((amdgpu_discovery != 2) &&
+   (RREG32(mmIP_DISCOVERY_VERSION) == 4))
+   amdgpu_ras_query_boot_status(adev, 4);
 return r;
 }

--
2.17.1


RE: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

2024-01-02 Thread Zhang, Hawking
[AMD Official Use Only - General]

RE - I'm not sure about hard-coding 4 instances here. The code you dropped in 
patch 1 was using adev->aid_mask. But I guess that's not even initialized 
correctly if IP discovery failed. Will this work correctly on the APU version?

Yes aid_mask is not initialized. IP_DISCOVERY_VERSION is the only available 
fuse setting that can be used to identify or equivalent to 4 instances of aid 
in such case. We switched to a common mailbox reg that works for both APU and 
dGPU. The expectation is for APU, driver still reports fw boot status, while it 
gives next level information on the failures if boot fails on dGPU.

Regards,
Hawking

-Original Message-
From: Kuehling, Felix 
Sent: Wednesday, January 3, 2024 01:49
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; 
Zhou1, Tao ; Yang, Stanley ; Wang, 
Yang(Kevin) ; Chai, Thomas ; Li, 
Candice 
Cc: Deucher, Alexander ; Ma, Le ; 
Lazar, Lijo 
Subject: Re: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed


On 2024-01-02 09:07, Hawking Zhang wrote:
> Check and report boot status if discovery failed.
>
> Signed-off-by: Hawking Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index b8fde08aec8e..302b71e9f1e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -27,6 +27,7 @@
>   #include "amdgpu_discovery.h"
>   #include "soc15_hw_ip.h"
>   #include "discovery.h"
> +#include "amdgpu_ras.h"
>
>   #include "soc15.h"
>   #include "gfx_v9_0.h"
> @@ -98,6 +99,7 @@
>   #define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"
>   MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);
>
> +#define mmIP_DISCOVERY_VERSION  0x16A00
>   #define mmRCC_CONFIG_MEMSIZE0xde3
>   #define mmMP0_SMN_C2PMSG_33 0x16061
>   #define mmMM_INDEX  0x0
> @@ -518,7 +520,9 @@ static int amdgpu_discovery_init(struct amdgpu_device 
> *adev)
>   out:
>   kfree(adev->mman.discovery_bin);
>   adev->mman.discovery_bin = NULL;
> -
> + if ((amdgpu_discovery != 2) &&
> + (RREG32(mmIP_DISCOVERY_VERSION) == 4))
> + amdgpu_ras_query_boot_status(adev, 4);
I'm not sure about hard-coding 4 instances here. The code you dropped in patch 
1 was using adev->aid_mask. But I guess that's not even initialized correctly 
if IP discovery failed. Will this work correctly on the APU version?

Regards,
   Felix


>   return r;
>   }
>


Re: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

2024-01-02 Thread Deucher, Alexander
[AMD Official Use Only - General]

Is mmIP_DISCOVERY_VERSION at the same offset across ASIC families?

Alex


From: Hawking Zhang 
Sent: Monday, January 1, 2024 10:43 PM
To: amd-gfx@lists.freedesktop.org ; Zhou1, Tao 
; Yang, Stanley ; Wang, Yang(Kevin) 
; Chai, Thomas ; Li, Candice 

Cc: Zhang, Hawking ; Deucher, Alexander 
; Lazar, Lijo ; Ma, Le 

Subject: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

Check and report boot status if discovery failed.

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index b8fde08aec8e..302b71e9f1e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -27,6 +27,7 @@
 #include "amdgpu_discovery.h"
 #include "soc15_hw_ip.h"
 #include "discovery.h"
+#include "amdgpu_ras.h"

 #include "soc15.h"
 #include "gfx_v9_0.h"
@@ -98,6 +99,7 @@
 #define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"
 MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);

+#define mmIP_DISCOVERY_VERSION  0x16A00
 #define mmRCC_CONFIG_MEMSIZE0xde3
 #define mmMP0_SMN_C2PMSG_33 0x16061
 #define mmMM_INDEX  0x0
@@ -518,7 +520,9 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
 out:
 kfree(adev->mman.discovery_bin);
 adev->mman.discovery_bin = NULL;
-
+   if ((amdgpu_discovery != 2) &&
+   (RREG32(mmIP_DISCOVERY_VERSION) == 4))
+   amdgpu_ras_query_boot_status(adev, 4);
 return r;
 }

--
2.17.1



Re: [PATCH 4/5] drm/amdgpu: Query boot status if discovery failed

2024-01-02 Thread Felix Kuehling



On 2024-01-02 09:07, Hawking Zhang wrote:

Check and report boot status if discovery failed.

Signed-off-by: Hawking Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index b8fde08aec8e..302b71e9f1e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -27,6 +27,7 @@
  #include "amdgpu_discovery.h"
  #include "soc15_hw_ip.h"
  #include "discovery.h"
+#include "amdgpu_ras.h"
  
  #include "soc15.h"

  #include "gfx_v9_0.h"
@@ -98,6 +99,7 @@
  #define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"
  MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);
  
+#define mmIP_DISCOVERY_VERSION  0x16A00

  #define mmRCC_CONFIG_MEMSIZE  0xde3
  #define mmMP0_SMN_C2PMSG_33   0x16061
  #define mmMM_INDEX0x0
@@ -518,7 +520,9 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
  out:
kfree(adev->mman.discovery_bin);
adev->mman.discovery_bin = NULL;
-
+   if ((amdgpu_discovery != 2) &&
+   (RREG32(mmIP_DISCOVERY_VERSION) == 4))
+   amdgpu_ras_query_boot_status(adev, 4);
I'm not sure about hard-coding 4 instances here. The code you dropped in 
patch 1 was using adev->aid_mask. But I guess that's not even 
initialized correctly if IP discovery failed. Will this work correctly 
on the APU version?


Regards,
  Felix



return r;
  }