RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Public Use]

Hello GuChun,

Good point, it makes sense to make function static inline here, I think I shall 
also rename the function from  get_umc_reg_offset  to  get_umc_6_reg_offset.

Thank you,
John Clements

From: Chen, Guchun 
Sent: Friday, January 3, 2020 11:09 AM
To: Clements, John ; Zhang, Hawking 
; amd-gfx@lists.freedesktop.org; Zhou1, Tao 

Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Yes, John, that concern is cleared after I look into the code.

One more issue is, it's better that function get_umc_reg_offset is one static 
inline function? With this problem fixed, the patch is: Reviewed-by: Guchun 
Chen mailto:guchun.c...@amd.com>>

uint32_t get_umc_reg_offset(struct amdgpu_device *adev,
+ uint32_t umc_inst,
+ uint32_t ch_inst)

Regards,
Guchun

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Friday, January 3, 2020 10:58 AM
To: Chen, Guchun mailto:guchun.c...@amd.com>>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Hello GuChun/Hawking,

Thank you for your feedback, I have updated the patch with the following 
amendments:

  *   Remove +#define UMC_REG_OFFSET (I forgot to remove this in original 
patch, I prefer the function over the macro)
  *   Updated the coding style of the braces in the for loops to have the 
starting brace on the same line as the for loop declaration

GuChun,
For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error 
counter register reading, the local SW error counters can only be incremented 
and not cleared throughout the iteration over the UMC error counter registers.

Thank you,
John Clements

From: Chen, Guchun mailto:guchun.c...@amd.com>>
Sent: Friday, January 3, 2020 9:07 AM
To: Zhang, Hawking mailto:hawking.zh...@amd.com>>; 
Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Zhang, Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Chen, Guchun
[AMD Public Use]

Yes, John, that concern is cleared after I look into the code.

One more issue is, it's better that function get_umc_reg_offset is one static 
inline function? With this problem fixed, the patch is: Reviewed-by: Guchun 
Chen 

uint32_t get_umc_reg_offset(struct amdgpu_device *adev,
+ uint32_t umc_inst,
+ uint32_t ch_inst)

Regards,
Guchun

From: Clements, John 
Sent: Friday, January 3, 2020 10:58 AM
To: Chen, Guchun ; Zhang, Hawking ; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Hello GuChun/Hawking,

Thank you for your feedback, I have updated the patch with the following 
amendments:

  *   Remove +#define UMC_REG_OFFSET (I forgot to remove this in original 
patch, I prefer the function over the macro)
  *   Updated the coding style of the braces in the for loops to have the 
starting brace on the same line as the for loop declaration

GuChun,
For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error 
counter register reading, the local SW error counters can only be incremented 
and not cleared throughout the iteration over the UMC error counter registers.

Thank you,
John Clements

From: Chen, Guchun mailto:guchun.c...@amd.com>>
Sent: Friday, January 3, 2020 9:07 AM
To: Zhang, Hawking mailto:hawking.zh...@amd.com>>; 
Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Zhang, Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Public Use]

Hello Tao,

That is an interesting suggestion, I agree that there is a little bit of 
duplicate code with  the same for loops being used in multiple functions.

My only concern with implementing the loops in a macro is code readability.

I’ll have to think about the trade off between the duplicate code and code 
readability more.

Thank you,
John Clements

From: Zhou1, Tao 
Sent: Friday, January 3, 2020 10:53 AM
To: Clements, John ; amd-gfx@lists.freedesktop.org; 
Zhang, Hawking 
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

I think we can implement it by only updating amdgpu_umc_for_each_channel macro, 
here is an example:

#define amdgpu_umc_for_each_channel(func)\
struct ras_err_data *err_data = \
(struct ras_err_data *)ras_error_status;\
uint32_t umc_inst, channel_inst, umc_reg_offset, channel_index; \
for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num;  \
umc_inst++) {   \
umc_reg_offset = adev->umc.inst_offs * umc_inst;   \
for (channel_inst = 0;  \
channel_inst < adev->umc.channel_inst_num; \
channel_inst++) {   \
/* get channel index of interleaved memory */   \
channel_index = adev->umc.channel_idx_tbl[\
umc_inst * adev->umc.channel_inst_num + 
channel_inst]; \
(func)(adev, err_data, umc_reg_offset, channel_index); \
/* increase register offset for next channel */ \
umc_reg_offset += adev->umc.channel_offs;  \
}   \
}

Regards,
Tao
From: Clements, John mailto:john.cleme...@amd.com>>
Sent: 2020年1月2日 18:31
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Public Use]

Hello GuChun/Hawking,

Thank you for your feedback, I have updated the patch with the following 
amendments:

  *   Remove +#define UMC_REG_OFFSET (I forgot to remove this in original 
patch, I prefer the function over the macro)
  *   Updated the coding style of the braces in the for loops to have the 
starting brace on the same line as the for loop declaration

GuChun,
For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error 
counter register reading, the local SW error counters can only be incremented 
and not cleared throughout the iteration over the UMC error counter registers.

Thank you,
John Clements

From: Chen, Guchun 
Sent: Friday, January 3, 2020 9:07 AM
To: Zhang, Hawking ; Clements, John 
; amd-gfx@lists.freedesktop.org; Zhou1, Tao 

Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Zhang, Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements


0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
Description: 0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Zhou1, Tao
[AMD Public Use]

I think we can implement it by only updating amdgpu_umc_for_each_channel macro, 
here is an example:

#define amdgpu_umc_for_each_channel(func)\
struct ras_err_data *err_data = \
(struct ras_err_data *)ras_error_status;\
uint32_t umc_inst, channel_inst, umc_reg_offset, channel_index; \
for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num;  \
umc_inst++) {   \
umc_reg_offset = adev->umc.inst_offs * umc_inst;   \
for (channel_inst = 0;  \
channel_inst < adev->umc.channel_inst_num; \
channel_inst++) {   \
/* get channel index of interleaved memory */   \
channel_index = adev->umc.channel_idx_tbl[\
umc_inst * adev->umc.channel_inst_num + 
channel_inst]; \
(func)(adev, err_data, umc_reg_offset, channel_index); \
/* increase register offset for next channel */ \
umc_reg_offset += adev->umc.channel_offs;  \
}   \
}

Regards,
Tao
From: Clements, John 
Sent: 2020年1月2日 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Chen, Guchun
[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John ; amd-gfx@lists.freedesktop.org; 
Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John 
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements


0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
Description: 0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx