RE: [PATCH] drm/amdgpu: added function to wait for PSP BL availability

2020-01-02 Thread Chen, Guchun
[AMD Official Use Only - Internal Distribution Only]

- /* there might be handshake issue with hardware which needs delay 
*/
- mdelay(20);

To be safety, I don't think we should remove this delay. And this actually do 
nothing with the code refine in this patch.

Regards,
Guchun

From: amd-gfx  On Behalf Of Clements, 
John
Sent: Friday, January 3, 2020 3:29 PM
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: added function to wait for PSP BL availability


[AMD Official Use Only - Internal Distribution Only]

Added dedicated function to wait for PSP BL availability.

Increased driver wait time for PSP BL availability.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: removed GFX RAS support check in UMC ECC callback

2020-01-02 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only]

Enable path for GPU recovery in event of UMC uncorrectable error.

Thank you,
John Clements


0001-drm-amdgpu-removed-GFX-RAS-support-check-in-UMC-ECC-.patch
Description: 0001-drm-amdgpu-removed-GFX-RAS-support-check-in-UMC-ECC-.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: added function to wait for PSP BL availability

2020-01-02 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only]

Added dedicated function to wait for PSP BL availability.

Increased driver wait time for PSP BL availability.

Thank you,
John Clements


0001-drm-amdgpu-added-function-to-wait-for-PSP-BL-availab.patch
Description: 0001-drm-amdgpu-added-function-to-wait-for-PSP-BL-availab.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: disable PSP XGMI TA unload sequence

2020-01-02 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only]

Disabled PSP XGMI TA unload sequence as it currently causes GPU recovery to 
fail.

Thank you,
John Clements


0001-drm-amdgpu-disable-PSP-XGMI-TA-unload-sequence.patch
Description: 0001-drm-amdgpu-disable-PSP-XGMI-TA-unload-sequence.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Error] amdgpu powerplay ip block error with -22.

2020-01-02 Thread Yusuf Altıparmak
Hello,

Still can't find a solution about this. Anyone can help me about this ?

Regards.


Yusuf Altıparmak , 2 Oca 2020 Per, 13:29 tarihinde
şunu yazdı:

> Hello,
>
> First you could check if the binary ‘polaris12_smc.bin’ is in your system:
>> /lib/firmware/../amdgpu/
>>
> 'polaris12_smc.bin' exists in my /lib/firmware/amdgpu folder. There are
> also 18 other binaries which starts with 'polaris12_'.
>
> If it’s there, then does this happen after a warm reset?
>>
>
> This does happen when booting up the board with ramdisk image (initrfs
> stage).
>
> Regards.
>
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/dp_mst: correct the shifting in DP_REMOTE_I2C_READ

2020-01-02 Thread Wayne Lin
[Why]
According to DP spec, it should shift left 4 digits for NO_STOP_BIT
in REMOTE_I2C_READ message. Not 5 digits.

In current code, NO_STOP_BIT is always set to zero which means I2C
master is always generating a I2C stop at the end of each I2C write
transaction while handling REMOTE_I2C_READ sideband message. This issue
might have the generated I2C signal not meeting the requirement. Take
random read in I2C for instance, I2C master should generate a repeat
start to start to read data after writing the read address. This issue
will cause the I2C master to generate a stop-start rather than a
re-start which is not expected in I2C random read.

[How]
Correct the shifting value of NO_STOP_BIT for DP_REMOTE_I2C_READ case in
drm_dp_encode_sideband_req().

Changes since v1:(https://patchwork.kernel.org/patch/11312667/)
* Add more descriptions in commit and cc to stable

Fixes: ad7f8a1f9ce (drm/helper: add Displayport multi-stream helper (v0.6))
Reviewed-by: Harry Wentland 
Signed-off-by: Wayne Lin 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 1cf5f8b8bbb8..9d24c98bece1 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -393,7 +393,7 @@ drm_dp_encode_sideband_req(const struct 
drm_dp_sideband_msg_req_body *req,
memcpy([idx], 
req->u.i2c_read.transactions[i].bytes, 
req->u.i2c_read.transactions[i].num_bytes);
idx += req->u.i2c_read.transactions[i].num_bytes;
 
-   buf[idx] = (req->u.i2c_read.transactions[i].no_stop_bit 
& 0x1) << 5;
+   buf[idx] = (req->u.i2c_read.transactions[i].no_stop_bit 
& 0x1) << 4;
buf[idx] |= 
(req->u.i2c_read.transactions[i].i2c_transaction_delay & 0xf);
idx++;
}
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 5/5] drm/amdgpu: immedially invalidate PTEs

2020-01-02 Thread Sierra Guiza, Alejandro (Alex)
[AMD Official Use Only - Internal Distribution Only]

Hi Christian, 
I wonder if you had a chance to look into this warning. 
Please let me know if there's something we could help with.

Regards,
Alejandro

-Original Message-
From: Christian König  
Sent: Thursday, December 12, 2019 2:52 AM
To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org; 
Yang, Philip ; Sierra Guiza, Alejandro (Alex) 

Subject: Re: [PATCH 5/5] drm/amdgpu: immedially invalidate PTEs

[CAUTION: External Email]

Hi Felix,

yeah, I've also found a corner case which would raise a warning now.

Need to rework how dependencies for the PTE update are generated.

Going to take care of this in the next few days, Christian.

Am 12.12.19 um 01:20 schrieb Felix Kuehling:
> Hi Christian,
>
> Alex started trying to invalidate PTEs in the MMU notifiers and we're 
> finding that we still need to reserve the VM reservation for 
> amdgpu_sync_resv in amdgpu_vm_sdma_prepare. Is that sync_resv still 
> needed now, given that VM fences aren't in that reservation object any 
> more?
>
> Regards,
>   Felix
>
> On 2019-12-05 5:39, Christian König wrote:
>> When a BO is evicted immedially invalidate the mapped PTEs.
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 -
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 839d6df394fc..e578113bfd55 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -2565,6 +2565,7 @@ void amdgpu_vm_bo_invalidate(struct 
>> amdgpu_device *adev,
>>struct amdgpu_bo *bo, bool evicted)
>>   {
>>   struct amdgpu_vm_bo_base *bo_base;
>> +int r;
>> /* shadow bo doesn't have bo base, its validation needs its 
>> parent */
>>   if (bo->parent && bo->parent->shadow == bo) @@ -2572,8 +2573,22 
>> @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
>> for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
>>   struct amdgpu_vm *vm = bo_base->vm;
>> +struct dma_resv *resv = vm->root.base.bo->tbo.base.resv;
>> +
>> +if (bo->tbo.type != ttm_bo_type_kernel) {
>> +struct amdgpu_bo_va *bo_va;
>> +
>> +bo_va = container_of(bo_base, struct amdgpu_bo_va,
>> + base);
>> +r = amdgpu_vm_bo_update(adev, bo_va,
>> +bo->tbo.base.resv != resv);
>> +if (!r) {
>> +amdgpu_vm_bo_idle(bo_base);
>> +continue;
>> +}
>> +}
>>   -if (evicted && bo->tbo.base.resv ==
>> vm->root.base.bo->tbo.base.resv) {
>> +if (evicted && bo->tbo.base.resv == resv) {
>>   amdgpu_vm_bo_evicted(bo_base);
>>   continue;
>>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Public Use]

Hello GuChun,

Good point, it makes sense to make function static inline here, I think I shall 
also rename the function from  get_umc_reg_offset  to  get_umc_6_reg_offset.

Thank you,
John Clements

From: Chen, Guchun 
Sent: Friday, January 3, 2020 11:09 AM
To: Clements, John ; Zhang, Hawking 
; amd-gfx@lists.freedesktop.org; Zhou1, Tao 

Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Yes, John, that concern is cleared after I look into the code.

One more issue is, it's better that function get_umc_reg_offset is one static 
inline function? With this problem fixed, the patch is: Reviewed-by: Guchun 
Chen mailto:guchun.c...@amd.com>>

uint32_t get_umc_reg_offset(struct amdgpu_device *adev,
+ uint32_t umc_inst,
+ uint32_t ch_inst)

Regards,
Guchun

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Friday, January 3, 2020 10:58 AM
To: Chen, Guchun mailto:guchun.c...@amd.com>>; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Hello GuChun/Hawking,

Thank you for your feedback, I have updated the patch with the following 
amendments:

  *   Remove +#define UMC_REG_OFFSET (I forgot to remove this in original 
patch, I prefer the function over the macro)
  *   Updated the coding style of the braces in the for loops to have the 
starting brace on the same line as the for loop declaration

GuChun,
For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error 
counter register reading, the local SW error counters can only be incremented 
and not cleared throughout the iteration over the UMC error counter registers.

Thank you,
John Clements

From: Chen, Guchun mailto:guchun.c...@amd.com>>
Sent: Friday, January 3, 2020 9:07 AM
To: Zhang, Hawking mailto:hawking.zh...@amd.com>>; 
Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Zhang, Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Chen, Guchun
[AMD Public Use]

Yes, John, that concern is cleared after I look into the code.

One more issue is, it's better that function get_umc_reg_offset is one static 
inline function? With this problem fixed, the patch is: Reviewed-by: Guchun 
Chen 

uint32_t get_umc_reg_offset(struct amdgpu_device *adev,
+ uint32_t umc_inst,
+ uint32_t ch_inst)

Regards,
Guchun

From: Clements, John 
Sent: Friday, January 3, 2020 10:58 AM
To: Chen, Guchun ; Zhang, Hawking ; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Hello GuChun/Hawking,

Thank you for your feedback, I have updated the patch with the following 
amendments:

  *   Remove +#define UMC_REG_OFFSET (I forgot to remove this in original 
patch, I prefer the function over the macro)
  *   Updated the coding style of the braces in the for loops to have the 
starting brace on the same line as the for loop declaration

GuChun,
For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error 
counter register reading, the local SW error counters can only be incremented 
and not cleared throughout the iteration over the UMC error counter registers.

Thank you,
John Clements

From: Chen, Guchun mailto:guchun.c...@amd.com>>
Sent: Friday, January 3, 2020 9:07 AM
To: Zhang, Hawking mailto:hawking.zh...@amd.com>>; 
Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Zhang, Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Public Use]

Hello Tao,

That is an interesting suggestion, I agree that there is a little bit of 
duplicate code with  the same for loops being used in multiple functions.

My only concern with implementing the loops in a macro is code readability.

I’ll have to think about the trade off between the duplicate code and code 
readability more.

Thank you,
John Clements

From: Zhou1, Tao 
Sent: Friday, January 3, 2020 10:53 AM
To: Clements, John ; amd-gfx@lists.freedesktop.org; 
Zhang, Hawking 
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

I think we can implement it by only updating amdgpu_umc_for_each_channel macro, 
here is an example:

#define amdgpu_umc_for_each_channel(func)\
struct ras_err_data *err_data = \
(struct ras_err_data *)ras_error_status;\
uint32_t umc_inst, channel_inst, umc_reg_offset, channel_index; \
for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num;  \
umc_inst++) {   \
umc_reg_offset = adev->umc.inst_offs * umc_inst;   \
for (channel_inst = 0;  \
channel_inst < adev->umc.channel_inst_num; \
channel_inst++) {   \
/* get channel index of interleaved memory */   \
channel_index = adev->umc.channel_idx_tbl[\
umc_inst * adev->umc.channel_inst_num + 
channel_inst]; \
(func)(adev, err_data, umc_reg_offset, channel_index); \
/* increase register offset for next channel */ \
umc_reg_offset += adev->umc.channel_offs;  \
}   \
}

Regards,
Tao
From: Clements, John mailto:john.cleme...@amd.com>>
Sent: 2020年1月2日 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Public Use]

Hello GuChun/Hawking,

Thank you for your feedback, I have updated the patch with the following 
amendments:

  *   Remove +#define UMC_REG_OFFSET (I forgot to remove this in original 
patch, I prefer the function over the macro)
  *   Updated the coding style of the braces in the for loops to have the 
starting brace on the same line as the for loop declaration

GuChun,
For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error 
counter register reading, the local SW error counters can only be incremented 
and not cleared throughout the iteration over the UMC error counter registers.

Thank you,
John Clements

From: Chen, Guchun 
Sent: Friday, January 3, 2020 9:07 AM
To: Zhang, Hawking ; Clements, John 
; amd-gfx@lists.freedesktop.org; Zhou1, Tao 

Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Zhang, Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements


0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
Description: 0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Zhou1, Tao
[AMD Public Use]

I think we can implement it by only updating amdgpu_umc_for_each_channel macro, 
here is an example:

#define amdgpu_umc_for_each_channel(func)\
struct ras_err_data *err_data = \
(struct ras_err_data *)ras_error_status;\
uint32_t umc_inst, channel_inst, umc_reg_offset, channel_index; \
for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num;  \
umc_inst++) {   \
umc_reg_offset = adev->umc.inst_offs * umc_inst;   \
for (channel_inst = 0;  \
channel_inst < adev->umc.channel_inst_num; \
channel_inst++) {   \
/* get channel index of interleaved memory */   \
channel_index = adev->umc.channel_idx_tbl[\
umc_inst * adev->umc.channel_inst_num + 
channel_inst]; \
(func)(adev, err_data, umc_reg_offset, channel_index); \
/* increase register offset for next channel */ \
umc_reg_offset += adev->umc.channel_offs;  \
}   \
}

Regards,
Tao
From: Clements, John 
Sent: 2020年1月2日 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Chen, Guchun
[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * 
(ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+{
Another coding style problem. "{" should follow closely at the same line, not 
starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error 
counter for all UMC channels. But we always use the same variable to do the 
query. So the value will be overwritten by new one? Then we will miss former 
error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John ; amd-gfx@lists.freedesktop.org; 
Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, 
Hawking mailto:hawking.zh...@amd.com>>; Zhou1, Tao 
mailto:tao.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/5] drm/amdgpu: Avoid reclaim fs while eviction lock

2020-01-02 Thread Yong Zhao

One comment inline.

On 2020-01-02 4:11 p.m., Alex Sierra wrote:

[Why]
Avoid reclaim filesystem while eviction lock is held called from
MMU notifier.

[How]
Setting PF_MEMALLOC_NOFS flags while eviction mutex is locked.
Using memalloc_nofs_save / memalloc_nofs_restore API.

Change-Id: I5531c9337836e7d4a430df3f16dcc82888e8018c
Signed-off-by: Alex Sierra 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 40 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  6 +++-
  2 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index b999b67ff57a..d6aba4f9df74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -82,6 +82,32 @@ struct amdgpu_prt_cb {
struct dma_fence_cb cb;
  };
  
+/**

+ * vm eviction_lock can be taken in MMU notifiers. Make sure no reclaim-FS
+ * happens while holding this lock anywhere to prevent deadlocks when
+ * an MMU notifier runs in reclaim-FS context.
+ */
+static inline void amdgpu_vm_eviction_lock(struct amdgpu_vm *vm)
+{
+   mutex_lock(>eviction_lock);
+   vm->saved_flags = memalloc_nofs_save();
[yz] I feel memalloc_nofs_save() should be called before mutex_lock(). 
Not too sure though.

+}
+
+static inline int amdgpu_vm_eviction_trylock(struct amdgpu_vm *vm)
+{
+   if (mutex_trylock(>eviction_lock)) {
+   vm->saved_flags = memalloc_nofs_save();
+   return 1;
+   }
+   return 0;
+}
+
+static inline void amdgpu_vm_eviction_unlock(struct amdgpu_vm *vm)
+{
+   memalloc_nofs_restore(vm->saved_flags);
+   mutex_unlock(>eviction_lock);
+}
+
  /**
   * amdgpu_vm_level_shift - return the addr shift for each level
   *
@@ -678,9 +704,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
}
}
  
-	mutex_lock(>eviction_lock);

+   amdgpu_vm_eviction_lock(vm);
vm->evicting = false;
-   mutex_unlock(>eviction_lock);
+   amdgpu_vm_eviction_unlock(vm);
  
  	return 0;

  }
@@ -1559,7 +1585,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
if (!(flags & AMDGPU_PTE_VALID))
owner = AMDGPU_FENCE_OWNER_KFD;
  
-	mutex_lock(>eviction_lock);

+   amdgpu_vm_eviction_lock(vm);
if (vm->evicting) {
r = -EBUSY;
goto error_unlock;
@@ -1576,7 +1602,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
r = vm->update_funcs->commit(, fence);
  
  error_unlock:

-   mutex_unlock(>eviction_lock);
+   amdgpu_vm_eviction_unlock(vm);
return r;
  }
  
@@ -2537,18 +2563,18 @@ bool amdgpu_vm_evictable(struct amdgpu_bo *bo)

return false;
  
  	/* Try to block ongoing updates */

-   if (!mutex_trylock(_base->vm->eviction_lock))
+   if (!amdgpu_vm_eviction_trylock(bo_base->vm))
return false;
  
  	/* Don't evict VM page tables while they are updated */

if (!dma_fence_is_signaled(bo_base->vm->last_direct) ||
!dma_fence_is_signaled(bo_base->vm->last_delayed)) {
-   mutex_unlock(_base->vm->eviction_lock);
+   amdgpu_vm_eviction_unlock(bo_base->vm);
return false;
}
  
  	bo_base->vm->evicting = true;

-   mutex_unlock(_base->vm->eviction_lock);
+   amdgpu_vm_eviction_unlock(bo_base->vm);
return true;
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 100547f094ff..c21a36bebc0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -30,6 +30,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "amdgpu_sync.h"

  #include "amdgpu_ring.h"
@@ -242,9 +243,12 @@ struct amdgpu_vm {
/* tree of virtual addresses mapped */
struct rb_root_cached   va;
  
-	/* Lock to prevent eviction while we are updating page tables */

+   /* Lock to prevent eviction while we are updating page tables
+* use vm_eviction_lock/unlock(vm)
+*/
struct mutexeviction_lock;
boolevicting;
+   unsigned intsaved_flags;
  
  	/* BOs who needs a validation */

struct list_headevicted;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/5] drm/amdgpu: export function to flush TLB via pasid

2020-01-02 Thread Yong Zhao

See one inline comment. Other than that:

Acked-by: Yong Zhao 

On 2020-01-02 4:11 p.m., Alex Sierra wrote:

This can be used directly from amdgpu and amdkfd to invalidate
TLB through pasid.
It supports gmc v7, v8, v9 and v10.

Change-Id: I6563a8eba2e42d1a67fa2547156c20da41d1e490
Signed-off-by: Alex Sierra 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  6 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 87 
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c   | 33 +
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   | 34 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 89 +
  5 files changed, 249 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index b499a3de8bb6..b6413a56f546 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -92,6 +92,9 @@ struct amdgpu_gmc_funcs {
/* flush the vm tlb via mmio */
void (*flush_gpu_tlb)(struct amdgpu_device *adev, uint32_t vmid,
uint32_t vmhub, uint32_t flush_type);
+   /* flush the vm tlb via pasid */
+   int (*flush_gpu_tlb_pasid)(struct amdgpu_device *adev, uint16_t pasid,
+   uint32_t flush_type, bool all_hub);
/* flush the vm tlb via ring */
uint64_t (*emit_flush_gpu_tlb)(struct amdgpu_ring *ring, unsigned vmid,
   uint64_t pd_addr);
@@ -216,6 +219,9 @@ struct amdgpu_gmc {
  };
  
  #define amdgpu_gmc_flush_gpu_tlb(adev, vmid, vmhub, type) ((adev)->gmc.gmc_funcs->flush_gpu_tlb((adev), (vmid), (vmhub), (type)))

+#define amdgpu_gmc_flush_gpu_tlb_pasid(adev, pasid, type, allhub) \
+   ((adev)->gmc.gmc_funcs->flush_gpu_tlb_pasid \
+   ((adev), (pasid), (type), (allhub)))
  #define amdgpu_gmc_emit_flush_gpu_tlb(r, vmid, addr) 
(r)->adev->gmc.gmc_funcs->emit_flush_gpu_tlb((r), (vmid), (addr))
  #define amdgpu_gmc_emit_pasid_mapping(r, vmid, pasid) 
(r)->adev->gmc.gmc_funcs->emit_pasid_mapping((r), (vmid), (pasid))
  #define amdgpu_gmc_map_mtype(adev, flags) 
(adev)->gmc.gmc_funcs->map_mtype((adev),(flags))
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index f5725336a5f2..11a2252e60f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -30,6 +30,8 @@
  #include "hdp/hdp_5_0_0_sh_mask.h"
  #include "gc/gc_10_1_0_sh_mask.h"
  #include "mmhub/mmhub_2_0_0_sh_mask.h"
+#include "athub/athub_2_0_0_sh_mask.h"
+#include "athub/athub_2_0_0_offset.h"
  #include "dcn/dcn_2_0_0_offset.h"
  #include "dcn/dcn_2_0_0_sh_mask.h"
  #include "oss/osssys_5_0_0_offset.h"
@@ -37,6 +39,7 @@
  #include "navi10_enum.h"
  
  #include "soc15.h"

+#include "soc15d.h"
  #include "soc15_common.h"
  
  #include "nbio_v2_3.h"

@@ -234,6 +237,48 @@ static bool gmc_v10_0_use_invalidate_semaphore(struct 
amdgpu_device *adev,
(!amdgpu_sriov_vf(adev)));
  }
  
+static bool gmc_v10_0_get_atc_vmid_pasid_mapping_info(

+   struct amdgpu_device *adev,
+   uint8_t vmid, uint16_t *p_pasid)
+{
+   uint32_t value;
+
+   value = RREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING)
++ vmid);
+   *p_pasid = value & ATC_VMID0_PASID_MAPPING__PASID_MASK;
+
+   return !!(value & ATC_VMID0_PASID_MAPPING__VALID_MASK);
+}
+
+static int gmc_v10_0_invalidate_tlbs_with_kiq(struct amdgpu_device *adev,
+   uint16_t pasid, uint32_t flush_type,
+   bool all_hub)
+{
+   signed long r;
+   uint32_t seq;
+   struct amdgpu_ring *ring = >gfx.kiq.ring;
+
+   spin_lock(>gfx.kiq.ring_lock);
+   amdgpu_ring_alloc(ring, 12); /* fence + invalidate_tlbs package*/
+   amdgpu_ring_write(ring, PACKET3(PACKET3_INVALIDATE_TLBS, 0));
+   amdgpu_ring_write(ring,
+   PACKET3_INVALIDATE_TLBS_DST_SEL(1) |
+   PACKET3_INVALIDATE_TLBS_ALL_HUB(all_hub) |
+   PACKET3_INVALIDATE_TLBS_PASID(pasid) |
+   PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(flush_type));
+   amdgpu_fence_emit_polling(ring, );
+   amdgpu_ring_commit(ring);
+   spin_unlock(>gfx.kiq.ring_lock);
+
+   r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
+   if (r < 1) {
+   DRM_ERROR("wait for kiq fence error: %ld.\n", r);
+   return -ETIME;
+   }
+
+   return 0;
+}
+
  /*
   * GART
   * VMID 0 is the physical GPU addresses as used by the kernel.
@@ -380,6 +425,47 @@ static void gmc_v10_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
DRM_ERROR("Error flushing GPU TLB using the SDMA (%d)!\n", r);
  }
  
+/**

+ * gmc_v10_0_flush_gpu_tlb_pasid - tlb flush via pasid
+ *
+ * @adev: amdgpu_device pointer
+ * @pasid: pasid to 

[PATCH 2/5] drm/amdgpu: export function to flush TLB via pasid

2020-01-02 Thread Alex Sierra
This can be used directly from amdgpu and amdkfd to invalidate
TLB through pasid.
It supports gmc v7, v8, v9 and v10.

Change-Id: I6563a8eba2e42d1a67fa2547156c20da41d1e490
Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  6 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 87 
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c   | 33 +
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   | 34 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 89 +
 5 files changed, 249 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index b499a3de8bb6..b6413a56f546 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -92,6 +92,9 @@ struct amdgpu_gmc_funcs {
/* flush the vm tlb via mmio */
void (*flush_gpu_tlb)(struct amdgpu_device *adev, uint32_t vmid,
uint32_t vmhub, uint32_t flush_type);
+   /* flush the vm tlb via pasid */
+   int (*flush_gpu_tlb_pasid)(struct amdgpu_device *adev, uint16_t pasid,
+   uint32_t flush_type, bool all_hub);
/* flush the vm tlb via ring */
uint64_t (*emit_flush_gpu_tlb)(struct amdgpu_ring *ring, unsigned vmid,
   uint64_t pd_addr);
@@ -216,6 +219,9 @@ struct amdgpu_gmc {
 };
 
 #define amdgpu_gmc_flush_gpu_tlb(adev, vmid, vmhub, type) 
((adev)->gmc.gmc_funcs->flush_gpu_tlb((adev), (vmid), (vmhub), (type)))
+#define amdgpu_gmc_flush_gpu_tlb_pasid(adev, pasid, type, allhub) \
+   ((adev)->gmc.gmc_funcs->flush_gpu_tlb_pasid \
+   ((adev), (pasid), (type), (allhub)))
 #define amdgpu_gmc_emit_flush_gpu_tlb(r, vmid, addr) 
(r)->adev->gmc.gmc_funcs->emit_flush_gpu_tlb((r), (vmid), (addr))
 #define amdgpu_gmc_emit_pasid_mapping(r, vmid, pasid) 
(r)->adev->gmc.gmc_funcs->emit_pasid_mapping((r), (vmid), (pasid))
 #define amdgpu_gmc_map_mtype(adev, flags) 
(adev)->gmc.gmc_funcs->map_mtype((adev),(flags))
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index f5725336a5f2..11a2252e60f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -30,6 +30,8 @@
 #include "hdp/hdp_5_0_0_sh_mask.h"
 #include "gc/gc_10_1_0_sh_mask.h"
 #include "mmhub/mmhub_2_0_0_sh_mask.h"
+#include "athub/athub_2_0_0_sh_mask.h"
+#include "athub/athub_2_0_0_offset.h"
 #include "dcn/dcn_2_0_0_offset.h"
 #include "dcn/dcn_2_0_0_sh_mask.h"
 #include "oss/osssys_5_0_0_offset.h"
@@ -37,6 +39,7 @@
 #include "navi10_enum.h"
 
 #include "soc15.h"
+#include "soc15d.h"
 #include "soc15_common.h"
 
 #include "nbio_v2_3.h"
@@ -234,6 +237,48 @@ static bool gmc_v10_0_use_invalidate_semaphore(struct 
amdgpu_device *adev,
(!amdgpu_sriov_vf(adev)));
 }
 
+static bool gmc_v10_0_get_atc_vmid_pasid_mapping_info(
+   struct amdgpu_device *adev,
+   uint8_t vmid, uint16_t *p_pasid)
+{
+   uint32_t value;
+
+   value = RREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING)
++ vmid);
+   *p_pasid = value & ATC_VMID0_PASID_MAPPING__PASID_MASK;
+
+   return !!(value & ATC_VMID0_PASID_MAPPING__VALID_MASK);
+}
+
+static int gmc_v10_0_invalidate_tlbs_with_kiq(struct amdgpu_device *adev,
+   uint16_t pasid, uint32_t flush_type,
+   bool all_hub)
+{
+   signed long r;
+   uint32_t seq;
+   struct amdgpu_ring *ring = >gfx.kiq.ring;
+
+   spin_lock(>gfx.kiq.ring_lock);
+   amdgpu_ring_alloc(ring, 12); /* fence + invalidate_tlbs package*/
+   amdgpu_ring_write(ring, PACKET3(PACKET3_INVALIDATE_TLBS, 0));
+   amdgpu_ring_write(ring,
+   PACKET3_INVALIDATE_TLBS_DST_SEL(1) |
+   PACKET3_INVALIDATE_TLBS_ALL_HUB(all_hub) |
+   PACKET3_INVALIDATE_TLBS_PASID(pasid) |
+   PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(flush_type));
+   amdgpu_fence_emit_polling(ring, );
+   amdgpu_ring_commit(ring);
+   spin_unlock(>gfx.kiq.ring_lock);
+
+   r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
+   if (r < 1) {
+   DRM_ERROR("wait for kiq fence error: %ld.\n", r);
+   return -ETIME;
+   }
+
+   return 0;
+}
+
 /*
  * GART
  * VMID 0 is the physical GPU addresses as used by the kernel.
@@ -380,6 +425,47 @@ static void gmc_v10_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
DRM_ERROR("Error flushing GPU TLB using the SDMA (%d)!\n", r);
 }
 
+/**
+ * gmc_v10_0_flush_gpu_tlb_pasid - tlb flush via pasid
+ *
+ * @adev: amdgpu_device pointer
+ * @pasid: pasid to be flush
+ *
+ * Flush the TLB for the requested pasid.
+ */
+static int gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
+  

[PATCH 1/5] drm/amdgpu: Avoid reclaim fs while eviction lock

2020-01-02 Thread Alex Sierra
[Why]
Avoid reclaim filesystem while eviction lock is held called from
MMU notifier.

[How]
Setting PF_MEMALLOC_NOFS flags while eviction mutex is locked.
Using memalloc_nofs_save / memalloc_nofs_restore API.

Change-Id: I5531c9337836e7d4a430df3f16dcc82888e8018c
Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 40 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  6 +++-
 2 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index b999b67ff57a..d6aba4f9df74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -82,6 +82,32 @@ struct amdgpu_prt_cb {
struct dma_fence_cb cb;
 };
 
+/**
+ * vm eviction_lock can be taken in MMU notifiers. Make sure no reclaim-FS
+ * happens while holding this lock anywhere to prevent deadlocks when
+ * an MMU notifier runs in reclaim-FS context.
+ */
+static inline void amdgpu_vm_eviction_lock(struct amdgpu_vm *vm)
+{
+   mutex_lock(>eviction_lock);
+   vm->saved_flags = memalloc_nofs_save();
+}
+
+static inline int amdgpu_vm_eviction_trylock(struct amdgpu_vm *vm)
+{
+   if (mutex_trylock(>eviction_lock)) {
+   vm->saved_flags = memalloc_nofs_save();
+   return 1;
+   }
+   return 0;
+}
+
+static inline void amdgpu_vm_eviction_unlock(struct amdgpu_vm *vm)
+{
+   memalloc_nofs_restore(vm->saved_flags);
+   mutex_unlock(>eviction_lock);
+}
+
 /**
  * amdgpu_vm_level_shift - return the addr shift for each level
  *
@@ -678,9 +704,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
}
}
 
-   mutex_lock(>eviction_lock);
+   amdgpu_vm_eviction_lock(vm);
vm->evicting = false;
-   mutex_unlock(>eviction_lock);
+   amdgpu_vm_eviction_unlock(vm);
 
return 0;
 }
@@ -1559,7 +1585,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
if (!(flags & AMDGPU_PTE_VALID))
owner = AMDGPU_FENCE_OWNER_KFD;
 
-   mutex_lock(>eviction_lock);
+   amdgpu_vm_eviction_lock(vm);
if (vm->evicting) {
r = -EBUSY;
goto error_unlock;
@@ -1576,7 +1602,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
r = vm->update_funcs->commit(, fence);
 
 error_unlock:
-   mutex_unlock(>eviction_lock);
+   amdgpu_vm_eviction_unlock(vm);
return r;
 }
 
@@ -2537,18 +2563,18 @@ bool amdgpu_vm_evictable(struct amdgpu_bo *bo)
return false;
 
/* Try to block ongoing updates */
-   if (!mutex_trylock(_base->vm->eviction_lock))
+   if (!amdgpu_vm_eviction_trylock(bo_base->vm))
return false;
 
/* Don't evict VM page tables while they are updated */
if (!dma_fence_is_signaled(bo_base->vm->last_direct) ||
!dma_fence_is_signaled(bo_base->vm->last_delayed)) {
-   mutex_unlock(_base->vm->eviction_lock);
+   amdgpu_vm_eviction_unlock(bo_base->vm);
return false;
}
 
bo_base->vm->evicting = true;
-   mutex_unlock(_base->vm->eviction_lock);
+   amdgpu_vm_eviction_unlock(bo_base->vm);
return true;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 100547f094ff..c21a36bebc0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
@@ -242,9 +243,12 @@ struct amdgpu_vm {
/* tree of virtual addresses mapped */
struct rb_root_cached   va;
 
-   /* Lock to prevent eviction while we are updating page tables */
+   /* Lock to prevent eviction while we are updating page tables
+* use vm_eviction_lock/unlock(vm)
+*/
struct mutexeviction_lock;
boolevicting;
+   unsigned intsaved_flags;
 
/* BOs who needs a validation */
struct list_headevicted;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] drm/amdgpu: fix ctx init failure for asics without gfx ring

2020-01-02 Thread Christian König

Am 02.01.20 um 10:47 schrieb Nirmoy:


On 1/1/20 1:52 PM, Christian König wrote:

Am 19.12.19 um 13:01 schrieb Nirmoy:

Reviewed-by: Nirmoy Das 

On 12/19/19 12:42 PM, Le Ma wrote:
This workaround does not affect other asics because amdgpu only 
need expose

one gfx sched to user for now.

Change-Id: Ica92b8565a89899aebe0eba7b2b5a25159b411d3
Signed-off-by: Le Ma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c

index 63f6365..64e2bab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -127,7 +127,8 @@ static int amdgpu_ctx_init(struct amdgpu_device 
*adev,

    switch (i) {
  case AMDGPU_HW_IP_GFX:
-    scheds = adev->gfx.gfx_sched;
+    sched = >gfx.gfx_ring[0].sched;
+    scheds = 
  num_scheds = 1;


Mhm, we should probably rather fix this here and don't expose a GFX 
ring when the hardware doesn't have one.

Hi Christian,

Do you mean by not initializing entity for gfx when not available?


Well we still initialize it, but with num_scheds=0.

Christian.






Christian.


  break;
  case AMDGPU_HW_IP_COMPUTE:

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cnirmoy.das%40amd.com%7Cbc5fa498efe84fcfb00b08d78eb97049%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637134799421291347sdata=FxkwaCrbIV28HH18xorgwmzRPCMA1I9IkRkgE1LF80Y%3Dreserved=0 




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amd/display: Reduce HDMI pixel encoding if max clock is exceeded

2020-01-02 Thread Harry Wentland
On 2019-12-02 4:47 p.m., Thomas Anderson wrote:
> For high-res (8K) or HFR (4K120) displays, using uncompressed pixel
> formats like YCbCr444 would exceed the bandwidth of HDMI 2.0, so the
> "interesting" modes would be disabled, leaving only low-res or low
> framerate modes.
> 
> This change lowers the pixel encoding to 4:2:2 or 4:2:0 if the max TMDS
> clock is exceeded. Verified that 8K30 and 4K120 are now available and
> working with a Samsung Q900R over an HDMI 2.0b link from a Radeon 5700.
> 
> Signed-off-by: Thomas Anderson 

Apologies for the late response.

Thanks for getting high-res modes working on HDMI.

This change is
Reviewed-by: Harry Wentland 

Harry

> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 45 ++-
>  1 file changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 7aac9568d3be..803e59d97411 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3356,27 +3356,21 @@ get_output_color_space(const struct dc_crtc_timing 
> *dc_crtc_timing)
>   return color_space;
>  }
>  
> -static void reduce_mode_colour_depth(struct dc_crtc_timing *timing_out)
> -{
> - if (timing_out->display_color_depth <= COLOR_DEPTH_888)
> - return;
> -
> - timing_out->display_color_depth--;
> -}
> -
> -static void adjust_colour_depth_from_display_info(struct dc_crtc_timing 
> *timing_out,
> - const struct drm_display_info 
> *info)
> +static bool adjust_colour_depth_from_display_info(
> + struct dc_crtc_timing *timing_out,
> + const struct drm_display_info *info)
>  {
> + enum dc_color_depth depth = timing_out->display_color_depth;
>   int normalized_clk;
> - if (timing_out->display_color_depth <= COLOR_DEPTH_888)
> - return;
>   do {
>   normalized_clk = timing_out->pix_clk_100hz / 10;
>   /* YCbCr 4:2:0 requires additional adjustment of 1/2 */
>   if (timing_out->pixel_encoding == PIXEL_ENCODING_YCBCR420)
>   normalized_clk /= 2;
>   /* Adjusting pix clock following on HDMI spec based on colour 
> depth */
> - switch (timing_out->display_color_depth) {
> + switch (depth) {
> + case COLOR_DEPTH_888:
> + break;
>   case COLOR_DEPTH_101010:
>   normalized_clk = (normalized_clk * 30) / 24;
>   break;
> @@ -3387,14 +3381,15 @@ static void 
> adjust_colour_depth_from_display_info(struct dc_crtc_timing *timing_
>   normalized_clk = (normalized_clk * 48) / 24;
>   break;
>   default:
> - return;
> + /* The above depths are the only ones valid for HDMI. */
> + return false;
>   }
> - if (normalized_clk <= info->max_tmds_clock)
> - return;
> - reduce_mode_colour_depth(timing_out);
> -
> - } while (timing_out->display_color_depth > COLOR_DEPTH_888);
> -
> + if (normalized_clk <= info->max_tmds_clock) {
> + timing_out->display_color_depth = depth;
> + return true;
> + }
> + } while (--depth > COLOR_DEPTH_666);
> + return false;
>  }
>  
>  static void fill_stream_properties_from_drm_display_mode(
> @@ -3474,8 +3469,14 @@ static void 
> fill_stream_properties_from_drm_display_mode(
>  
>   stream->out_transfer_func->type = TF_TYPE_PREDEFINED;
>   stream->out_transfer_func->tf = TRANSFER_FUNCTION_SRGB;
> - if (stream->signal == SIGNAL_TYPE_HDMI_TYPE_A)
> - adjust_colour_depth_from_display_info(timing_out, info);
> + if (stream->signal == SIGNAL_TYPE_HDMI_TYPE_A) {
> + if (!adjust_colour_depth_from_display_info(timing_out, info) &&
> + drm_mode_is_420_also(info, mode_in) &&
> + timing_out->pixel_encoding != PIXEL_ENCODING_YCBCR420) {
> + timing_out->pixel_encoding = PIXEL_ENCODING_YCBCR420;
> + adjust_colour_depth_from_display_info(timing_out, info);
> + }
> + }
>  }
>  
>  static void fill_audio_info(struct audio_info *audio_info,
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 1/4] drm/amdkfd: Fix permissions of hang_hws

2020-01-02 Thread Russell, Kent
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Kent Russell 

-Original Message-
From: amd-gfx  On Behalf Of Felix 
Kuehling
Sent: Friday, December 20, 2019 3:30 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 1/4] drm/amdkfd: Fix permissions of hang_hws

Reading from /sys/kernel/debug/kfd/hang_hws would cause a kernel oops because 
we didn't implement a read callback. Set the permission to write-only to 
prevent that.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
index 15c523027285..511712c2e382 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c
@@ -93,7 +93,7 @@ void kfd_debugfs_init(void)
kfd_debugfs_hqds_by_device, _debugfs_fops);
debugfs_create_file("rls", S_IFREG | 0444, debugfs_root,
kfd_debugfs_rls_by_device, _debugfs_fops);
-   debugfs_create_file("hang_hws", S_IFREG | 0644, debugfs_root,
+   debugfs_create_file("hang_hws", S_IFREG | 0200, debugfs_root,
NULL, _debugfs_hang_hws_fops);  }
 
--
2.24.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Ckent.russell%40amd.com%7C7dc16d27ccf646dd676608d78526df14%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637124274407198344sdata=joT5cPhL9glNqV85jTIqvygQ%2B6CSOprnadM8LzaRBQs%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/4] drm/amdkfd: Remove unused variable

2020-01-02 Thread Russell, Kent
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Kent Russell 

-Original Message-
From: amd-gfx  On Behalf Of Felix 
Kuehling
Sent: Friday, December 20, 2019 3:30 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 2/4] drm/amdkfd: Remove unused variable

dqm->pipeline_mem wasn't used anywhere.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 -  
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f7f6df40875e..558c0ad81848 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -930,7 +930,6 @@ static void uninitialize(struct device_queue_manager *dqm)
for (i = 0 ; i < KFD_MQD_TYPE_MAX ; i++)
kfree(dqm->mqd_mgrs[i]);
mutex_destroy(>lock_hidden);
-   kfd_gtt_sa_free(dqm->dev, dqm->pipeline_mem);
 }
 
 static int start_nocpsch(struct device_queue_manager *dqm) diff --git 
a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index a8c37e6da027..8991120c4fa2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -190,7 +190,6 @@ struct device_queue_manager {
/* the pasid mapping for each kfd vmid */
uint16_tvmid_pasid[VMID_NUM];
uint64_tpipelines_addr;
-   struct kfd_mem_obj  *pipeline_mem;
uint64_tfence_gpu_addr;
unsigned int*fence_addr;
struct kfd_mem_obj  *fence_mem;
--
2.24.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Ckent.russell%40amd.com%7C1c4037940973425f605808d78526de75%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637124274380882459sdata=0SF%2F%2FKC%2BEOK%2FzFNDwQBltE5%2F9euhi7IrbKNPp8dlDrQ%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset 
actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John 
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

2020-01-02 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not 
iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements


0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
Description: 0001-drm-amdgpu-resolve-bug-in-UMC-6-error-counter-query.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Error] amdgpu powerplay ip block error with -22.

2020-01-02 Thread Yusuf Altıparmak
Hello,

First you could check if the binary ‘polaris12_smc.bin’ is in your system:
> /lib/firmware/../amdgpu/
>
'polaris12_smc.bin' exists in my /lib/firmware/amdgpu folder. There are
also 18 other binaries which starts with 'polaris12_'.

If it’s there, then does this happen after a warm reset?
>

This does happen when booting up the board with ramdisk image (initrfs
stage).

Regards.

>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] drm/amdgpu: fix ctx init failure for asics without gfx ring

2020-01-02 Thread Nirmoy


On 1/1/20 1:52 PM, Christian König wrote:

Am 19.12.19 um 13:01 schrieb Nirmoy:

Reviewed-by: Nirmoy Das 

On 12/19/19 12:42 PM, Le Ma wrote:
This workaround does not affect other asics because amdgpu only need 
expose

one gfx sched to user for now.

Change-Id: Ica92b8565a89899aebe0eba7b2b5a25159b411d3
Signed-off-by: Le Ma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c

index 63f6365..64e2bab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -127,7 +127,8 @@ static int amdgpu_ctx_init(struct amdgpu_device 
*adev,

    switch (i) {
  case AMDGPU_HW_IP_GFX:
-    scheds = adev->gfx.gfx_sched;
+    sched = >gfx.gfx_ring[0].sched;
+    scheds = 
  num_scheds = 1;


Mhm, we should probably rather fix this here and don't expose a GFX 
ring when the hardware doesn't have one.

Hi Christian,

Do you mean by not initializing entity for gfx when not available?




Christian.


  break;
  case AMDGPU_HW_IP_COMPUTE:

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cnirmoy.das%40amd.com%7Cbc5fa498efe84fcfb00b08d78eb97049%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637134799421291347sdata=FxkwaCrbIV28HH18xorgwmzRPCMA1I9IkRkgE1LF80Y%3Dreserved=0 




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/5] drm/amdgpu: Avoid reclaim fs while eviction lock

2020-01-02 Thread Christian König

Am 20.12.19 um 07:24 schrieb Alex Sierra:

[Why]
Avoid reclaim filesystem while eviction lock is held called from
MMU notifier.

[How]
Setting PF_MEMALLOC_NOFS flags while eviction mutex is locked.
Using memalloc_nofs_save / memalloc_nofs_restore API.

Change-Id: I5531c9337836e7d4a430df3f16dcc82888e8018c
Signed-off-by: Alex Sierra 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 28 +-
  2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index b999b67ff57a..b36daa6230fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -678,9 +678,9 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
}
}
  
-	mutex_lock(>eviction_lock);

+   vm_eviction_lock(vm);
vm->evicting = false;
-   mutex_unlock(>eviction_lock);
+   vm_eviction_unlock(vm);
  
  	return 0;

  }
@@ -1559,7 +1559,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
if (!(flags & AMDGPU_PTE_VALID))
owner = AMDGPU_FENCE_OWNER_KFD;
  
-	mutex_lock(>eviction_lock);

+   vm_eviction_lock(vm);
if (vm->evicting) {
r = -EBUSY;
goto error_unlock;
@@ -1576,7 +1576,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
r = vm->update_funcs->commit(, fence);
  
  error_unlock:

-   mutex_unlock(>eviction_lock);
+   vm_eviction_unlock(vm);
return r;
  }
  
@@ -2537,18 +2537,18 @@ bool amdgpu_vm_evictable(struct amdgpu_bo *bo)

return false;
  
  	/* Try to block ongoing updates */

-   if (!mutex_trylock(_base->vm->eviction_lock))
+   if (!vm_eviction_trylock(bo_base->vm))
return false;
  
  	/* Don't evict VM page tables while they are updated */

if (!dma_fence_is_signaled(bo_base->vm->last_direct) ||
!dma_fence_is_signaled(bo_base->vm->last_delayed)) {
-   mutex_unlock(_base->vm->eviction_lock);
+   vm_eviction_unlock(bo_base->vm);
return false;
}
  
  	bo_base->vm->evicting = true;

-   mutex_unlock(_base->vm->eviction_lock);
+   vm_eviction_unlock(bo_base->vm);
return true;
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 100547f094ff..d35aa76469ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -30,6 +30,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "amdgpu_sync.h"

  #include "amdgpu_ring.h"
@@ -242,9 +243,12 @@ struct amdgpu_vm {
/* tree of virtual addresses mapped */
struct rb_root_cached   va;
  
-	/* Lock to prevent eviction while we are updating page tables */

+   /* Lock to prevent eviction while we are updating page tables
+* use vm_eviction_lock/unlock(vm)
+*/
struct mutexeviction_lock;
boolevicting;
+   unsigned intsaved_flags;
  
  	/* BOs who needs a validation */

struct list_headevicted;
@@ -436,4 +440,26 @@ void amdgpu_vm_move_to_lru_tail(struct amdgpu_device *adev,
struct amdgpu_vm *vm);
  void amdgpu_vm_del_from_lru_notify(struct ttm_buffer_object *bo);
  
+/* vm eviction_lock can be taken in MMU notifiers. Make sure no reclaim-FS

+ * happens while holding this lock anywhere to prevent deadlocks when
+ * an MMU notifier runs in reclaim-FS context.
+ */
+static inline void vm_eviction_lock(struct amdgpu_vm *vm)


Please add a proper amdgpu_ prefix to the function names.

Additional to that please don't put local static functions into the 
header, those shouldn't be used outside of the VM code.


Christian.


+{
+   mutex_lock(>eviction_lock);
+   vm->saved_flags = memalloc_nofs_save();
+}
+static inline int vm_eviction_trylock(struct amdgpu_vm *vm)
+{
+   if (mutex_trylock(>eviction_lock)) {
+   vm->saved_flags = memalloc_nofs_save();
+   return 1;
+   }
+   return 0;
+}
+static inline void vm_eviction_unlock(struct amdgpu_vm *vm)
+{
+   memalloc_nofs_restore(vm->saved_flags);
+   mutex_unlock(>eviction_lock);
+}
  #endif


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: eGPU failed to initialize

2020-01-02 Thread Christian König

Hi Qu,

that problem is completely unrelated to amdgpu. See you thunderbold 
bridge fails to assign the necessary I/O resources to the PCI device 
long before amdgpu even loads:


From your dmesg:


Jan 01 07:22:22 thinkpad kernel: pci_bus :06: Allocating resources
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: bridge window [io  
0x1000-0x0fff] to [bus 0b-3a] add_size 1000
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: bridge window [io  
0x1000-0x1fff] to [bus 09-3a] add_size 1000
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: no space 
for [io  size 0x2000]
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: failed to 
assign [io  size 0x2000]
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: assigned 
[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: [io 
0x2000-0x2fff] (failed to expand by 0x1000)
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: failed to add 1000 
res[13]=[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :09:01.0: BAR 13: assigned 
[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: no space 
for [io  size 0x1000]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: failed to 
assign [io  size 0x1000]
Jan 01 07:22:22 thinkpad kernel: pci :09:01.0: BAR 13: assigned 
[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: no space 
for [io  size 0x1000]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: failed to 
assign [io  size 0x1000]


This is a rather unusual problem and I have no idea how you ended up 
with that. But with this setup it is impossible for the driver to access 
the device.


Regards,
Christian.

Am 01.01.20 um 10:31 schrieb Qu Wenruo:

Hi,

Not sure if this is reported before, but amdgpu is initialized for an
external GPU (thunderbolt 3), which is not accessible at boot, only
after boltctl initialized the tb3 subsystem.

Then amdgpu will report an timeout, and failed to really initialize the GPU.
At this stage, one my of monitors (U2414H, DP) reports unsupported
framerate, while the other monitor (HP 24mh, HDMI) just reports no signal

The involved GPU is RX580. The tb3 enclosure is AORUS GAMING BOX.

And obviously, this eGPU works pretty fine under Windows.
So my normal boot routine needs to boot into windows, then reboot into
Linux without unplug the tb3 connector, to make the eGPU work under Linux.

The kernel warning is:
Jan 01 07:22:25 thinkpad kernel: [drm] REG_WAIT timeout 10us * 3500
tries - dce_mi_free_dmif line:634
Jan 01 07:22:25 thinkpad kernel: [ cut here ]
Jan 01 07:22:25 thinkpad kernel: WARNING: CPU: 6 PID: 804 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x25/0x2c [amdgpu]
Jan 01 07:22:25 thinkpad kernel: Modules linked in: xt_CHECKSUM
xt_MASQUERADE xt_conntrack ipt_REJECT tun bridge stp llc nf_tables_set
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct amdgpu nft_chain_nat msr
nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle
gpu_sched iptable_raw ttm iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 xt_tcpudp ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter cmac algif_hash algif_skcipher
af_alg bnep joydev mousedev btrfs xor rmi_smbus rmi_core
snd_hda_codec_hdmi iTCO_wdt mei_wdt mei_hdcp iTCO_vendor_support
snd_hda_codec_realtek intel_rapl_msr wmi_bmof raid6_pq
intel_wmi_thunderbolt iwlmvm snd_hda_codec_generic x86_pkg_temp_thermal
intel_powerclamp snd_hda_intel coretemp snd_intel_nhlt mac80211
kvm_intel snd_hda_codec nls_iso8859_1 libarc4 uvcvideo nls_cp437
intel_cstate btusb snd_hda_core
Jan 01 07:22:25 thinkpad kernel:  vfat videobuf2_vmalloc intel_uncore
btrtl snd_hwdep btbcm iwlwifi videobuf2_memops intel_rapl_perf fat
videobuf2_v4l2 btintel snd_pcm pcspkr psmouse input_leds
videobuf2_common mei_me e1000e i2c_i801 snd_timer thunderbolt bluetooth
cfg80211 videodev mei thinkpad_acpi intel_xhci_usb_role_switch
processor_thermal_device ucsi_acpi ecdh_generic mc nvram ecc
intel_rapl_common intel_soc_dts_iosf crc16 intel_pch_thermal roles
typec_ucsi ledtrig_audio rfkill typec snd int3403_thermal wmi soundcore
battery ac int340x_thermal_zone i2c_hid hid evdev int3400_thermal
mac_hid acpi_thermal_rel crypto_user acpi_call(OE) kvmgt i915 vfio_mdev
mdev vfio_iommu_type1 vfio i2c_algo_bit drm_kms_helper drm intel_gtt
agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops kvm irqbypass
ip_tables x_tables xfs libcrc32c crc32c_generic sd_mod uas usb_storage
scsi_mod dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel dm_mod serio_raw atkbd libps2 aesni_intel
crypto_simd xhci_pci cryptd
Jan 01 07:22:25 thinkpad kernel:  glue_helper xhci_hcd i8042 serio
Jan 01 07:22:25 thinkpad kernel: CPU: 6 PID: 804 Comm: Xorg Tainted: G
   U OE 

RE: [Error] amdgpu powerplay ip block error with -22.

2020-01-02 Thread Feng, Kenneth
[AMD Official Use Only - Internal Distribution Only]

First you could check if the binary 'polaris12_smc.bin' is in your system: 
/lib/firmware/../amdgpu/
If it's there, then does this happen after a warm reset?
Thanks.


From: amd-gfx  On Behalf Of Yusuf 
Altiparmak
Sent: Thursday, January 2, 2020 4:22 PM
To: amd-gfx@lists.freedesktop.org
Subject: [Error] amdgpu powerplay ip block error with -22.

[CAUTION: External Email]
I am having this error with kernel version 4.19 amdgpu driver for a polaris12 
based GPU. What could be the problem? Any suggestions? Thanks.

Full dmesg:
[5.426009] [drm] amdgpu kernel modesetting enabled.
[5.430109] [drm] initializing kernel modesetting (POLARIS12 0x1002:0x6987 
0x1787:0x2389 0x80).
[5.437591] [drm] register mmio base: 0x2020
[5.440899] [drm] register mmio size: 262144
[5.443888] [drm] add ip block number 0 
[5.447465] [drm] add ip block number 1 
[5.450953] [drm] add ip block number 2 
[5.454442] [drm] add ip block number 3 
[5.458018] [drm] add ip block number 4 
[5.460979] [drm] add ip block number 5 
[5.464466] [drm] add ip block number 6 
[5.468042] [drm] add ip block number 7 
[5.471531] [drm] add ip block number 8 
[5.475047] [drm] UVD is enabled in VM mode
[5.477928] [drm] UVD ENC is enabled in VM mode
[5.481154] [drm] VCE enabled in VM mode
[5.712355] ATOM BIOS: 113-ER16BFC-001
[5.714830] [drm] GPU posting now...
[5.833704] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment 
size is 9-bit
[5.840950] amdgpu 0001:01:00.0: BAR 2: releasing [mem 
0x22000-0x2201f 64bit pref]
[5.847930] amdgpu 0001:01:00.0: BAR 0: releasing [mem 
0x21000-0x21fff 64bit pref]
[5.855688] [drm:.amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem 
resizing BAR0 (-2).
[5.855706] amdgpu 0001:01:00.0: BAR 0: assigned [mem 
0x23000-0x23fff 64bit pref]
[5.869663] amdgpu 0001:01:00.0: BAR 2: assigned [mem 
0x24000-0x2401f 64bit pref]
[5.876582] amdgpu 0001:01:00.0: VRAM: 4096M 0x00F4 - 
0x00F4 (4096M used)
[5.884160] amdgpu 0001:01:00.0: GART: 256M 0x - 
0x0FFF
[5.890519] [drm] Detected VRAM RAM=4096M, BAR=256M
[5.894093] [drm] RAM width 128bits GDDR5
[5.896941] [TTM] Zone  kernel: Available graphics memory: 4062380 kiB
[5.902177] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[5.907402] [TTM] Initializing pool allocator
[5.910464] [TTM] Initializing DMA pool allocator
[5.919973] [drm] amdgpu: 4096M of VRAM memory ready
[5.923659] [drm] amdgpu: 4096M of GTT memory ready.
[5.927358] [drm] GART: num cpu pages 65536, num gpu pages 65536
[5.932957] [drm] PCIE GART of 256M enabled (table at 0x00F4).
[5.939122] [drm] Chained IB support enabled!
[5.948873] [drm] Found UVD firmware Version: 1.79 Family ID: 16
[5.953647] [drm] UVD ENC is disabled
[5.975818] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[6.404774] amdgpu: [powerplay] Failed to send Message.
[6.835902] amdgpu: [powerplay] SMU Firmware start failed!
[6.840086] amdgpu: [powerplay] Failed to load SMU ucode.
[6.844184] amdgpu: [powerplay] smc start failed
[6.847498] amdgpu: [powerplay] powerplay hw init failed
[6.852281] [drm:.amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block 
 failed -22
[6.859883] amdgpu 0001:01:00.0: amdgpu_device_ip_init failed
[6.864330] amdgpu 0001:01:00.0: Fatal error during GPU init
[6.868689] [drm] amdgpu: finishing device.
[7.339427] pcieport 0001:00:00.0: AER: Corrected error received: 
0001:00:00.0
[7.345374] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Data Link Layer, (Transmitter ID)
[7.353993] pcieport 0001:00:00.0:   device [1957:0824] error 
status/mask=1000/2000
[7.361047] pcieport 0001:00:00.0:[12] Timeout
[7.706137] amdgpu: [powerplay]
last message was failed ret is 0
[8.127667] amdgpu: [powerplay]
failed to send message 261 ret is 0
[8.966331] amdgpu: [powerplay]
last message was failed ret is 0
[9.320290] pcieport 0001:00:00.0: AER: Corrected error received: 
0001:00:00.0
[9.326226] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Data Link Layer, (Transmitter ID)
[9.334845] pcieport 0001:00:00.0:   device [1957:0824] error 
status/mask=1000/2000
[9.341899] pcieport 0001:00:00.0:[12] Timeout
[9.387975] amdgpu: [powerplay]
failed to send message 261 ret is 0
[   10.226636] amdgpu: [powerplay]
last message was failed ret is 0
[   10.648275] amdgpu: [powerplay]
failed to send message 261 ret is 0
[   11.486932] amdgpu: [powerplay]
last message was failed ret is 0
[   11.908570] amdgpu: [powerplay]
failed to send message 261 ret is 0
[   

Re: [error] Drm -> amdgpu Unrecoverable Machine Check

2020-01-02 Thread Yusuf Altıparmak
Hello Christian,

I solved this problem weeks ago. The problem was, the system I use could
only give 256 MB address range but GPU was demanding more. Even if I give 4
GB, PCIe slot is only having 256 MB, nothing more.  I put a empty area that
is between PCIe2 ( GPU was connected to this) and PCIe3 and everything
worked fine ( I moved forward the start adress of next PCIe device) . If
anyone encounters same problem, here is a sample Device Tree Source for
linux;

pci0: pcie@ffe24 {
reg = <0xf 0xfe24 0 0x1>;
ranges = <0x0200 0 0xe000 0x2 0x0 0x0 0x1000
 0x0100 0 0x0 0xf 0xf800 0x0 0x0001>;
pcie@0 {
ranges = <0x0200 0 0xe000
 0x0200 0 0xe000
 0 0x1000

 0x0100 0 0x
 0x0100 0 0x
 0 0x0001>;
};
};

pci1: pcie@ffe25 { // GPU CONNECTED TO THIS ONE
reg = <0xf 0xfe25 0 0x1>;
ranges = <0x0200 0 0xe000 0x2 0x1000 0x1 0x
 0x0100 0 0 0xf 0xf801 0 0x0001>;
pcie@0 {
ranges = <0x0200 0 0xe000
 0x0200 0 0xe000
 0x1 0x

 0x0100 0 0x
 0x0100 0 0x
 0 0x0001>;
};
};

pci2: pcie@ffe26 {
reg = <0xf 0xfe26 0 0x1>;
ranges = <0x0200 0 0xe000 0x3 0x2000 0 0x1000 // 0x3
0x2000 actually it must be 0x3 0x1000 because I gave 4 GB to pci1
but I also added 256 MB empty area between them. So it started from 0x3
0x2000
 0x0100 0 0x 0xf 0xf802 0 0x0001>;
pcie@0 {
ranges = <0x0200 0 0xe000
 0x0200 0 0xe000
 0 0x1000

 0x0100 0 0x
 0x0100 0 0x
 0 0x0001>;
};
};

pci3: pcie@ffe27 {
reg = <0xf 0xfe27 0 0x1>;
ranges = <0x0200 0 0xe000 0x3 0x3000 0 0x1000
 0x0100 0 0x 0xf 0xf803 0 0x0001>;
pcie@0 {
ranges = <0x0200 0 0xe000
 0x0200 0 0xe000
 0 0x1000

 0x0100 0 0x
 0x0100 0 0x
 0 0x0001>;
};
};



Yusuf Altıparmak , 3 Ara 2019 Sal, 22:20 tarihinde
şunu yazdı:

>
> What you could try as well is to use the size 320MB for the MMIO. Those
>> ranges usually don't need to be a power of two (only the BARs itself are a
>> power of two) and this way it might even be easier to fit everything
>> together.
>>
>
> Hmm this makes my job easier it seems.
>
>
>> By the way I wonder how can I get at least VGA output from GPU. Maybe I
>> can get a text console on screen or something like X server? Do you have
>> any recommendations?
>>
>> What could maybe work is VGA emulation, which essentially means text
>> only. But no guarantee for that this really works as expected.
>>
>> It's a well known board and U-boot is the most popular bootloader in
> embedded world it seems. I think I am not the only one who tries to connect
> a GPU from PCIe so I think there must be some config variables that enables
> VGA emulation, or some kind of packages.
>
>
>
>> I am just wondering, does modern gaming motherboards have more than 4GB
>> PCIe buffer for this job ?
>>
>> They don't, resources are dynamically assigned instead.
>>
>> See on x86 you usually have 1GB 32-bit address space where the BIOS
>> shuffles all the mandatory devices it sees at boot time into.
>>
>> Then when the motherboard has multiple PEG slots the BIOS also configures
>> a 64-bit address space which is usually rather huge (256GB-1TB). Since the
>> the VRAM and the doorbell BAR are 64bit BARs on the GPU they can be mapped
>> into that as well.
>>
>> This way you can easily have 10 GPUs connected to your CPU.
>>
>> Ah that was a clear answer. So the adress that CPU uses after mapping is
> actully an imaginary/virtual adress. It depends on the operating systems
> bit configuration. If I am not wrong, those addresses are adding on
> previous one meanwhile PCIe is mapping with endpoint device.
>
>
>
>> The problem you have here is that U-config doesn't do this resource
>> assignment automatically and you need to configure it manually.
>>
>
> Yes. By the way, thanks for your answers Christian. I am a newbie to
> embedded world. I have been dealing with these stuffs for 3 months. I
> couldn't get the answers I seek from google. Your answers were more clear
> and understandable.
>
> Best Regards.
>
>
>
>> Am 03.12.19 um 13:50 schrieb Yusuf Altıparmak:
>>
>>
>> Hi Christian,
>>
>>> 0001f000
>>>
>>> Exactly as I thought. The hardware does support BAR resize, but
>>> unfortunately 256MB is already the minimum.
>>>
>>> Sorry, but there isn't anything I could do from the GPU drivers point of
>>> view.
>>>
>>
>> Yes unfortunately there is nothing remained to about GPU side.
>>
>> The only good news I have is that 256M+2M+512K+128K=260M address space
>>> should be enough for the GPU to work, maybe that makes things a bit simpler.
>>>
>>>
>> Right now I am trying to increase MMIO size config to 512 MB, I hope that
>> should help me. By the way I wonder how can I get at least VGA output from
>> GPU. Maybe I can get a text console on screen or 

[Error] amdgpu powerplay ip block error with -22.

2020-01-02 Thread Yusuf Altıparmak
I am having this error with kernel version 4.19 amdgpu driver for a
polaris12 based GPU. What could be the problem? Any suggestions? Thanks.

*Full dmesg:*
[5.426009] [drm] amdgpu kernel modesetting enabled.

[5.430109] [drm] initializing kernel modesetting (POLARIS12
0x1002:0x6987 0x1787:0x2389 0x80).
[5.437591] [drm] register mmio base: 0x2020

[5.440899] [drm] register mmio size: 262144

[5.443888] [drm] add ip block number 0 

[5.447465] [drm] add ip block number 1 

[5.450953] [drm] add ip block number 2 

[5.454442] [drm] add ip block number 3 

[5.458018] [drm] add ip block number 4 

[5.460979] [drm] add ip block number 5 

[5.464466] [drm] add ip block number 6 

[5.468042] [drm] add ip block number 7 

[5.471531] [drm] add ip block number 8 

[5.475047] [drm] UVD is enabled in VM mode

[5.477928] [drm] UVD ENC is enabled in VM mode

[5.481154] [drm] VCE enabled in VM mode

[5.712355] ATOM BIOS: 113-ER16BFC-001

[5.714830] [drm] GPU posting now...

[5.833704] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
fragment size is 9-bit
[5.840950] amdgpu 0001:01:00.0: BAR 2: releasing [mem
0x22000-0x2201f 64bit pref]
[5.847930] amdgpu 0001:01:00.0: BAR 0: releasing [mem
0x21000-0x21fff 64bit pref]
[5.855688] [drm:.amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem
resizing BAR0 (-2).
[5.855706] amdgpu 0001:01:00.0: BAR 0: assigned [mem
0x23000-0x23fff 64bit pref]
[5.869663] amdgpu 0001:01:00.0: BAR 2: assigned [mem
0x24000-0x2401f 64bit pref]
[5.876582] amdgpu 0001:01:00.0: VRAM: 4096M 0x00F4 -
0x00F4 (4096M used)
[5.884160] amdgpu 0001:01:00.0: GART: 256M 0x -
0x0FFF
[5.890519] [drm] Detected VRAM RAM=4096M, BAR=256M

[5.894093] [drm] RAM width 128bits GDDR5

[5.896941] [TTM] Zone  kernel: Available graphics memory: 4062380 kiB

[5.902177] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB

[5.907402] [TTM] Initializing pool allocator

[5.910464] [TTM] Initializing DMA pool allocator

[5.919973] [drm] amdgpu: 4096M of VRAM memory ready

[5.923659] [drm] amdgpu: 4096M of GTT memory ready.

[5.927358] [drm] GART: num cpu pages 65536, num gpu pages 65536

[5.932957] [drm] PCIE GART of 256M enabled (table at
0x00F4).
[5.939122] [drm] Chained IB support enabled!

[5.948873] [drm] Found UVD firmware Version: 1.79 Family ID: 16

[5.953647] [drm] UVD ENC is disabled

[5.975818] [drm] Found VCE firmware Version: 52.4 Binary ID: 3

[6.404774] amdgpu: [powerplay] Failed to send Message.

[6.835902] amdgpu: [powerplay] SMU Firmware start failed!

[6.840086] amdgpu: [powerplay] Failed to load SMU ucode.

[6.844184] amdgpu: [powerplay] smc start failed

*[6.847498] amdgpu: [powerplay] powerplay hw init failed *

*[6.852281] [drm:.amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP
block  failed -22*
*[6.859883] amdgpu 0001:01:00.0: amdgpu_device_ip_init failed *

*  [6.864330] amdgpu
0001:01:00.0: Fatal error during GPU init  *
* [6.868689] [drm] amdgpu:
finishing device. *




*[7.339427] pcieport 0001:00:00.0: AER: Corrected error received:
0001:00:00.0[7.345374]
pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link
Layer, (Transmitter ID) [7.353993] pcieport 0001:00:00.0:
device [1957:0824] error status/mask=1000/2000
  [7.361047] pcieport 0001:00:00.0:[12] Timeout*

[7.706137] amdgpu: [powerplay]

last message was failed ret is 0

[8.127667] amdgpu: [powerplay]

failed to send message 261 ret is 0

[8.966331] amdgpu: [powerplay]

last message was failed ret is 0

[9.320290] pcieport 0001:00:00.0: AER: Corrected error received:
0001:00:00.0
[9.326226] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, (Transmitter ID)
[9.334845] pcieport 0001:00:00.0:   device [1957:0824] error
status/mask=1000/2000
[9.341899] pcieport 0001:00:00.0:[12] Timeout

[9.387975] amdgpu: [powerplay]

failed to send message 261 ret is 0

[   10.226636] amdgpu: [powerplay]

last message was failed ret is 0

[   10.648275] amdgpu: [powerplay]

failed to send message 261 ret is 0

[   11.486932] amdgpu: [powerplay]

last message was failed ret is 0

[   11.908570] amdgpu: [powerplay]

failed to send message 261 ret is 0

[   12.747228] amdgpu: [powerplay]

last message was failed ret is 0

[   13.168866] amdgpu: [powerplay]

failed to send message 261 ret is 0

[   14.007523] amdgpu: [powerplay]


RE: [PATCH 1/2] amd/amdgpu/sriov enable onevf mode for ARCTURUS VF

2020-01-02 Thread Quan, Evan
Acked-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Jack
> Zhang
> Sent: Thursday, January 2, 2020 3:44 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Jack (Jian) 
> Subject: [PATCH 1/2] amd/amdgpu/sriov enable onevf mode for ARCTURUS VF
> 
> Before, initialization of smu ip block would be skipped for sriov ASICs. But 
> if
> there's only one VF being used, guest driver should be able to dump some HW
> info such as clks, temperature,etc.
> 
> To solve this, now after onevf mode is enabled, host driver will notify 
> guest. If
> it's onevf mode, guest will do smu hw_init and skip some steps in normal smu
> hw_init flow because host driver has already done it for smu.
> 
> With this fix, guest app can talk with smu and dump hw information from smu.
> 
> v2: refine the logic for pm_enabled.Skip hw_init by not changing pm_enabled.
> v3: refine is_support_sw_smu and fix some indentation issue.
> 
> Signed-off-by: Jack Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  3 +-
>  drivers/gpu/drm/amd/amdgpu/soc15.c |  3 +-
>  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 98 --
> 
>  3 files changed, 56 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 0d842a1..5341905 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -1454,7 +1454,8 @@ static int psp_np_fw_load(struct psp_context *psp)
>  || ucode->ucode_id == AMDGPU_UCODE_ID_RLC_G
>   || ucode->ucode_id ==
> AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL
>   || ucode->ucode_id ==
> AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM
> - || ucode->ucode_id ==
> AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM))
> + || ucode->ucode_id ==
> AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM
> + || ucode->ucode_id == AMDGPU_UCODE_ID_SMC))
>   /*skip ucode loading in SRIOV VF */
>   continue;
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 6129fab..26e1c8c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -826,8 +826,7 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev,
> _virtual_ip_block);
>   amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
>   amdgpu_device_ip_block_add(adev, _v4_0_ip_block);
> - if (!amdgpu_sriov_vf(adev))
> - amdgpu_device_ip_block_add(adev,
> _v11_0_ip_block);
> + amdgpu_device_ip_block_add(adev, _v11_0_ip_block);
> 
>   if (unlikely(adev->firmware.load_type ==
> AMDGPU_FW_LOAD_DIRECT))
>   amdgpu_device_ip_block_add(adev,
> _v2_5_ip_block); diff --git
> a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index 4ed8bdc..4b96937 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -531,7 +531,7 @@ bool is_support_sw_smu(struct amdgpu_device *adev)
>   if (adev->asic_type == CHIP_VEGA20)
>   return (amdgpu_dpm == 2) ? true : false;
>   else if (adev->asic_type >= CHIP_ARCTURUS) {
> - if (amdgpu_sriov_vf(adev))
> + if
> (amdgpu_sriov_vf(adev)&& !amdgpu_sriov_is_pp_one_vf(adev))
>   return false;
>   else
>   return true;
> @@ -1061,28 +1061,27 @@ static int smu_smc_table_hw_init(struct
> smu_context *smu,
>   }
> 
>   /* smu_dump_pptable(smu); */
> + if (!amdgpu_sriov_vf(adev)) {
> + /*
> +  * Copy pptable bo in the vram to smc with SMU MSGs such as
> +  * SetDriverDramAddr and TransferTableDram2Smu.
> +  */
> + ret = smu_write_pptable(smu);
> + if (ret)
> + return ret;
> 
> - /*
> -  * Copy pptable bo in the vram to smc with SMU MSGs such as
> -  * SetDriverDramAddr and TransferTableDram2Smu.
> -  */
> - ret = smu_write_pptable(smu);
> - if (ret)
> - return ret;
> -
> - /* issue Run*Btc msg */
> - ret = smu_run_btc(smu);
> - if (ret)
> - return ret;
> -
> - ret = smu_feature_set_allowed_mask(smu);
> - if (ret)
> - return ret;
> -
> - ret = smu_system_features_control(smu, true);
> - if (ret)
> - return ret;
> + /* issue Run*Btc msg */
> + ret = smu_run_btc(smu);
> + if (ret)
> + return ret;
> + ret = smu_feature_set_allowed_mask(smu);
> + if (ret)
> + return ret;
> 
> + ret = smu_system_features_control(smu, true);
> + if (ret)
> +