Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-07 Thread Yang, Philip
On 2018-11-07 12:53 p.m., Kuehling, Felix wrote:
> [+Philip]
>
> On 2018-11-07 12:25 a.m., Zhang, Jerry(Junwei) wrote:
>> On 11/7/18 1:15 PM, Trigger Huang wrote:
>>> Currently, SDMA page queue is not used under SR-IOV VF, and this
>>> queue will
>>> cause ring test failure in amdgpu module reload case. So just disable
>>> it.
>>>
>>> Signed-off-by: Trigger Huang 
>> Looks we ran into several issues about it on vega.
>> kfd also disabled vega10 for development.(but not sure the detail
>> issue for them)
>>
>> Thus, we may disable it for vega10 as well?
>> any comment? Alex, Christian, Flex.
> We ran into a regression with the page queue in a specific KFDTest that
> runs user mode SDMA in two processes. The SDMA engine would stall for
> about 6 seconds after one of the processes terminates (and destroys its
> queues). We don't have a root cause. Suspect an SDMA firmware issue.
>
> Regards,
>    Felix
The SDMA firmware has root cause the bug, I have tested one SDMA 
firmware, that fixed
the KFDIPCTest.BasicTest issue. I am waiting for SDMA firmware check in 
then enable paging
queue for Vega 10. I has asked SDAM firmware if Vega12/Vega20 paging 
queue issue has
the same root cause.

Philip
>
>> Regards,
>> Jerry
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index e39a09eb0f..4edc848 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>>    adev->sdma.has_page_queue = false;
>>>    } else {
>>>    adev->sdma.num_instances = 2;
>>> -    if (adev->asic_type != CHIP_VEGA20 &&
>>> +    if ((adev->asic_type == CHIP_VEGA10) &&
>>> amdgpu_sriov_vf((adev)))
>>> +    adev->sdma.has_page_queue = false;
>>> +    else if (adev->asic_type != CHIP_VEGA20 &&
>>>    adev->asic_type != CHIP_VEGA12)
>>>    adev->sdma.has_page_queue = true;
>>>    }
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-07 Thread Kuehling, Felix
[+Philip]

On 2018-11-07 12:25 a.m., Zhang, Jerry(Junwei) wrote:
> On 11/7/18 1:15 PM, Trigger Huang wrote:
>> Currently, SDMA page queue is not used under SR-IOV VF, and this
>> queue will
>> cause ring test failure in amdgpu module reload case. So just disable
>> it.
>>
>> Signed-off-by: Trigger Huang 
>
> Looks we ran into several issues about it on vega.
> kfd also disabled vega10 for development.(but not sure the detail
> issue for them)
>
> Thus, we may disable it for vega10 as well?
> any comment? Alex, Christian, Flex.

We ran into a regression with the page queue in a specific KFDTest that
runs user mode SDMA in two processes. The SDMA engine would stall for
about 6 seconds after one of the processes terminates (and destroys its
queues). We don't have a root cause. Suspect an SDMA firmware issue.

Regards,
  Felix


>
> Regards,
> Jerry
>> ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index e39a09eb0f..4edc848 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>   adev->sdma.has_page_queue = false;
>>   } else {
>>   adev->sdma.num_instances = 2;
>> -    if (adev->asic_type != CHIP_VEGA20 &&
>> +    if ((adev->asic_type == CHIP_VEGA10) &&
>> amdgpu_sriov_vf((adev)))
>> +    adev->sdma.has_page_queue = false;
>> +    else if (adev->asic_type != CHIP_VEGA20 &&
>>   adev->asic_type != CHIP_VEGA12)
>>   adev->sdma.has_page_queue = true;
>>   }
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-07 Thread Liu, Monk
Yeah, we allow max up to 500ms to let RLCV finish the IDLE command for CP/GFX 
and SDMA together,  and this already introduce very poor user experience ...

Looks like this feature doesn't applicable for world switch case 

/Monk
-Original Message-
From: Koenig, Christian 
Sent: Wednesday, November 7, 2018 4:48 PM
To: Liu, Monk ; Zhang, Jerry ; Huang, 
Trigger ; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander ; Kuehling, Felix 
Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

> is it prepared for PRT (or something like kernel page fault handling 
> on CPU/MMU side)?
That is for providing shared virtual address space (e.g. when the CPU and GPU 
have the same VA view) as well as changing our memory management in general.

> For SRIOV, in theoretically any feature*not* related with hardware 
> scheduling (MES) or OS preemption (buggy with world switch preemption) 
> is welcome to SR-IOV, no reason Not to support it as far as I know, 
> unless not mature enough to enable it
The problem is that recoverable page faults in Vega10 are incompatible with 
SRIOV because a page fault can block the GPU for an undefined amount of time 
and Vega10 can't schedule those away from the hardware.

So the shader thread is blocked and can't be switched away. Under SRIOV that 
would mean that we just get killed by the hypervisor rather soon.

Christian.

Am 07.11.18 um 09:40 schrieb Liu, Monk:
> Hi Christian
>
> Thanks for sharing,
> Do you further know why we need recoverable page faults ? is it prepared for 
> PRT (or something like kernel page fault handling on CPU/MMU side)?
>
> For SRIOV, in theoretically any feature*not* related with hardware 
> scheduling (MES) or OS preemption (buggy with world switch preemption) 
> is welcome to SR-IOV, no reason Not to support it as far as I know, 
> unless not mature enough to enable it
>
> /Monk
>
> -Original Message-
> From: Koenig, Christian
> Sent: Wednesday, November 7, 2018 3:30 PM
> To: Liu, Monk ; Zhang, Jerry ; 
> Huang, Trigger ; amd-gfx@lists.freedesktop.org; 
> Deucher, Alexander ; Kuehling, Felix 
> 
> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV 
> VF
>
> Hi guys,
>
> this is necessary for recoverable page fault handling.
>
> When the normal SDMA queue is blocked because of a page fault the SDMA 
> firmware will switch to the paging queue so that we are able to handle the 
> fault.
>
> In general it should work on all Vega (but not Raven) components and we are 
> going to need it when we enable recoverable page faults.
>
> The only case I can see where we don't immediately need it is SRIOV, because 
> the current planning is to not support recoverable page faults there.
>
> Christian.
>
> Am 07.11.18 um 08:21 schrieb Liu, Monk:
>> Hi team
>>
>> Why we need this page_queue in amdgpu ?  can anyone share something of its 
>> introduction to the kmd ?
>> According to my understanding , gpu-scheduler already have couple levels of 
>> priority for contexts/entities , thus the job page_queue supposed to do 
>> (should be mapping/unmapping/moving) is already good took care of by 
>> "KERNEL" priority entities, and all other context/entity SDMA jobs will be 
>> handled after "KERNEL" jobs ...
>>
>> So there is no real benefit to introduce page_queue (also for rlc_queue) to 
>> amdgpu with the existence of priority aware gpu-scheduler ... unless we are 
>> going to remove the "KERNEL" priority and always do the mapping/unmapping in 
>> page_queue ...
>>
>> /Monk
>>
>> -----Original Message-
>> From: amd-gfx  On Behalf Of 
>> Zhang, Jerry(Junwei)
>> Sent: Wednesday, November 7, 2018 1:26 PM
>> To: Huang, Trigger ; 
>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>> ; Koenig, Christian 
>> ; Kuehling, Felix 
>> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV 
>> VF
>>
>> On 11/7/18 1:15 PM, Trigger Huang wrote:
>>> Currently, SDMA page queue is not used under SR-IOV VF, and this 
>>> queue will cause ring test failure in amdgpu module reload case. So just 
>>> disable it.
>>>
>>> Signed-off-by: Trigger Huang 
>> Looks we ran into several issues about it on vega.
>> kfd also disabled vega10 for development.(but not sure the detail 
>> issue for them)
>>
>> Thus, we may disable it for vega10 as well?
>> any comment? Alex, Christian, Flex.
>>
>> Regards,
>> Jerry
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff -

Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-07 Thread Koenig, Christian
> is it prepared for PRT (or something like kernel page fault handling 
> on CPU/MMU side)?
That is for providing shared virtual address space (e.g. when the CPU 
and GPU have the same VA view) as well as changing our memory management 
in general.

> For SRIOV, in theoretically any feature*not* related with hardware scheduling 
> (MES) or OS preemption (buggy with world switch preemption) is welcome to 
> SR-IOV, no reason
> Not to support it as far as I know, unless not mature enough to enable it
The problem is that recoverable page faults in Vega10 are incompatible 
with SRIOV because a page fault can block the GPU for an undefined 
amount of time and Vega10 can't schedule those away from the hardware.

So the shader thread is blocked and can't be switched away. Under SRIOV 
that would mean that we just get killed by the hypervisor rather soon.

Christian.

Am 07.11.18 um 09:40 schrieb Liu, Monk:
> Hi Christian
>
> Thanks for sharing,
> Do you further know why we need recoverable page faults ? is it prepared for 
> PRT (or something like kernel page fault handling on CPU/MMU side)?
>
> For SRIOV, in theoretically any feature*not* related with hardware scheduling 
> (MES) or OS preemption (buggy with world switch preemption) is welcome to 
> SR-IOV, no reason
> Not to support it as far as I know, unless not mature enough to enable it
>
> /Monk
>
> -Original Message-
> From: Koenig, Christian
> Sent: Wednesday, November 7, 2018 3:30 PM
> To: Liu, Monk ; Zhang, Jerry ; Huang, 
> Trigger ; amd-gfx@lists.freedesktop.org; Deucher, 
> Alexander ; Kuehling, Felix 
> 
> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
>
> Hi guys,
>
> this is necessary for recoverable page fault handling.
>
> When the normal SDMA queue is blocked because of a page fault the SDMA 
> firmware will switch to the paging queue so that we are able to handle the 
> fault.
>
> In general it should work on all Vega (but not Raven) components and we are 
> going to need it when we enable recoverable page faults.
>
> The only case I can see where we don't immediately need it is SRIOV, because 
> the current planning is to not support recoverable page faults there.
>
> Christian.
>
> Am 07.11.18 um 08:21 schrieb Liu, Monk:
>> Hi team
>>
>> Why we need this page_queue in amdgpu ?  can anyone share something of its 
>> introduction to the kmd ?
>> According to my understanding , gpu-scheduler already have couple levels of 
>> priority for contexts/entities , thus the job page_queue supposed to do 
>> (should be mapping/unmapping/moving) is already good took care of by 
>> "KERNEL" priority entities, and all other context/entity SDMA jobs will be 
>> handled after "KERNEL" jobs ...
>>
>> So there is no real benefit to introduce page_queue (also for rlc_queue) to 
>> amdgpu with the existence of priority aware gpu-scheduler ... unless we are 
>> going to remove the "KERNEL" priority and always do the mapping/unmapping in 
>> page_queue ...
>>
>> /Monk
>>
>> -----Original Message-----
>> From: amd-gfx  On Behalf Of
>> Zhang, Jerry(Junwei)
>> Sent: Wednesday, November 7, 2018 1:26 PM
>> To: Huang, Trigger ;
>> amd-gfx@lists.freedesktop.org; Deucher, Alexander
>> ; Koenig, Christian
>> ; Kuehling, Felix 
>> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV
>> VF
>>
>> On 11/7/18 1:15 PM, Trigger Huang wrote:
>>> Currently, SDMA page queue is not used under SR-IOV VF, and this
>>> queue will cause ring test failure in amdgpu module reload case. So just 
>>> disable it.
>>>
>>> Signed-off-by: Trigger Huang 
>> Looks we ran into several issues about it on vega.
>> kfd also disabled vega10 for development.(but not sure the detail
>> issue for them)
>>
>> Thus, we may disable it for vega10 as well?
>> any comment? Alex, Christian, Flex.
>>
>> Regards,
>> Jerry
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index e39a09eb0f..4edc848 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>> adev->sdma.has_page_queue = false;
>>> } else {
>>> adev->sdma.num_instances = 2;
>>> 

RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-07 Thread Liu, Monk
Hi Christian

Thanks for sharing,
Do you further know why we need recoverable page faults ? is it prepared for 
PRT (or something like kernel page fault handling on CPU/MMU side)?

For SRIOV, in theoretically any feature*not* related with hardware scheduling 
(MES) or OS preemption (buggy with world switch preemption) is welcome to 
SR-IOV, no reason 
Not to support it as far as I know, unless not mature enough to enable it 

/Monk 

-Original Message-
From: Koenig, Christian 
Sent: Wednesday, November 7, 2018 3:30 PM
To: Liu, Monk ; Zhang, Jerry ; Huang, 
Trigger ; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander ; Kuehling, Felix 
Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

Hi guys,

this is necessary for recoverable page fault handling.

When the normal SDMA queue is blocked because of a page fault the SDMA firmware 
will switch to the paging queue so that we are able to handle the fault.

In general it should work on all Vega (but not Raven) components and we are 
going to need it when we enable recoverable page faults.

The only case I can see where we don't immediately need it is SRIOV, because 
the current planning is to not support recoverable page faults there.

Christian.

Am 07.11.18 um 08:21 schrieb Liu, Monk:
> Hi team
>
> Why we need this page_queue in amdgpu ?  can anyone share something of its 
> introduction to the kmd ?
> According to my understanding , gpu-scheduler already have couple levels of 
> priority for contexts/entities , thus the job page_queue supposed to do 
> (should be mapping/unmapping/moving) is already good took care of by "KERNEL" 
> priority entities, and all other context/entity SDMA jobs will be handled 
> after "KERNEL" jobs ...
>
> So there is no real benefit to introduce page_queue (also for rlc_queue) to 
> amdgpu with the existence of priority aware gpu-scheduler ... unless we are 
> going to remove the "KERNEL" priority and always do the mapping/unmapping in 
> page_queue ...
>
> /Monk
>
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Zhang, Jerry(Junwei)
> Sent: Wednesday, November 7, 2018 1:26 PM
> To: Huang, Trigger ; 
> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
> ; Koenig, Christian 
> ; Kuehling, Felix 
> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV 
> VF
>
> On 11/7/18 1:15 PM, Trigger Huang wrote:
>> Currently, SDMA page queue is not used under SR-IOV VF, and this 
>> queue will cause ring test failure in amdgpu module reload case. So just 
>> disable it.
>>
>> Signed-off-by: Trigger Huang 
> Looks we ran into several issues about it on vega.
> kfd also disabled vega10 for development.(but not sure the detail 
> issue for them)
>
> Thus, we may disable it for vega10 as well?
> any comment? Alex, Christian, Flex.
>
> Regards,
> Jerry
>> ---
>>drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index e39a09eb0f..4edc848 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>  adev->sdma.has_page_queue = false;
>>  } else {
>>  adev->sdma.num_instances = 2;
>> -if (adev->asic_type != CHIP_VEGA20 &&
>> +if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
>> +adev->sdma.has_page_queue = false;
>> +else if (adev->asic_type != CHIP_VEGA20 &&
>>  adev->asic_type != CHIP_VEGA12)
>>  adev->sdma.has_page_queue = true;
>>  }
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-07 Thread Zhang, Jerry(Junwei)

On 11/7/18 3:55 PM, Koenig, Christian wrote:

Am 07.11.18 um 08:41 schrieb Zhang, Jerry(Junwei):

On 11/7/18 3:29 PM, Koenig, Christian wrote:

Hi guys,

this is necessary for recoverable page fault handling.

When the normal SDMA queue is blocked because of a page fault the SDMA
firmware will switch to the paging queue so that we are able to handle
the fault.

Thanks for your info.

IIRC, page queue has higher priority than gfx queue(previously we were
using),
so the PT update job on page queue will always be scheduled first in HW.

I think so, but that is not it's primary purpose. The key feature is
that it still works even when the GFX or RLC queues are blocked because
of fault handling.


That sounds good functionality.




And (not 100% sure) page queue is designed for page migration?

Yes, well it is designed for page tables updates. Either while doing
migration, fault handling or whatever reason you got.


Anyway, we can disable it for SRIOV for their existing issues.

It would be nice to have for normal PD/PT updates under SRIOV as well,
but as a short term workaround we can probably disable it.


Agree.

Regards,
Jerry



Regards,
Christian.


Regards,
Jerry


In general it should work on all Vega (but not Raven) components and we
are going to need it when we enable recoverable page faults.

The only case I can see where we don't immediately need it is SRIOV,
because the current planning is to not support recoverable page faults
there.

Christian.

Am 07.11.18 um 08:21 schrieb Liu, Monk:

Hi team

Why we need this page_queue in amdgpu ?  can anyone share something
of its introduction to the kmd ?
According to my understanding , gpu-scheduler already have couple
levels of priority for contexts/entities , thus the job page_queue
supposed to do (should be mapping/unmapping/moving) is already good
took care of by "KERNEL" priority entities, and all other
context/entity SDMA jobs will be handled after "KERNEL" jobs ...

So there is no real benefit to introduce page_queue (also for
rlc_queue) to amdgpu with the existence of priority aware
gpu-scheduler ... unless we are going to remove the "KERNEL"
priority and always do the mapping/unmapping in page_queue ...

/Monk

-Original Message-
From: amd-gfx  On Behalf Of
Zhang, Jerry(Junwei)
Sent: Wednesday, November 7, 2018 1:26 PM
To: Huang, Trigger ;
amd-gfx@lists.freedesktop.org; Deucher, Alexander
; Koenig, Christian
; Kuehling, Felix 
Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

On 11/7/18 1:15 PM, Trigger Huang wrote:

Currently, SDMA page queue is not used under SR-IOV VF, and this queue
will cause ring test failure in amdgpu module reload case. So just
disable it.

Signed-off-by: Trigger Huang 

Looks we ran into several issues about it on vega.
kfd also disabled vega10 for development.(but not sure the detail
issue for them)

Thus, we may disable it for vega10 as well?
any comment? Alex, Christian, Flex.

Regards,
Jerry

---
     drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
     1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index e39a09eb0f..4edc848 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
     adev->sdma.has_page_queue = false;
     } else {
     adev->sdma.num_instances = 2;
-    if (adev->asic_type != CHIP_VEGA20 &&
+    if ((adev->asic_type == CHIP_VEGA10) &&
amdgpu_sriov_vf((adev)))
+    adev->sdma.has_page_queue = false;
+    else if (adev->asic_type != CHIP_VEGA20 &&
     adev->asic_type != CHIP_VEGA12)
     adev->sdma.has_page_queue = true;
     }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-06 Thread Christian König

Am 07.11.18 um 06:15 schrieb Trigger Huang:

Currently, SDMA page queue is not used under SR-IOV VF, and this queue will
cause ring test failure in amdgpu module reload case. So just disable it.

Signed-off-by: Trigger Huang 
---
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index e39a09eb0f..4edc848 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
adev->sdma.has_page_queue = false;
} else {
adev->sdma.num_instances = 2;
-   if (adev->asic_type != CHIP_VEGA20 &&
+   if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
+   adev->sdma.has_page_queue = false;
+   else if (adev->asic_type != CHIP_VEGA20 &&


Please add a /* TODO: Page queue breaks driver reload under SRIOV */ 
comment.


With that done the patch is Reviewed-by: Christian König 
.


Regards,
Christian.


adev->asic_type != CHIP_VEGA12)
adev->sdma.has_page_queue = true;
}


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-06 Thread Koenig, Christian
Am 07.11.18 um 08:41 schrieb Zhang, Jerry(Junwei):
> On 11/7/18 3:29 PM, Koenig, Christian wrote:
>> Hi guys,
>>
>> this is necessary for recoverable page fault handling.
>>
>> When the normal SDMA queue is blocked because of a page fault the SDMA
>> firmware will switch to the paging queue so that we are able to handle
>> the fault.
> Thanks for your info.
>
> IIRC, page queue has higher priority than gfx queue(previously we were 
> using),
> so the PT update job on page queue will always be scheduled first in HW.

I think so, but that is not it's primary purpose. The key feature is 
that it still works even when the GFX or RLC queues are blocked because 
of fault handling.

> And (not 100% sure) page queue is designed for page migration?

Yes, well it is designed for page tables updates. Either while doing 
migration, fault handling or whatever reason you got.

> Anyway, we can disable it for SRIOV for their existing issues.

It would be nice to have for normal PD/PT updates under SRIOV as well, 
but as a short term workaround we can probably disable it.

Regards,
Christian.

>
> Regards,
> Jerry
>
>>
>> In general it should work on all Vega (but not Raven) components and we
>> are going to need it when we enable recoverable page faults.
>>
>> The only case I can see where we don't immediately need it is SRIOV,
>> because the current planning is to not support recoverable page faults
>> there.
>>
>> Christian.
>>
>> Am 07.11.18 um 08:21 schrieb Liu, Monk:
>>> Hi team
>>>
>>> Why we need this page_queue in amdgpu ?  can anyone share something 
>>> of its introduction to the kmd ?
>>> According to my understanding , gpu-scheduler already have couple 
>>> levels of priority for contexts/entities , thus the job page_queue 
>>> supposed to do (should be mapping/unmapping/moving) is already good 
>>> took care of by "KERNEL" priority entities, and all other 
>>> context/entity SDMA jobs will be handled after "KERNEL" jobs ...
>>>
>>> So there is no real benefit to introduce page_queue (also for 
>>> rlc_queue) to amdgpu with the existence of priority aware 
>>> gpu-scheduler ... unless we are going to remove the "KERNEL" 
>>> priority and always do the mapping/unmapping in page_queue ...
>>>
>>> /Monk
>>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of 
>>> Zhang, Jerry(Junwei)
>>> Sent: Wednesday, November 7, 2018 1:26 PM
>>> To: Huang, Trigger ; 
>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>> ; Koenig, Christian 
>>> ; Kuehling, Felix 
>>> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
>>>
>>> On 11/7/18 1:15 PM, Trigger Huang wrote:
>>>> Currently, SDMA page queue is not used under SR-IOV VF, and this queue
>>>> will cause ring test failure in amdgpu module reload case. So just 
>>>> disable it.
>>>>
>>>> Signed-off-by: Trigger Huang 
>>> Looks we ran into several issues about it on vega.
>>> kfd also disabled vega10 for development.(but not sure the detail 
>>> issue for them)
>>>
>>> Thus, we may disable it for vega10 as well?
>>> any comment? Alex, Christian, Flex.
>>>
>>> Regards,
>>> Jerry
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>>>     1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> index e39a09eb0f..4edc848 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>>>     adev->sdma.has_page_queue = false;
>>>>     } else {
>>>>     adev->sdma.num_instances = 2;
>>>> -    if (adev->asic_type != CHIP_VEGA20 &&
>>>> +    if ((adev->asic_type == CHIP_VEGA10) && 
>>>> amdgpu_sriov_vf((adev)))
>>>> +    adev->sdma.has_page_queue = false;
>>>> +    else if (adev->asic_type != CHIP_VEGA20 &&
>>>>     adev->asic_type != CHIP_VEGA12)
>>>>     adev->sdma.has_page_queue = true;
>>>>     }
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-06 Thread Zhang, Jerry(Junwei)

On 11/7/18 3:29 PM, Koenig, Christian wrote:

Hi guys,

this is necessary for recoverable page fault handling.

When the normal SDMA queue is blocked because of a page fault the SDMA
firmware will switch to the paging queue so that we are able to handle
the fault.

Thanks for your info.

IIRC, page queue has higher priority than gfx queue(previously we were 
using),

so the PT update job on page queue will always be scheduled first in HW.

And (not 100% sure) page queue is designed for page migration?

Anyway, we can disable it for SRIOV for their existing issues.

Regards,
Jerry



In general it should work on all Vega (but not Raven) components and we
are going to need it when we enable recoverable page faults.

The only case I can see where we don't immediately need it is SRIOV,
because the current planning is to not support recoverable page faults
there.

Christian.

Am 07.11.18 um 08:21 schrieb Liu, Monk:

Hi team

Why we need this page_queue in amdgpu ?  can anyone share something of its 
introduction to the kmd ?
According to my understanding , gpu-scheduler already have couple levels of priority for 
contexts/entities , thus the job page_queue supposed to do (should be mapping/unmapping/moving) is 
already good took care of by "KERNEL" priority entities, and all other context/entity 
SDMA jobs will be handled after "KERNEL" jobs ...

So there is no real benefit to introduce page_queue (also for rlc_queue) to amdgpu with 
the existence of priority aware gpu-scheduler ... unless we are going to remove the 
"KERNEL" priority and always do the mapping/unmapping in page_queue ...

/Monk

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Jerry(Junwei)
Sent: Wednesday, November 7, 2018 1:26 PM
To: Huang, Trigger ; amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Koenig, Christian ; Kuehling, Felix 

Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

On 11/7/18 1:15 PM, Trigger Huang wrote:

Currently, SDMA page queue is not used under SR-IOV VF, and this queue
will cause ring test failure in amdgpu module reload case. So just disable it.

Signed-off-by: Trigger Huang 

Looks we ran into several issues about it on vega.
kfd also disabled vega10 for development.(but not sure the detail issue for 
them)

Thus, we may disable it for vega10 as well?
any comment? Alex, Christian, Flex.

Regards,
Jerry

---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index e39a09eb0f..4edc848 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
adev->sdma.has_page_queue = false;
} else {
adev->sdma.num_instances = 2;
-   if (adev->asic_type != CHIP_VEGA20 &&
+   if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
+   adev->sdma.has_page_queue = false;
+   else if (adev->asic_type != CHIP_VEGA20 &&
adev->asic_type != CHIP_VEGA12)
adev->sdma.has_page_queue = true;
}

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-06 Thread Koenig, Christian
Hi guys,

this is necessary for recoverable page fault handling.

When the normal SDMA queue is blocked because of a page fault the SDMA 
firmware will switch to the paging queue so that we are able to handle 
the fault.

In general it should work on all Vega (but not Raven) components and we 
are going to need it when we enable recoverable page faults.

The only case I can see where we don't immediately need it is SRIOV, 
because the current planning is to not support recoverable page faults 
there.

Christian.

Am 07.11.18 um 08:21 schrieb Liu, Monk:
> Hi team
>
> Why we need this page_queue in amdgpu ?  can anyone share something of its 
> introduction to the kmd ?
> According to my understanding , gpu-scheduler already have couple levels of 
> priority for contexts/entities , thus the job page_queue supposed to do 
> (should be mapping/unmapping/moving) is already good took care of by "KERNEL" 
> priority entities, and all other context/entity SDMA jobs will be handled 
> after "KERNEL" jobs ...
>
> So there is no real benefit to introduce page_queue (also for rlc_queue) to 
> amdgpu with the existence of priority aware gpu-scheduler ... unless we are 
> going to remove the "KERNEL" priority and always do the mapping/unmapping in 
> page_queue ...
>
> /Monk
>
> -Original Message-
> From: amd-gfx  On Behalf Of Zhang, 
> Jerry(Junwei)
> Sent: Wednesday, November 7, 2018 1:26 PM
> To: Huang, Trigger ; amd-gfx@lists.freedesktop.org; 
> Deucher, Alexander ; Koenig, Christian 
> ; Kuehling, Felix 
> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
>
> On 11/7/18 1:15 PM, Trigger Huang wrote:
>> Currently, SDMA page queue is not used under SR-IOV VF, and this queue
>> will cause ring test failure in amdgpu module reload case. So just disable 
>> it.
>>
>> Signed-off-by: Trigger Huang 
> Looks we ran into several issues about it on vega.
> kfd also disabled vega10 for development.(but not sure the detail issue for 
> them)
>
> Thus, we may disable it for vega10 as well?
> any comment? Alex, Christian, Flex.
>
> Regards,
> Jerry
>> ---
>>drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>>1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index e39a09eb0f..4edc848 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>>  adev->sdma.has_page_queue = false;
>>  } else {
>>  adev->sdma.num_instances = 2;
>> -if (adev->asic_type != CHIP_VEGA20 &&
>> +if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
>> +adev->sdma.has_page_queue = false;
>> +else if (adev->asic_type != CHIP_VEGA20 &&
>>  adev->asic_type != CHIP_VEGA12)
>>  adev->sdma.has_page_queue = true;
>>  }
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-06 Thread Liu, Monk
Hi team

Why we need this page_queue in amdgpu ?  can anyone share something of its 
introduction to the kmd ?
According to my understanding , gpu-scheduler already have couple levels of 
priority for contexts/entities , thus the job page_queue supposed to do (should 
be mapping/unmapping/moving) is already good took care of by "KERNEL" priority 
entities, and all other context/entity SDMA jobs will be handled after "KERNEL" 
jobs ...

So there is no real benefit to introduce page_queue (also for rlc_queue) to 
amdgpu with the existence of priority aware gpu-scheduler ... unless we are 
going to remove the "KERNEL" priority and always do the mapping/unmapping in 
page_queue ...

/Monk

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Jerry(Junwei)
Sent: Wednesday, November 7, 2018 1:26 PM
To: Huang, Trigger ; amd-gfx@lists.freedesktop.org; 
Deucher, Alexander ; Koenig, Christian 
; Kuehling, Felix 
Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

On 11/7/18 1:15 PM, Trigger Huang wrote:
> Currently, SDMA page queue is not used under SR-IOV VF, and this queue 
> will cause ring test failure in amdgpu module reload case. So just disable it.
>
> Signed-off-by: Trigger Huang 

Looks we ran into several issues about it on vega.
kfd also disabled vega10 for development.(but not sure the detail issue for 
them)

Thus, we may disable it for vega10 as well?
any comment? Alex, Christian, Flex.

Regards,
Jerry
> ---
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index e39a09eb0f..4edc848 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
>   adev->sdma.has_page_queue = false;
>   } else {
>   adev->sdma.num_instances = 2;
> - if (adev->asic_type != CHIP_VEGA20 &&
> + if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
> + adev->sdma.has_page_queue = false;
> + else if (adev->asic_type != CHIP_VEGA20 &&
>   adev->asic_type != CHIP_VEGA12)
>   adev->sdma.has_page_queue = true;
>   }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-06 Thread Zhang, Jerry(Junwei)

On 11/7/18 1:15 PM, Trigger Huang wrote:

Currently, SDMA page queue is not used under SR-IOV VF, and this queue will
cause ring test failure in amdgpu module reload case. So just disable it.

Signed-off-by: Trigger Huang 


Looks we ran into several issues about it on vega.
kfd also disabled vega10 for development.(but not sure the detail issue 
for them)


Thus, we may disable it for vega10 as well?
any comment? Alex, Christian, Flex.

Regards,
Jerry

---
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index e39a09eb0f..4edc848 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
adev->sdma.has_page_queue = false;
} else {
adev->sdma.num_instances = 2;
-   if (adev->asic_type != CHIP_VEGA20 &&
+   if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
+   adev->sdma.has_page_queue = false;
+   else if (adev->asic_type != CHIP_VEGA20 &&
adev->asic_type != CHIP_VEGA12)
adev->sdma.has_page_queue = true;
}


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF

2018-11-06 Thread Trigger Huang
Currently, SDMA page queue is not used under SR-IOV VF, and this queue will
cause ring test failure in amdgpu module reload case. So just disable it.

Signed-off-by: Trigger Huang 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index e39a09eb0f..4edc848 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle)
adev->sdma.has_page_queue = false;
} else {
adev->sdma.num_instances = 2;
-   if (adev->asic_type != CHIP_VEGA20 &&
+   if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev)))
+   adev->sdma.has_page_queue = false;
+   else if (adev->asic_type != CHIP_VEGA20 &&
adev->asic_type != CHIP_VEGA12)
adev->sdma.has_page_queue = true;
}
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx