Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
On 2018-11-07 12:53 p.m., Kuehling, Felix wrote: > [+Philip] > > On 2018-11-07 12:25 a.m., Zhang, Jerry(Junwei) wrote: >> On 11/7/18 1:15 PM, Trigger Huang wrote: >>> Currently, SDMA page queue is not used under SR-IOV VF, and this >>> queue will >>> cause ring test failure in amdgpu module reload case. So just disable >>> it. >>> >>> Signed-off-by: Trigger Huang >> Looks we ran into several issues about it on vega. >> kfd also disabled vega10 for development.(but not sure the detail >> issue for them) >> >> Thus, we may disable it for vega10 as well? >> any comment? Alex, Christian, Flex. > We ran into a regression with the page queue in a specific KFDTest that > runs user mode SDMA in two processes. The SDMA engine would stall for > about 6 seconds after one of the processes terminates (and destroys its > queues). We don't have a root cause. Suspect an SDMA firmware issue. > > Regards, > Felix The SDMA firmware has root cause the bug, I have tested one SDMA firmware, that fixed the KFDIPCTest.BasicTest issue. I am waiting for SDMA firmware check in then enable paging queue for Vega 10. I has asked SDAM firmware if Vega12/Vega20 paging queue issue has the same root cause. Philip > >> Regards, >> Jerry >>> --- >>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> index e39a09eb0f..4edc848 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) >>> adev->sdma.has_page_queue = false; >>> } else { >>> adev->sdma.num_instances = 2; >>> - if (adev->asic_type != CHIP_VEGA20 && >>> + if ((adev->asic_type == CHIP_VEGA10) && >>> amdgpu_sriov_vf((adev))) >>> + adev->sdma.has_page_queue = false; >>> + else if (adev->asic_type != CHIP_VEGA20 && >>> adev->asic_type != CHIP_VEGA12) >>> adev->sdma.has_page_queue = true; >>> } >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
[+Philip] On 2018-11-07 12:25 a.m., Zhang, Jerry(Junwei) wrote: > On 11/7/18 1:15 PM, Trigger Huang wrote: >> Currently, SDMA page queue is not used under SR-IOV VF, and this >> queue will >> cause ring test failure in amdgpu module reload case. So just disable >> it. >> >> Signed-off-by: Trigger Huang > > Looks we ran into several issues about it on vega. > kfd also disabled vega10 for development.(but not sure the detail > issue for them) > > Thus, we may disable it for vega10 as well? > any comment? Alex, Christian, Flex. We ran into a regression with the page queue in a specific KFDTest that runs user mode SDMA in two processes. The SDMA engine would stall for about 6 seconds after one of the processes terminates (and destroys its queues). We don't have a root cause. Suspect an SDMA firmware issue. Regards, Felix > > Regards, > Jerry >> --- >> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> index e39a09eb0f..4edc848 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) >> adev->sdma.has_page_queue = false; >> } else { >> adev->sdma.num_instances = 2; >> - if (adev->asic_type != CHIP_VEGA20 && >> + if ((adev->asic_type == CHIP_VEGA10) && >> amdgpu_sriov_vf((adev))) >> + adev->sdma.has_page_queue = false; >> + else if (adev->asic_type != CHIP_VEGA20 && >> adev->asic_type != CHIP_VEGA12) >> adev->sdma.has_page_queue = true; >> } > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
Yeah, we allow max up to 500ms to let RLCV finish the IDLE command for CP/GFX and SDMA together, and this already introduce very poor user experience ... Looks like this feature doesn't applicable for world switch case /Monk -Original Message- From: Koenig, Christian Sent: Wednesday, November 7, 2018 4:48 PM To: Liu, Monk ; Zhang, Jerry ; Huang, Trigger ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, Felix Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF > is it prepared for PRT (or something like kernel page fault handling > on CPU/MMU side)? That is for providing shared virtual address space (e.g. when the CPU and GPU have the same VA view) as well as changing our memory management in general. > For SRIOV, in theoretically any feature*not* related with hardware > scheduling (MES) or OS preemption (buggy with world switch preemption) > is welcome to SR-IOV, no reason Not to support it as far as I know, > unless not mature enough to enable it The problem is that recoverable page faults in Vega10 are incompatible with SRIOV because a page fault can block the GPU for an undefined amount of time and Vega10 can't schedule those away from the hardware. So the shader thread is blocked and can't be switched away. Under SRIOV that would mean that we just get killed by the hypervisor rather soon. Christian. Am 07.11.18 um 09:40 schrieb Liu, Monk: > Hi Christian > > Thanks for sharing, > Do you further know why we need recoverable page faults ? is it prepared for > PRT (or something like kernel page fault handling on CPU/MMU side)? > > For SRIOV, in theoretically any feature*not* related with hardware > scheduling (MES) or OS preemption (buggy with world switch preemption) > is welcome to SR-IOV, no reason Not to support it as far as I know, > unless not mature enough to enable it > > /Monk > > -Original Message- > From: Koenig, Christian > Sent: Wednesday, November 7, 2018 3:30 PM > To: Liu, Monk ; Zhang, Jerry ; > Huang, Trigger ; amd-gfx@lists.freedesktop.org; > Deucher, Alexander ; Kuehling, Felix > > Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV > VF > > Hi guys, > > this is necessary for recoverable page fault handling. > > When the normal SDMA queue is blocked because of a page fault the SDMA > firmware will switch to the paging queue so that we are able to handle the > fault. > > In general it should work on all Vega (but not Raven) components and we are > going to need it when we enable recoverable page faults. > > The only case I can see where we don't immediately need it is SRIOV, because > the current planning is to not support recoverable page faults there. > > Christian. > > Am 07.11.18 um 08:21 schrieb Liu, Monk: >> Hi team >> >> Why we need this page_queue in amdgpu ? can anyone share something of its >> introduction to the kmd ? >> According to my understanding , gpu-scheduler already have couple levels of >> priority for contexts/entities , thus the job page_queue supposed to do >> (should be mapping/unmapping/moving) is already good took care of by >> "KERNEL" priority entities, and all other context/entity SDMA jobs will be >> handled after "KERNEL" jobs ... >> >> So there is no real benefit to introduce page_queue (also for rlc_queue) to >> amdgpu with the existence of priority aware gpu-scheduler ... unless we are >> going to remove the "KERNEL" priority and always do the mapping/unmapping in >> page_queue ... >> >> /Monk >> >> -----Original Message- >> From: amd-gfx On Behalf Of >> Zhang, Jerry(Junwei) >> Sent: Wednesday, November 7, 2018 1:26 PM >> To: Huang, Trigger ; >> amd-gfx@lists.freedesktop.org; Deucher, Alexander >> ; Koenig, Christian >> ; Kuehling, Felix >> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV >> VF >> >> On 11/7/18 1:15 PM, Trigger Huang wrote: >>> Currently, SDMA page queue is not used under SR-IOV VF, and this >>> queue will cause ring test failure in amdgpu module reload case. So just >>> disable it. >>> >>> Signed-off-by: Trigger Huang >> Looks we ran into several issues about it on vega. >> kfd also disabled vega10 for development.(but not sure the detail >> issue for them) >> >> Thus, we may disable it for vega10 as well? >> any comment? Alex, Christian, Flex. >> >> Regards, >> Jerry >>> --- >>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff -
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
> is it prepared for PRT (or something like kernel page fault handling > on CPU/MMU side)? That is for providing shared virtual address space (e.g. when the CPU and GPU have the same VA view) as well as changing our memory management in general. > For SRIOV, in theoretically any feature*not* related with hardware scheduling > (MES) or OS preemption (buggy with world switch preemption) is welcome to > SR-IOV, no reason > Not to support it as far as I know, unless not mature enough to enable it The problem is that recoverable page faults in Vega10 are incompatible with SRIOV because a page fault can block the GPU for an undefined amount of time and Vega10 can't schedule those away from the hardware. So the shader thread is blocked and can't be switched away. Under SRIOV that would mean that we just get killed by the hypervisor rather soon. Christian. Am 07.11.18 um 09:40 schrieb Liu, Monk: > Hi Christian > > Thanks for sharing, > Do you further know why we need recoverable page faults ? is it prepared for > PRT (or something like kernel page fault handling on CPU/MMU side)? > > For SRIOV, in theoretically any feature*not* related with hardware scheduling > (MES) or OS preemption (buggy with world switch preemption) is welcome to > SR-IOV, no reason > Not to support it as far as I know, unless not mature enough to enable it > > /Monk > > -Original Message- > From: Koenig, Christian > Sent: Wednesday, November 7, 2018 3:30 PM > To: Liu, Monk ; Zhang, Jerry ; Huang, > Trigger ; amd-gfx@lists.freedesktop.org; Deucher, > Alexander ; Kuehling, Felix > > Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF > > Hi guys, > > this is necessary for recoverable page fault handling. > > When the normal SDMA queue is blocked because of a page fault the SDMA > firmware will switch to the paging queue so that we are able to handle the > fault. > > In general it should work on all Vega (but not Raven) components and we are > going to need it when we enable recoverable page faults. > > The only case I can see where we don't immediately need it is SRIOV, because > the current planning is to not support recoverable page faults there. > > Christian. > > Am 07.11.18 um 08:21 schrieb Liu, Monk: >> Hi team >> >> Why we need this page_queue in amdgpu ? can anyone share something of its >> introduction to the kmd ? >> According to my understanding , gpu-scheduler already have couple levels of >> priority for contexts/entities , thus the job page_queue supposed to do >> (should be mapping/unmapping/moving) is already good took care of by >> "KERNEL" priority entities, and all other context/entity SDMA jobs will be >> handled after "KERNEL" jobs ... >> >> So there is no real benefit to introduce page_queue (also for rlc_queue) to >> amdgpu with the existence of priority aware gpu-scheduler ... unless we are >> going to remove the "KERNEL" priority and always do the mapping/unmapping in >> page_queue ... >> >> /Monk >> >> -----Original Message----- >> From: amd-gfx On Behalf Of >> Zhang, Jerry(Junwei) >> Sent: Wednesday, November 7, 2018 1:26 PM >> To: Huang, Trigger ; >> amd-gfx@lists.freedesktop.org; Deucher, Alexander >> ; Koenig, Christian >> ; Kuehling, Felix >> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV >> VF >> >> On 11/7/18 1:15 PM, Trigger Huang wrote: >>> Currently, SDMA page queue is not used under SR-IOV VF, and this >>> queue will cause ring test failure in amdgpu module reload case. So just >>> disable it. >>> >>> Signed-off-by: Trigger Huang >> Looks we ran into several issues about it on vega. >> kfd also disabled vega10 for development.(but not sure the detail >> issue for them) >> >> Thus, we may disable it for vega10 as well? >> any comment? Alex, Christian, Flex. >> >> Regards, >> Jerry >>> --- >>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> index e39a09eb0f..4edc848 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) >>> adev->sdma.has_page_queue = false; >>> } else { >>> adev->sdma.num_instances = 2; >>>
RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
Hi Christian Thanks for sharing, Do you further know why we need recoverable page faults ? is it prepared for PRT (or something like kernel page fault handling on CPU/MMU side)? For SRIOV, in theoretically any feature*not* related with hardware scheduling (MES) or OS preemption (buggy with world switch preemption) is welcome to SR-IOV, no reason Not to support it as far as I know, unless not mature enough to enable it /Monk -Original Message- From: Koenig, Christian Sent: Wednesday, November 7, 2018 3:30 PM To: Liu, Monk ; Zhang, Jerry ; Huang, Trigger ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Kuehling, Felix Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF Hi guys, this is necessary for recoverable page fault handling. When the normal SDMA queue is blocked because of a page fault the SDMA firmware will switch to the paging queue so that we are able to handle the fault. In general it should work on all Vega (but not Raven) components and we are going to need it when we enable recoverable page faults. The only case I can see where we don't immediately need it is SRIOV, because the current planning is to not support recoverable page faults there. Christian. Am 07.11.18 um 08:21 schrieb Liu, Monk: > Hi team > > Why we need this page_queue in amdgpu ? can anyone share something of its > introduction to the kmd ? > According to my understanding , gpu-scheduler already have couple levels of > priority for contexts/entities , thus the job page_queue supposed to do > (should be mapping/unmapping/moving) is already good took care of by "KERNEL" > priority entities, and all other context/entity SDMA jobs will be handled > after "KERNEL" jobs ... > > So there is no real benefit to introduce page_queue (also for rlc_queue) to > amdgpu with the existence of priority aware gpu-scheduler ... unless we are > going to remove the "KERNEL" priority and always do the mapping/unmapping in > page_queue ... > > /Monk > > -Original Message- > From: amd-gfx On Behalf Of > Zhang, Jerry(Junwei) > Sent: Wednesday, November 7, 2018 1:26 PM > To: Huang, Trigger ; > amd-gfx@lists.freedesktop.org; Deucher, Alexander > ; Koenig, Christian > ; Kuehling, Felix > Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV > VF > > On 11/7/18 1:15 PM, Trigger Huang wrote: >> Currently, SDMA page queue is not used under SR-IOV VF, and this >> queue will cause ring test failure in amdgpu module reload case. So just >> disable it. >> >> Signed-off-by: Trigger Huang > Looks we ran into several issues about it on vega. > kfd also disabled vega10 for development.(but not sure the detail > issue for them) > > Thus, we may disable it for vega10 as well? > any comment? Alex, Christian, Flex. > > Regards, > Jerry >> --- >>drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- >>1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> index e39a09eb0f..4edc848 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) >> adev->sdma.has_page_queue = false; >> } else { >> adev->sdma.num_instances = 2; >> -if (adev->asic_type != CHIP_VEGA20 && >> +if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) >> +adev->sdma.has_page_queue = false; >> +else if (adev->asic_type != CHIP_VEGA20 && >> adev->asic_type != CHIP_VEGA12) >> adev->sdma.has_page_queue = true; >> } > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
On 11/7/18 3:55 PM, Koenig, Christian wrote: Am 07.11.18 um 08:41 schrieb Zhang, Jerry(Junwei): On 11/7/18 3:29 PM, Koenig, Christian wrote: Hi guys, this is necessary for recoverable page fault handling. When the normal SDMA queue is blocked because of a page fault the SDMA firmware will switch to the paging queue so that we are able to handle the fault. Thanks for your info. IIRC, page queue has higher priority than gfx queue(previously we were using), so the PT update job on page queue will always be scheduled first in HW. I think so, but that is not it's primary purpose. The key feature is that it still works even when the GFX or RLC queues are blocked because of fault handling. That sounds good functionality. And (not 100% sure) page queue is designed for page migration? Yes, well it is designed for page tables updates. Either while doing migration, fault handling or whatever reason you got. Anyway, we can disable it for SRIOV for their existing issues. It would be nice to have for normal PD/PT updates under SRIOV as well, but as a short term workaround we can probably disable it. Agree. Regards, Jerry Regards, Christian. Regards, Jerry In general it should work on all Vega (but not Raven) components and we are going to need it when we enable recoverable page faults. The only case I can see where we don't immediately need it is SRIOV, because the current planning is to not support recoverable page faults there. Christian. Am 07.11.18 um 08:21 schrieb Liu, Monk: Hi team Why we need this page_queue in amdgpu ? can anyone share something of its introduction to the kmd ? According to my understanding , gpu-scheduler already have couple levels of priority for contexts/entities , thus the job page_queue supposed to do (should be mapping/unmapping/moving) is already good took care of by "KERNEL" priority entities, and all other context/entity SDMA jobs will be handled after "KERNEL" jobs ... So there is no real benefit to introduce page_queue (also for rlc_queue) to amdgpu with the existence of priority aware gpu-scheduler ... unless we are going to remove the "KERNEL" priority and always do the mapping/unmapping in page_queue ... /Monk -Original Message- From: amd-gfx On Behalf Of Zhang, Jerry(Junwei) Sent: Wednesday, November 7, 2018 1:26 PM To: Huang, Trigger ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Koenig, Christian ; Kuehling, Felix Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF On 11/7/18 1:15 PM, Trigger Huang wrote: Currently, SDMA page queue is not used under SR-IOV VF, and this queue will cause ring test failure in amdgpu module reload case. So just disable it. Signed-off-by: Trigger Huang Looks we ran into several issues about it on vega. kfd also disabled vega10 for development.(but not sure the detail issue for them) Thus, we may disable it for vega10 as well? any comment? Alex, Christian, Flex. Regards, Jerry --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index e39a09eb0f..4edc848 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) adev->sdma.has_page_queue = false; } else { adev->sdma.num_instances = 2; - if (adev->asic_type != CHIP_VEGA20 && + if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) + adev->sdma.has_page_queue = false; + else if (adev->asic_type != CHIP_VEGA20 && adev->asic_type != CHIP_VEGA12) adev->sdma.has_page_queue = true; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
Am 07.11.18 um 06:15 schrieb Trigger Huang: Currently, SDMA page queue is not used under SR-IOV VF, and this queue will cause ring test failure in amdgpu module reload case. So just disable it. Signed-off-by: Trigger Huang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index e39a09eb0f..4edc848 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) adev->sdma.has_page_queue = false; } else { adev->sdma.num_instances = 2; - if (adev->asic_type != CHIP_VEGA20 && + if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) + adev->sdma.has_page_queue = false; + else if (adev->asic_type != CHIP_VEGA20 && Please add a /* TODO: Page queue breaks driver reload under SRIOV */ comment. With that done the patch is Reviewed-by: Christian König . Regards, Christian. adev->asic_type != CHIP_VEGA12) adev->sdma.has_page_queue = true; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
Am 07.11.18 um 08:41 schrieb Zhang, Jerry(Junwei): > On 11/7/18 3:29 PM, Koenig, Christian wrote: >> Hi guys, >> >> this is necessary for recoverable page fault handling. >> >> When the normal SDMA queue is blocked because of a page fault the SDMA >> firmware will switch to the paging queue so that we are able to handle >> the fault. > Thanks for your info. > > IIRC, page queue has higher priority than gfx queue(previously we were > using), > so the PT update job on page queue will always be scheduled first in HW. I think so, but that is not it's primary purpose. The key feature is that it still works even when the GFX or RLC queues are blocked because of fault handling. > And (not 100% sure) page queue is designed for page migration? Yes, well it is designed for page tables updates. Either while doing migration, fault handling or whatever reason you got. > Anyway, we can disable it for SRIOV for their existing issues. It would be nice to have for normal PD/PT updates under SRIOV as well, but as a short term workaround we can probably disable it. Regards, Christian. > > Regards, > Jerry > >> >> In general it should work on all Vega (but not Raven) components and we >> are going to need it when we enable recoverable page faults. >> >> The only case I can see where we don't immediately need it is SRIOV, >> because the current planning is to not support recoverable page faults >> there. >> >> Christian. >> >> Am 07.11.18 um 08:21 schrieb Liu, Monk: >>> Hi team >>> >>> Why we need this page_queue in amdgpu ? can anyone share something >>> of its introduction to the kmd ? >>> According to my understanding , gpu-scheduler already have couple >>> levels of priority for contexts/entities , thus the job page_queue >>> supposed to do (should be mapping/unmapping/moving) is already good >>> took care of by "KERNEL" priority entities, and all other >>> context/entity SDMA jobs will be handled after "KERNEL" jobs ... >>> >>> So there is no real benefit to introduce page_queue (also for >>> rlc_queue) to amdgpu with the existence of priority aware >>> gpu-scheduler ... unless we are going to remove the "KERNEL" >>> priority and always do the mapping/unmapping in page_queue ... >>> >>> /Monk >>> >>> -Original Message- >>> From: amd-gfx On Behalf Of >>> Zhang, Jerry(Junwei) >>> Sent: Wednesday, November 7, 2018 1:26 PM >>> To: Huang, Trigger ; >>> amd-gfx@lists.freedesktop.org; Deucher, Alexander >>> ; Koenig, Christian >>> ; Kuehling, Felix >>> Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF >>> >>> On 11/7/18 1:15 PM, Trigger Huang wrote: >>>> Currently, SDMA page queue is not used under SR-IOV VF, and this queue >>>> will cause ring test failure in amdgpu module reload case. So just >>>> disable it. >>>> >>>> Signed-off-by: Trigger Huang >>> Looks we ran into several issues about it on vega. >>> kfd also disabled vega10 for development.(but not sure the detail >>> issue for them) >>> >>> Thus, we may disable it for vega10 as well? >>> any comment? Alex, Christian, Flex. >>> >>> Regards, >>> Jerry >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>>> index e39a09eb0f..4edc848 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >>>> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) >>>> adev->sdma.has_page_queue = false; >>>> } else { >>>> adev->sdma.num_instances = 2; >>>> - if (adev->asic_type != CHIP_VEGA20 && >>>> + if ((adev->asic_type == CHIP_VEGA10) && >>>> amdgpu_sriov_vf((adev))) >>>> + adev->sdma.has_page_queue = false; >>>> + else if (adev->asic_type != CHIP_VEGA20 && >>>> adev->asic_type != CHIP_VEGA12) >>>> adev->sdma.has_page_queue = true; >>>> } >>> ___ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
On 11/7/18 3:29 PM, Koenig, Christian wrote: Hi guys, this is necessary for recoverable page fault handling. When the normal SDMA queue is blocked because of a page fault the SDMA firmware will switch to the paging queue so that we are able to handle the fault. Thanks for your info. IIRC, page queue has higher priority than gfx queue(previously we were using), so the PT update job on page queue will always be scheduled first in HW. And (not 100% sure) page queue is designed for page migration? Anyway, we can disable it for SRIOV for their existing issues. Regards, Jerry In general it should work on all Vega (but not Raven) components and we are going to need it when we enable recoverable page faults. The only case I can see where we don't immediately need it is SRIOV, because the current planning is to not support recoverable page faults there. Christian. Am 07.11.18 um 08:21 schrieb Liu, Monk: Hi team Why we need this page_queue in amdgpu ? can anyone share something of its introduction to the kmd ? According to my understanding , gpu-scheduler already have couple levels of priority for contexts/entities , thus the job page_queue supposed to do (should be mapping/unmapping/moving) is already good took care of by "KERNEL" priority entities, and all other context/entity SDMA jobs will be handled after "KERNEL" jobs ... So there is no real benefit to introduce page_queue (also for rlc_queue) to amdgpu with the existence of priority aware gpu-scheduler ... unless we are going to remove the "KERNEL" priority and always do the mapping/unmapping in page_queue ... /Monk -Original Message- From: amd-gfx On Behalf Of Zhang, Jerry(Junwei) Sent: Wednesday, November 7, 2018 1:26 PM To: Huang, Trigger ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Koenig, Christian ; Kuehling, Felix Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF On 11/7/18 1:15 PM, Trigger Huang wrote: Currently, SDMA page queue is not used under SR-IOV VF, and this queue will cause ring test failure in amdgpu module reload case. So just disable it. Signed-off-by: Trigger Huang Looks we ran into several issues about it on vega. kfd also disabled vega10 for development.(but not sure the detail issue for them) Thus, we may disable it for vega10 as well? any comment? Alex, Christian, Flex. Regards, Jerry --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index e39a09eb0f..4edc848 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) adev->sdma.has_page_queue = false; } else { adev->sdma.num_instances = 2; - if (adev->asic_type != CHIP_VEGA20 && + if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) + adev->sdma.has_page_queue = false; + else if (adev->asic_type != CHIP_VEGA20 && adev->asic_type != CHIP_VEGA12) adev->sdma.has_page_queue = true; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
Hi guys, this is necessary for recoverable page fault handling. When the normal SDMA queue is blocked because of a page fault the SDMA firmware will switch to the paging queue so that we are able to handle the fault. In general it should work on all Vega (but not Raven) components and we are going to need it when we enable recoverable page faults. The only case I can see where we don't immediately need it is SRIOV, because the current planning is to not support recoverable page faults there. Christian. Am 07.11.18 um 08:21 schrieb Liu, Monk: > Hi team > > Why we need this page_queue in amdgpu ? can anyone share something of its > introduction to the kmd ? > According to my understanding , gpu-scheduler already have couple levels of > priority for contexts/entities , thus the job page_queue supposed to do > (should be mapping/unmapping/moving) is already good took care of by "KERNEL" > priority entities, and all other context/entity SDMA jobs will be handled > after "KERNEL" jobs ... > > So there is no real benefit to introduce page_queue (also for rlc_queue) to > amdgpu with the existence of priority aware gpu-scheduler ... unless we are > going to remove the "KERNEL" priority and always do the mapping/unmapping in > page_queue ... > > /Monk > > -Original Message- > From: amd-gfx On Behalf Of Zhang, > Jerry(Junwei) > Sent: Wednesday, November 7, 2018 1:26 PM > To: Huang, Trigger ; amd-gfx@lists.freedesktop.org; > Deucher, Alexander ; Koenig, Christian > ; Kuehling, Felix > Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF > > On 11/7/18 1:15 PM, Trigger Huang wrote: >> Currently, SDMA page queue is not used under SR-IOV VF, and this queue >> will cause ring test failure in amdgpu module reload case. So just disable >> it. >> >> Signed-off-by: Trigger Huang > Looks we ran into several issues about it on vega. > kfd also disabled vega10 for development.(but not sure the detail issue for > them) > > Thus, we may disable it for vega10 as well? > any comment? Alex, Christian, Flex. > > Regards, > Jerry >> --- >>drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- >>1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> index e39a09eb0f..4edc848 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c >> @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) >> adev->sdma.has_page_queue = false; >> } else { >> adev->sdma.num_instances = 2; >> -if (adev->asic_type != CHIP_VEGA20 && >> +if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) >> +adev->sdma.has_page_queue = false; >> +else if (adev->asic_type != CHIP_VEGA20 && >> adev->asic_type != CHIP_VEGA12) >> adev->sdma.has_page_queue = true; >> } > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
Hi team Why we need this page_queue in amdgpu ? can anyone share something of its introduction to the kmd ? According to my understanding , gpu-scheduler already have couple levels of priority for contexts/entities , thus the job page_queue supposed to do (should be mapping/unmapping/moving) is already good took care of by "KERNEL" priority entities, and all other context/entity SDMA jobs will be handled after "KERNEL" jobs ... So there is no real benefit to introduce page_queue (also for rlc_queue) to amdgpu with the existence of priority aware gpu-scheduler ... unless we are going to remove the "KERNEL" priority and always do the mapping/unmapping in page_queue ... /Monk -Original Message- From: amd-gfx On Behalf Of Zhang, Jerry(Junwei) Sent: Wednesday, November 7, 2018 1:26 PM To: Huang, Trigger ; amd-gfx@lists.freedesktop.org; Deucher, Alexander ; Koenig, Christian ; Kuehling, Felix Subject: Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF On 11/7/18 1:15 PM, Trigger Huang wrote: > Currently, SDMA page queue is not used under SR-IOV VF, and this queue > will cause ring test failure in amdgpu module reload case. So just disable it. > > Signed-off-by: Trigger Huang Looks we ran into several issues about it on vega. kfd also disabled vega10 for development.(but not sure the detail issue for them) Thus, we may disable it for vega10 as well? any comment? Alex, Christian, Flex. Regards, Jerry > --- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c > b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c > index e39a09eb0f..4edc848 100644 > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c > @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) > adev->sdma.has_page_queue = false; > } else { > adev->sdma.num_instances = 2; > - if (adev->asic_type != CHIP_VEGA20 && > + if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) > + adev->sdma.has_page_queue = false; > + else if (adev->asic_type != CHIP_VEGA20 && > adev->asic_type != CHIP_VEGA12) > adev->sdma.has_page_queue = true; > } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
On 11/7/18 1:15 PM, Trigger Huang wrote: Currently, SDMA page queue is not used under SR-IOV VF, and this queue will cause ring test failure in amdgpu module reload case. So just disable it. Signed-off-by: Trigger Huang Looks we ran into several issues about it on vega. kfd also disabled vega10 for development.(but not sure the detail issue for them) Thus, we may disable it for vega10 as well? any comment? Alex, Christian, Flex. Regards, Jerry --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index e39a09eb0f..4edc848 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) adev->sdma.has_page_queue = false; } else { adev->sdma.num_instances = 2; - if (adev->asic_type != CHIP_VEGA20 && + if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) + adev->sdma.has_page_queue = false; + else if (adev->asic_type != CHIP_VEGA20 && adev->asic_type != CHIP_VEGA12) adev->sdma.has_page_queue = true; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: disable page queue on Vega10 SR-IOV VF
Currently, SDMA page queue is not used under SR-IOV VF, and this queue will cause ring test failure in amdgpu module reload case. So just disable it. Signed-off-by: Trigger Huang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index e39a09eb0f..4edc848 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -1451,7 +1451,9 @@ static int sdma_v4_0_early_init(void *handle) adev->sdma.has_page_queue = false; } else { adev->sdma.num_instances = 2; - if (adev->asic_type != CHIP_VEGA20 && + if ((adev->asic_type == CHIP_VEGA10) && amdgpu_sriov_vf((adev))) + adev->sdma.has_page_queue = false; + else if (adev->asic_type != CHIP_VEGA20 && adev->asic_type != CHIP_VEGA12) adev->sdma.has_page_queue = true; } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx