Re: [PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Zeng, Oak
Ok, that makes sense. Thanks for explaining. Regards, Oak On 2021-05-03, 3:13 PM, "Huang, JinHuiEric" wrote: In drivers/acpi/numa/srat.c, the generic CCD parsing is for the mapping of numa node and pxm domain that creates arrays of pxm_to_node_map and node_to_pxm_map. We are

Re: [PATCH] drm/amdkfd: fix no atomics settings in the kfd topology

2021-05-03 Thread Zeng, Oak
Reviewed-by: Oak Zeng Regards, Oak On 2021-05-03, 3:49 PM, "amd-gfx on behalf of Jonathan Kim" wrote: To account for various PCIe and xGMI setups, check the no atomics settings for a device in relation to every direct peer. v2: apply suggested clean ups in main loop.

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-03 Thread Andrey Grodzovsky
On 2021-04-29 3:08 a.m., Christian König wrote: Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate Maybe split that up into more patches.

Re: [PATCH] drm/amdkfd: fix no atomics settings in the kfd topology

2021-05-03 Thread Felix Kuehling
Am 2021-05-03 um 3:49 p.m. schrieb Jonathan Kim: > To account for various PCIe and xGMI setups, check the no atomics settings > for a device in relation to every direct peer. > > v2: apply suggested clean ups in main loop. > > Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling > --- >

[PATCH] drm/amdkfd: fix no atomics settings in the kfd topology

2021-05-03 Thread Jonathan Kim
To account for various PCIe and xGMI setups, check the no atomics settings for a device in relation to every direct peer. v2: apply suggested clean ups in main loop. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 61 ++- 1 file changed, 37

Re: [PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Felix Kuehling
Am 2021-05-03 um 3:27 p.m. schrieb Eric Huang: > > > On 2021-05-03 3:13 p.m., Felix Kuehling wrote: >> Am 2021-05-03 um 10:47 a.m. schrieb Eric Huang: >>> In NPS4 BIOS we need to find the closest numa node when creating >>> topology io link between cpu and gpu, if PCI driver doesn't set >>> it.

Re: [PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Eric Huang
On 2021-05-03 3:13 p.m., Felix Kuehling wrote: Am 2021-05-03 um 10:47 a.m. schrieb Eric Huang: In NPS4 BIOS we need to find the closest numa node when creating topology io link between cpu and gpu, if PCI driver doesn't set it. Signed-off-by: Eric Huang ---

Re: [PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Eric Huang
In drivers/acpi/numa/srat.c, the generic CCD parsing is for the mapping of numa node and pxm domain that creates arrays of pxm_to_node_map and node_to_pxm_map. We are currently using API pxm_to_node() to get the corresponding information. For GCD parsing, the relation of GCD to CCD is defined

Re: [PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Felix Kuehling
Am 2021-05-03 um 10:47 a.m. schrieb Eric Huang: > In NPS4 BIOS we need to find the closest numa node when creating > topology io link between cpu and gpu, if PCI driver doesn't set > it. > > Signed-off-by: Eric Huang > --- > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 95 ++-

Re: [PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Zeng, Oak
I feel such parsing work should be part of the ACPI generic work so should be done in drivers/acpi/num/srat.c (see acpi_table_parse_srat) and the acpi subsystem should expose APIs for rest drivers to query such numa information. Regards, Oak On 2021-04-28, 11:12 AM, "amd-gfx on behalf of

Re: [RFC] CRIU support for ROCm

2021-05-03 Thread Felix Kuehling
Am 2021-05-01 um 1:03 p.m. schrieb Adrian Reber: > On Fri, Apr 30, 2021 at 09:57:45PM -0400, Felix Kuehling wrote: >> We have been working on a prototype supporting CRIU (Checkpoint/Restore >> In Userspace) for accelerated compute applications running on AMD GPUs >> using ROCm (Radeon Open Compute

Re: [PATCH] drm/amdkfd: fix no atomics settings in the kfd topology

2021-05-03 Thread Felix Kuehling
Am 2021-05-03 um 11:57 a.m. schrieb Jonathan Kim: > To account for various PCIe and xGMI setups, check the no atomics settings > for a device in relation to every direct peer. Thanks, this looks reasonable. Some more nit-picks about naming and coding style inline. > > Signed-off-by: Jonathan

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-03 Thread Andrey Grodzovsky
On 2021-04-29 11:13 p.m., Alex Deucher wrote: On Wed, Apr 28, 2021 at 11:13 AM Andrey Grodzovsky wrote: Handle all DMA IOMMU gropup related dependencies before the group is removed. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate Signed-off-by: Andrey Grodzovsky

Re: [PATCH] Update NV SIMD-per-CU to 2

2021-05-03 Thread Felix Kuehling
Am 2021-05-03 um 1:25 p.m. schrieb Joseph Greathouse: > Navi series GPUs have 2 SIMDs per CU (and then 2 CUs per WGP). > The NV enum headers incorrectly listed this as 4, which later meant > we were incorrectly reporting the number of SIMDs in the HSA > topology. This could cause problems down the

Re: [PATCH] Update NV SIMD-per-CU to 2

2021-05-03 Thread Deucher, Alexander
[AMD Official Use Only - Internal Distribution Only] Please fix the subject: drm/amdgpu: Update NV SIMD-per-CU to 2 With that fixed, the patch is: Reviewed-by: Alex Deucher From: amd-gfx on behalf of Joseph Greathouse Sent: Monday, May 3, 2021 1:25 PM To:

Re: [PATCH v1 1/1] drm/dp_mst: Use kHz as link rate units when settig source max link caps at init

2021-05-03 Thread Jani Nikula
On Mon, 03 May 2021, Nikola Cornij wrote: > [why] > Link rate in kHz is what is eventually required to calculate the link > bandwidth, which makes kHz a more generic unit. This should also make > forward-compatibility with new DP standards easier. > > [how] > - Replace 'link rate DPCD code' with

[PATCH] Update NV SIMD-per-CU to 2

2021-05-03 Thread Joseph Greathouse
Navi series GPUs have 2 SIMDs per CU (and then 2 CUs per WGP). The NV enum headers incorrectly listed this as 4, which later meant we were incorrectly reporting the number of SIMDs in the HSA topology. This could cause problems down the line for user-space applications that want to launch a fixed

[PATCH v1 1/1] drm/dp_mst: Use kHz as link rate units when settig source max link caps at init

2021-05-03 Thread Nikola Cornij
[why] Link rate in kHz is what is eventually required to calculate the link bandwidth, which makes kHz a more generic unit. This should also make forward-compatibility with new DP standards easier. [how] - Replace 'link rate DPCD code' with 'link rate in kHz' when used with

[PATCH v1 0/1] drm/dp_mst: Use kHz as link rate units when settig source max link caps at init

2021-05-03 Thread Nikola Cornij
A patch that uses kHz as the link rate units when passing max link rate to drm_dp_mst_topology_mgr_init() at initialization time. It should be a 2nd and final follow-up patch to '98025a62cb00 ("drm/dp_mst: Use Extended Base Receiver Capability DPCD space")'. Change history: v1: - Initial

[PATCH AUTOSEL 4.4 14/16] drm/amdgpu: fix NULL pointer dereference

2021-05-03 Thread Sasha Levin
From: Guchun Chen [ Upstream commit 3c3dc654333f6389803cdcaf03912e94173ae510 ] ttm->sg needs to be checked before accessing its child member. Call Trace: amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu] ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm] ttm_bo_release+0x17d/0x300 [ttm]

[PATCH AUTOSEL 4.9 22/24] drm/amdgpu: fix NULL pointer dereference

2021-05-03 Thread Sasha Levin
From: Guchun Chen [ Upstream commit 3c3dc654333f6389803cdcaf03912e94173ae510 ] ttm->sg needs to be checked before accessing its child member. Call Trace: amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu] ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm] ttm_bo_release+0x17d/0x300 [ttm]

[PATCH AUTOSEL 4.14 28/31] drm/amdgpu: fix NULL pointer dereference

2021-05-03 Thread Sasha Levin
From: Guchun Chen [ Upstream commit 3c3dc654333f6389803cdcaf03912e94173ae510 ] ttm->sg needs to be checked before accessing its child member. Call Trace: amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu] ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm] ttm_bo_release+0x17d/0x300 [ttm]

[PATCH AUTOSEL 4.19 31/35] drm/amdgpu: fix NULL pointer dereference

2021-05-03 Thread Sasha Levin
From: Guchun Chen [ Upstream commit 3c3dc654333f6389803cdcaf03912e94173ae510 ] ttm->sg needs to be checked before accessing its child member. Call Trace: amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu] ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm] ttm_bo_release+0x17d/0x300 [ttm]

[PATCH AUTOSEL 4.19 30/35] amdgpu: avoid incorrect %hu format string

2021-05-03 Thread Sasha Levin
From: Arnd Bergmann [ Upstream commit 7d98d416c2cc1c1f7d9508e887de4630e521d797 ] clang points out that the %hu format string does not match the type of the variables here: drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type

[PATCH AUTOSEL 4.19 14/35] drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f

2021-05-03 Thread Sasha Levin
From: shaoyunl [ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e6e ] This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl Reviewed-by: Hawking Zhang

[PATCH AUTOSEL 5.4 51/57] drm/amdgpu: fix NULL pointer dereference

2021-05-03 Thread Sasha Levin
From: Guchun Chen [ Upstream commit 3c3dc654333f6389803cdcaf03912e94173ae510 ] ttm->sg needs to be checked before accessing its child member. Call Trace: amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu] ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm] ttm_bo_release+0x17d/0x300 [ttm]

[PATCH AUTOSEL 5.4 50/57] amdgpu: avoid incorrect %hu format string

2021-05-03 Thread Sasha Levin
From: Arnd Bergmann [ Upstream commit 7d98d416c2cc1c1f7d9508e887de4630e521d797 ] clang points out that the %hu format string does not match the type of the variables here: drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type

[PATCH AUTOSEL 5.4 49/57] drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug

2021-05-03 Thread Sasha Levin
From: Qu Huang [ Upstream commit d73610211eec8aa027850982b1a48980aa1bc96e ] Here is the system crash log: [ 1272.884438] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1272.88] IP: [< (null)>] (null) [ 1272.884447] PGD 825b09067 PUD 8267c8067 PMD 0 [

[PATCH AUTOSEL 5.4 24/57] drm/amd/display: fix dml prefetch validation

2021-05-03 Thread Sasha Levin
From: Dmytro Laktyushkin [ Upstream commit 8ee0fea4baf90e43efe2275de208a7809f9985bc ] Incorrect variable used, missing initialization during validation. Tested-by: Daniel Wheeler Signed-off-by: Dmytro Laktyushkin Reviewed-by: Eric Bernstein Acked-by: Solomon Chiu Signed-off-by: Alex

[PATCH AUTOSEL 5.4 23/57] drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'

2021-05-03 Thread Sasha Levin
From: Anson Jacob [ Upstream commit 6a30a92997eee49554f72b462dce90abe54a496f ] [Why] dc_cursor_position do not initialise position.translate_by_source when crtc or plane->state->fb is NULL. UBSAN caught this error in dce110_set_cursor_position, as the value was garbage. [How] Initialise

[PATCH AUTOSEL 5.4 22/57] drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f

2021-05-03 Thread Sasha Levin
From: shaoyunl [ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e6e ] This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl Reviewed-by: Hawking Zhang

[PATCH AUTOSEL 5.4 21/57] drm/amdkfd: Fix UBSAN shift-out-of-bounds warning

2021-05-03 Thread Sasha Levin
From: Anson Jacob [ Upstream commit 50e2fc36e72d4ad672032ebf646cecb48656efe0 ] If get_num_sdma_queues or get_num_xgmi_sdma_queues is 0, we end up doing a shift operation where the number of bits shifted equals number of bits in the operand. This behaviour is undefined. Set num_sdma_queues or

[PATCH AUTOSEL 5.4 20/57] drm/amdgpu: mask the xgmi number of hops reported from psp to kfd

2021-05-03 Thread Sasha Levin
From: Jonathan Kim [ Upstream commit 4ac5617c4b7d0f0a8f879997f8ceaa14636d7554 ] The psp supplies the link type in the upper 2 bits of the psp xgmi node information num_hops field. With a new link type, Aldebaran has these bits set to a non-zero value (1 = xGMI3) so the KFD topology will report

[PATCH AUTOSEL 5.10 038/100] drm/amd/display: fix dml prefetch validation

2021-05-03 Thread Sasha Levin
From: Dmytro Laktyushkin [ Upstream commit 8ee0fea4baf90e43efe2275de208a7809f9985bc ] Incorrect variable used, missing initialization during validation. Tested-by: Daniel Wheeler Signed-off-by: Dmytro Laktyushkin Reviewed-by: Eric Bernstein Acked-by: Solomon Chiu Signed-off-by: Alex

[PATCH AUTOSEL 5.10 037/100] drm/amd/display: DCHUB underflow counter increasing in some scenarios

2021-05-03 Thread Sasha Levin
From: Aric Cyr [ Upstream commit 4710430a779e6077d81218ac768787545bff8c49 ] [Why] When unplugging a display, the underflow counter can be seen to increase because PSTATE switch is allowed even when some planes are not blanked. [How] Check that all planes are not active instead of all streams

[PATCH AUTOSEL 5.10 034/100] drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f

2021-05-03 Thread Sasha Levin
From: shaoyunl [ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e6e ] This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl Reviewed-by: Hawking Zhang

[PATCH AUTOSEL 5.10 033/100] drm/amdkfd: Fix UBSAN shift-out-of-bounds warning

2021-05-03 Thread Sasha Levin
From: Anson Jacob [ Upstream commit 50e2fc36e72d4ad672032ebf646cecb48656efe0 ] If get_num_sdma_queues or get_num_xgmi_sdma_queues is 0, we end up doing a shift operation where the number of bits shifted equals number of bits in the operand. This behaviour is undefined. Set num_sdma_queues or

[PATCH AUTOSEL 5.4 04/57] drm/amd/display: Don't optimize bandwidth before disabling planes

2021-05-03 Thread Sasha Levin
From: Aric Cyr [ Upstream commit 6ad98e8aeb0106f453bb154933e8355849244990 ] [Why] There is a window of time where we optimize bandwidth due to no streams enabled will enable PSTATE changing but HUBPs are not disabled yet. This results in underflow counter increasing in some hotplug scenarios.

[PATCH AUTOSEL 5.4 03/57] drm/amd/display: Check for DSC support instead of ASIC revision

2021-05-03 Thread Sasha Levin
From: Eryk Brol [ Upstream commit 349a19b2f1b01e713268c7de9944ad669ccdf369 ] [why] This check for ASIC revision is no longer useful and causes lightup issues after a topology change in MST DSC scenario. In this case, DSC configs should be recalculated for the new topology. This check prevented

[PATCH AUTOSEL 5.10 032/100] drm/amdgpu: mask the xgmi number of hops reported from psp to kfd

2021-05-03 Thread Sasha Levin
From: Jonathan Kim [ Upstream commit 4ac5617c4b7d0f0a8f879997f8ceaa14636d7554 ] The psp supplies the link type in the upper 2 bits of the psp xgmi node information num_hops field. With a new link type, Aldebaran has these bits set to a non-zero value (1 = xGMI3) so the KFD topology will report

[PATCH AUTOSEL 5.10 036/100] drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'

2021-05-03 Thread Sasha Levin
From: Anson Jacob [ Upstream commit 6a30a92997eee49554f72b462dce90abe54a496f ] [Why] dc_cursor_position do not initialise position.translate_by_source when crtc or plane->state->fb is NULL. UBSAN caught this error in dce110_set_cursor_position, as the value was garbage. [How] Initialise

[PATCH AUTOSEL 5.10 035/100] drm/amd/pm: fix workload mismatch on vega10

2021-05-03 Thread Sasha Levin
From: Kenneth Feng [ Upstream commit 0979d43259e13846d86ba17e451e17fec185d240 ] Workload number mapped to the correct one. This issue is only on vega10. Signed-off-by: Kenneth Feng Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin ---

[PATCH AUTOSEL 5.10 013/100] drm/amdgpu: Fix some unload driver issues

2021-05-03 Thread Sasha Levin
From: Emily Deng [ Upstream commit bb0cd09be45ea457f25fdcbcb3d6cf2230f26c46 ] When unloading driver after killing some applications, it will hit sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So to avoid the job submit after fence driver fini, call

[PATCH AUTOSEL 5.10 010/100] drm/amd/display/dc/dce/dce_aux: Remove duplicate line causing 'field overwritten' issue

2021-05-03 Thread Sasha Levin
From: Lee Jones [ Upstream commit 89adc10178fd6cb68c8ef1905d269070a4d3bd64 ] Fixes the following W=1 kernel build warning(s): In file included from drivers/gpu/drm/amd/amdgpu/../display/dc/dce112/dce112_resource.c:59:

[PATCH AUTOSEL 5.10 009/100] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-05-03 Thread Sasha Levin
From: Xiaogang Chen [ Upstream commit b6f91fc183f758461b9462cc93e673adbbf95c2d ] amdgpu DM handles INTERRUPT_LOW_IRQ_CONTEXT interrupt(hpd, hpd_rx) by using work queue and uses single work_struct. If new interrupt is recevied before the previous handler finished, new interrupts(same type) will

[PATCH AUTOSEL 5.10 008/100] drm/amd/display: Don't optimize bandwidth before disabling planes

2021-05-03 Thread Sasha Levin
From: Aric Cyr [ Upstream commit 6ad98e8aeb0106f453bb154933e8355849244990 ] [Why] There is a window of time where we optimize bandwidth due to no streams enabled will enable PSTATE changing but HUBPs are not disabled yet. This results in underflow counter increasing in some hotplug scenarios.

[PATCH AUTOSEL 5.10 007/100] drm/amd/display: Check for DSC support instead of ASIC revision

2021-05-03 Thread Sasha Levin
From: Eryk Brol [ Upstream commit 349a19b2f1b01e713268c7de9944ad669ccdf369 ] [why] This check for ASIC revision is no longer useful and causes lightup issues after a topology change in MST DSC scenario. In this case, DSC configs should be recalculated for the new topology. This check prevented

[PATCH AUTOSEL 5.10 005/100] drm/amd/display: changing sr exit latency

2021-05-03 Thread Sasha Levin
From: Martin Leung [ Upstream commit efe213e5a57e0cd92fa4f328dc1963d330549982 ] [Why] Hardware team remeasured, need to update timings to increase latency slightly and avoid intermittent underflows. [How] sr exit latency update. Signed-off-by: Martin Leung Reviewed-by: Alvin Lee Acked-by:

[PATCH AUTOSEL 5.11 048/115] drm/amd/display: fix dml prefetch validation

2021-05-03 Thread Sasha Levin
From: Dmytro Laktyushkin [ Upstream commit 8ee0fea4baf90e43efe2275de208a7809f9985bc ] Incorrect variable used, missing initialization during validation. Tested-by: Daniel Wheeler Signed-off-by: Dmytro Laktyushkin Reviewed-by: Eric Bernstein Acked-by: Solomon Chiu Signed-off-by: Alex

[PATCH AUTOSEL 5.11 049/115] drm/amdgpu: Fix memory leak

2021-05-03 Thread Sasha Levin
From: xinhui pan [ Upstream commit 79fcd446e7e182c52c2c808c76f8de3eb6714349 ] drm_gem_object_put() should be paired with drm_gem_object_lookup(). All gem objs are saved in fb->base.obj[]. Need put the old first before assign a new obj. Trigger VRAM leak by running command below $ service gdm

[PATCH AUTOSEL 5.11 047/115] drm/amd/display: DCHUB underflow counter increasing in some scenarios

2021-05-03 Thread Sasha Levin
From: Aric Cyr [ Upstream commit 4710430a779e6077d81218ac768787545bff8c49 ] [Why] When unplugging a display, the underflow counter can be seen to increase because PSTATE switch is allowed even when some planes are not blanked. [How] Check that all planes are not active instead of all streams

[PATCH AUTOSEL 5.11 046/115] drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'

2021-05-03 Thread Sasha Levin
From: Anson Jacob [ Upstream commit 6a30a92997eee49554f72b462dce90abe54a496f ] [Why] dc_cursor_position do not initialise position.translate_by_source when crtc or plane->state->fb is NULL. UBSAN caught this error in dce110_set_cursor_position, as the value was garbage. [How] Initialise

[PATCH AUTOSEL 5.11 045/115] drm/amd/pm: fix workload mismatch on vega10

2021-05-03 Thread Sasha Levin
From: Kenneth Feng [ Upstream commit 0979d43259e13846d86ba17e451e17fec185d240 ] Workload number mapped to the correct one. This issue is only on vega10. Signed-off-by: Kenneth Feng Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin ---

[PATCH AUTOSEL 5.11 044/115] drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f

2021-05-03 Thread Sasha Levin
From: shaoyunl [ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e6e ] This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl Reviewed-by: Hawking Zhang

[PATCH AUTOSEL 5.11 018/115] drm/amdgpu: Fix some unload driver issues

2021-05-03 Thread Sasha Levin
From: Emily Deng [ Upstream commit bb0cd09be45ea457f25fdcbcb3d6cf2230f26c46 ] When unloading driver after killing some applications, it will hit sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So to avoid the job submit after fence driver fini, call

[PATCH AUTOSEL 5.11 012/115] drm/amd/display/dc/dce/dce_aux: Remove duplicate line causing 'field overwritten' issue

2021-05-03 Thread Sasha Levin
From: Lee Jones [ Upstream commit 89adc10178fd6cb68c8ef1905d269070a4d3bd64 ] Fixes the following W=1 kernel build warning(s): In file included from drivers/gpu/drm/amd/amdgpu/../display/dc/dce112/dce112_resource.c:59:

[PATCH AUTOSEL 5.11 011/115] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-05-03 Thread Sasha Levin
From: Xiaogang Chen [ Upstream commit b6f91fc183f758461b9462cc93e673adbbf95c2d ] amdgpu DM handles INTERRUPT_LOW_IRQ_CONTEXT interrupt(hpd, hpd_rx) by using work queue and uses single work_struct. If new interrupt is recevied before the previous handler finished, new interrupts(same type) will

[PATCH AUTOSEL 5.11 010/115] drm/amd/display: Return invalid state if GPINT times out

2021-05-03 Thread Sasha Levin
From: Wyatt Wood [ Upstream commit 8039bc7130ef4206a58e4dc288621bc97eba08eb ] [Why] GPINT timeout is causing PSR_STATE_0 to be returned when it shouldn't. We must guarantee that PSR is fully disabled before doing hw programming on driver-side. [How] Return invalid state if GPINT command times

[PATCH AUTOSEL 5.11 009/115] drm/amd/display: Don't optimize bandwidth before disabling planes

2021-05-03 Thread Sasha Levin
From: Aric Cyr [ Upstream commit 6ad98e8aeb0106f453bb154933e8355849244990 ] [Why] There is a window of time where we optimize bandwidth due to no streams enabled will enable PSTATE changing but HUBPs are not disabled yet. This results in underflow counter increasing in some hotplug scenarios.

[PATCH AUTOSEL 5.11 008/115] drm/amd/display: Check for DSC support instead of ASIC revision

2021-05-03 Thread Sasha Levin
From: Eryk Brol [ Upstream commit 349a19b2f1b01e713268c7de9944ad669ccdf369 ] [why] This check for ASIC revision is no longer useful and causes lightup issues after a topology change in MST DSC scenario. In this case, DSC configs should be recalculated for the new topology. This check prevented

[PATCH AUTOSEL 5.11 006/115] drm/amd/display: Fix MPC OGAM power on/off sequence

2021-05-03 Thread Sasha Levin
From: Nicholas Kazlauskas [ Upstream commit 737b2b536a30a467c405d75f2287e17828838a13 ] [Why] Color corruption can occur on bootup into a login manager that applies a non-linear gamma LUT because the LUT may not actually be powered on before writing. It's cleared on the next full pipe

[PATCH AUTOSEL 5.11 005/115] drm/amd/display: changing sr exit latency

2021-05-03 Thread Sasha Levin
From: Martin Leung [ Upstream commit efe213e5a57e0cd92fa4f328dc1963d330549982 ] [Why] Hardware team remeasured, need to update timings to increase latency slightly and avoid intermittent underflows. [How] sr exit latency update. Signed-off-by: Martin Leung Reviewed-by: Alvin Lee Acked-by:

[PATCH AUTOSEL 5.12 054/134] drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'

2021-05-03 Thread Sasha Levin
From: Anson Jacob [ Upstream commit 6a30a92997eee49554f72b462dce90abe54a496f ] [Why] dc_cursor_position do not initialise position.translate_by_source when crtc or plane->state->fb is NULL. UBSAN caught this error in dce110_set_cursor_position, as the value was garbage. [How] Initialise

[PATCH AUTOSEL 5.12 052/134] drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f

2021-05-03 Thread Sasha Levin
From: shaoyunl [ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e6e ] This recent change introduce SDMA interrupt info printing with irq->process function. These functions do not require a set function to enable/disable the irq Signed-off-by: shaoyunl Reviewed-by: Hawking Zhang

[PATCH AUTOSEL 5.12 051/134] drm/amd/display: Align cursor cache address to 2KB

2021-05-03 Thread Sasha Levin
From: Joshua Aberback [ Upstream commit 554ba183b135ef09250b61a202d88512b5bbd03a ] [Why] The registers for the address of the cursor are aligned to 2KB, so all cursor surfaces also need to be aligned to 2KB. Currently, the provided cursor cache surface is not aligned, so we need a workaround

[PATCH AUTOSEL 5.12 047/134] drm/amdgpu: enable retry fault wptr overflow

2021-05-03 Thread Sasha Levin
From: Philip Yang [ Upstream commit b672cb1eee59efe6ca5bb2a2ce90060a22860558 ] If xnack is on, VM retry fault interrupt send to IH ring1, and ring1 will be full quickly. IH cannot receive other interrupts, this causes deadlock if migrating buffer using sdma and waiting for sdma done while

[PATCH AUTOSEL 5.12 058/134] drm/amdgpu: Fix memory leak

2021-05-03 Thread Sasha Levin
From: xinhui pan [ Upstream commit 79fcd446e7e182c52c2c808c76f8de3eb6714349 ] drm_gem_object_put() should be paired with drm_gem_object_lookup(). All gem objs are saved in fb->base.obj[]. Need put the old first before assign a new obj. Trigger VRAM leak by running command below $ service gdm

[PATCH AUTOSEL 5.12 057/134] drm/amd/display: Fix potential memory leak

2021-05-03 Thread Sasha Levin
From: Qingqing Zhuo [ Upstream commit 51ba691206e35464fd7ec33dd519d141c80b5dff ] [Why] vblank_workqueue is never released. [How] Free it upon dm finish. Tested-by: Daniel Wheeler Signed-off-by: Qingqing Zhuo Reviewed-by: Nicholas Kazlauskas Acked-by: Solomon Chiu Signed-off-by: Alex

[PATCH AUTOSEL 5.12 056/134] drm/amd/display: fix dml prefetch validation

2021-05-03 Thread Sasha Levin
From: Dmytro Laktyushkin [ Upstream commit 8ee0fea4baf90e43efe2275de208a7809f9985bc ] Incorrect variable used, missing initialization during validation. Tested-by: Daniel Wheeler Signed-off-by: Dmytro Laktyushkin Reviewed-by: Eric Bernstein Acked-by: Solomon Chiu Signed-off-by: Alex

[PATCH AUTOSEL 5.12 055/134] drm/amd/display: DCHUB underflow counter increasing in some scenarios

2021-05-03 Thread Sasha Levin
From: Aric Cyr [ Upstream commit 4710430a779e6077d81218ac768787545bff8c49 ] [Why] When unplugging a display, the underflow counter can be seen to increase because PSTATE switch is allowed even when some planes are not blanked. [How] Check that all planes are not active instead of all streams

[PATCH AUTOSEL 5.12 050/134] drm/amdkfd: Fix UBSAN shift-out-of-bounds warning

2021-05-03 Thread Sasha Levin
From: Anson Jacob [ Upstream commit 50e2fc36e72d4ad672032ebf646cecb48656efe0 ] If get_num_sdma_queues or get_num_xgmi_sdma_queues is 0, we end up doing a shift operation where the number of bits shifted equals number of bits in the operand. This behaviour is undefined. Set num_sdma_queues or

[PATCH AUTOSEL 5.12 049/134] drm/amdgpu: mask the xgmi number of hops reported from psp to kfd

2021-05-03 Thread Sasha Levin
From: Jonathan Kim [ Upstream commit 4ac5617c4b7d0f0a8f879997f8ceaa14636d7554 ] The psp supplies the link type in the upper 2 bits of the psp xgmi node information num_hops field. With a new link type, Aldebaran has these bits set to a non-zero value (1 = xGMI3) so the KFD topology will report

[PATCH AUTOSEL 5.12 053/134] drm/amd/pm: fix workload mismatch on vega10

2021-05-03 Thread Sasha Levin
From: Kenneth Feng [ Upstream commit 0979d43259e13846d86ba17e451e17fec185d240 ] Workload number mapped to the correct one. This issue is only on vega10. Signed-off-by: Kenneth Feng Reviewed-by: Kevin Wang Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin ---

[PATCH AUTOSEL 5.12 048/134] drm/amdgpu: enable 48-bit IH timestamp counter

2021-05-03 Thread Sasha Levin
From: Alex Sierra [ Upstream commit 9a9c59a8f4f4478d5951eb0bded1d17b936aad6e ] By default this timestamp is 32 bit counter. It gets overflowed in around 10 minutes. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin ---

[PATCH AUTOSEL 5.12 020/134] drm/amdgpu: Fix some unload driver issues

2021-05-03 Thread Sasha Levin
From: Emily Deng [ Upstream commit bb0cd09be45ea457f25fdcbcb3d6cf2230f26c46 ] When unloading driver after killing some applications, it will hit sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So to avoid the job submit after fence driver fini, call

[PATCH AUTOSEL 5.12 019/134] drm/amd/pm/swsmu: clean up user profile function

2021-05-03 Thread Sasha Levin
From: Arunpravin [ Upstream commit d8cce9306801cfbf709055677f7896905094ff95 ] Remove unnecessary comments, enable restore mode using '|=' operator, fixes the alignment to improve the code readability. v2: Move all restoration flag check to bitwise '&' operator Signed-off-by: Arunpravin

[PATCH AUTOSEL 5.12 013/134] drm/amd/display/dc/dce/dce_aux: Remove duplicate line causing 'field overwritten' issue

2021-05-03 Thread Sasha Levin
From: Lee Jones [ Upstream commit 3e3527f5b765c6f479ba55e5a570ee9538589a74 ] Fixes the following W=1 kernel build warning(s): In file included from drivers/gpu/drm/amd/amdgpu/../display/dc/dce112/dce112_resource.c:59:

[PATCH AUTOSEL 5.12 012/134] drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work

2021-05-03 Thread Sasha Levin
From: Xiaogang Chen [ Upstream commit b6f91fc183f758461b9462cc93e673adbbf95c2d ] amdgpu DM handles INTERRUPT_LOW_IRQ_CONTEXT interrupt(hpd, hpd_rx) by using work queue and uses single work_struct. If new interrupt is recevied before the previous handler finished, new interrupts(same type) will

[PATCH AUTOSEL 5.12 011/134] drm/amd/display: Return invalid state if GPINT times out

2021-05-03 Thread Sasha Levin
From: Wyatt Wood [ Upstream commit 8039bc7130ef4206a58e4dc288621bc97eba08eb ] [Why] GPINT timeout is causing PSR_STATE_0 to be returned when it shouldn't. We must guarantee that PSR is fully disabled before doing hw programming on driver-side. [How] Return invalid state if GPINT command times

[PATCH AUTOSEL 5.12 010/134] drm/amd/display: Don't optimize bandwidth before disabling planes

2021-05-03 Thread Sasha Levin
From: Aric Cyr [ Upstream commit 6ad98e8aeb0106f453bb154933e8355849244990 ] [Why] There is a window of time where we optimize bandwidth due to no streams enabled will enable PSTATE changing but HUBPs are not disabled yet. This results in underflow counter increasing in some hotplug scenarios.

[PATCH AUTOSEL 5.12 009/134] drm/amd/display: Check for DSC support instead of ASIC revision

2021-05-03 Thread Sasha Levin
From: Eryk Brol [ Upstream commit 349a19b2f1b01e713268c7de9944ad669ccdf369 ] [why] This check for ASIC revision is no longer useful and causes lightup issues after a topology change in MST DSC scenario. In this case, DSC configs should be recalculated for the new topology. This check prevented

[PATCH AUTOSEL 5.12 007/134] drm/amd/pm: do not issue message while write "r" into pp_od_clk_voltage

2021-05-03 Thread Sasha Levin
From: Huang Rui [ Upstream commit ca1203d7d7295c49e5707d7def457bdc524a8edb ] We should commit the value after restore them back to default as well. $ echo "r" > pp_od_clk_voltage $ echo "c" > pp_od_clk_voltage Signed-off-by: Huang Rui Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

[PATCH AUTOSEL 5.12 005/134] drm/amd/display: changing sr exit latency

2021-05-03 Thread Sasha Levin
From: Martin Leung [ Upstream commit efe213e5a57e0cd92fa4f328dc1963d330549982 ] [Why] Hardware team remeasured, need to update timings to increase latency slightly and avoid intermittent underflows. [How] sr exit latency update. Signed-off-by: Martin Leung Reviewed-by: Alvin Lee Acked-by:

[PATCH AUTOSEL 5.12 006/134] drm/amd/display: Fix MPC OGAM power on/off sequence

2021-05-03 Thread Sasha Levin
From: Nicholas Kazlauskas [ Upstream commit 737b2b536a30a467c405d75f2287e17828838a13 ] [Why] Color corruption can occur on bootup into a login manager that applies a non-linear gamma LUT because the LUT may not actually be powered on before writing. It's cleared on the next full pipe

Re: A hotplug bug in AMDGPU

2021-05-03 Thread Alex Deucher
On Mon, May 3, 2021 at 11:40 AM Mikulas Patocka wrote: > > Hi > > There's a bug with monitor hotplug starting with the kernel 5.7. > > I have Radeon RX 570. If I boot the system with the monitor unplugged and > then plug the monitor via DVI, the kernel 5.6 and below will properly > initialized

[PATCH] drm/amdkfd: fix no atomics settings in the kfd topology

2021-05-03 Thread Jonathan Kim
To account for various PCIe and xGMI setups, check the no atomics settings for a device in relation to every direct peer. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 55 ++- 1 file changed, 34 insertions(+), 21 deletions(-) diff --git

A hotplug bug in AMDGPU

2021-05-03 Thread Mikulas Patocka
Hi There's a bug with monitor hotplug starting with the kernel 5.7. I have Radeon RX 570. If I boot the system with the monitor unplugged and then plug the monitor via DVI, the kernel 5.6 and below will properly initialized graphics; the kernels 5.7+ will not initialize it - and the monitor

[PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Eric Huang
In NPS4 BIOS we need to find the closest numa node when creating topology io link between cpu and gpu, if PCI driver doesn't set it. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 95 ++- 1 file changed, 93 insertions(+), 2 deletions(-) diff --git

RE: [PATCH v3 2/2] drm/amd/pm: Add debugfs node to read private buffer

2021-05-03 Thread Zhang, Hawking
[AMD Public Use] Series is Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Lazar, Lijo Sent: Monday, May 3, 2021 14:12 To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Wang, Kevin(Yang) ; Koenig, Christian ; Zhang, Hawking Subject: RE:

Re: [PATCH 1/2] MAINTAINERS: Fix TTM tree

2021-05-03 Thread Christian König
Am 03.05.21 um 15:47 schrieb Alex Deucher: TTM uses drm-misc now. Update the tree. Cc: David Ward Signed-off-by: Alex Deucher Reviewed-by: Christian König for the series. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS

Re: [PATCH] drm/amdkfd: add ACPI SRAT parsing for topology

2021-05-03 Thread Eric Huang
Thanks Felix for your review. I will send another patch. Eric On 2021-04-30 7:42 p.m., Felix Kuehling wrote: Am 2021-04-28 um 11:11 a.m. schrieb Eric Huang: In NPS4 BIOS we need to find the closest numa node when creating topology io link between cpu and gpu, if PCI driver doesn't set it.

[PATCH 2/2] MAINTAINERS: fix a few more amdgpu tree links

2021-05-03 Thread Alex Deucher
Switch to gitlab. Fixes: 101c2fae5108d7 ("MAINTAINERS: update radeon/amdgpu/amdkfd git trees") Cc: David Ward Signed-off-by: Alex Deucher --- MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 27ee2a659867..3ea29032e5dc 100644

[PATCH 1/2] MAINTAINERS: Fix TTM tree

2021-05-03 Thread Alex Deucher
TTM uses drm-misc now. Update the tree. Cc: David Ward Signed-off-by: Alex Deucher --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index f03a198cbc52..27ee2a659867 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6134,7 +6134,7 @@ M:

Re: [Heads up to maintainers] Re: [PATCH v8 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-05-03 Thread Jani Nikula
On Mon, 03 May 2021, Jani Nikula wrote: > On Fri, 30 Apr 2021, Jani Nikula wrote: >> On Thu, 29 Apr 2021, Lyude Paul wrote: >>> JFYI Jani and Ben: I will be pushing this patch to drm-misc-next sometime >>> today if there's no objections >> >> Thanks for the heads-up, I think this breaks i915.

Re: [Heads up to maintainers] Re: [PATCH v8 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-05-03 Thread Jani Nikula
On Fri, 30 Apr 2021, Jani Nikula wrote: > On Thu, 29 Apr 2021, Lyude Paul wrote: >> JFYI Jani and Ben: I will be pushing this patch to drm-misc-next sometime >> today if there's no objections > > Thanks for the heads-up, I think this breaks i915. See my review > comments elsewhere in the thread.

RE: [PATCH v2 1/1] drm/i915: Use the correct max source link rate for MST

2021-05-03 Thread Jani Nikula
On Fri, 30 Apr 2021, "Cornij, Nikola" wrote: > I'll fix the dpcd part to use kHz on Monday I'd appreciate that, thanks. I think it is the better interface. > My apologies as well, not only for coming up with the wrong patch in > first place, but also for missing to CC all the maintainers. The

[PATCH 0/2] drm/radeon: Fix off-by-one power_state index heap overwrite

2021-05-03 Thread Kees Cook
Hi, This is an attempt at fixing a bug[1] uncovered by the relocation of the slab freelist pointer offset, as well as some related clean-ups. I don't have hardware to do runtime testing, but it builds. ;) -Kees [1] https://bugzilla.kernel.org/show_bug.cgi?id=211537 Kees Cook (2):

[PATCH 1/2] drm/radeon: Fix off-by-one power_state index heap overwrite

2021-05-03 Thread Kees Cook
An out of bounds write happens when setting the default power state. KASAN sees this as: [drm] radeon: 512M of GTT memory ready. [drm] GART: num cpu pages 131072, num gpu pages 131072 == BUG: KASAN: slab-out-of-bounds in

Re: [RFC] CRIU support for ROCm

2021-05-03 Thread Adrian Reber
On Fri, Apr 30, 2021 at 09:57:45PM -0400, Felix Kuehling wrote: > We have been working on a prototype supporting CRIU (Checkpoint/Restore > In Userspace) for accelerated compute applications running on AMD GPUs > using ROCm (Radeon Open Compute Platform). We're happy to finally share > this work

[PATCH 2/2] drm/radeon: Avoid power table parsing memory leaks

2021-05-03 Thread Kees Cook
Avoid leaving a hanging pre-allocated clock_info if last mode is invalid, and avoid heap corruption if no valid modes are found. Fixes: 6991b8f2a319 ("drm/radeon/kms: fix segfault in pm rework") Signed-off-by: Kees Cook --- drivers/gpu/drm/radeon/radeon_atombios.c | 20 +++- 1

  1   2   >