Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Christian König
Am 05.01.22 um 08:34 schrieb JingWen Chen: On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: On 2022-01-04 6:36 a.m., Christian König wrote: Am 04.01.22 um 11:49 schrieb Liu, Monk: [AMD Official Use Only] See the FLR request from the hypervisor is just another source of signaling the need for

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen
On 2022/1/5 上午12:56, Andrey Grodzovsky wrote: > > On 2022-01-04 6:36 a.m., Christian König wrote: >> Am 04.01.22 um 11:49 schrieb Liu, Monk: >>> [AMD Official Use Only] >>> > See the FLR request from the hypervisor is just another source of > signaling the need for a reset, similar to

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen
On 2022/1/4 下午7:36, Christian König wrote: > Am 04.01.22 um 11:49 schrieb Liu, Monk: >> [AMD Official Use Only] >> See the FLR request from the hypervisor is just another source of signaling the need for a reset, similar to each job timeout on each queue. Otherwise you have a

[PATCH v2] drm/amdgpu: Unmap MMIO mappings when device is not unplugged

2022-01-04 Thread Leslie Shi
Patch: 3efb17ae7e92 ("drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure") makes call to amdgpu_device_unmap_mmio() conditioned on device unplugged. This patch unmaps MMIO mappings even when device is not unplugged. Signed-off-by:

RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Liu, Shaoyun
[AMD Official Use Only] I see, I didn't notice you already have this implemented . so the flr_work routine itself is synced now, in this case , I agree it should be safe to remove the in_gpu_reset and reset_semm in the flr_work. Regards Shaoyun.liu -Original Message- From:

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Andrey Grodzovsky
On 2022-01-04 12:13 p.m., Liu, Shaoyun wrote: [AMD Official Use Only] I mostly agree with the sequences Christian described . Just one thing might need to discuss here. For FLR notified from host, in new sequenceas described , driver need to reply the READY_TO_RESET in the workitem

Re: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

2022-01-04 Thread Alex Deucher
On Tue, Jan 4, 2022 at 12:26 PM Limonciello, Mario wrote: > > [AMD Official Use Only] > > > > Maybe it was used more widely previously? > > The only place that I found it in use was amdgpu_device_evict_resources. > > > > From: Deucher, Alexander > Sent: Tuesday, January 4, 2022 11:24 > To:

Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-04 Thread Felix Kuehling
[+Adrian] Am 2021-12-23 um 2:05 a.m. schrieb Christian König: > Am 22.12.21 um 21:53 schrieb Daniel Vetter: >> On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote: >> >> [SNIP] >> Still sounds funky. I think minimally we should have an ack from CRIU >> developers that this is

RE: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

2022-01-04 Thread Limonciello, Mario
[AMD Official Use Only] Maybe it was used more widely previously? The only place that I found it in use was amdgpu_device_evict_resources. From: Deucher, Alexander Sent: Tuesday, January 4, 2022 11:24 To: Limonciello, Mario ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 2/2] drm/amdgpu:

Re: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

2022-01-04 Thread Deucher, Alexander
[AMD Official Use Only] I don't think this will work properly. The in_s3 flag was mainly for runtime pm vs system suspend. I'm not sure if in_s0ix is properly handled everywhere we check in_s3. Alex From: amd-gfx on behalf of Mario Limonciello Sent:

Re: [PATCH 1/2] drm/amdgpu: explicitly check for s0ix when evicting resources

2022-01-04 Thread Deucher, Alexander
[Public] Reviewed-by: Alex Deucher From: amd-gfx on behalf of Mario Limonciello Sent: Monday, January 3, 2022 10:23 AM To: amd-gfx@lists.freedesktop.org Cc: Limonciello, Mario Subject: [PATCH 1/2] drm/amdgpu: explicitly check for s0ix when evicting

RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Liu, Shaoyun
[AMD Official Use Only] I mostly agree with the sequences Christian described . Just one thing might need to discuss here. For FLR notified from host, in new sequenceas described , driver need to reply the READY_TO_RESET in the workitem from a reset work queue which means inside

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Andrey Grodzovsky
On 2022-01-04 6:36 a.m., Christian König wrote: Am 04.01.22 um 11:49 schrieb Liu, Monk: [AMD Official Use Only] See the FLR request from the hypervisor is just another source of signaling the need for a reset, similar to each job timeout on each queue. Otherwise you have a race condition

Re: [PATCH v2 08/11] lib: add support for device coherent type in test_hmm

2022-01-04 Thread Liam Howlett
* Alex Sierra [211206 14:00]: > Device Coherent type uses device memory that is coherently accesible by > the CPU. This could be shown as SP (special purpose) memory range > at the BIOS-e820 memory enumeration. If no SP memory is supported in > system, this could be faked by setting

Re: [PATCH] drm/amd/display: explicitly update clocks when DC is set to D3

2022-01-04 Thread Harry Wentland
On 2022-01-04 10:33, Mario Limonciello wrote: > The WA from commit 5965280abd30 ("drm/amd/display: Apply w/a for > hard hang on HPD") causes a regression in s0ix where the system will > fail to resume properly. This may be because an HPD was active the last > time clocks were updated but

Re: [PATCH] drm/amdgpu: Delay unmapping MMIO VRAM to amdgpu_ttm_fini() in GPU initialization failure

2022-01-04 Thread Andrey Grodzovsky
On 2022-01-03 9:30 p.m., Leslie Shi wrote: If the driver loads failed during hw_init(), delay unmapping MMIO VRAM to amdgpu_ttm_fini(). Its prevents accessing invalid memory address in vcn_v3_0_sw_fini(). Signed-off-by: Leslie Shi --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16

[PATCH] drm/amd/display: explicitly update clocks when DC is set to D3

2022-01-04 Thread Mario Limonciello
The WA from commit 5965280abd30 ("drm/amd/display: Apply w/a for hard hang on HPD") causes a regression in s0ix where the system will fail to resume properly. This may be because an HPD was active the last time clocks were updated but clocks didn't get updated again during s0ix. So add an extra

Re: [PATCH v6 4/6] drm: implement a method to free unused pages

2022-01-04 Thread Matthew Auld
On 26/12/2021 22:24, Arunpravin wrote: On contiguous allocation, we round up the size to the *next* power of 2, implement a function to free the unused pages after the newly allocate block. v2(Matthew Auld): - replace function name 'drm_buddy_free_unused_pages' with drm_buddy_block_trim

Re: [PATCH v6 2/6] drm: improve drm_buddy_alloc function

2022-01-04 Thread Matthew Auld
On 26/12/2021 22:24, Arunpravin wrote: - Make drm_buddy_alloc a single function to handle range allocation and non-range allocation demands - Implemented a new function alloc_range() which allocates the requested power-of-two block comply with range limitations - Moved order computation

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Christian König
Am 04.01.22 um 11:49 schrieb Liu, Monk: [AMD Official Use Only] See the FLR request from the hypervisor is just another source of signaling the need for a reset, similar to each job timeout on each queue. Otherwise you have a race condition between the hypervisor and the scheduler. No it's

RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Liu, Monk
[AMD Official Use Only] >> See the FLR request from the hypervisor is just another source of signaling >> the need for a reset, similar to each job timeout on each queue. Otherwise >> you have a race condition between the hypervisor and the scheduler. No it's not, FLR from hypervisor is just to

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Christian König
Hi Jingwen, well what I mean is that we need to adjust the implementation in amdgpu to actually match the requirements. Could be that the reset sequence is questionable in general, but I doubt so at least for now. See the FLR request from the hypervisor is just another source of signaling

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen
Hi Christian, I'm not sure what do you mean by "we need to change SRIOV not the driver". Do you mean we should change the reset sequence in SRIOV? This will be a huge change for our SRIOV solution. >From my point of view, we can directly use amdgpu_device_lock_adev and amdgpu_device_unlock_adev

RE: [PATCH V3 01/12] drm/amdgpu: Unify ras block interface for each ras block

2022-01-04 Thread Zhou1, Tao
[AMD Official Use Only] The series is: Reviewed-by: Tao Zhou Please make sure basic RAS tests are successful before submit the series. > -Original Message- > From: Chai, Thomas > Sent: Wednesday, December 29, 2021 2:32 PM > To: amd-gfx@lists.freedesktop.org > Cc: Chai, Thomas ;