date:20210819

[PATCH V2] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend

2021-08-19 Thread Evan Quan

Perform proper cleanups on UVD/VCE suspend: powergate enablement, clockgating enablement and dpm disablement. This can fix some hangs observed on suspending when UVD/VCE still using(e.g. issue "pm-suspend" when video is still playing). Change-Id: I36f39d9731e0a9638b52d5d92558b0ee9c23a9ed

RE: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend

2021-08-19 Thread Quan, Evan

[AMD Official Use Only] From: Lazar, Lijo Sent: Thursday, August 19, 2021 10:36 PM To: Zhu, James ; Quan, Evan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Chen, Guchun ; Pan, Xinhui Subject: RE: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend [AMD

[PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly

2021-08-19 Thread Joseph Greathouse

Give every process at most one queue from each SDMA engine. Previously, we allocated all SDMA engines and queues on a first- come-first-serve basis. This meant that it was possible for two processes racing on their allocation requests to each end up with two queues on the same SDMA engine. That

[PATCH 2/3] drm/amdgpu: Use SDMA1 for buffer movement on Aldebaran

2021-08-19 Thread Joseph Greathouse

Aldebaran should not use SDMA0 for buffer funcs such as page migration. Instead, we move over to SDMA1 for these features. Leave SDMA0 in charge for all other existing chips to avoid any possibility of regressions. Signed-off-by: Joseph Greathouse --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8

[PATCH 3/3] drm/amdkfd: Spread out XGMI SDMA allocations

2021-08-19 Thread Joseph Greathouse

Avoid hotspotting of allocations of SDMA engines from the XGMI pool by making each process attempt to allocate engines starting from the engine after the last one that was allocated. Signed-off-by: Joseph Greathouse --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +++-

RE: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend

2021-08-19 Thread Quan, Evan

[AMD Official Use Only] From: Zhu, James Sent: Thursday, August 19, 2021 10:19 PM To: Quan, Evan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Chen, Guchun ; Lazar, Lijo ; Pan, Xinhui Subject: Re: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend [AMD

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-19 Thread Daniel Vetter

On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote: > > On 2021-08-18 10:42 a.m., Daniel Vetter wrote: > > On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote: > > > On 2021-08-18 10:32 a.m., Daniel Vetter wrote: > > > > On Wed, Aug 18, 2021 at 10:26:25AM -0400,

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-19 Thread Daniel Vetter

On Thu, Aug 19, 2021 at 03:01:26AM +, Liu, Monk wrote: > [AMD Official Use Only] > > Hi Andrey and Daniel > > We worked for a really long time on this new feature to AMD that finally > can pick up the bad job from all timedout ones, and the change in > scheduler (get/put fence in

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-19 Thread Mike Lothian

Hi Do I need to open a new bug report for this? Cheers Mike On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky wrote: > > On 2021-08-02 1:16 a.m., Guchun Chen wrote: > > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop > > scheduler in s3 test, otherwise, fence related

RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-19 Thread Liu, Monk

[AMD Official Use Only] Hi Daniel >> Why can't we stop the scheduler thread first, so that there's guaranteed no >> race? I've recently had a lot of discussions with panfrost folks about their >> reset that spawns across engines, and without stopping the scheduler thread >> first before you

[PATCH 1/2] drm/amdkfd: check access permisson to restore retry fault

2021-08-19 Thread Philip Yang

Check range access permission to restore GPU retry fault, if GPU retry fault on address which belongs to VMA, and VMA has no read or write permission requested by GPU, failed to restore the address. The vm fault event will pass back to user space. Signed-off-by: Philip Yang ---

[PATCH 2/2] drm/amdkfd: map SVM range with correct access permission

2021-08-19 Thread Philip Yang

Restore retry fault or prefetch range, or restore svm range after eviction to map range to GPU with correct read or write access permission. Range may includes multiple VMAs, update GPU page table with offset of prange, number of pages for each VMA according VMA access permission. Signed-off-by:

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-19 Thread Andrey Grodzovsky

On 2021-08-19 5:30 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:42 a.m., Daniel Vetter wrote: On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote: On 2021-08-18 10:32 a.m., Daniel Vetter wrote: On Wed, Aug 18,

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Christoph Hellwig

On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote: > While the name suggests this is intended mainly for guests, it will > also be used for host memory encryption checks in place of sme_active(). Which suggest that the name is not good to start with. Maybe protected hardware, system

Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-19 Thread Christoph Hellwig

On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote: > +#define PATTR_MEM_ENCRYPT0 /* Encrypted memory */ > +#define PATTR_HOST_MEM_ENCRYPT 1 /* Host encrypted > memory */ > +#define PATTR_GUEST_MEM_ENCRYPT 2 /* Guest encrypted >

Re: [PATCH v2 04/12] powerpc/pseries/svm: Add a powerpc version of prot_guest_has()

2021-08-19 Thread Christoph Hellwig

On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote: > +static inline bool prot_guest_has(unsigned int attr) No reall need to have this inline. In fact I'd suggest we havea the prototype in a common header so that everyone must implement it out of line.

[PATCH 09/18] drm/amdkfd: CRIU add queues support

2021-08-19 Thread David Yat Sin

Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 380 ++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH 11/18] drm/amdkfd: CRIU restore sdma id for queues

2021-08-19 Thread David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +-

[PATCH 10/18] drm/amdkfd: CRIU restore queue ids

2021-08-19 Thread David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Adding a new private structure queue_restore_data to store queue restore information. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin ---

[PATCH 15/18] drm/amdkfd: CRIU dump and restore events

2021-08-19 Thread David Yat Sin

Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 130 +++- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 253 ---

[PATCH 12/18] drm/amdkfd: CRIU restore queue doorbell id

2021-08-19 Thread David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 +-- 1 file changed, 41 insertions(+), 20 deletions(-) diff --git

[PATCH 13/18] drm/amdkfd: CRIU dump and restore queue mqds

2021-08-19 Thread David Yat Sin

Dump contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 53 ++ drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 70

[PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj Checkpoint-Restore in userspace (CRIU) is a powerful tool that can snapshot a running process and later restore it on same or a remote machine but expects the processes that have a device file (e.g. GPU) associated with them, provide necessary driver support to assist

[PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj This adds support to discover the buffer objects that belong to a process being checkpointed. The data corresponding to these buffer objects is returned to user space plugin running under criu master context which then stores this info to recreate these buffer objects

[PATCH 02/18] x86/configs: CRIU update debug rock defconfig

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj - Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file

[PATCH 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj This adds support to create userptr BOs on restore and introduces a new ioctl to restart memory notifiers for the restored userptr BOs. When doing CRIU restore MMU notifications can happen anytime after we call amdgpu_mn_register. Prevent MMU notifications until we reach

[PATCH 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj This implements the KFD CRIU Restore ioctl that lays the basic foundation for the CRIU restore operation. It provides support to create the buffer objects corresponding to Non-Paged system memory mapped for GPU and/or CPU access and lays basic foundation for the userptrs

[PATCH 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl

2021-08-19 Thread David Yat Sin

Introducing pause IOCTL. The CRIU amdgpu plugin is needs to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures that the queues are not modified between each CRIU dump ioctl. Signed-off-by: David Yat Sin ---

[PATCH 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs"

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj This reverts commit 12ebe2b9df192a2a8580cd9ee3e9940c116913c8. This is just a temporary work around and will be dropped later. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++ 1 file changed, 7

[PATCH 14/18] drm/amdkfd: CRIU dump/restore queue control stack

2021-08-19 Thread David Yat Sin

Dump contents of queue control stacks on CRIU dump and restore them during CRIU restore. (rajneesh: rebased to 5.11 and fixed merge conflict) Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 31 ---

[PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping

2021-08-19 Thread David Yat Sin

When doing a restore on a different node, the gpu_id's on the restore node may be different. But the user space application will still refer use the original gpu_id's in the ioctl calls. Adding code to create a gpu id mapping so that kfd can determine actual gpu_id during the user ioctl's.

[PATCH 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj KFD buffer objects do not associate a GEM handle with them so cannot directly be used with libdrm to initiate a system dma (sDMA) operation to speedup the checkpoint and restore operation so export them as dmabuf objects and use with libdrm helper (amdgpu_bo_import) to

[PATCH 00/18] CHECKPOINT RESTORE WITH ROCm

2021-08-19 Thread David Yat Sin

CRIU is a user space tool which is very popular for container live migration in datacentres. It can checkpoint a running application, save its complete state, memory contents and all system resources to images on disk which can be migrated to another m achine and restored later. More

[PATCH 01/18] x86/configs: CRIU update release defconfig

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj Update rock-rel_defconfig for monolithic kernel release that enables CRIU support with kfd. Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 4a6d309a82648a23a4fc0add83013ac6db6187d5) Signed-off-by: David Yat Sin --- arch/x86/configs/rock-rel_defconfig | 13

[PATCH 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-08-19 Thread David Yat Sin

From: Rajneesh Bhardwaj This IOCTL is expected to be called as a precursor to the actual Checkpoint operation. This does the basic discovery into the target process seized by CRIU and relays the information to the userspace that utilizes it to start the Checkpoint operation via another dedicated

Re: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend

2021-08-19 Thread Zhu, James

[AMD Official Use Only] Why not move changes into hw_fini? Best Regards! James Zhu From: amd-gfx on behalf of Evan Quan Sent: Wednesday, August 18, 2021 11:08 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Chen, Guchun ; Lazar, Lijo ;

RE: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend

2021-08-19 Thread Lazar, Lijo

[AMD Official Use Only] If that is done - + amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_UVD, + AMD_PG_STATE_GATE); + amdgpu_device_ip_set_clockgating_state(adev, AMD_IP_BLOCK_TYPE_UVD, +

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-19 Thread Alex Deucher

Please go ahead. Thanks! Alex On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian wrote: > > Hi > > Do I need to open a new bug report for this? > > Cheers > > Mike > > On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky > wrote: >> >> >> On 2021-08-02 1:16 a.m., Guchun Chen wrote: >> > In

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Tom Lendacky

On 8/19/21 4:52 AM, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote: >> While the name suggests this is intended mainly for guests, it will >> also be used for host memory encryption checks in place of sme_active(). > > Which suggest that the name is not

Re: [PATCH v2 04/12] powerpc/pseries/svm: Add a powerpc version of prot_guest_has()

2021-08-19 Thread Tom Lendacky

On 8/19/21 4:55 AM, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote: >> +static inline bool prot_guest_has(unsigned int attr) > > No reall need to have this inline. In fact I'd suggest we havea the > prototype in a common header so that everyone must

[PATCH v3 5/6] drm/amd/display: Add DP 2.0 BIOS and DMUB Support

2021-08-19 Thread Fangzhi Zuo

Parse DP2 encoder caps and hpo instance from bios Signed-off-by: Fangzhi Zuo --- drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 10 ++ drivers/gpu/drm/amd/display/dc/bios/command_table2.c | 10 ++ .../drm/amd/display/dc/dcn30/dcn30_dio_link_encoder.c | 4

[PATCH v3 6/6] drm/amd/display: Add DP 2.0 SST DC Support

2021-08-19 Thread Fangzhi Zuo

1. Retrieve 128/132b link cap. 2. 128/132b link training and payload allocation. 3. UHBR10 link rate support. Signed-off-by: Fangzhi Zuo --- .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |8 + drivers/gpu/drm/amd/display/dc/core/dc.c | 21 +

[PATCH v3 2/6] drm/amd/display: Add DP 2.0 HPO Stream Encoder

2021-08-19 Thread Fangzhi Zuo

HW Blocks: ++ +-+ +--+ | OPTC | | HDA | | HUBP | ++ +-+ +--+ | || | || HPO |==|| | | v| | | +-+

[PATCH v3 3/6] drm/amd/display: Add DP 2.0 HPO Link Encoder

2021-08-19 Thread Fangzhi Zuo

HW Blocks: ++ +-+ +--+ | OPTC | | HDA | | HUBP | ++ +-+ +--+ | || | || HPO |==|| | | v| | | +-+

[PATCH v3 4/6] drm/amd/display: Add DP 2.0 DCCG

2021-08-19 Thread Fangzhi Zuo

HW Blocks: ++ +-+ +--+ | OPTC | | HDA | | HUBP | ++ +-+ +--+ | || | || HPO |==|| | | v| | | +-+

[PATCH v3 0/6] Add DP 2.0 SST Support

2021-08-19 Thread Fangzhi Zuo

The patch series adds SST UHBR10 support Fangzhi Zuo (6): drm/amd/display: Add DP 2.0 Audio Package Generator drm/amd/display: Add DP 2.0 HPO Stream Encoder drm/amd/display: Add DP 2.0 HPO Link Encoder drm/amd/display: Add DP 2.0 DCCG drm/amd/display: Add DP 2.0 BIOS and DMUB Support

[PATCH v3 1/6] drm/amd/display: Add DP 2.0 Audio Package Generator

2021-08-19 Thread Fangzhi Zuo

HW Blocks: +-+ | HDA | +-+ | | HPO ===|= | v | +-+ | | APG | v +-+

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-19 Thread Felix Kuehling

Am 2021-08-19 um 2:00 p.m. schrieb Sierra Guiza, Alejandro (Alex): > > On 8/18/2021 2:28 PM, Ralph Campbell wrote: >> On 8/17/21 5:35 PM, Felix Kuehling wrote: >>> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: On 8/12/21 11:31 PM, Alex Sierra wrote: > From: Ralph Campbell >

Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-19 Thread Tom Lendacky

On 8/19/21 4:46 AM, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote: >> +#define PATTR_MEM_ENCRYPT 0 /* Encrypted memory */ >> +#define PATTR_HOST_MEM_ENCRYPT 1 /* Host encrypted >> memory */ >> +#define

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Borislav Petkov

On Thu, Aug 19, 2021 at 10:52:53AM +0100, Christoph Hellwig wrote: > Which suggest that the name is not good to start with. Maybe protected > hardware, system or platform might be a better choice? Yah, coming up with a proper name here hasn't been easy. prot_guest_has() is not the first variant.

Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-19 Thread Sierra Guiza, Alejandro (Alex)

On 8/18/2021 2:28 PM, Ralph Campbell wrote: On 8/17/21 5:35 PM, Felix Kuehling wrote: Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell: On 8/12/21 11:31 PM, Alex Sierra wrote: From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Kuppuswamy, Sathyanarayanan

On 8/19/21 11:33 AM, Tom Lendacky wrote: There was some talk about this on the mailing list where TDX and SEV may need to be differentiated, so we wanted to reserve a range of values per technology. I guess I can remove them until they are actually needed. In TDX also we have similar

[PATCH] drm/amd/pm: And destination bounds checking to struct copy

2021-08-19 Thread Kees Cook

In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. The "Board Parameters" members of the structs: struct atom_smc_dpm_info_v4_5 struct

Re: [PATCH v2 18/63] drm/amd/pm: Use struct_group() for memcpy() region

2021-08-19 Thread Kees Cook

On Thu, Aug 19, 2021 at 10:33:43AM +0530, Lazar, Lijo wrote: > On 8/19/2021 5:29 AM, Kees Cook wrote: > > On Wed, Aug 18, 2021 at 05:12:28PM +0530, Lazar, Lijo wrote: > > > > > > On 8/18/2021 11:34 AM, Kees Cook wrote: > > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time

Re: [PATCH 1/2] drm/amdkfd: check access permisson to restore retry fault

2021-08-19 Thread Felix Kuehling

Am 2021-08-19 um 10:56 a.m. schrieb Philip Yang: > Check range access permission to restore GPU retry fault, if GPU retry > fault on address which belongs to VMA, and VMA has no read or write > permission requested by GPU, failed to restore the address. The vm fault > event will pass back to user

Re: [PATCH 2/2] drm/amdkfd: map SVM range with correct access permission

2021-08-19 Thread Felix Kuehling

Am 2021-08-19 um 10:56 a.m. schrieb Philip Yang: > Restore retry fault or prefetch range, or restore svm range after > eviction to map range to GPU with correct read or write access > permission. > > Range may includes multiple VMAs, update GPU page table with offset of > prange, number of pages

56 matches

Mail list logo