Perform proper cleanups on UVD/VCE suspend: powergate enablement,
clockgating enablement and dpm disablement. This can fix some hangs
observed on suspending when UVD/VCE still using(e.g. issue
"pm-suspend" when video is still playing).
Change-Id: I36f39d9731e0a9638b52d5d92558b0ee9c23a9ed
[AMD Official Use Only]
From: Lazar, Lijo
Sent: Thursday, August 19, 2021 10:36 PM
To: Zhu, James ; Quan, Evan ;
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun
; Pan, Xinhui
Subject: RE: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on
suspend
[AMD
Give every process at most one queue from each SDMA engine.
Previously, we allocated all SDMA engines and queues on a first-
come-first-serve basis. This meant that it was possible for two
processes racing on their allocation requests to each end up with
two queues on the same SDMA engine. That
Aldebaran should not use SDMA0 for buffer funcs such as page migration.
Instead, we move over to SDMA1 for these features. Leave SDMA0 in
charge for all other existing chips to avoid any possibility of
regressions.
Signed-off-by: Joseph Greathouse
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8
Avoid hotspotting of allocations of SDMA engines from the
XGMI pool by making each process attempt to allocate engines
starting from the engine after the last one that was allocated.
Signed-off-by: Joseph Greathouse
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +++-
[AMD Official Use Only]
From: Zhu, James
Sent: Thursday, August 19, 2021 10:19 PM
To: Quan, Evan ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun
; Lazar, Lijo ; Pan, Xinhui
Subject: Re: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on
suspend
[AMD
On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:
>
> On 2021-08-18 10:42 a.m., Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote:
> > > On 2021-08-18 10:32 a.m., Daniel Vetter wrote:
> > > > On Wed, Aug 18, 2021 at 10:26:25AM -0400,
On Thu, Aug 19, 2021 at 03:01:26AM +, Liu, Monk wrote:
> [AMD Official Use Only]
>
> Hi Andrey and Daniel
>
> We worked for a really long time on this new feature to AMD that finally
> can pick up the bad job from all timedout ones, and the change in
> scheduler (get/put fence in
Hi
Do I need to open a new bug report for this?
Cheers
Mike
On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky
wrote:
>
> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
> > scheduler in s3 test, otherwise, fence related
[AMD Official Use Only]
Hi Daniel
>> Why can't we stop the scheduler thread first, so that there's guaranteed no
>> race? I've recently had a lot of discussions with panfrost folks about their
>> reset that spawns across engines, and without stopping the scheduler thread
>> first before you
Check range access permission to restore GPU retry fault, if GPU retry
fault on address which belongs to VMA, and VMA has no read or write
permission requested by GPU, failed to restore the address. The vm fault
event will pass back to user space.
Signed-off-by: Philip Yang
---
Restore retry fault or prefetch range, or restore svm range after
eviction to map range to GPU with correct read or write access
permission.
Range may includes multiple VMAs, update GPU page table with offset of
prange, number of pages for each VMA according VMA access permission.
Signed-off-by:
On 2021-08-19 5:30 a.m., Daniel Vetter wrote:
On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:
On 2021-08-18 10:42 a.m., Daniel Vetter wrote:
On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote:
On 2021-08-18 10:32 a.m., Daniel Vetter wrote:
On Wed, Aug 18,
On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote:
> While the name suggests this is intended mainly for guests, it will
> also be used for host memory encryption checks in place of sme_active().
Which suggest that the name is not good to start with. Maybe protected
hardware, system
On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote:
> +#define PATTR_MEM_ENCRYPT0 /* Encrypted memory */
> +#define PATTR_HOST_MEM_ENCRYPT 1 /* Host encrypted
> memory */
> +#define PATTR_GUEST_MEM_ENCRYPT 2 /* Guest encrypted
>
On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote:
> +static inline bool prot_guest_has(unsigned int attr)
No reall need to have this inline. In fact I'd suggest we havea the
prototype in a common header so that everyone must implement it out
of line.
Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 380 ++-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.
Signed-off-by: David Yat Sin
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +-
When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump. Adding a new private
structure queue_restore_data to store queue restore information.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 130 +++-
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 253 ---
When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.
Signed-off-by: David Yat Sin
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 +--
1 file changed, 41 insertions(+), 20 deletions(-)
diff --git
Dump contents of queue MQD's on CRIU dump and restore them during CRIU
restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 53 ++
drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +-
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 70
From: Rajneesh Bhardwaj
Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist
From: Rajneesh Bhardwaj
This adds support to discover the buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
From: Rajneesh Bhardwaj
- Update debug config for Checkpoint-Restore (CR) support
- Also include necessary options for CR with docker containers.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
arch/x86/configs/rock-dbg_defconfig | 53 ++---
1 file
From: Rajneesh Bhardwaj
This adds support to create userptr BOs on restore and introduces a new
ioctl to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach
From: Rajneesh Bhardwaj
This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to Non-Paged system memory
mapped for GPU and/or CPU access and lays basic foundation for the
userptrs
Introducing pause IOCTL. The CRIU amdgpu plugin is needs
to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and
AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures
that the queues are not modified between each CRIU dump ioctl.
Signed-off-by: David Yat Sin
---
From: Rajneesh Bhardwaj
This reverts commit 12ebe2b9df192a2a8580cd9ee3e9940c116913c8.
This is just a temporary work around and will be dropped later.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++
1 file changed, 7
Dump contents of queue control stacks on CRIU dump and restore them
during CRIU restore.
(rajneesh: rebased to 5.11 and fixed merge conflict)
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 31 ---
When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.
From: Rajneesh Bhardwaj
KFD buffer objects do not associate a GEM handle with them so cannot
directly be used with libdrm to initiate a system dma (sDMA) operation
to speedup the checkpoint and restore operation so export them as dmabuf
objects and use with libdrm helper (amdgpu_bo_import) to
CRIU is a user space tool which is very popular for container live migration in
datacentres. It can checkpoint a running application, save its complete state,
memory contents and all system resources to images on disk which can be
migrated to another m
achine and restored later. More
From: Rajneesh Bhardwaj
Update rock-rel_defconfig for monolithic kernel release that enables
CRIU support with kfd.
Signed-off-by: Rajneesh Bhardwaj
(cherry picked from commit 4a6d309a82648a23a4fc0add83013ac6db6187d5)
Signed-off-by: David Yat Sin
---
arch/x86/configs/rock-rel_defconfig | 13
From: Rajneesh Bhardwaj
This IOCTL is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
[AMD Official Use Only]
Why not move changes into hw_fini?
Best Regards!
James Zhu
From: amd-gfx on behalf of Evan Quan
Sent: Wednesday, August 18, 2021 11:08 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun
; Lazar, Lijo ;
[AMD Official Use Only]
If that is done -
+ amdgpu_device_ip_set_powergating_state(adev,
AMD_IP_BLOCK_TYPE_UVD,
+ AMD_PG_STATE_GATE);
+ amdgpu_device_ip_set_clockgating_state(adev,
AMD_IP_BLOCK_TYPE_UVD,
+
Please go ahead. Thanks!
Alex
On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian wrote:
>
> Hi
>
> Do I need to open a new bug report for this?
>
> Cheers
>
> Mike
>
> On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky
> wrote:
>>
>>
>> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
>> > In
On 8/19/21 4:52 AM, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote:
>> While the name suggests this is intended mainly for guests, it will
>> also be used for host memory encryption checks in place of sme_active().
>
> Which suggest that the name is not
On 8/19/21 4:55 AM, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote:
>> +static inline bool prot_guest_has(unsigned int attr)
>
> No reall need to have this inline. In fact I'd suggest we havea the
> prototype in a common header so that everyone must
Parse DP2 encoder caps and hpo instance from bios
Signed-off-by: Fangzhi Zuo
---
drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 10 ++
drivers/gpu/drm/amd/display/dc/bios/command_table2.c | 10 ++
.../drm/amd/display/dc/dcn30/dcn30_dio_link_encoder.c | 4
1. Retrieve 128/132b link cap.
2. 128/132b link training and payload allocation.
3. UHBR10 link rate support.
Signed-off-by: Fangzhi Zuo
---
.../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |8 +
drivers/gpu/drm/amd/display/dc/core/dc.c | 21 +
HW Blocks:
++ +-+ +--+
| OPTC | | HDA | | HUBP |
++ +-+ +--+
| ||
| ||
HPO |==||
| | v|
| | +-+
HW Blocks:
++ +-+ +--+
| OPTC | | HDA | | HUBP |
++ +-+ +--+
| ||
| ||
HPO |==||
| | v|
| | +-+
HW Blocks:
++ +-+ +--+
| OPTC | | HDA | | HUBP |
++ +-+ +--+
| ||
| ||
HPO |==||
| | v|
| | +-+
The patch series adds SST UHBR10 support
Fangzhi Zuo (6):
drm/amd/display: Add DP 2.0 Audio Package Generator
drm/amd/display: Add DP 2.0 HPO Stream Encoder
drm/amd/display: Add DP 2.0 HPO Link Encoder
drm/amd/display: Add DP 2.0 DCCG
drm/amd/display: Add DP 2.0 BIOS and DMUB Support
HW Blocks:
+-+
| HDA |
+-+
|
|
HPO ===|=
| v
| +-+
| | APG |
v +-+
Am 2021-08-19 um 2:00 p.m. schrieb Sierra Guiza, Alejandro (Alex):
>
> On 8/18/2021 2:28 PM, Ralph Campbell wrote:
>> On 8/17/21 5:35 PM, Felix Kuehling wrote:
>>> Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
On 8/12/21 11:31 PM, Alex Sierra wrote:
> From: Ralph Campbell
>
On 8/19/21 4:46 AM, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote:
>> +#define PATTR_MEM_ENCRYPT 0 /* Encrypted memory */
>> +#define PATTR_HOST_MEM_ENCRYPT 1 /* Host encrypted
>> memory */
>> +#define
On Thu, Aug 19, 2021 at 10:52:53AM +0100, Christoph Hellwig wrote:
> Which suggest that the name is not good to start with. Maybe protected
> hardware, system or platform might be a better choice?
Yah, coming up with a proper name here hasn't been easy.
prot_guest_has() is not the first variant.
On 8/18/2021 2:28 PM, Ralph Campbell wrote:
On 8/17/21 5:35 PM, Felix Kuehling wrote:
Am 2021-08-17 um 8:01 p.m. schrieb Ralph Campbell:
On 8/12/21 11:31 PM, Alex Sierra wrote:
From: Ralph Campbell
ZONE_DEVICE struct pages have an extra reference count that
complicates the
code for
On 8/19/21 11:33 AM, Tom Lendacky wrote:
There was some talk about this on the mailing list where TDX and SEV may
need to be differentiated, so we wanted to reserve a range of values per
technology. I guess I can remove them until they are actually needed.
In TDX also we have similar
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.
The "Board Parameters" members of the structs:
struct atom_smc_dpm_info_v4_5
struct
On Thu, Aug 19, 2021 at 10:33:43AM +0530, Lazar, Lijo wrote:
> On 8/19/2021 5:29 AM, Kees Cook wrote:
> > On Wed, Aug 18, 2021 at 05:12:28PM +0530, Lazar, Lijo wrote:
> > >
> > > On 8/18/2021 11:34 AM, Kees Cook wrote:
> > > > In preparation for FORTIFY_SOURCE performing compile-time and run-time
Am 2021-08-19 um 10:56 a.m. schrieb Philip Yang:
> Check range access permission to restore GPU retry fault, if GPU retry
> fault on address which belongs to VMA, and VMA has no read or write
> permission requested by GPU, failed to restore the address. The vm fault
> event will pass back to user
Am 2021-08-19 um 10:56 a.m. schrieb Philip Yang:
> Restore retry fault or prefetch range, or restore svm range after
> eviction to map range to GPU with correct read or write access
> permission.
>
> Range may includes multiple VMAs, update GPU page table with offset of
> prange, number of pages
56 matches
Mail list logo