Re: [PATCH v6 02/14] mm: handling Non-LRU pages returned by vm_normal_pages

2022-06-28 Thread Sierra Guiza, Alejandro (Alex)
On 6/28/2022 5:42 AM, David Hildenbrand wrote: On 28.06.22 02:14, Alex Sierra wrote: With DEVICE_COHERENT, we'll soon have vm_normal_pages() return device-managed anonymous pages that are not LRU pages. Although they behave like normal pages for purposes of mapping in CPU page, and for COW.

[PATCH v7 14/14] tools: add selftests to hmm for COW in device memory

2022-06-28 Thread Alex Sierra
The objective is to test device migration mechanism in pages marked as COW, for private and coherent device type. In case of writing to COW private page(s), a page fault will migrate pages back to system memory first. Then, these pages will be duplicated. In case of COW device coherent type, pages

[PATCH v7 13/14] tools: add hmm gup tests for device coherent type

2022-06-28 Thread Alex Sierra
The intention is to test hmm device coherent type under different get user pages paths. Also, test gup with FOLL_LONGTERM flag set in device coherent pages. These pages should get migrated back to system memory. Signed-off-by: Alex Sierra Reviewed-by: Alistair Popple ---

[PATCH v7 11/14] tools: update hmm-test to support device coherent type

2022-06-28 Thread Alex Sierra
Test cases such as migrate_fault and migrate_multiple, were modified to explicit migrate from device to sys memory without the need of page faults, when using device coherent type. Snapshot test case updated to read memory device type first and based on that, get the proper returned results

[PATCH v7 05/14] mm: remove the vma check in migrate_vma_setup()

2022-06-28 Thread Alex Sierra
From: Alistair Popple migrate_vma_setup() checks that a valid vma is passed so that the page tables can be walked to find the pfns associated with a given address range. However in some cases the pfns are already known, such as when migrating device coherent pages during pin_user_pages() meaning

[PATCH v7 09/14] lib: test_hmm add module param for zone device type

2022-06-28 Thread Alex Sierra
In order to configure device coherent in test_hmm, two module parameters should be passed, which correspond to the SP start address of each device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed, private device type is configured. Signed-off-by: Alex Sierra Acked-by: Felix

[PATCH v7 12/14] tools: update test_hmm script to support SP config

2022-06-28 Thread Alex Sierra
Add two more parameters to set spm_addr_dev0 & spm_addr_dev1 addresses. These two parameters configure the start SP addresses for each device in test_hmm driver. Consequently, this configures zone device type as coherent. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair

[PATCH v7 08/14] lib: test_hmm add ioctl to get zone device type

2022-06-28 Thread Alex Sierra
new ioctl cmd added to query zone device type. This will be used once the test_hmm adds zone device coherent type. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Poppple Signed-off-by: Christoph Hellwig --- lib/test_hmm.c | 11 +-- lib/test_hmm_uapi.h

[PATCH v7 07/14] drm/amdkfd: add SPM support for SVM

2022-06-28 Thread Alex Sierra
When CPU is connected throug XGMI, it has coherent access to VRAM resource. In this case that resource is taken from a table in the device gmc aperture base. This resource is used along with the device type, which could be DEVICE_PRIVATE or DEVICE_COHERENT to create the device page map region.

[PATCH v7 06/14] mm/gup: migrate device coherent pages when pinning instead of failing

2022-06-28 Thread Alex Sierra
From: Alistair Popple Currently any attempts to pin a device coherent page will fail. This is because device coherent pages need to be managed by a device driver, and pinning them would prevent a driver from migrating them off the device. However this is no reason to fail pinning of these

[PATCH v7 10/14] lib: add support for device coherent type in test_hmm

2022-06-28 Thread Alex Sierra
Device Coherent type uses device memory that is coherently accesible by the CPU. This could be shown as SP (special purpose) memory range at the BIOS-e820 memory enumeration. If no SP memory is supported in system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP. Currently, test_hmm only

[PATCH v7 03/14] mm: handling Non-LRU pages returned by vm_normal_pages

2022-06-28 Thread Alex Sierra
With DEVICE_COHERENT, we'll soon have vm_normal_pages() return device-managed anonymous pages that are not LRU pages. Although they behave like normal pages for purposes of mapping in CPU page, and for COW. They do not support LRU lists, NUMA migration or THP. Callers to follow_page that expect

[PATCH v7 04/14] mm: add device coherent vma selection for memory migration

2022-06-28 Thread Alex Sierra
This case is used to migrate pages from device memory, back to system memory. Device coherent type memory is cache coherent from device and CPU point of view. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Poppple Signed-off-by: Christoph Hellwig ---

[PATCH v7 01/14] mm: rename is_pinnable_pages to is_pinnable_longterm_pages

2022-06-28 Thread Alex Sierra
is_pinnable_page() and folio_is_pinnable() were renamed to is_longterm_pinnable_page() and folio_is_longterm_pinnable() respectively. These functions are used in the FOLL_LONGTERM flag context. Signed-off-by: Alex Sierra --- include/linux/memremap.h | 24

[PATCH v7 02/14] mm: add zone device coherent type memory support

2022-06-28 Thread Alex Sierra
Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted.

[PATCH v7 00/14] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

2022-06-28 Thread Alex Sierra
This is our MEMORY_DEVICE_COHERENT patch series rebased and updated for current 5.19.0-rc4 Changes since the last version: - Fixed problems with migration during long-term pinning in get_user_pages - Open coded vm_normal_lru_pages as suggested in previous code review - Update hmm_gup_test with

[PATCH 7/7] Revert "drm/amdgpu/gmc11: avoid cpu accessing registers to flush VM"

2022-06-28 Thread Jack Xiao
This reverts commit 5af39cf2fbadbaac1a04c94a604b298a9a325670 since drv enabled mes to access registers. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 51 +- 1 file changed, 1 insertion(+), 50 deletions(-) diff --git

[PATCH 5/7] drm/amdgpu: enable mes to access registers v2

2022-06-28 Thread Jack Xiao
Enable mes to access registers. v2: squash mes sched ring enablement flag Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 8 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 ++ drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 2 +-

[PATCH 6/7] drm/amdgpu/mes: add mes ring test

2022-06-28 Thread Jack Xiao
Use read/write register to test mes ring. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 36 + drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 6 + 3 files changed, 43 insertions(+) diff --git

[PATCH 4/7] drm/amdgpu/mes: add mes register access interface

2022-06-28 Thread Jack Xiao
Add mes register access routines: 1. read register 2. write register 3. wait register 4. write and wait register Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 132 +++- 1 file changed, 131 insertions(+), 1 deletion(-) diff --git

[PATCH 3/7] drm/amdgpu/mes11: add mes11 misc op

2022-06-28 Thread Jack Xiao
Add misc op commands in mes11. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 53 ++ 1 file changed, 53 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index d5200cbceb8a..e2aa1ebb3a00

[PATCH 1/7] drm/amdgpu/mes11: update mes interface for acessing registers

2022-06-28 Thread Jack Xiao
Update MES firmware api for accessing registers. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/include/mes_v11_api_def.h | 37 +-- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h

[PATCH 2/7] drm/amdgpu: add common interface for mes misc op

2022-06-28 Thread Jack Xiao
Add common interface for mes misc op, including accessing register interface. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 46 + 1 file changed, 46 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h

SYNCOBJ TIMELINE Test failed while running amdgpu_test

2022-06-28 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - General] Hi Alex and Mario We find the “Syncobj timeline” test failed on ubunt22(kernel version >= 5.15.34). Failed log: Suite: SYNCOBJ TIMELINE Tests Test: syncobj timeline test ...FAILED 1. sources/drm/tests/amdgpu/syncobj_tests.c:299 -

RE: [PATCH] drm/amdgpu/display: reduce stack size in dml32_ModeSupportAndSystemConfigurationFull()

2022-06-28 Thread Chen, Guchun
Acked-by: Guchun Chen Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Tuesday, June 28, 2022 10:33 PM To: Deucher, Alexander Cc: Stephen Rothwell ; Pillai, Aurabindo ; Siqueira, Rodrigo ; amd-gfx list Subject: Re: [PATCH] drm/amdgpu/display: reduce

RE: [PATCH] drm/amdkfd: Fix warnings from static analyzer Smatch

2022-06-28 Thread Errabolu, Ramesh
[AMD Official Use Only - General] My responses are inline -Original Message- From: Kuehling, Felix Sent: Tuesday, June 28, 2022 6:41 PM To: Errabolu, Ramesh ; amd-gfx@lists.freedesktop.org; dan.carpen...@oracle.com Subject: Re: [PATCH] drm/amdkfd: Fix warnings from static analyzer

Re: [PATCH] drm/amdkfd: Fix warnings from static analyzer Smatch

2022-06-28 Thread Felix Kuehling
Am 2022-06-28 um 19:25 schrieb Ramesh Errabolu: The patch fixes couple of warnings, as reported by Smatch a static analyzer Signed-off-by: Ramesh Errabolu Reported-by: Dan Carpenter --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 36 --- 1 file changed, 19

[PATCH] drm/amdkfd: Fix warnings from static analyzer Smatch

2022-06-28 Thread Ramesh Errabolu
The patch fixes couple of warnings, as reported by Smatch a static analyzer Signed-off-by: Ramesh Errabolu Reported-by: Dan Carpenter --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 36 --- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git

Re: [PATCH v2] drm/amd/display: expose additional modifier for DCN32/321

2022-06-28 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen On Tue, Jun 28, 2022 at 10:25 PM Aurabindo Pillai wrote: > > [Why] > Some userspace expect a backwards compatible modifier on DCN32/321. For > hardware with num_pipes more than 16, we expose the most efficient > modifier first. As a fall back method, we need to

[PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area

2022-06-28 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c |

[PATCH 3/4] libhsakmt: add new flags for svm

2022-06-28 Thread Eric Huang
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h index 8a0ed49..5c45f58

[PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area

2022-06-28 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c |

[PATCH 0/4] Unified memory for CWSR save restore area

2022-06-28 Thread Eric Huang
amdkfd changes: Eric Huang (2): drm/amdkfd: add new flag for svm drm/amdkfd: change svm range evict drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- include/uapi/linux/kfd_ioctl.h | 2 ++ 2 files changed, 10 insertions(+), 2 deletions(-) libhsakmt(thunk) changes: which are

[PATCH 3/4] libhsakmt: add new flags for svm

2022-06-28 Thread Eric Huang
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h index 8a0ed49..5c45f58

[PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-28 Thread Eric Huang
Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git

[PATCH 1/2] drm/amdkfd: add new flag for svm

2022-06-28 Thread Eric Huang
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..eba04ebfd9a8 100644 ---

Re: [RFC PATCH 4/5] drm/drm_color_mgmt: add 3D LUT to color mgmt properties

2022-06-28 Thread Harry Wentland
On 6/19/22 18:31, Melissa Wen wrote: > Add 3D LUT for gammar correction using a 3D lookup table. The position > in the color correction pipeline where 3D LUT is applied depends on hw > design, being after CTM or gamma. If just after CTM, a shaper lut must > be set to shape the content for a

Re: [PATCH v6 14/22] dma-buf: Introduce new locking convention

2022-06-28 Thread Intel
On 5/30/22 15:57, Dmitry Osipenko wrote: On 5/30/22 16:41, Christian König wrote: Hi Dmitry, Am 30.05.22 um 15:26 schrieb Dmitry Osipenko: Hello Christian, On 5/30/22 09:50, Christian König wrote: Hi Dmitry, First of all please separate out this patch from the rest of the series, since

Re: [RFC PATCH 2/5] Documentation/amdgpu/display: add DC color caps info

2022-06-28 Thread Harry Wentland
On 6/19/22 18:31, Melissa Wen wrote: > Add details about color correction capabilities and explain a bit about > differences between DC hw generations and also how they are mapped > between DRM and DC interface. Two schemas for DCN 2.0 and 3.0 > (rasterized from the original png) is included to

[PATCH] drm/amd: Add debug mask for subviewport mclk switch

2022-06-28 Thread Aurabindo Pillai
[Why] Expose a new debugfs enum to force a subviewport memory clock switch to facilitate easy testing. Signed-off-by: Aurabindo Pillai --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++ drivers/gpu/drm/amd/include/amd_shared.h | 1 + 2 files changed, 4 insertions(+) diff

[linux-next:master] BUILD REGRESSION cb71b93c2dc36d18a8b05245973328d018272cdf

2022-06-28 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: cb71b93c2dc36d18a8b05245973328d018272cdf Add linux-next specific files for 20220628 Error/Warning: (recently discovered and may have been fixed) arch/powerpc/kernel/interrupt.c:542:55: error

Re: [PATCH 1/3] drm/amdkfd: add new flags for svm

2022-06-28 Thread Eric Huang
Thank you, Felix. I will send all libhsakmt changes and amdkfd changes to amd-gfx. Regards, Eric On 2022-06-28 16:44, Felix Kuehling wrote: Am 2022-06-27 um 12:01 schrieb Eric Huang: No. There is only internal link for now, because it is under review. Once it is submitted, external link

Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

2022-06-28 Thread Uladzislau Rezki
> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > > All you need to do to get the previous behavior is to add something like > > this to your defconfig file: > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > > > Any reason why this will not work for you? > > As far as I

Re: [PATCH 1/3] drm/amdkfd: add new flags for svm

2022-06-28 Thread Felix Kuehling
Am 2022-06-27 um 12:01 schrieb Eric Huang: No. There is only internal link for now, because it is under review. Once it is submitted, external link should be in gerritgit for libhsakmt. Hi Eric, For anything that requires ioctl API changes, the user mode and kernel mode changes need to be

Re: [RFC PATCH 1/5] Documentation/amdgpu_dm: Add DM color correction documentation

2022-06-28 Thread Harry Wentland
On 2022-06-19 18:31, Melissa Wen wrote: > AMDGPU DM maps DRM color management properties (degamma, ctm and gamma) > to DC color correction entities. Part of this mapping is already > documented as code comments and can be converted as kernel docs. > > Signed-off-by: Melissa Wen > --- >

[PATCH] drm/amd: Load TA firmware for DCN32/321

2022-06-28 Thread Aurabindo Pillai
[Why] TA firmware is needed to enable HDCP Signed-off-by: Aurabindo Pillai --- drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c index 9e1ef81933ff..4df45c2a7d0a 100644

[PATCH v2] drm/amd/display: expose additional modifier for DCN32/321

2022-06-28 Thread Aurabindo Pillai
[Why] Some userspace expect a backwards compatible modifier on DCN32/321. For hardware with num_pipes more than 16, we expose the most efficient modifier first. As a fall back method, we need to expose slightly inefficient modifier AMD_FMT_MOD_TILE_GFX9_64K_R_X after the best option. Also set the

Re: [PATCH v6 01/22] drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error

2022-06-28 Thread Intel
Hi, On 5/27/22 01:50, Dmitry Osipenko wrote: Use ww_acquire_fini() in the error code paths. Otherwise lockdep thinks that lock is held when lock's memory is freed after the drm_gem_lock_reservations() error. The WW needs to be annotated as "freed" s /WW/ww_acquire_context/ ? s

Re: [RFC PATCH 0/5] DRM CRTC 3D LUT interface for AMD DCN

2022-06-28 Thread Harry Wentland
On 2022-06-19 18:30, Melissa Wen wrote: > Hi, > > I've been working on a proposal to add 3D LUT interface to DRM CRTC > color mgmt, that means new **after-blending** properties for color > correction. As I'm targeting AMD display drivers, I need to map these > new properties to AMD DC

Re: [RFC PATCH 4/5] drm/drm_color_mgmt: add 3D LUT to color mgmt properties

2022-06-28 Thread Harry Wentland
On 2022-06-27 08:18, Ville Syrjälä wrote: > On Sun, Jun 19, 2022 at 09:31:03PM -0100, Melissa Wen wrote: >> Add 3D LUT for gammar correction using a 3D lookup table. The position >> in the color correction pipeline where 3D LUT is applied depends on hw >> design, being after CTM or gamma. If

Re: [PATCH 10/20] drm/amd/display: Insert pulling smu busy status before sending another request

2022-06-28 Thread Mike Lothian
Hi I'm seeing the following stack trace, I'm guessing due to the assert: [3.516409] [ cut here ] [3.516412] WARNING: CPU: 1 PID: 1 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.c:98 rn_vbios_smu_send_msg_with_param+0x3e/0xe0 [

Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

2022-06-28 Thread Paul E. McKenney
On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote: > Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am: > > On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote: > >> Ah, I see. I have selected the default value for > >>

Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Rob Clark
On Tue, Jun 28, 2022 at 5:51 AM Dmitry Osipenko wrote: > > On 6/28/22 15:31, Robin Murphy wrote: > > ->8- > > [ 68.295951] == > > [ 68.295956] WARNING: possible circular locking dependency detected > > [ 68.295963] 5.19.0-rc3+ #400

Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

2022-06-28 Thread Jason A. Donenfeld
Hi Alex, On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote: > WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as > rcu, there to see if suspends are "frequent". This seems dubious for the > same reasons. I'd be happy to take a patch in WireGuard and

Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Dmitry Osipenko
Hello Robin, On 6/28/22 15:31, Robin Murphy wrote: >> Hello, >> >> This patchset introduces memory shrinker for the VirtIO-GPU DRM driver >> and adds memory purging and eviction support to VirtIO-GPU driver. >> >> The new dma-buf locking convention is introduced here as well. >> >> During OOM,

Re: [PATCH] drm/amdgpu/display: reduce stack size in dml32_ModeSupportAndSystemConfigurationFull()

2022-06-28 Thread Rodrigo Siqueira Jordao
On 2022-06-22 10:47, Alex Deucher wrote: Move more stack variable in to dummy vars structure on the heap. Fixes stack frame size errors: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c: In function 'dml32_ModeSupportAndSystemConfigurationFull':

[PATCH 10/11] libhsakmt: add open SMI event handle

2022-06-28 Thread Philip Yang
System Management Interface event is read from anonymous file handle, this helper wrap the ioctl interface to get anonymous file handle for GPU nodeid. Define SMI event IDs, event triggers, copy the same value from kfd_ioctl.h to avoid translation. Change-Id:

[PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event

2022-06-28 Thread Philip Yang
Indicate SMI profiling events available. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index f239e260796b..b024e8ba865d 100644 ---

[PATCH 11/11] ROCR-Runtime Basic SVM profiler

2022-06-28 Thread Philip Yang
From: Sean Keely Mostly a demo at this point. Logs SVM (aka HMM) info to HSA_SVM_PROFILE if set. Example: HSA_SVM_PROFILE=log.txt SomeApp Change-Id: Ib6fd688f661a21b2c695f586b833be93662a15f4 --- src/CMakeLists.txt| 1 + src/core/inc/amd_gpu_agent.h | 3 +

[PATCH v5 6/11] drm/amdkfd: Add unmap from GPU SMI event

2022-06-28 Thread Philip Yang
SVM range unmapped from GPUs when range is unmapped from CPU, or with xnack on from MMU notifier when range is evicted or migrated. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 9 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h | 3 +++

[PATCH 9/11] libhsakmt: hsaKmtGetNodeProperties add gpu_id

2022-06-28 Thread Philip Yang
Add KFDGpuID to HsaNodeProperties to return gpu_id to upper layer, gpu_id is hash ID generated by KFD to distinguish GPUs on the system. ROCr and ROCProfiler will use gpu_id to analyze SMI event message. Change-Id: I6eabe6849230e04120674f5bc55e6ea254a532d6 Signed-off-by: Philip Yang ---

[PATCH v5 4/11] drm/amdkfd: Add migration SMI event

2022-06-28 Thread Philip Yang
For migration start and end event, output timestamp when migration starts, ends, svm range address and size, GPU id of migration source and destination and svm range attributes, Migration trigger could be prefetch, CPU or GPU page fault and TTM eviction. Signed-off-by: Philip Yang ---

[PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault SMI event

2022-06-28 Thread Philip Yang
Use ktime_get_boottime_ns() as timestamp to correlate with other APIs. Output timestamp when GPU recoverable fault starts and ends to recover the fault, if migration happened or only GPU page table is updated to recover, fault address, if read or write fault. Signed-off-by: Philip Yang ---

[PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore SMI event

2022-06-28 Thread Philip Yang
Output user queue eviction and restore event. User queue eviction may be triggered by svm or userptr MMU notifier, TTM eviction, device suspend and CRIU checkpoint and restore. User queue restore may be rescheduled if eviction happens again while restore. Signed-off-by: Philip Yang ---

[PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client

2022-06-28 Thread Philip Yang
The synchronize_rcu may take several ms, which noticeably slows down applications close SMI event handle. Use call_rcu to free client->fifo and client asynchronously and eliminate the synchronize_rcu call in the user thread. Signed-off-by: Philip Yang ---

[PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers

2022-06-28 Thread Philip Yang
Define new system management interface event IDs for migration, GPU recoverable page fault, user queues eviction, restore and unmap from GPU events and corresponding event triggers, those will be implemented in the following patches. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h

[PATCH v5 2/11] drm/amdkfd: Enable per process SMI event

2022-06-28 Thread Philip Yang
Process receive event from same process by default. Add a flag to be able to receive event from all processes, this requires super user permission. Event using pid 0 to send the event to all processes, to keep the default behavior of existing SMI events. Signed-off-by: Philip Yang Reviewed-by:

[PATCH v5 0/11] HMM profiler interface

2022-06-28 Thread Philip Yang
This implements KFD profiling APIs to expose HMM migration and recoverable page fault profiling data. The ROCm profiler will shared link with application, to collect and expose the profiling data to application developers to tune the applications based on how the address range attributes

Re: [PATCH] drm/amdgpu: Fix typos in amdgpu_stop_pending_resets

2022-06-28 Thread Alex Deucher
On Tue, Jun 28, 2022 at 10:42 AM Kent Russell wrote: > > Change amdggpu to amdgpu and pedning to pending > > Signed-off-by: Kent Russell Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git

[PATCH] drm/amdgpu: Fix typos in amdgpu_stop_pending_resets

2022-06-28 Thread Kent Russell
Change amdggpu to amdgpu and pedning to pending Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index

Re: [PATCH 11/22] drm: amd: amd_shared.h: Add missing doc for PP_GFX_DCS_MASK

2022-06-28 Thread Alex Deucher
Applied. Thanks! On Tue, Jun 28, 2022 at 5:46 AM Mauro Carvalho Chehab wrote: > > This symbol is missing documentation: > > drivers/gpu/drm/amd/include/amd_shared.h:224: warning: Enum value > 'PP_GFX_DCS_MASK' not described in enum 'PP_FEATURE_MASK' > > Document it. > > Fixes:

Re: [PATCH 10/22] drm: amdgpu: amdgpu_device.c: fix a kernel-doc markup

2022-06-28 Thread Alex Deucher
On Tue, Jun 28, 2022 at 5:46 AM Mauro Carvalho Chehab wrote: > > The function was renamed without renaming also kernel-doc markup: > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:5095: warning: expecting > prototype for amdgpu_device_gpu_recover_imp(). Prototype was for >

Re: [PATCH 09/22] drm: amdgpu: amdgpu_dm: fix kernel-doc markups

2022-06-28 Thread Alex Deucher
Applied. Thanks! On Tue, Jun 28, 2022 at 5:46 AM Mauro Carvalho Chehab wrote: > > There are 4 undocumented fields at struct amdgpu_display_manager. > > Add documentation for them, fixing those warnings: > > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:544: warning: > Function

Re: [PATCH] drm/amdgpu/display: reduce stack size in dml32_ModeSupportAndSystemConfigurationFull()

2022-06-28 Thread Alex Deucher
Ping? Alex On Wed, Jun 22, 2022 at 10:48 AM Alex Deucher wrote: > > Move more stack variable in to dummy vars structure on the heap. > > Fixes stack frame size errors: > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c: In > function

RE: [PATCH 2/2] drm/amdgpu: fix documentation warning

2022-06-28 Thread Russell, Kent
[AMD Official Use Only - General] Not sure why no one responded, but this is something even I can RB. Reviewed-by: Kent Russell > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Monday, June 27, 2022 5:41 PM > To: Deucher, Alexander > Cc: Stephen Rothwell ;

Re: (subset) [PATCH 00/22] Fix kernel-doc warnings at linux-next

2022-06-28 Thread Mark Brown
On Tue, 28 Jun 2022 10:46:04 +0100, Mauro Carvalho Chehab wrote: > As we're currently discussing about making kernel-doc issues fatal when > CONFIG_WERROR is enable, let's fix all 60 kernel-doc warnings > inside linux-next: > > arch/x86/include/uapi/asm/sgx.h:19: warning: Enum value >

Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Robin Murphy
On 2022-05-27 00:50, Dmitry Osipenko wrote: Hello, This patchset introduces memory shrinker for the VirtIO-GPU DRM driver and adds memory purging and eviction support to VirtIO-GPU driver. The new dma-buf locking convention is introduced here as well. During OOM, the shrinker will release BOs

Re: [PATCH v6 02/14] mm: handling Non-LRU pages returned by vm_normal_pages

2022-06-28 Thread David Hildenbrand
On 28.06.22 02:14, Alex Sierra wrote: > With DEVICE_COHERENT, we'll soon have vm_normal_pages() return > device-managed anonymous pages that are not LRU pages. Although they > behave like normal pages for purposes of mapping in CPU page, and for > COW. They do not support LRU lists, NUMA migration

Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Dmitry Osipenko
dler+0xb8/0xc0 > [  100.512051]  el0t_64_sync+0x18c/0x190 > [  100.512064] This one shall be fixed by [1] that is not in the RC kernel yet, please use linux-next. [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220628=7d64c40a7d96190d9d06e240305389e025295916 -- Best regards, Dmitry

Re: [PATCH v6 06/14] mm: add device coherent checker to is_pinnable_page

2022-06-28 Thread David Hildenbrand
On 28.06.22 02:14, Alex Sierra wrote: > is_device_coherent checker was added to is_pinnable_page and renamed > to is_longterm_pinnable_page. The reason is that device coherent > pages are not supported for longterm pinning. > > Signed-off-by: Alex Sierra > --- > include/linux/memremap.h | 25

Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Dmitry Osipenko
On 6/28/22 15:31, Robin Murphy wrote: > ->8- > [   68.295951] == > [   68.295956] WARNING: possible circular locking dependency detected > [   68.295963] 5.19.0-rc3+ #400 Not tainted > [   68.295972]

Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash

2022-06-28 Thread Jocelyn Falempe
On 28/06/2022 10:43, Thomas Zimmermann wrote: Hi Am 27.06.22 um 19:25 schrieb Linus Torvalds: On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas wrote: The flag was dropped because it was causing drivers that requested their memory resource with pci_request_region() to fail with

Re: [PATCH] drm/amd/display: expose additional modifier for DCN32/321

2022-06-28 Thread Marek Olšák
This needs to be a loop inserting all 64K_R_X and all 256K_R_X modifiers. If num_pipes > 16, insert 256K_R_X first, else insert 64K_R_X first. Insert the other one after that. For example: for (unsigned i = 0; i < 2; i++) { unsigned swizzle_r_x; /* Insert the best one

Re: [PATCH 09/14] drm/radeon: use drm_oom_badness

2022-06-28 Thread Michel Dänzer
On 2022-06-24 10:04, Christian König wrote: > This allows the OOM killer to make a better decision which process to reap. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/radeon/radeon_drv.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/radeon/radeon_drv.c

[PATCH 09/22] drm: amdgpu: amdgpu_dm: fix kernel-doc markups

2022-06-28 Thread Mauro Carvalho Chehab
There are 4 undocumented fields at struct amdgpu_display_manager. Add documentation for them, fixing those warnings: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:544: warning: Function parameter or member 'dmub_outbox_params' not described in 'amdgpu_display_manager'

[PATCH 11/22] drm: amd: amd_shared.h: Add missing doc for PP_GFX_DCS_MASK

2022-06-28 Thread Mauro Carvalho Chehab
This symbol is missing documentation: drivers/gpu/drm/amd/include/amd_shared.h:224: warning: Enum value 'PP_GFX_DCS_MASK' not described in enum 'PP_FEATURE_MASK' Document it. Fixes: 680602d6c2d6 ("drm/amd/pm: enable DCS") Signed-off-by: Mauro Carvalho Chehab --- To avoid mailbombing

[PATCH 10/22] drm: amdgpu: amdgpu_device.c: fix a kernel-doc markup

2022-06-28 Thread Mauro Carvalho Chehab
The function was renamed without renaming also kernel-doc markup: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:5095: warning: expecting prototype for amdgpu_device_gpu_recover_imp(). Prototype was for amdgpu_device_gpu_recover() instead Signed-off-by: Mauro Carvalho Chehab --- To avoid

[PATCH 00/22] Fix kernel-doc warnings at linux-next

2022-06-28 Thread Mauro Carvalho Chehab
As we're currently discussing about making kernel-doc issues fatal when CONFIG_WERROR is enable, let's fix all 60 kernel-doc warnings inside linux-next: arch/x86/include/uapi/asm/sgx.h:19: warning: Enum value 'SGX_PAGE_MEASURE' not described in enum 'sgx_page_flags'

[Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-06-28 Thread Mikhail Gavrilov
Hi guys. Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode instead I see black screen with constantly glowing cursor. Demonstration: https://youtu.be/rGL4LsHMae4 In the kernel logs there are references to hung processes: [ 149.363465] rfkill: input handler

Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash

2022-06-28 Thread Thomas Zimmermann
Hi Am 27.06.22 um 19:25 schrieb Linus Torvalds: On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas wrote: The flag was dropped because it was causing drivers that requested their memory resource with pci_request_region() to fail with -EBUSY (e.g: the vmwgfx driver):