Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Sierra Guiza, Alejandro (Alex)
On 10/14/2021 3:57 PM, Ralph Campbell wrote: On 10/14/21 11:01 AM, Jason Gunthorpe wrote: On Thu, Oct 14, 2021 at 10:35:27AM -0700, Ralph Campbell wrote: I ran xfstests-dev using the kernel boot option to "fake" a pmem device when I first posted this patch. The tests ran OK (or at least

RE: [PATCH 1/5] drm/amdkfd: protect hawaii_device_info with CONFIG_DRM_AMDGPU_CIK

2021-10-14 Thread Quan, Evan
[AMD Official Use Only] Patch1 & 2 are reviewed-by: Evan Quan Patch 3 - 5 are acked-by: Evan Quan > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Thursday, October 14, 2021 2:21 AM > To: Deucher, Alexander > Cc: amd-gfx list > Subject: Re: [PATCH 1/5]

RE: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-14 Thread Quan, Evan
[AMD Official Use Only] +Kent who maintains the Rocm tool From: amd-gfx On Behalf Of Lazar, Lijo Sent: Thursday, October 14, 2021 1:07 AM To: Tuikov, Luben ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency [AMD Official Use

RE: [PATCH] drm/amdgpu/gfx10: fix typo in gfx_v10_0_update_gfx_clock_gating()

2021-10-14 Thread Quan, Evan
[AMD Official Use Only] Reviewed-by: Evan Quan > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Wednesday, October 13, 2021 12:40 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH] drm/amdgpu/gfx10: fix typo in >

Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Dan Williams
On Thu, Oct 14, 2021 at 4:06 PM Jason Gunthorpe wrote: > > On Thu, Oct 14, 2021 at 12:01:14PM -0700, Dan Williams wrote: > > > > Does anyone know why devmap is pte_special anyhow? > > > > It does not need to be special as mentioned here: > > > >

[PATCH 5/5] dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency (v2)

2021-10-14 Thread Luben Tuikov
A current value of a clock frequency of 0, means that the IP block is in some kind of low power state. Ignore it and don't report it here. Here we only report the possible operating (non-zero) frequencies of the block requested. So, if the current clock value is 0, then print the DPM frequencies,

[PATCH 2/5] drm/amd/pm: Rename cur_value to curr_value

2021-10-14 Thread Luben Tuikov
Rename "cur_value", which stands for "cursor value" to "curr_value", which stands for "current value". Cc: Alex Deucher Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 12 ++-- .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 15 --- 2

[PATCH 4/5] dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency (v2)

2021-10-14 Thread Luben Tuikov
A current value of a clock frequency of 0, means that the IP block is in some kind of low power state. Ignore it and don't report it here. Here we only report the possible operating (non-zero) frequencies of the block requested. So, if the current clock value is 0, then print the DPM frequencies,

[PATCH 3/5] drm/amd/pm: Rename freq_values --> freq_value

2021-10-14 Thread Luben Tuikov
By usage: read freq_values[x] to freq_value[x]. Cc: Alex Deucher Signed-off-by: Luben Tuikov --- .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c| 16 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c| 18 +- 2 files changed, 17 insertions(+), 17 deletions(-)

[PATCH 1/5] drm/amd/pm: Slight function rename (v2)

2021-10-14 Thread Luben Tuikov
Rename sienna_cichlid_is_support_fine_grained_dpm() to sienna_cichlid_supports_fine_grained_dpm(). Rename navi10_is_support_fine_grained_dpm() to navi10_supports_fine_grained_dpm(). v2: Fix function name in commit message to reflect the change being done: add a missing 's'. Cc: Alex Deucher

[PATCH 0/5] 0 MHz is not a valid current frequency (v3)

2021-10-14 Thread Luben Tuikov
Some ASICs support low-power functionality for the whole ASIC or just an IP block. When in such low-power mode, some sysfs interfaces would report a frequency of 0, e.g., $cat /sys/class/drm/card0/device/pp_dpm_sclk 0: 500Mhz 1: 0Mhz * 2: 2200Mhz $_ An operating frequency of 0 MHz doesn't make

Re: [PATCH 2/2] amd/amdgpu_dm: Verify Gamma and Degamma LUT sizes using DRM Core check

2021-10-14 Thread Sean Paul
On Wed, Oct 13, 2021 at 2:12 PM Mark Yacoub wrote: > > From: Mark Yacoub > > [Why] > drm_atomic_helper_check_crtc now verifies both legacy and non-legacy LUT > sizes. There is no need to check it within amdgpu_dm_atomic_check. > > [How] > Remove the local call to verify LUT sizes and use DRM

Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Jason Gunthorpe
On Thu, Oct 14, 2021 at 12:01:14PM -0700, Dan Williams wrote: > > > Does anyone know why devmap is pte_special anyhow? > > It does not need to be special as mentioned here: > > https://lore.kernel.org/all/CAPcyv4iFeVDVPn6uc=aksyuvkiu3-fk-n16ijvzq3n8ot00...@mail.gmail.com/ I added a remark there

Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Ralph Campbell
On 10/14/21 11:01 AM, Jason Gunthorpe wrote: On Thu, Oct 14, 2021 at 10:35:27AM -0700, Ralph Campbell wrote: I ran xfstests-dev using the kernel boot option to "fake" a pmem device when I first posted this patch. The tests ran OK (or at least the same tests passed with and without my patch).

Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Dan Williams
On Thu, Oct 14, 2021 at 11:45 AM Matthew Wilcox wrote: > > > It would probably help if you cc'd Dan on this. Thanks. [..] > > On Thu, Oct 14, 2021 at 02:06:34PM -0300, Jason Gunthorpe wrote: > > On Thu, Oct 14, 2021 at 10:39:28AM -0500, Alex Sierra wrote: > > > From: Ralph Campbell > > > > > >

Re: [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu

2021-10-14 Thread Felix Kuehling
Am 2021-10-14 um 2:12 p.m. schrieb Jonathan Kim: > ROCr needs to be able to identify all devices that have direct access to > fine grain memory, which should include CPUs that are connected to GPUs > over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the > CPU is part of the hive.

Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Matthew Wilcox
It would probably help if you cc'd Dan on this. As far as I know he's the only person left who cares about GUP on DAX. On Thu, Oct 14, 2021 at 02:06:34PM -0300, Jason Gunthorpe wrote: > On Thu, Oct 14, 2021 at 10:39:28AM -0500, Alex Sierra wrote: > > From: Ralph Campbell > > > > ZONE_DEVICE

[PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu

2021-10-14 Thread Jonathan Kim
ROCr needs to be able to identify all devices that have direct access to fine grain memory, which should include CPUs that are connected to GPUs over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the CPU is part of the hive. v2: fixup to ensure all numa nodes get the hive id

Re: [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu

2021-10-14 Thread Felix Kuehling
Am 2021-10-14 um 1:44 p.m. schrieb Jonathan Kim: > ROCr needs to be able to identify all devices that have direct access to > fine grain memory, which should include CPUs that are connected to GPUs > over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the > CPU is part of the hive.

[PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu

2021-10-14 Thread Jonathan Kim
ROCr needs to be able to identify all devices that have direct access to fine grain memory, which should include CPUs that are connected to GPUs over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the CPU is part of the hive. Signed-off-by: Jonathan Kim ---

Re: [PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Ralph Campbell
On 10/14/21 10:06 AM, Jason Gunthorpe wrote: On Thu, Oct 14, 2021 at 10:39:28AM -0500, Alex Sierra wrote: From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference

[PATCH] drm/amdgpu: Warn when bad pages approaches threshold

2021-10-14 Thread Kent Russell
Currently dmesg doesn't warn when the number of bad pages approaches the threshold for page retirement. WARN when the number of bad pages is at 90% or greater for easier checks and planning, instead of waiting until the GPU is full of bad pages Signed-off-by: Kent Russell ---

Re: [PATCH] drm/amdkfd: Separate pinned BOs destruction from general routine

2021-10-14 Thread Christian König
Am 14.10.21 um 12:14 schrieb Yu, Lang: [AMD Official Use Only] -Original Message- From: Kuehling, Felix Sent: Wednesday, October 13, 2021 11:25 PM To: Yu, Lang ; amd-gfx@lists.freedesktop.org Cc: Koenig, Christian ; Deucher, Alexander ; Huang, Ray Subject: Re: [PATCH] drm/amdkfd:

[PATCH v1 0/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Alex Sierra
This patch cleans up ZONE_DEVICE page refcounting. Quoting Matthew Wilcox, "it removes a ton of cruft from every call to put_page()" This work was originally done by Ralph Campbell and submitted as RFC. As of today, it has been ack by Theodore Ts'o / Darrick J. Wong, and reviewed by Christoph

[PATCH v1 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2021-10-14 Thread Alex Sierra
From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference

[PATCH v1 1/2] ext4/xfs: add page refcount helper

2021-10-14 Thread Alex Sierra
From: Ralph Campbell There are several places where ZONE_DEVICE struct pages assume a reference count == 1 means the page is idle and free. Instead of open coding this, add a helper function to hide this detail. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra Reviewed-by: Christoph

[PATCH] amd/display: remove ChromeOS workaround

2021-10-14 Thread Simon Ser
This reverts commits ddab8bd788f5 ("drm/amd/display: Fix two cursor duplication when using overlay") and e7d9560aeae5 ("Revert "drm/amd/display: Fix overlay validation by considering cursors""). tl;dr ChromeOS uses the atomic interface for everything except the cursor. This is incorrect and

Re: [PATCH v5] amd/display: only require overlay plane to cover whole CRTC on ChromeOS

2021-10-14 Thread Simon Ser
On Tuesday, October 12th, 2021 at 23:03, Alex Deucher wrote: > > > @Harry Wentland, @Simon Ser Do you have a preference on whether we > > > apply this patch or revert ddab8bd788f5? I'm fine with either. I'd prefer to revert because (1) the ChromeOS team seems to be okay with that (2) they can

[pull] amdgpu, amdkfd drm-next-5.16

2021-10-14 Thread Alex Deucher
Hi Dave, Daniel, Bug fixes for 5.16. The following changes since commit 1176d15f0f6e556d54ced510ac4a91694960332b: Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next (2021-10-11 18:09:39 +1000) are available in the Git repository at:

Re: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")

2021-10-14 Thread Borislav Petkov
On Thu, Oct 14, 2021 at 02:02:48AM +, Quan, Evan wrote: > [Quan, Evan] Yes, but not(apply them) at the same time. One by one as you did > before. > - try the patch1 first Ok, first patch worked fine. > - undo the changes of patch1 and try patch2 Did that, worked fine too except after the

RE: [PATCH] drm/amdkfd: Separate pinned BOs destruction from general routine

2021-10-14 Thread Yu, Lang
[AMD Official Use Only] >-Original Message- >From: Kuehling, Felix >Sent: Wednesday, October 13, 2021 11:25 PM >To: Yu, Lang ; amd-gfx@lists.freedesktop.org >Cc: Koenig, Christian ; Deucher, Alexander >; Huang, Ray >Subject: Re: [PATCH] drm/amdkfd: Separate pinned BOs destruction from