date:20211006

Am 2021-10-06 um 10:32 a.m. schrieb Philip Yang:
> migrate_vma_setup may return cpages 0, means 0 page can be migrated,
> treat this as error case to skip the rest of migration steps, and don't
> change prange actual loc, to avoid warning message "VRAM BO missing
> during validation".
>
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 48 ++--
>  1 file changed, 29 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 069422337cf7..9b68e3e8f2a1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -409,20 +409,25 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, 
> struct svm_range *prange,
>   r, prange->svms, prange->start, prange->last);
>   goto out_free;
>   }
> - if (migrate.cpages != npages) {
> - pr_debug("Partial migration. 0x%lx/0x%llx pages can be 
> migrated\n",
> -  migrate.cpages,
> -  npages);
> - }
>  
> - if (migrate.cpages) {
> - r = svm_migrate_copy_to_vram(adev, prange, , ,
> -  scratch);
> - migrate_vma_pages();
> - svm_migrate_copy_done(adev, mfence);
> - migrate_vma_finalize();
> + if (migrate.cpages != npages)
> + pr_debug("partial migration, 0x%lx/0x%llx pages migrated\n",
> +  migrate.cpages, npages);
> + else
> + pr_debug("0x%lx pages migrated\n", migrate.cpages);
> +
> + if (!migrate.cpages) {
> + pr_debug("failed collect migrate sys pages [0x%lx 0x%lx]\n",
> +  prange->start, prange->last);
> + r = -ENOMEM;

I think just returning an error here is incorrect. This error gets
handled in svm_migrate_ram_to_vram and prevents the following VMAs from
migrating as well (if the range spans multiple VMAs).

Maybe return the number of pages migrated, if successful. Then the
caller can add up all the successful migrations and update
prange->actual_loc only if the total is > 0.

Regards,
  Felix


> + goto out_free;
>   }
>  
> + r = svm_migrate_copy_to_vram(adev, prange, , , scratch);
> + migrate_vma_pages();
> + svm_migrate_copy_done(adev, mfence);
> + migrate_vma_finalize();
> +
>   svm_range_dma_unmap(adev->dev, scratch, 0, npages);
>   svm_range_free_dma_mappings(prange);
>  
> @@ -636,19 +641,24 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, 
> struct svm_range *prange,
>   goto out_free;
>   }
>  
> - pr_debug("cpages %ld\n", migrate.cpages);
> + if (migrate.cpages != npages)
> + pr_debug("partial migration, 0x%lx/0x%llx pages migrated\n",
> +  migrate.cpages, npages);
> + else
> + pr_debug("0x%lx pages migrated\n", migrate.cpages);
>  
> - if (migrate.cpages) {
> - r = svm_migrate_copy_to_ram(adev, prange, , ,
> - scratch, npages);
> - migrate_vma_pages();
> - svm_migrate_copy_done(adev, mfence);
> - migrate_vma_finalize();
> - } else {
> + if (!migrate.cpages) {
>   pr_debug("failed collect migrate device pages [0x%lx 0x%lx]\n",
>prange->start, prange->last);
> + r = -ENOMEM;
> + goto out_free;
>   }
>  
> + r = svm_migrate_copy_to_ram(adev, prange, , ,
> + scratch, npages);
> + migrate_vma_pages();
> + svm_migrate_copy_done(adev, mfence);
> + migrate_vma_finalize();
>   svm_range_dma_unmap(adev->dev, scratch, 0, npages);
>  
>  out_free:

Re: [PATCH 1/4] drm/amdkfd: ratelimited svm debug messages

Am 2021-10-06 um 10:32 a.m. schrieb Philip Yang:
> No function change, use pr_debug_ratelimited to avoid per page debug
> message overflowing dmesg buf and console log.
>
> use dev_err to show error message from unexpected situation, to provide
> clue to help debug without enabling dynamic debug log.

AFAIK, dev_err does not print function and line-number information. So
you probably need to provide a little more context in these messages. I
think this could be done with a

    #define pr_fmt(fmt) "kfd_migrate: " fmt

in kfd_migrate.c. I'll make a few more specific comments inline.


>
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 34 
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 12 -
>  2 files changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index f53e17a94ad8..069422337cf7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -151,14 +151,14 @@ svm_migrate_copy_memory_gart(struct amdgpu_device 
> *adev, dma_addr_t *sys,
>   gart_d = svm_migrate_direct_mapping_addr(adev, *vram);
>   }
>   if (r) {
> - pr_debug("failed %d to create gart mapping\n", r);
> + dev_err(adev->dev, "fail %d create gart mapping\n", r);
>   goto out_unlock;
>   }
>  
>   r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE,
>  NULL, , false, true, false);
>   if (r) {
> - pr_debug("failed %d to copy memory\n", r);
> + dev_err(adev->dev, "fail %d to copy memory\n", r);
>   goto out_unlock;
>   }
>  
> @@ -285,7 +285,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
> struct svm_range *prange,
>  
>   r = svm_range_vram_node_new(adev, prange, true);
>   if (r) {
> - pr_debug("failed %d get 0x%llx pages from vram\n", r, npages);
> + dev_err(adev->dev, "fail %d get %lld vram pages\n", r, npages);

This message is misleading. svm_range_vram_node_new doesn't care about
npages. It allocates memory for the whole range or reuses an existing
allocation. So I'd drop the npages from the message.


>   goto out;
>   }
>  
> @@ -305,7 +305,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
> struct svm_range *prange,
> DMA_TO_DEVICE);
>   r = dma_mapping_error(dev, src[i]);
>   if (r) {
> - pr_debug("failed %d dma_map_page\n", r);
> + dev_err(adev->dev, "fail %d dma_map_page\n", r);
>   goto out_free_vram_pages;
>   }
>   } else {
> @@ -325,8 +325,8 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
> struct svm_range *prange,
>   continue;
>   }
>  
> - pr_debug("dma mapping src to 0x%llx, page_to_pfn 0x%lx\n",
> -  src[i] >> PAGE_SHIFT, page_to_pfn(spage));
> + pr_debug_ratelimited("dma mapping src to 0x%llx, pfn 0x%lx\n",
> +  src[i] >> PAGE_SHIFT, page_to_pfn(spage));
>  
>   if (j >= (cursor.size >> PAGE_SHIFT) - 1 && i < npages - 1) {
>   r = svm_migrate_copy_memory_gart(adev, src + i - j,
> @@ -347,7 +347,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, 
> struct svm_range *prange,
>  
>  out_free_vram_pages:
>   if (r) {
> - pr_debug("failed %d to copy memory to vram\n", r);
> + dev_err(adev->dev, "fail %d to copy memory to vram\n", r);

I think you only get here if svm_migrate_copy_memory_gart failed. That
function already prints its own error messages, so this probably doesn't
need to be a dev_err.


>   while (i--) {
>   svm_migrate_put_vram_page(adev, dst[i]);
>   migrate->dst[i] = 0;
> @@ -405,8 +405,8 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, 
> struct svm_range *prange,
>  
>   r = migrate_vma_setup();
>   if (r) {
> - pr_debug("failed %d prepare migrate svms 0x%p [0x%lx 0x%lx]\n",
> -  r, prange->svms, prange->start, prange->last);
> + dev_err(adev->dev, "fail %d vma setup 0x%p [0x%lx 0x%lx]\n",
> + r, prange->svms, prange->start, prange->last);
>   goto out_free;
>   }
>   if (migrate.cpages != npages) {
> @@ -506,7 +506,7 @@ static void svm_migrate_page_free(struct page *page)
>   struct svm_range_bo *svm_bo = page->zone_device_data;
>  
>   if (svm_bo) {
> - pr_debug("svm_bo ref left: %d\n", kref_read(_bo->kref));
> +

[pull] amdgpu, amdkfd drm-fixes-5.15

Hi Dave, Daniel,

Fixes for 5.15.

The following changes since commit 9e1ff307c779ce1f0f810c7ecce3d95bbae40896:

  Linux 5.15-rc4 (2021-10-03 14:08:47 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-5.15-2021-10-06

for you to fetch changes up to 5a1fef027846e7635b9d320b2cc0b416fd11a3be:

  drm/amd/display: Fix detection of 4 lane for DPALT (2021-10-06 16:14:17 -0400)


amd-drm-fixes-5.15-2021-10-06:

amdgpu:
- DCN 3.1 DP alt mode fixes
- S0ix gfxoff fix
- Fix DRM_AMD_DC_SI dependencies
- PCIe DPC handling fix
- DCN 3.1 scaling fix
- Documentation fix

amdkfd:
- Fix potential memory leak
- IOMMUv2 init fixes


Alex Deucher (2):
  Documentation/gpu: remove spurious "+" in amdgpu.rst
  drm/amdgpu/display: fix dependencies for DRM_AMD_DC_SI

George Shen (1):
  drm/amd/display: Skip override for preferred link settings during link 
training

Guchun Chen (1):
  drm/amdgpu: handle the case of pci_channel_io_frozen only in 
amdgpu_pci_resume

Hansen (1):
  drm/amd/display: Fix detection of 4 lane for DPALT

Jude Shih (1):
  drm/amd/display: USB4 bring up set correct address

Lang Yu (1):
  drm/amdkfd: fix a potential ttm->sg memory leak

Lijo Lazar (1):
  drm/amdgpu: During s0ix don't wait to signal GFXOFF

Liu, Zhan (2):
  drm/amd/display: Fix B0 USB-C DP Alt mode
  drm/amd/display: Fix DCN3 B0 DP Alt Mapping

Nikola Cornij (1):
  drm/amd/display: Limit display scaling to up to 4k for DCN 3.1

Yifan Zhang (2):
  drm/amdkfd: remove redundant iommu cleanup code
  drm/amdgpu: init iommu after amdkfd device init

 Documentation/gpu/amdgpu.rst   |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 14 -
 drivers/gpu/drm/amd/amdkfd/kfd_device.c|  8 +--
 drivers/gpu/drm/amd/display/Kconfig|  2 +
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c   |  6 --
 .../drm/amd/display/dc/dcn10/dcn10_link_encoder.h  |  1 +
 .../amd/display/dc/dcn31/dcn31_dio_link_encoder.c  | 66 +-
 .../amd/display/dc/dcn31/dcn31_dio_link_encoder.h  | 14 -
 .../gpu/drm/amd/display/dc/dcn31/dcn31_resource.c  |  8 ++-
 drivers/gpu/drm/amd/display/include/dal_asic_id.h  |  2 +-
 .../amd/include/asic_reg/dpcs/dpcs_4_2_0_offset.h  | 27 +
 14 files changed, 142 insertions(+), 26 deletions(-)

Re: [PATCH 3/4] drm/amdkfd: avoid svm conflicting with userptr address

2021-10-06 Thread philip yang


  


On 2021-10-06 2:23 p.m., Felix Kuehling
  wrote:


  Am 2021-10-06 um 2:09 p.m. schrieb philip yang:

  


On 2021-10-06 1:22 p.m., Felix Kuehling wrote:


  Am 2021-10-06 um 10:32 a.m. schrieb Philip Yang:

  
For ioctl_alloc_memory_of_gpu to alloc userptr bo, check userptr address
does not exist in svm->objects.

For svm range allocation, look for address in the userptr range of
interval tree VA nodes.

  
  Why? The purpose of the check is to prevent the same GPU virtual address
being managed by the two different memory managers. So checking
args->va_addr should be correct even for userptr BOs. The CPU virtual
address should be OK to be mapped with userptr and SVM at the same time
as long as the userptr uses a distinct GPU virtual address.



userptr cpu virtual address is registered to MMU notifier, if svm
range overlap with userptr, then migration svm range triggers mmu
notifier to evict userptr and evict user queues, for xnack on, this is
not correct. And restore userptr will fail if svm range is migrated to
vram because hmm_range_fault failed to get system pages, app will soft
hang as queues are not restored.


  
  OK. Then we need to check both the CPU and GPU virtual address ranges.
Having userptr or SVM registrations fail like that is not ideal. It only
changes the failure mode, but doesn't really fix applications affected
by this.

A real solution probably requires, that we reimplement userptrs using
the SVM API in the Thunk, when SVM is available. That way you avoid this
conflict between the two memory managers.


I only find issue in patch 4/4, this is not real app issue right
  now, but I realize this could happen, just want to prevent this
  case. It is good idea to use svm path for userptr if svm is
  supported. I will drop this patch, and resend v2 for patch 4/4.
Thanks,
Philip


  
Regards,
  Felix



  
Regards,

Philip



  Regards,
  Felix



  
Change helper svm_range_check_vm to return amdgpu_bo, which will be used
to avoid creating new svm range overlap with bo later.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 17 +---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 55 +++-
 2 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f1e7edeb4e6b..34dfa6a938bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1255,6 +1255,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	long err;
 	uint64_t offset = args->mmap_offset;
 	uint32_t flags = args->flags;
+	unsigned long start, last;
 
 	if (args->size == 0)
 		return -EINVAL;
@@ -1266,11 +1267,17 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	svm_range_list_lock_and_flush_work(>svms, current->mm);
 	mutex_lock(>svms.lock);
 	mmap_write_unlock(current->mm);
-	if (interval_tree_iter_first(>svms.objects,
- args->va_addr >> PAGE_SHIFT,
- (args->va_addr + args->size - 1) >> PAGE_SHIFT)) {
-		pr_err("Address: 0x%llx already allocated by SVM\n",
-			args->va_addr);
+
+	if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+		start = args->mmap_offset >> PAGE_SHIFT;
+		last = (args->mmap_offset + args->size - 1) >> PAGE_SHIFT;
+	} else {
+		start = args->va_addr >> PAGE_SHIFT;
+		last = (args->va_addr + args->size - 1) >> PAGE_SHIFT;
+	}
+
+	if (interval_tree_iter_first(>svms.objects, start, last)) {
+		pr_err("[0x%lx 0x%lx] already allocated by SVM\n", start, last);
 		mutex_unlock(>svms.lock);
 		return -EADDRINUSE;
 	}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7f0743fcfcc3..d49c08618714 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2679,15 +2679,18 @@ int svm_range_list_init(struct kfd_process *p)
  *
  * Context: Process context
  *
- * Return 0 - OK, if the range is not mapped.
+ * Return NULL - if the range is not mapped.
+ * amdgpu_bo - if the range is mapped by old API
  * Otherwise error code:
- * -EADDRINUSE - if address is mapped already by kfd_ioctl_alloc_memory_of_gpu
  * -ERESTARTSYS - A wait for the buffer to become unreserved was interrupted by
  * a signal. Release all buffer reservations and return to user-space.
  */
-static int
+static struct amdgpu_bo *
 svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last)
 {
+	struct amdgpu_bo_va_mapping *mapping;
+	struct interval_tree_node *node;
+	struct amdgpu_bo *bo = NULL;
 	uint32_t i;
 	int r;
 
@@ -2700,16 +2703,42 @@ svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last)
 		vm = drm_priv_to_vm(p->pdds[i]->drm_priv);
 		r = amdgpu_bo_reserve(vm->root.bo, false);
 		if (r)
-			return r;
-

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Wed, Oct 06, 2021 at 02:21:40PM -0400, Alex Deucher wrote:
> And just another general comment, swiotlb + bounce buffers isn't
> really useful on GPUs.  You may have 10-100s of MBs of memory mapped
> long term into the GPU's address space for random access.  E.g., you
> may have buffers in system memory that the display hardware is
> actively scanning out of.  For GPUs you should really only enable SME
> if IOMMU is enabled in remapping mode.  But that is probably beyond
> the discussion here.

Right, but insights into how these things work (or don't work) together
are always welcome. And yes, as 2cc13bb4f59f says:

"... The bounce buffer
code has an upper limit of 256kb for the size of DMA
allocations, which is too small for certain devices and
causes them to fail."

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Wed, Oct 06, 2021 at 02:36:56PM -0400, Alex Deucher wrote:
> From the x86 model and family info?  I think Raven has different
> families from other Zen based CPUs.

Yeah, I'd like to avoid a f/m/s mapping table, if possible. Those things
should be a last resort and they always need adjustment when new models
pop up.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Wed, Oct 6, 2021 at 2:21 PM Borislav Petkov  wrote:
>
> On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote:
> > This is not limited to Raven.
>
> That's what the innocuous "a.o." wanted to state. :)

Whoops, my eyes passed right over that.

>
> > All GPUs (and quite a few other
> > devices) have a limited DMA mask.  AMD GPUs have between 32 and 48
> > bits of DMA depending on what generation the hardware is.  So to
> > support SME, you either need swiotlb with bounce buffers or you need
> > IOMMU in remapping mode. The limitation with Raven is that if you want
> > to use it with the IOMMU enabled it requires the IOMMU to be set up in
> > passthrough mode to support IOMMUv2 functionality for compute support
> > and due to other hardware limitations on the display side. So for all
> > GPUs except raven, just having IOMMU enabled in remapping mode is
> > fine.  GPUs from other vendors would likely run into similar
> > limitations.  Raven just has further limitations.
>
> Hmm, and in passthrough mode it would use bounce buffers when SME is
> enabled. And when those 256K are not enough, it would fail there too,
> even with IOMMUv2. At least this is how it looks from here.
>
> Dunno, it feels like doing GPU compute and SME does not go hand-in-hand
> real smoothly currently but that probably doesn't matter all too much
> for both user camps. But that's just me with a hunch.

Well, this limitation only applies to Raven which is an integrated GPU
in client parts.  SME was initially productized on server parts so
there was not a lot of concern given to interactions with integrated
graphics at the time.  This has since been fixed in newer integrated
graphics.  dGPUs work fine as long as the IOMMU is in remapping mode
to handle the C bit.

>
> > Another option would be to enable SME by default on Epyc platforms,
> > but disabled by default on client APU platforms or even just raven.
>
> Thing is, we don't know at SME init time - very early during boot -
> whether we're Epyc or client. Can we find that out reliably from the hw?
>

>From the x86 model and family info?  I think Raven has different
families from other Zen based CPUs.

> And even if we do, that's still not accurate enough - we wanna know
> whether the IOMMU works.

Right.

>
> So I guess we're all left to the user to decide. But I'm always open
> to suggestions for solving things in sw and not requiring any user
> interaction.

@Tom Lendacky Any ideas?

Alex

>
> > Other than these comments, looks fine to me.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote:
> This is not limited to Raven.

That's what the innocuous "a.o." wanted to state. :)

> All GPUs (and quite a few other
> devices) have a limited DMA mask.  AMD GPUs have between 32 and 48
> bits of DMA depending on what generation the hardware is.  So to
> support SME, you either need swiotlb with bounce buffers or you need
> IOMMU in remapping mode. The limitation with Raven is that if you want
> to use it with the IOMMU enabled it requires the IOMMU to be set up in
> passthrough mode to support IOMMUv2 functionality for compute support
> and due to other hardware limitations on the display side. So for all
> GPUs except raven, just having IOMMU enabled in remapping mode is
> fine.  GPUs from other vendors would likely run into similar
> limitations.  Raven just has further limitations.

Hmm, and in passthrough mode it would use bounce buffers when SME is
enabled. And when those 256K are not enough, it would fail there too,
even with IOMMUv2. At least this is how it looks from here.

Dunno, it feels like doing GPU compute and SME does not go hand-in-hand
real smoothly currently but that probably doesn't matter all too much
for both user camps. But that's just me with a hunch.

> Another option would be to enable SME by default on Epyc platforms,
> but disabled by default on client APU platforms or even just raven.

Thing is, we don't know at SME init time - very early during boot -
whether we're Epyc or client. Can we find that out reliably from the hw?

And even if we do, that's still not accurate enough - we wanna know
whether the IOMMU works.

So I guess we're all left to the user to decide. But I'm always open
to suggestions for solving things in sw and not requiring any user
interaction.

> Other than these comments, looks fine to me.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 3/4] drm/amdkfd: avoid svm conflicting with userptr address

Am 2021-10-06 um 2:09 p.m. schrieb philip yang:
>
>
> On 2021-10-06 1:22 p.m., Felix Kuehling wrote:
>> Am 2021-10-06 um 10:32 a.m. schrieb Philip Yang:
>>> For ioctl_alloc_memory_of_gpu to alloc userptr bo, check userptr address
>>> does not exist in svm->objects.
>>>
>>> For svm range allocation, look for address in the userptr range of
>>> interval tree VA nodes.
>> Why? The purpose of the check is to prevent the same GPU virtual address
>> being managed by the two different memory managers. So checking
>> args->va_addr should be correct even for userptr BOs. The CPU virtual
>> address should be OK to be mapped with userptr and SVM at the same time
>> as long as the userptr uses a distinct GPU virtual address.
>
> userptr cpu virtual address is registered to MMU notifier, if svm
> range overlap with userptr, then migration svm range triggers mmu
> notifier to evict userptr and evict user queues, for xnack on, this is
> not correct. And restore userptr will fail if svm range is migrated to
> vram because hmm_range_fault failed to get system pages, app will soft
> hang as queues are not restored.
>
OK. Then we need to check both the CPU and GPU virtual address ranges.
Having userptr or SVM registrations fail like that is not ideal. It only
changes the failure mode, but doesn't really fix applications affected
by this.

A real solution probably requires, that we reimplement userptrs using
the SVM API in the Thunk, when SVM is available. That way you avoid this
conflict between the two memory managers.

Regards,
  Felix


> Regards,
>
> Philip
>
>> Regards,
>>   Felix
>>
>>
>>> Change helper svm_range_check_vm to return amdgpu_bo, which will be used
>>> to avoid creating new svm range overlap with bo later.
>>>
>>> Signed-off-by: Philip Yang 
>>> ---
>>>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 17 +---
>>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 55 +++-
>>>  2 files changed, 57 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> index f1e7edeb4e6b..34dfa6a938bf 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>>> @@ -1255,6 +1255,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
>>> *filep,
>>> long err;
>>> uint64_t offset = args->mmap_offset;
>>> uint32_t flags = args->flags;
>>> +   unsigned long start, last;
>>>  
>>> if (args->size == 0)
>>> return -EINVAL;
>>> @@ -1266,11 +1267,17 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct 
>>> file *filep,
>>> svm_range_list_lock_and_flush_work(>svms, current->mm);
>>> mutex_lock(>svms.lock);
>>> mmap_write_unlock(current->mm);
>>> -   if (interval_tree_iter_first(>svms.objects,
>>> -args->va_addr >> PAGE_SHIFT,
>>> -(args->va_addr + args->size - 1) >> 
>>> PAGE_SHIFT)) {
>>> -   pr_err("Address: 0x%llx already allocated by SVM\n",
>>> -   args->va_addr);
>>> +
>>> +   if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>>> +   start = args->mmap_offset >> PAGE_SHIFT;
>>> +   last = (args->mmap_offset + args->size - 1) >> PAGE_SHIFT;
>>> +   } else {
>>> +   start = args->va_addr >> PAGE_SHIFT;
>>> +   last = (args->va_addr + args->size - 1) >> PAGE_SHIFT;
>>> +   }
>>> +
>>> +   if (interval_tree_iter_first(>svms.objects, start, last)) {
>>> +   pr_err("[0x%lx 0x%lx] already allocated by SVM\n", start, last);
>>> mutex_unlock(>svms.lock);
>>> return -EADDRINUSE;
>>> }
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 7f0743fcfcc3..d49c08618714 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -2679,15 +2679,18 @@ int svm_range_list_init(struct kfd_process *p)
>>>   *
>>>   * Context: Process context
>>>   *
>>> - * Return 0 - OK, if the range is not mapped.
>>> + * Return NULL - if the range is not mapped.
>>> + * amdgpu_bo - if the range is mapped by old API
>>>   * Otherwise error code:
>>> - * -EADDRINUSE - if address is mapped already by 
>>> kfd_ioctl_alloc_memory_of_gpu
>>>   * -ERESTARTSYS - A wait for the buffer to become unreserved was 
>>> interrupted by
>>>   * a signal. Release all buffer reservations and return to user-space.
>>>   */
>>> -static int
>>> +static struct amdgpu_bo *
>>>  svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last)
>>>  {
>>> +   struct amdgpu_bo_va_mapping *mapping;
>>> +   struct interval_tree_node *node;
>>> +   struct amdgpu_bo *bo = NULL;
>>> uint32_t i;
>>> int r;
>>>  
>>> @@ -2700,16 +2703,42 @@ svm_range_check_vm(struct kfd_process *p, uint64_t 
>>> start, uint64_t last)
>>> vm = drm_priv_to_vm(p->pdds[i]->drm_priv);
>>> r = amdgpu_bo_reserve(vm->root.bo, false);
>>>

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

And just another general comment, swiotlb + bounce buffers isn't
really useful on GPUs.  You may have 10-100s of MBs of memory mapped
long term into the GPU's address space for random access.  E.g., you
may have buffers in system memory that the display hardware is
actively scanning out of.  For GPUs you should really only enable SME
if IOMMU is enabled in remapping mode.  But that is probably beyond
the discussion here.

Alex

On Wed, Oct 6, 2021 at 2:10 PM Alex Deucher  wrote:
>
> On Wed, Oct 6, 2021 at 1:48 PM Borislav Petkov  wrote:
> >
> > Ok,
> >
> > so I sat down and wrote something and tried to capture all the stuff we
> > so talked about that it is clear in the future why we did it.
> >
> > Thoughts?
> >
> > ---
> > From: Borislav Petkov 
> > Date: Wed, 6 Oct 2021 19:34:55 +0200
> > Subject: [PATCH] x86/Kconfig: Do not enable 
> > AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> >  automatically
> >
> > This Kconfig option was added initially so that memory encryption is
> > enabled by default on machines which support it.
> >
> > However, Raven-class GPUs, a.o., cannot handle DMA masks which are
> > shorter than the bit position of the encryption, aka C-bit. For that,
> > those devices need to have the IOMMU present.
>
> This is not limited to Raven.  All GPUs (and quite a few other
> devices) have a limited DMA mask.  AMD GPUs have between 32 and 48
> bits of DMA depending on what generation the hardware is.  So to
> support SME, you either need swiotlb with bounce buffers or you need
> IOMMU in remapping mode. The limitation with Raven is that if you want
> to use it with the IOMMU enabled it requires the IOMMU to be set up in
> passthrough mode to support IOMMUv2 functionality for compute support
> and due to other hardware limitations on the display side. So for all
> GPUs except raven, just having IOMMU enabled in remapping mode is
> fine.  GPUs from other vendors would likely run into similar
> limitations.  Raven just has further limitations.
>
>
> >
> > If the IOMMU is disabled or in passthrough mode, though, the kernel
> > would switch to SWIOTLB bounce-buffering for those transfers.
> >
> > In order to avoid that,
> >
> >2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
> >
> > disables the default IOMMU passthrough mode so that devices for which
> > the default 256K DMA is insufficient, can use the IOMMU instead.
> >
> > However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
> > think the usual hardware folk "oops, I dropped the ball there" cases.
> >
> > Which means, it can happen that there are systems out there with devices
> > which need the IOMMU to function properly with SME enabled but the IOMMU
> > won't necessarily be enabled.
> >
> > So in order for those devices to function, drop the "default y" for
> > the SME by default on option so that users who want to have SME, will
> > need to either enable it in their config or use "mem_encrypt=on" on the
> > kernel command line.
>
> Another option would be to enable SME by default on Epyc platforms,
> but disabled by default on client APU platforms or even just raven.
>
> Other than these comments, looks fine to me.
>
> Alex
>
> >
> > Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
> > Reported-by: Paul Menzel 
> > Signed-off-by: Borislav Petkov 
> > Cc: 
> > Link: 
> > https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df...@molgen.mpg.de
> > ---
> >  arch/x86/Kconfig | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 8055da49f1c0..6a336b1f3f28 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
> >
> >  config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> > bool "Activate AMD Secure Memory Encryption (SME) by default"
> > -   default y
> > depends on AMD_MEM_ENCRYPT
> > help
> >   Say yes to have system memory encrypted by default if running on
> > --
> > 2.29.2
> >
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Wed, Oct 6, 2021 at 1:48 PM Borislav Petkov  wrote:
>
> Ok,
>
> so I sat down and wrote something and tried to capture all the stuff we
> so talked about that it is clear in the future why we did it.
>
> Thoughts?
>
> ---
> From: Borislav Petkov 
> Date: Wed, 6 Oct 2021 19:34:55 +0200
> Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
>  automatically
>
> This Kconfig option was added initially so that memory encryption is
> enabled by default on machines which support it.
>
> However, Raven-class GPUs, a.o., cannot handle DMA masks which are
> shorter than the bit position of the encryption, aka C-bit. For that,
> those devices need to have the IOMMU present.

This is not limited to Raven.  All GPUs (and quite a few other
devices) have a limited DMA mask.  AMD GPUs have between 32 and 48
bits of DMA depending on what generation the hardware is.  So to
support SME, you either need swiotlb with bounce buffers or you need
IOMMU in remapping mode. The limitation with Raven is that if you want
to use it with the IOMMU enabled it requires the IOMMU to be set up in
passthrough mode to support IOMMUv2 functionality for compute support
and due to other hardware limitations on the display side. So for all
GPUs except raven, just having IOMMU enabled in remapping mode is
fine.  GPUs from other vendors would likely run into similar
limitations.  Raven just has further limitations.


>
> If the IOMMU is disabled or in passthrough mode, though, the kernel
> would switch to SWIOTLB bounce-buffering for those transfers.
>
> In order to avoid that,
>
>2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
>
> disables the default IOMMU passthrough mode so that devices for which
> the default 256K DMA is insufficient, can use the IOMMU instead.
>
> However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
> think the usual hardware folk "oops, I dropped the ball there" cases.
>
> Which means, it can happen that there are systems out there with devices
> which need the IOMMU to function properly with SME enabled but the IOMMU
> won't necessarily be enabled.
>
> So in order for those devices to function, drop the "default y" for
> the SME by default on option so that users who want to have SME, will
> need to either enable it in their config or use "mem_encrypt=on" on the
> kernel command line.

Another option would be to enable SME by default on Epyc platforms,
but disabled by default on client APU platforms or even just raven.

Other than these comments, looks fine to me.

Alex

>
> Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
> Reported-by: Paul Menzel 
> Signed-off-by: Borislav Petkov 
> Cc: 
> Link: 
> https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df...@molgen.mpg.de
> ---
>  arch/x86/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 8055da49f1c0..6a336b1f3f28 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
>
>  config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> bool "Activate AMD Secure Memory Encryption (SME) by default"
> -   default y
> depends on AMD_MEM_ENCRYPT
> help
>   Say yes to have system memory encrypted by default if running on
> --
> 2.29.2
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 3/4] drm/amdkfd: avoid svm conflicting with userptr address

2021-10-06 Thread philip yang


  


On 2021-10-06 1:22 p.m., Felix Kuehling
  wrote:


  Am 2021-10-06 um 10:32 a.m. schrieb Philip Yang:

  
For ioctl_alloc_memory_of_gpu to alloc userptr bo, check userptr address
does not exist in svm->objects.

For svm range allocation, look for address in the userptr range of
interval tree VA nodes.

  
  
Why? The purpose of the check is to prevent the same GPU virtual address
being managed by the two different memory managers. So checking
args->va_addr should be correct even for userptr BOs. The CPU virtual
address should be OK to be mapped with userptr and SVM at the same time
as long as the userptr uses a distinct GPU virtual address.


userptr cpu virtual address is registered to MMU notifier, if svm
  range overlap with userptr, then migration svm range triggers mmu
  notifier to evict userptr and evict user queues, for xnack on,
  this is not correct. And restore userptr will fail if svm range is
  migrated to vram because hmm_range_fault failed to get system
  pages, app will soft hang as queues are not restored.
Regards,
Philip


  
Regards,
  Felix



  

Change helper svm_range_check_vm to return amdgpu_bo, which will be used
to avoid creating new svm range overlap with bo later.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 17 +---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 55 +++-
 2 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f1e7edeb4e6b..34dfa6a938bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1255,6 +1255,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	long err;
 	uint64_t offset = args->mmap_offset;
 	uint32_t flags = args->flags;
+	unsigned long start, last;
 
 	if (args->size == 0)
 		return -EINVAL;
@@ -1266,11 +1267,17 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	svm_range_list_lock_and_flush_work(>svms, current->mm);
 	mutex_lock(>svms.lock);
 	mmap_write_unlock(current->mm);
-	if (interval_tree_iter_first(>svms.objects,
- args->va_addr >> PAGE_SHIFT,
- (args->va_addr + args->size - 1) >> PAGE_SHIFT)) {
-		pr_err("Address: 0x%llx already allocated by SVM\n",
-			args->va_addr);
+
+	if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+		start = args->mmap_offset >> PAGE_SHIFT;
+		last = (args->mmap_offset + args->size - 1) >> PAGE_SHIFT;
+	} else {
+		start = args->va_addr >> PAGE_SHIFT;
+		last = (args->va_addr + args->size - 1) >> PAGE_SHIFT;
+	}
+
+	if (interval_tree_iter_first(>svms.objects, start, last)) {
+		pr_err("[0x%lx 0x%lx] already allocated by SVM\n", start, last);
 		mutex_unlock(>svms.lock);
 		return -EADDRINUSE;
 	}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7f0743fcfcc3..d49c08618714 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2679,15 +2679,18 @@ int svm_range_list_init(struct kfd_process *p)
  *
  * Context: Process context
  *
- * Return 0 - OK, if the range is not mapped.
+ * Return NULL - if the range is not mapped.
+ * amdgpu_bo - if the range is mapped by old API
  * Otherwise error code:
- * -EADDRINUSE - if address is mapped already by kfd_ioctl_alloc_memory_of_gpu
  * -ERESTARTSYS - A wait for the buffer to become unreserved was interrupted by
  * a signal. Release all buffer reservations and return to user-space.
  */
-static int
+static struct amdgpu_bo *
 svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last)
 {
+	struct amdgpu_bo_va_mapping *mapping;
+	struct interval_tree_node *node;
+	struct amdgpu_bo *bo = NULL;
 	uint32_t i;
 	int r;
 
@@ -2700,16 +2703,42 @@ svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last)
 		vm = drm_priv_to_vm(p->pdds[i]->drm_priv);
 		r = amdgpu_bo_reserve(vm->root.bo, false);
 		if (r)
-			return r;
-		if (interval_tree_iter_first(>va, start, last)) {
-			pr_debug("Range [0x%llx 0x%llx] already mapped\n", start, last);
-			amdgpu_bo_unreserve(vm->root.bo);
-			return -EADDRINUSE;
+			return ERR_PTR(r);
+		node = interval_tree_iter_first(>va, start, last);
+		if (node) {
+			pr_debug("range [0x%llx 0x%llx] already TTM mapped\n",
+ start, last);
+			mapping = container_of((struct rb_node *)node,
+	   struct amdgpu_bo_va_mapping, rb);
+			bo = mapping->bo_va->base.bo;
+			goto out_unreserve;
+		}
+
+		/* Check userptr by searching entire vm->va interval tree */
+		node = interval_tree_iter_first(>va, 0, ~0ULL);
+		while (node) {
+			mapping = container_of((struct rb_node *)node,
+	   struct amdgpu_bo_va_mapping, rb);
+			bo = mapping->bo_va->base.bo;
+
+			if (amdgpu_ttm_tt_affect_userptr(bo->tbo.ttm,
+			 start << PAGE_SHIFT,
+			 last << PAGE_SHIFT)) {
+

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

Ok,

so I sat down and wrote something and tried to capture all the stuff we
so talked about that it is clear in the future why we did it.

Thoughts?

---
From: Borislav Petkov 
Date: Wed, 6 Oct 2021 19:34:55 +0200
Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
 automatically

This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.

However, Raven-class GPUs, a.o., cannot handle DMA masks which are
shorter than the bit position of the encryption, aka C-bit. For that,
those devices need to have the IOMMU present.

If the IOMMU is disabled or in passthrough mode, though, the kernel
would switch to SWIOTLB bounce-buffering for those transfers.

In order to avoid that,

   2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")

disables the default IOMMU passthrough mode so that devices for which
the default 256K DMA is insufficient, can use the IOMMU instead.

However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
think the usual hardware folk "oops, I dropped the ball there" cases.

Which means, it can happen that there are systems out there with devices
which need the IOMMU to function properly with SME enabled but the IOMMU
won't necessarily be enabled.

So in order for those devices to function, drop the "default y" for
the SME by default on option so that users who want to have SME, will
need to either enable it in their config or use "mem_encrypt=on" on the
kernel command line.

Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: Paul Menzel 
Signed-off-by: Borislav Petkov 
Cc: 
Link: 
https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df...@molgen.mpg.de
---
 arch/x86/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8055da49f1c0..6a336b1f3f28 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT

 config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
-   default y
depends on AMD_MEM_ENCRYPT
help
  Say yes to have system memory encrypted by default if running on
-- 
2.29.2

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 3/4] drm/amdkfd: avoid svm conflicting with userptr address

Am 2021-10-06 um 10:32 a.m. schrieb Philip Yang:
> For ioctl_alloc_memory_of_gpu to alloc userptr bo, check userptr address
> does not exist in svm->objects.
>
> For svm range allocation, look for address in the userptr range of
> interval tree VA nodes.

Why? The purpose of the check is to prevent the same GPU virtual address
being managed by the two different memory managers. So checking
args->va_addr should be correct even for userptr BOs. The CPU virtual
address should be OK to be mapped with userptr and SVM at the same time
as long as the userptr uses a distinct GPU virtual address.

Regards,
  Felix


>
> Change helper svm_range_check_vm to return amdgpu_bo, which will be used
> to avoid creating new svm range overlap with bo later.
>
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 17 +---
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 55 +++-
>  2 files changed, 57 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index f1e7edeb4e6b..34dfa6a938bf 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1255,6 +1255,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
> *filep,
>   long err;
>   uint64_t offset = args->mmap_offset;
>   uint32_t flags = args->flags;
> + unsigned long start, last;
>  
>   if (args->size == 0)
>   return -EINVAL;
> @@ -1266,11 +1267,17 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
> *filep,
>   svm_range_list_lock_and_flush_work(>svms, current->mm);
>   mutex_lock(>svms.lock);
>   mmap_write_unlock(current->mm);
> - if (interval_tree_iter_first(>svms.objects,
> -  args->va_addr >> PAGE_SHIFT,
> -  (args->va_addr + args->size - 1) >> 
> PAGE_SHIFT)) {
> - pr_err("Address: 0x%llx already allocated by SVM\n",
> - args->va_addr);
> +
> + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
> + start = args->mmap_offset >> PAGE_SHIFT;
> + last = (args->mmap_offset + args->size - 1) >> PAGE_SHIFT;
> + } else {
> + start = args->va_addr >> PAGE_SHIFT;
> + last = (args->va_addr + args->size - 1) >> PAGE_SHIFT;
> + }
> +
> + if (interval_tree_iter_first(>svms.objects, start, last)) {
> + pr_err("[0x%lx 0x%lx] already allocated by SVM\n", start, last);
>   mutex_unlock(>svms.lock);
>   return -EADDRINUSE;
>   }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 7f0743fcfcc3..d49c08618714 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2679,15 +2679,18 @@ int svm_range_list_init(struct kfd_process *p)
>   *
>   * Context: Process context
>   *
> - * Return 0 - OK, if the range is not mapped.
> + * Return NULL - if the range is not mapped.
> + * amdgpu_bo - if the range is mapped by old API
>   * Otherwise error code:
> - * -EADDRINUSE - if address is mapped already by 
> kfd_ioctl_alloc_memory_of_gpu
>   * -ERESTARTSYS - A wait for the buffer to become unreserved was interrupted 
> by
>   * a signal. Release all buffer reservations and return to user-space.
>   */
> -static int
> +static struct amdgpu_bo *
>  svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last)
>  {
> + struct amdgpu_bo_va_mapping *mapping;
> + struct interval_tree_node *node;
> + struct amdgpu_bo *bo = NULL;
>   uint32_t i;
>   int r;
>  
> @@ -2700,16 +2703,42 @@ svm_range_check_vm(struct kfd_process *p, uint64_t 
> start, uint64_t last)
>   vm = drm_priv_to_vm(p->pdds[i]->drm_priv);
>   r = amdgpu_bo_reserve(vm->root.bo, false);
>   if (r)
> - return r;
> - if (interval_tree_iter_first(>va, start, last)) {
> - pr_debug("Range [0x%llx 0x%llx] already mapped\n", 
> start, last);
> - amdgpu_bo_unreserve(vm->root.bo);
> - return -EADDRINUSE;
> + return ERR_PTR(r);
> + node = interval_tree_iter_first(>va, start, last);
> + if (node) {
> + pr_debug("range [0x%llx 0x%llx] already TTM mapped\n",
> +  start, last);
> + mapping = container_of((struct rb_node *)node,
> +struct amdgpu_bo_va_mapping, rb);
> + bo = mapping->bo_va->base.bo;
> + goto out_unreserve;
> + }
> +
> + /* Check userptr by searching entire vm->va interval tree */
> + node = interval_tree_iter_first(>va, 0, ~0ULL);
> + while (node) {
> + mapping = container_of((struct rb_node

[PATCH 1/1] drm/amdgpu: unify BO evicting method in amdgpu_ttm

2021-10-06 Thread Nirmoy Das

Unify BO evicting functionality for possible memory
types in amdgpu_ttm.c and remove corresponding function
from amdgpu_object.c.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  8 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 23 
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 30 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  1 +
 6 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 5497e2d31d1a..22f3de29d783 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1328,7 +1328,7 @@ static int amdgpu_debugfs_evict_vram(void *data, u64 *val)
return r;
}
 
-   *val = amdgpu_bo_evict_vram(adev);
+   *val = amdgpu_bo_evict_memory(adev, TTM_PL_VRAM);
 
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
@@ -1341,17 +1341,15 @@ static int amdgpu_debugfs_evict_gtt(void *data, u64 
*val)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)data;
struct drm_device *dev = adev_to_drm(adev);
-   struct ttm_resource_manager *man;
int r;
 
r = pm_runtime_get_sync(dev->dev);
if (r < 0) {
-   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
+   pm_runtime_put_autosuspend(dev->dev);
return r;
}
 
-   man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-   *val = ttm_resource_manager_evict_all(>mman.bdev, man);
+   *val = amdgpu_bo_evict_memory(adev, TTM_PL_TT);
 
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 57638fe9cfc2..c441ebe9da11 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3921,7 +3921,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
amdgpu_amdkfd_suspend(adev, adev->in_runpm);
 
/* evict vram memory */
-   amdgpu_bo_evict_vram(adev);
+   amdgpu_bo_evict_memory(adev, TTM_PL_VRAM);
 
amdgpu_fence_driver_hw_fini(adev);
 
@@ -3930,7 +3930,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
 * This second call to evict vram is to evict the gart page table
 * using the CPU.
 */
-   amdgpu_bo_evict_vram(adev);
+   amdgpu_bo_evict_memory(adev, TTM_PL_VRAM);
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4ec904f36ceb..073ba2af0b9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1004,29 +1004,6 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
}
 }
 
-/**
- * amdgpu_bo_evict_vram - evict VRAM buffers
- * @adev: amdgpu device object
- *
- * Evicts all VRAM buffers on the lru list of the memory type.
- * Mainly used for evicting vram at suspend time.
- *
- * Returns:
- * 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_evict_vram(struct amdgpu_device *adev)
-{
-   struct ttm_resource_manager *man;
-
-   if (adev->in_s3 && (adev->flags & AMD_IS_APU)) {
-   /* No need to evict vram on APUs for suspend to ram */
-   return 0;
-   }
-
-   man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
-   return ttm_resource_manager_evict_all(>mman.bdev, man);
-}
-
 static const char *amdgpu_vram_names[] = {
"UNKNOWN",
"GDDR1",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 8ff61bad4138..d787e0e89e0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -305,7 +305,6 @@ int amdgpu_bo_pin(struct amdgpu_bo *bo, u32 domain);
 int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
 u64 min_offset, u64 max_offset);
 void amdgpu_bo_unpin(struct amdgpu_bo *bo);
-int amdgpu_bo_evict_vram(struct amdgpu_device *adev);
 int amdgpu_bo_init(struct amdgpu_device *adev);
 void amdgpu_bo_fini(struct amdgpu_device *adev);
 int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 tiling_flags);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e2896ac2c9ce..545b4bdeae07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2034,6 +2034,36 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
return r;
 }
 
+/**
+ * amdgpu_bo_evict_memory - evict memory buffers
+ * @adev: amdgpu device object
+ * @mem_type: evicted BO's memory type
+ *
+ * Evicts all

Re: [PATCH 1/1] drm/amdgpu: add and use amdgpu_bo_evict_gtt

2021-10-06 Thread Das, Nirmoy




On 10/6/2021 4:58 PM, Christian König wrote:

Am 06.10.21 um 16:45 schrieb Nirmoy Das:

Unify BO evicting functionality for VRAM and TT memory
types in amdgpu_object.c. Use amdgpu_bo_evict_gtt()
for evicting gtt memory similar to how we do that
for amdgpu_debugfs_evict_vram().

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  6 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 52 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 +
  3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 5497e2d31d1a..67045983d63d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1341,17 +1341,15 @@ static int amdgpu_debugfs_evict_gtt(void 
*data, u64 *val)

  {
  struct amdgpu_device *adev = (struct amdgpu_device *)data;
  struct drm_device *dev = adev_to_drm(adev);
-    struct ttm_resource_manager *man;
  int r;
    r = pm_runtime_get_sync(dev->dev);
  if (r < 0) {
-    pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
+    pm_runtime_put_autosuspend(dev->dev);
  return r;
  }
  -    man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-    *val = ttm_resource_manager_evict_all(>mman.bdev, man);
+    *val = amdgpu_bo_evict_gtt(adev);
    pm_runtime_mark_last_busy(dev->dev);
  pm_runtime_put_autosuspend(dev->dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 4ec904f36ceb..3b8c9cf44d74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1005,10 +1005,37 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
  }
    /**
- * amdgpu_bo_evict_vram - evict VRAM buffers
+ * amdgpu_bo_evict_memory - evict memory buffers
   * @adev: amdgpu device object
+ * @mem_type: evicted BO's memory type
   *
- * Evicts all VRAM buffers on the lru list of the memory type.
+ * Evicts all @mem_type buffers on the lru list of the memory type.
+ *
+ * Returns:
+ * 0 for success or a negative error code on failure.
+ */
+static int amdgpu_bo_evict_memory(struct amdgpu_device *adev, int 
mem_type)


That function should probably be inside amdgpu_ttm.c instead.


+{
+    struct ttm_resource_manager *man;
+
+    switch (mem_type) {
+    case TTM_PL_VRAM:
+    case TTM_PL_TT:
+    man = ttm_manager_type(>mman.bdev, mem_type);
+    break;
+    default:
+    DRM_ERROR("Trying to evict invalid memory type\n");
+    return -EINVAL;


At least in theory we could do that for OA, GWS and GDS as well.



I will add those and take care of other comments and resend v2.


Thanks,

Nirmoy





+    }
+
+    return ttm_resource_manager_evict_all(>mman.bdev, man);
+}
+
+/**
+ * amdgpu_bo_evict_vram - evict vram buffers
+ * @adev: amdgpu device object
+ *
+ * Evicts all vram buffers on the lru list of the memory type.
   * Mainly used for evicting vram at suspend time.
   *
   * Returns:
@@ -1016,17 +1043,32 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
   */
  int amdgpu_bo_evict_vram(struct amdgpu_device *adev)
  {
-    struct ttm_resource_manager *man;
    if (adev->in_s3 && (adev->flags & AMD_IS_APU)) {
  /* No need to evict vram on APUs for suspend to ram */
  return 0;
  }
  -    man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
-    return ttm_resource_manager_evict_all(>mman.bdev, man);
+    return amdgpu_bo_evict_memory(adev, TTM_PL_VRAM);
+}
+
+/**
+ * amdgpu_bo_evict_gtt - evict gtt buffers
+ * @adev: amdgpu device object
+ *
+ * Evicts all gtt buffers on the lru list of the memory type.
+ * Mainly used for evicting gtt buffers through debugfs.
+ *
+ * Returns:
+ * 0 for success or a negative error code on failure.
+ */
+
+int amdgpu_bo_evict_gtt(struct amdgpu_device *adev)
+{


I won't add a wrapper for that. This looks like misplaced and overkill.

Christian.


+    return amdgpu_bo_evict_memory(adev, TTM_PL_TT);
  }
  +
  static const char *amdgpu_vram_names[] = {
  "UNKNOWN",
  "GDDR1",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h

index 8ff61bad4138..5e9b7710b8e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -306,6 +306,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo 
*bo, u32 domain,

   u64 min_offset, u64 max_offset);
  void amdgpu_bo_unpin(struct amdgpu_bo *bo);
  int amdgpu_bo_evict_vram(struct amdgpu_device *adev);
+int amdgpu_bo_evict_gtt(struct amdgpu_device *adev);
  int amdgpu_bo_init(struct amdgpu_device *adev);
  void amdgpu_bo_fini(struct amdgpu_device *adev);
  int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 
tiling_flags);

Re: [PATCH 1/1] drm/amdgpu: add and use amdgpu_bo_evict_gtt


Am 06.10.21 um 16:45 schrieb Nirmoy Das:

Unify BO evicting functionality for VRAM and TT memory
types in amdgpu_object.c. Use amdgpu_bo_evict_gtt()
for evicting gtt memory similar to how we do that
for amdgpu_debugfs_evict_vram().

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  6 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 52 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 +
  3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 5497e2d31d1a..67045983d63d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1341,17 +1341,15 @@ static int amdgpu_debugfs_evict_gtt(void *data, u64 
*val)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)data;
struct drm_device *dev = adev_to_drm(adev);
-   struct ttm_resource_manager *man;
int r;
  
  	r = pm_runtime_get_sync(dev->dev);

if (r < 0) {
-   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
+   pm_runtime_put_autosuspend(dev->dev);
return r;
}
  
-	man = ttm_manager_type(>mman.bdev, TTM_PL_TT);

-   *val = ttm_resource_manager_evict_all(>mman.bdev, man);
+   *val = amdgpu_bo_evict_gtt(adev);
  
  	pm_runtime_mark_last_busy(dev->dev);

pm_runtime_put_autosuspend(dev->dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4ec904f36ceb..3b8c9cf44d74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1005,10 +1005,37 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
  }
  
  /**

- * amdgpu_bo_evict_vram - evict VRAM buffers
+ * amdgpu_bo_evict_memory - evict memory buffers
   * @adev: amdgpu device object
+ * @mem_type: evicted BO's memory type
   *
- * Evicts all VRAM buffers on the lru list of the memory type.
+ * Evicts all @mem_type buffers on the lru list of the memory type.
+ *
+ * Returns:
+ * 0 for success or a negative error code on failure.
+ */
+static int amdgpu_bo_evict_memory(struct amdgpu_device *adev, int mem_type)


That function should probably be inside amdgpu_ttm.c instead.


+{
+   struct ttm_resource_manager *man;
+
+   switch (mem_type) {
+   case TTM_PL_VRAM:
+   case TTM_PL_TT:
+   man = ttm_manager_type(>mman.bdev, mem_type);
+   break;
+   default:
+   DRM_ERROR("Trying to evict invalid memory type\n");
+   return -EINVAL;


At least in theory we could do that for OA, GWS and GDS as well.


+   }
+
+   return ttm_resource_manager_evict_all(>mman.bdev, man);
+}
+
+/**
+ * amdgpu_bo_evict_vram - evict vram buffers
+ * @adev: amdgpu device object
+ *
+ * Evicts all vram buffers on the lru list of the memory type.
   * Mainly used for evicting vram at suspend time.
   *
   * Returns:
@@ -1016,17 +1043,32 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
   */
  int amdgpu_bo_evict_vram(struct amdgpu_device *adev)
  {
-   struct ttm_resource_manager *man;
  
  	if (adev->in_s3 && (adev->flags & AMD_IS_APU)) {

/* No need to evict vram on APUs for suspend to ram */
return 0;
}
  
-	man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM);

-   return ttm_resource_manager_evict_all(>mman.bdev, man);
+   return amdgpu_bo_evict_memory(adev, TTM_PL_VRAM);
+}
+
+/**
+ * amdgpu_bo_evict_gtt - evict gtt buffers
+ * @adev: amdgpu device object
+ *
+ * Evicts all gtt buffers on the lru list of the memory type.
+ * Mainly used for evicting gtt buffers through debugfs.
+ *
+ * Returns:
+ * 0 for success or a negative error code on failure.
+ */
+
+int amdgpu_bo_evict_gtt(struct amdgpu_device *adev)
+{


I won't add a wrapper for that. This looks like misplaced and overkill.

Christian.


+   return amdgpu_bo_evict_memory(adev, TTM_PL_TT);
  }
  
+

  static const char *amdgpu_vram_names[] = {
"UNKNOWN",
"GDDR1",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 8ff61bad4138..5e9b7710b8e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -306,6 +306,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 u64 min_offset, u64 max_offset);
  void amdgpu_bo_unpin(struct amdgpu_bo *bo);
  int amdgpu_bo_evict_vram(struct amdgpu_device *adev);
+int amdgpu_bo_evict_gtt(struct amdgpu_device *adev);
  int amdgpu_bo_init(struct amdgpu_device *adev);
  void amdgpu_bo_fini(struct amdgpu_device *adev);
  int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 tiling_flags);

[PATCH 1/1] drm/amdgpu: add and use amdgpu_bo_evict_gtt

2021-10-06 Thread Nirmoy Das

Unify BO evicting functionality for VRAM and TT memory
types in amdgpu_object.c. Use amdgpu_bo_evict_gtt()
for evicting gtt memory similar to how we do that
for amdgpu_debugfs_evict_vram().

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 52 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  1 +
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 5497e2d31d1a..67045983d63d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1341,17 +1341,15 @@ static int amdgpu_debugfs_evict_gtt(void *data, u64 
*val)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)data;
struct drm_device *dev = adev_to_drm(adev);
-   struct ttm_resource_manager *man;
int r;
 
r = pm_runtime_get_sync(dev->dev);
if (r < 0) {
-   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
+   pm_runtime_put_autosuspend(dev->dev);
return r;
}
 
-   man = ttm_manager_type(>mman.bdev, TTM_PL_TT);
-   *val = ttm_resource_manager_evict_all(>mman.bdev, man);
+   *val = amdgpu_bo_evict_gtt(adev);
 
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 4ec904f36ceb..3b8c9cf44d74 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1005,10 +1005,37 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
 }
 
 /**
- * amdgpu_bo_evict_vram - evict VRAM buffers
+ * amdgpu_bo_evict_memory - evict memory buffers
  * @adev: amdgpu device object
+ * @mem_type: evicted BO's memory type
  *
- * Evicts all VRAM buffers on the lru list of the memory type.
+ * Evicts all @mem_type buffers on the lru list of the memory type.
+ *
+ * Returns:
+ * 0 for success or a negative error code on failure.
+ */
+static int amdgpu_bo_evict_memory(struct amdgpu_device *adev, int mem_type)
+{
+   struct ttm_resource_manager *man;
+
+   switch (mem_type) {
+   case TTM_PL_VRAM:
+   case TTM_PL_TT:
+   man = ttm_manager_type(>mman.bdev, mem_type);
+   break;
+   default:
+   DRM_ERROR("Trying to evict invalid memory type\n");
+   return -EINVAL;
+   }
+
+   return ttm_resource_manager_evict_all(>mman.bdev, man);
+}
+
+/**
+ * amdgpu_bo_evict_vram - evict vram buffers
+ * @adev: amdgpu device object
+ *
+ * Evicts all vram buffers on the lru list of the memory type.
  * Mainly used for evicting vram at suspend time.
  *
  * Returns:
@@ -1016,17 +1043,32 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
  */
 int amdgpu_bo_evict_vram(struct amdgpu_device *adev)
 {
-   struct ttm_resource_manager *man;
 
if (adev->in_s3 && (adev->flags & AMD_IS_APU)) {
/* No need to evict vram on APUs for suspend to ram */
return 0;
}
 
-   man = ttm_manager_type(>mman.bdev, TTM_PL_VRAM);
-   return ttm_resource_manager_evict_all(>mman.bdev, man);
+   return amdgpu_bo_evict_memory(adev, TTM_PL_VRAM);
+}
+
+/**
+ * amdgpu_bo_evict_gtt - evict gtt buffers
+ * @adev: amdgpu device object
+ *
+ * Evicts all gtt buffers on the lru list of the memory type.
+ * Mainly used for evicting gtt buffers through debugfs.
+ *
+ * Returns:
+ * 0 for success or a negative error code on failure.
+ */
+
+int amdgpu_bo_evict_gtt(struct amdgpu_device *adev)
+{
+   return amdgpu_bo_evict_memory(adev, TTM_PL_TT);
 }
 
+
 static const char *amdgpu_vram_names[] = {
"UNKNOWN",
"GDDR1",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 8ff61bad4138..5e9b7710b8e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -306,6 +306,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 u64 min_offset, u64 max_offset);
 void amdgpu_bo_unpin(struct amdgpu_bo *bo);
 int amdgpu_bo_evict_vram(struct amdgpu_device *adev);
+int amdgpu_bo_evict_gtt(struct amdgpu_device *adev);
 int amdgpu_bo_init(struct amdgpu_device *adev);
 void amdgpu_bo_fini(struct amdgpu_device *adev);
 int amdgpu_bo_set_tiling_flags(struct amdgpu_bo *bo, u64 tiling_flags);
-- 
2.32.0

[PATCH 3/4] drm/amdkfd: avoid svm conflicting with userptr address

For ioctl_alloc_memory_of_gpu to alloc userptr bo, check userptr address
does not exist in svm->objects.

For svm range allocation, look for address in the userptr range of
interval tree VA nodes.

Change helper svm_range_check_vm to return amdgpu_bo, which will be used
to avoid creating new svm range overlap with bo later.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 17 +---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 55 +++-
 2 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f1e7edeb4e6b..34dfa6a938bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1255,6 +1255,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
long err;
uint64_t offset = args->mmap_offset;
uint32_t flags = args->flags;
+   unsigned long start, last;
 
if (args->size == 0)
return -EINVAL;
@@ -1266,11 +1267,17 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file 
*filep,
svm_range_list_lock_and_flush_work(>svms, current->mm);
mutex_lock(>svms.lock);
mmap_write_unlock(current->mm);
-   if (interval_tree_iter_first(>svms.objects,
-args->va_addr >> PAGE_SHIFT,
-(args->va_addr + args->size - 1) >> 
PAGE_SHIFT)) {
-   pr_err("Address: 0x%llx already allocated by SVM\n",
-   args->va_addr);
+
+   if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+   start = args->mmap_offset >> PAGE_SHIFT;
+   last = (args->mmap_offset + args->size - 1) >> PAGE_SHIFT;
+   } else {
+   start = args->va_addr >> PAGE_SHIFT;
+   last = (args->va_addr + args->size - 1) >> PAGE_SHIFT;
+   }
+
+   if (interval_tree_iter_first(>svms.objects, start, last)) {
+   pr_err("[0x%lx 0x%lx] already allocated by SVM\n", start, last);
mutex_unlock(>svms.lock);
return -EADDRINUSE;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7f0743fcfcc3..d49c08618714 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2679,15 +2679,18 @@ int svm_range_list_init(struct kfd_process *p)
  *
  * Context: Process context
  *
- * Return 0 - OK, if the range is not mapped.
+ * Return NULL - if the range is not mapped.
+ * amdgpu_bo - if the range is mapped by old API
  * Otherwise error code:
- * -EADDRINUSE - if address is mapped already by kfd_ioctl_alloc_memory_of_gpu
  * -ERESTARTSYS - A wait for the buffer to become unreserved was interrupted by
  * a signal. Release all buffer reservations and return to user-space.
  */
-static int
+static struct amdgpu_bo *
 svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last)
 {
+   struct amdgpu_bo_va_mapping *mapping;
+   struct interval_tree_node *node;
+   struct amdgpu_bo *bo = NULL;
uint32_t i;
int r;
 
@@ -2700,16 +2703,42 @@ svm_range_check_vm(struct kfd_process *p, uint64_t 
start, uint64_t last)
vm = drm_priv_to_vm(p->pdds[i]->drm_priv);
r = amdgpu_bo_reserve(vm->root.bo, false);
if (r)
-   return r;
-   if (interval_tree_iter_first(>va, start, last)) {
-   pr_debug("Range [0x%llx 0x%llx] already mapped\n", 
start, last);
-   amdgpu_bo_unreserve(vm->root.bo);
-   return -EADDRINUSE;
+   return ERR_PTR(r);
+   node = interval_tree_iter_first(>va, start, last);
+   if (node) {
+   pr_debug("range [0x%llx 0x%llx] already TTM mapped\n",
+start, last);
+   mapping = container_of((struct rb_node *)node,
+  struct amdgpu_bo_va_mapping, rb);
+   bo = mapping->bo_va->base.bo;
+   goto out_unreserve;
+   }
+
+   /* Check userptr by searching entire vm->va interval tree */
+   node = interval_tree_iter_first(>va, 0, ~0ULL);
+   while (node) {
+   mapping = container_of((struct rb_node *)node,
+  struct amdgpu_bo_va_mapping, rb);
+   bo = mapping->bo_va->base.bo;
+
+   if (amdgpu_ttm_tt_affect_userptr(bo->tbo.ttm,
+start << PAGE_SHIFT,
+last << PAGE_SHIFT)) {
+   pr_debug("[0x%llx 0x%llx] userptr mapped\n",
+start, last);
+

[PATCH 4/4] drm/amdkfd: create svm unregister range not overlap with userptr

When creating new svm range to recover retry fault, avoid svm range
to overlap with ranges or userptr ranges managed by TTM, otherwise
svm migration will trigger TTM or userptr eviction, to evict user queues
unexpectedly.

Add helper amdgpu_ttm_tt_get_userptr because amdgpu_ttm_tt structure is
not accessed from outside of amdgpu_ttm.c.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c| 28 -
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e2896ac2c9ce..93952e1bce5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1251,6 +1251,19 @@ bool amdgpu_ttm_tt_is_userptr(struct ttm_tt *ttm)
return true;
 }
 
+/*
+ * amdgpu_ttm_tt_get_userptr - get userptr of the address range
+ */
+uint64_t amdgpu_ttm_tt_get_userptr(struct ttm_tt *ttm)
+{
+   struct amdgpu_ttm_tt *gtt = (void *)ttm;
+
+   if (gtt == NULL)
+   return 0;
+   return  gtt->userptr;
+}
+
+
 /*
  * amdgpu_ttm_tt_is_readonly - Is the ttm_tt object read only?
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index e69f3e8e06e5..1dd1a882280d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -186,6 +186,7 @@ bool amdgpu_ttm_tt_affect_userptr(struct ttm_tt *ttm, 
unsigned long start,
 bool amdgpu_ttm_tt_userptr_invalidated(struct ttm_tt *ttm,
   int *last_invalidated);
 bool amdgpu_ttm_tt_is_userptr(struct ttm_tt *ttm);
+uint64_t amdgpu_ttm_tt_get_userptr(struct ttm_tt *ttm);
 bool amdgpu_ttm_tt_is_readonly(struct ttm_tt *ttm);
 uint64_t amdgpu_ttm_tt_pde_flags(struct ttm_tt *ttm, struct ttm_resource *mem);
 uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct ttm_tt 
*ttm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index d49c08618714..a2eb21deb06f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -45,6 +45,8 @@ static bool
 svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
const struct mmu_notifier_range *range,
unsigned long cur_seq);
+static struct amdgpu_bo *
+svm_range_check_vm(struct kfd_process *p, uint64_t start, uint64_t last);
 
 static const struct mmu_interval_notifier_ops svm_range_mn_ops = {
.invalidate = svm_range_cpu_invalidate_pagetables,
@@ -2308,6 +2310,7 @@ svm_range_best_restore_location(struct svm_range *prange,
 
return -1;
 }
+
 static int
 svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr,
unsigned long *start, unsigned long *last)
@@ -2355,8 +2358,8 @@ svm_range_get_range_boundaries(struct kfd_process *p, 
int64_t addr,
  vma->vm_end >> PAGE_SHIFT, *last);
 
return 0;
-
 }
+
 static struct
 svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
struct kfd_process *p,
@@ -2366,10 +2369,33 @@ svm_range *svm_range_create_unregistered_range(struct 
amdgpu_device *adev,
struct svm_range *prange = NULL;
unsigned long start, last;
uint32_t gpuid, gpuidx;
+   struct amdgpu_bo *bo;
 
if (svm_range_get_range_boundaries(p, addr, , ))
return NULL;
 
+   bo = svm_range_check_vm(p, start, last);
+   if (bo) {
+   struct ttm_tt *ttm = bo->tbo.ttm;
+   unsigned long bo_s, bo_l;
+
+   if (amdgpu_ttm_tt_is_userptr(ttm)) {
+   bo_s = amdgpu_ttm_tt_get_userptr(ttm) >> PAGE_SHIFT;
+   bo_l = bo_s + ttm->num_pages - 1;
+   pr_debug("overlap userptr [0x%lx 0x%lx]\n", bo_s, bo_l);
+   } else {
+   bo_s = bo->kfd_bo->va;
+   bo_l = bo_s + ttm->num_pages - 1;
+   pr_debug("overlap range [0x%lx 0x%lx]\n", bo_s, bo_l);
+   }
+   if (addr >= bo_s && addr <= bo_l)
+   return NULL;
+   if (addr < bo_s)
+   last = bo_s - 1;
+   if (addr > bo_l)
+   start = bo_l + 1;
+   }
+
prange = svm_range_new(>svms, start, last);
if (!prange) {
pr_debug("Failed to create prange in address [0x%llx]\n", addr);
-- 
2.17.1

[PATCH 1/4] drm/amdkfd: ratelimited svm debug messages

No function change, use pr_debug_ratelimited to avoid per page debug
message overflowing dmesg buf and console log.

use dev_err to show error message from unexpected situation, to provide
clue to help debug without enabling dynamic debug log.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 34 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 12 -
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index f53e17a94ad8..069422337cf7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -151,14 +151,14 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, 
dma_addr_t *sys,
gart_d = svm_migrate_direct_mapping_addr(adev, *vram);
}
if (r) {
-   pr_debug("failed %d to create gart mapping\n", r);
+   dev_err(adev->dev, "fail %d create gart mapping\n", r);
goto out_unlock;
}
 
r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE,
   NULL, , false, true, false);
if (r) {
-   pr_debug("failed %d to copy memory\n", r);
+   dev_err(adev->dev, "fail %d to copy memory\n", r);
goto out_unlock;
}
 
@@ -285,7 +285,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
 
r = svm_range_vram_node_new(adev, prange, true);
if (r) {
-   pr_debug("failed %d get 0x%llx pages from vram\n", r, npages);
+   dev_err(adev->dev, "fail %d get %lld vram pages\n", r, npages);
goto out;
}
 
@@ -305,7 +305,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
  DMA_TO_DEVICE);
r = dma_mapping_error(dev, src[i]);
if (r) {
-   pr_debug("failed %d dma_map_page\n", r);
+   dev_err(adev->dev, "fail %d dma_map_page\n", r);
goto out_free_vram_pages;
}
} else {
@@ -325,8 +325,8 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
continue;
}
 
-   pr_debug("dma mapping src to 0x%llx, page_to_pfn 0x%lx\n",
-src[i] >> PAGE_SHIFT, page_to_pfn(spage));
+   pr_debug_ratelimited("dma mapping src to 0x%llx, pfn 0x%lx\n",
+src[i] >> PAGE_SHIFT, page_to_pfn(spage));
 
if (j >= (cursor.size >> PAGE_SHIFT) - 1 && i < npages - 1) {
r = svm_migrate_copy_memory_gart(adev, src + i - j,
@@ -347,7 +347,7 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
 
 out_free_vram_pages:
if (r) {
-   pr_debug("failed %d to copy memory to vram\n", r);
+   dev_err(adev->dev, "fail %d to copy memory to vram\n", r);
while (i--) {
svm_migrate_put_vram_page(adev, dst[i]);
migrate->dst[i] = 0;
@@ -405,8 +405,8 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
 
r = migrate_vma_setup();
if (r) {
-   pr_debug("failed %d prepare migrate svms 0x%p [0x%lx 0x%lx]\n",
-r, prange->svms, prange->start, prange->last);
+   dev_err(adev->dev, "fail %d vma setup 0x%p [0x%lx 0x%lx]\n",
+   r, prange->svms, prange->start, prange->last);
goto out_free;
}
if (migrate.cpages != npages) {
@@ -506,7 +506,7 @@ static void svm_migrate_page_free(struct page *page)
struct svm_range_bo *svm_bo = page->zone_device_data;
 
if (svm_bo) {
-   pr_debug("svm_bo ref left: %d\n", kref_read(_bo->kref));
+   pr_debug_ratelimited("ref: %d\n", kref_read(_bo->kref));
svm_range_bo_unref(svm_bo);
}
 }
@@ -563,8 +563,8 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
 
dpage = svm_migrate_get_sys_page(migrate->vma, addr);
if (!dpage) {
-   pr_debug("failed get page svms 0x%p [0x%lx 0x%lx]\n",
-prange->svms, prange->start, prange->last);
+   dev_err(adev->dev, "fail get page 0x%p [0x%lx 0x%lx]\n",
+   prange->svms, prange->start, prange->last);
r = -ENOMEM;
goto out_oom;
}
@@ -572,12 +572,12 @@ svm_migrate_copy_to_ram(struct amdgpu_device

[PATCH 2/4] drm/amdkfd: handle partial migration cpages 0

migrate_vma_setup may return cpages 0, means 0 page can be migrated,
treat this as error case to skip the rest of migration steps, and don't
change prange actual loc, to avoid warning message "VRAM BO missing
during validation".

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 48 ++--
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 069422337cf7..9b68e3e8f2a1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -409,20 +409,25 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, 
struct svm_range *prange,
r, prange->svms, prange->start, prange->last);
goto out_free;
}
-   if (migrate.cpages != npages) {
-   pr_debug("Partial migration. 0x%lx/0x%llx pages can be 
migrated\n",
-migrate.cpages,
-npages);
-   }
 
-   if (migrate.cpages) {
-   r = svm_migrate_copy_to_vram(adev, prange, , ,
-scratch);
-   migrate_vma_pages();
-   svm_migrate_copy_done(adev, mfence);
-   migrate_vma_finalize();
+   if (migrate.cpages != npages)
+   pr_debug("partial migration, 0x%lx/0x%llx pages migrated\n",
+migrate.cpages, npages);
+   else
+   pr_debug("0x%lx pages migrated\n", migrate.cpages);
+
+   if (!migrate.cpages) {
+   pr_debug("failed collect migrate sys pages [0x%lx 0x%lx]\n",
+prange->start, prange->last);
+   r = -ENOMEM;
+   goto out_free;
}
 
+   r = svm_migrate_copy_to_vram(adev, prange, , , scratch);
+   migrate_vma_pages();
+   svm_migrate_copy_done(adev, mfence);
+   migrate_vma_finalize();
+
svm_range_dma_unmap(adev->dev, scratch, 0, npages);
svm_range_free_dma_mappings(prange);
 
@@ -636,19 +641,24 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
goto out_free;
}
 
-   pr_debug("cpages %ld\n", migrate.cpages);
+   if (migrate.cpages != npages)
+   pr_debug("partial migration, 0x%lx/0x%llx pages migrated\n",
+migrate.cpages, npages);
+   else
+   pr_debug("0x%lx pages migrated\n", migrate.cpages);
 
-   if (migrate.cpages) {
-   r = svm_migrate_copy_to_ram(adev, prange, , ,
-   scratch, npages);
-   migrate_vma_pages();
-   svm_migrate_copy_done(adev, mfence);
-   migrate_vma_finalize();
-   } else {
+   if (!migrate.cpages) {
pr_debug("failed collect migrate device pages [0x%lx 0x%lx]\n",
 prange->start, prange->last);
+   r = -ENOMEM;
+   goto out_free;
}
 
+   r = svm_migrate_copy_to_ram(adev, prange, , ,
+   scratch, npages);
+   migrate_vma_pages();
+   svm_migrate_copy_done(adev, mfence);
+   migrate_vma_finalize();
svm_range_dma_unmap(adev->dev, scratch, 0, npages);
 
 out_free:
-- 
2.17.1

Re: [PATCH v3 2/2] amd/display: only require overlay plane to cover whole CRTC on ChromeOS

2021-10-06 Thread Simon Ser

> current->comm is "DrmThread" on my ChromeOS system.

Oops! I forgot that current->comm could be overwritten by user-space. Sent v4
which should address this by checking the executable name too.

[PATCH v4 2/2] amd/display: only require overlay plane to cover whole CRTC on ChromeOS

2021-10-06 Thread Simon Ser

Commit ddab8bd788f5 ("drm/amd/display: Fix two cursor duplication when
using overlay") changed the atomic validation code to forbid the
overlay plane from being used if it doesn't cover the whole CRTC. The
motivation is that ChromeOS uses the atomic API for everything except
the cursor plane (which uses the legacy API). Thus amdgpu must always
be prepared to enable/disable/move the cursor plane at any time without
failing (or else ChromeOS will trip over).

As discussed in [1], there's no reason why the ChromeOS limitation
should prevent other fully atomic users from taking advantage of the
overlay plane. Let's limit the check to ChromeOS.

v4: fix ChromeOS detection (Harry)

[1]: 
https://lore.kernel.org/amd-gfx/JIQ_93_cHcshiIDsrMU1huBzx9P9LVQxucx8hQArpQu7Wk5DrCl_vTXj_Q20m_L-8C8A5dSpNcSJ8ehfcCrsQpfB5QG_Spn14EYkH9chtg0=@emersion.fr/

Signed-off-by: Simon Ser 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Bas Nieuwenhuizen 
Cc: Rodrigo Siqueira 
Cc: Sean Paul 
Fixes: ddab8bd788f5 ("drm/amd/display: Fix two cursor duplication when using 
overlay")
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 24 +++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 5746980454e5..0b80f779e706 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -10595,6 +10595,26 @@ static int add_affected_mst_dsc_crtcs(struct 
drm_atomic_state *state, struct drm
 }
 #endif
 
+static bool is_chromeos(void)
+{
+   struct mm_struct *mm = current->mm;
+   struct file *exe_file;
+   bool ret;
+
+   /* ChromeOS renames its thread to DrmThread. Also check the executable
+* name. */
+   if (strcmp(current->comm, "DrmThread") != 0 || !mm)
+   return false;
+
+   exe_file = get_mm_exe_file(mm);
+   if (!exe_file)
+   return false;
+   ret = strcmp(exe_file->f_path.dentry->d_name.name, "chrome") == 0;
+   fput(exe_file);
+
+   return ret;
+}
+
 static int validate_overlay(struct drm_atomic_state *state)
 {
int i;
@@ -10602,6 +10622,10 @@ static int validate_overlay(struct drm_atomic_state 
*state)
struct drm_plane_state *new_plane_state;
struct drm_plane_state *primary_state, *overlay_state = NULL;
 
+   /* This is a workaround for ChromeOS only */
+   if (!is_chromeos())
+   return 0;
+
/* Check if primary plane is contained inside overlay */
for_each_new_plane_in_state_reverse(state, plane, new_plane_state, i) {
if (plane->type == DRM_PLANE_TYPE_OVERLAY) {
-- 
2.33.0

[PATCH v4 1/2] amd/display: check cursor plane matches underlying plane

2021-10-06 Thread Simon Ser

The current logic checks whether the cursor plane blending
properties match the primary plane's. However that's wrong,
because the cursor is painted on all planes underneath. If
the cursor is over the primary plane and the overlay plane,
it's painted on both pipes.

Iterate over the CRTC planes and check their scaling match
the cursor's.

v4: fix typo in commit message (Harry)

Signed-off-by: Simon Ser 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Bas Nieuwenhuizen 
Cc: Rodrigo Siqueira 
Cc: Sean Paul 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 49 +--
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a399a984b8a6..5746980454e5 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -10517,18 +10517,18 @@ static int dm_check_crtc_cursor(struct 
drm_atomic_state *state,
struct drm_crtc *crtc,
struct drm_crtc_state *new_crtc_state)
 {
-   struct drm_plane_state *new_cursor_state, *new_primary_state;
-   int cursor_scale_w, cursor_scale_h, primary_scale_w, primary_scale_h;
+   struct drm_plane *cursor = crtc->cursor, *underlying;
+   struct drm_plane_state *new_cursor_state, *new_underlying_state;
+   int i;
+   int cursor_scale_w, cursor_scale_h, underlying_scale_w, 
underlying_scale_h;
 
/* On DCE and DCN there is no dedicated hardware cursor plane. We get a
 * cursor per pipe but it's going to inherit the scaling and
 * positioning from the underlying pipe. Check the cursor plane's
-* blending properties match the primary plane's. */
+* blending properties match the underlying planes'. */
 
-   new_cursor_state = drm_atomic_get_new_plane_state(state, crtc->cursor);
-   new_primary_state = drm_atomic_get_new_plane_state(state, 
crtc->primary);
-   if (!new_cursor_state || !new_primary_state ||
-   !new_cursor_state->fb || !new_primary_state->fb) {
+   new_cursor_state = drm_atomic_get_new_plane_state(state, cursor);
+   if (!new_cursor_state || !new_cursor_state->fb) {
return 0;
}
 
@@ -10537,15 +10537,34 @@ static int dm_check_crtc_cursor(struct 
drm_atomic_state *state,
cursor_scale_h = new_cursor_state->crtc_h * 1000 /
 (new_cursor_state->src_h >> 16);
 
-   primary_scale_w = new_primary_state->crtc_w * 1000 /
-(new_primary_state->src_w >> 16);
-   primary_scale_h = new_primary_state->crtc_h * 1000 /
-(new_primary_state->src_h >> 16);
+   for_each_new_plane_in_state_reverse(state, underlying, 
new_underlying_state, i) {
+   /* Narrow down to non-cursor planes on the same CRTC as the 
cursor */
+   if (new_underlying_state->crtc != crtc || underlying == 
crtc->cursor)
+   continue;
 
-   if (cursor_scale_w != primary_scale_w ||
-   cursor_scale_h != primary_scale_h) {
-   drm_dbg_atomic(crtc->dev, "Cursor plane scaling doesn't match 
primary plane\n");
-   return -EINVAL;
+   /* Ignore disabled planes */
+   if (!new_underlying_state->fb)
+   continue;
+
+   underlying_scale_w = new_underlying_state->crtc_w * 1000 /
+(new_underlying_state->src_w >> 16);
+   underlying_scale_h = new_underlying_state->crtc_h * 1000 /
+(new_underlying_state->src_h >> 16);
+
+   if (cursor_scale_w != underlying_scale_w ||
+   cursor_scale_h != underlying_scale_h) {
+   drm_dbg_atomic(crtc->dev,
+  "Cursor [PLANE:%d:%s] scaling doesn't 
match underlying [PLANE:%d:%s]\n",
+  cursor->base.id, cursor->name, 
underlying->base.id, underlying->name);
+   return -EINVAL;
+   }
+
+   /* If this plane covers the whole CRTC, no need to check planes 
underneath */
+   if (new_underlying_state->crtc_x <= 0 &&
+   new_underlying_state->crtc_y <= 0 &&
+   new_underlying_state->crtc_x + new_underlying_state->crtc_w 
>= new_crtc_state->mode.hdisplay &&
+   new_underlying_state->crtc_y + new_underlying_state->crtc_h 
>= new_crtc_state->mode.vdisplay)
+   break;
}
 
return 0;
-- 
2.33.0

Re: [PATCH v2 19/23] drm/amd/display: Add debug flags for USB4 DP link training

2021-10-06 Thread Harry Wentland




On 2021-10-06 06:14, Lin, Wayne wrote:
> [Public]
> 
>> -Original Message-
>> From: Wentland, Harry 
>> Sent: Wednesday, October 6, 2021 1:11 AM
>> To: Lin, Wayne ; amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander ; Kazlauskas, Nicholas 
>> ; Siqueira, Rodrigo
>> ; Wang, Chao-kai (Stylon) ; 
>> Shih, Jude ; Kizito, Jimmy
>> ; Somasundaram, Meenakshikumar 
>> ; Lei, Jun 
>> Subject: Re: [PATCH v2 19/23] drm/amd/display: Add debug flags for USB4 DP 
>> link training
>>
>>
>>
>> On 2021-10-05 03:52, Wayne Lin wrote:
>>> From: Jimmy Kizito 
>>>
>>> [Why & How]
>>> Additional debug flags that can be useful for testing USB4 DP link
>>> training.
>>>
>>> Add flags:
>>> - 0x2 : Forces USB4 DP link to non-LTTPR mode
>>> - 0x4 : Extends status read intervals to about 60s.
>>>
>>> Reviewed-by: Meenakshikumar Somasundaram
>>> 
>>> Reviewed-by: Jun Lei 
>>> Acked-by: Wayne Lin 
>>> Acked-by: Nicholas Kazlauskas 
>>> Signed-off-by: Jimmy Kizito 
>>> ---
>>>  drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c   | 6 ++
>>>  drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c | 6 ++
>>>  drivers/gpu/drm/amd/display/dc/dc.h| 4 +++-
>>>  drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h  | 3 +++
>>>  4 files changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
>>> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
>>> index bfba1d2c6a18..423fbd2b9b39 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
>>> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
>>> @@ -4528,6 +4528,12 @@ bool dp_retrieve_lttpr_cap(struct dc_link *link)
>>> else
>>> link->lttpr_mode = LTTPR_MODE_NON_TRANSPARENT;
>>> }
>>> +#if defined(CONFIG_DRM_AMD_DC_DCN)
>>
>> Why is this guarded with DC_DCN when all other DPIA code isn't?
>> It looks like it might be unnecessary.
> Thanks Harry.
> 
> Since declaration of dpia_debug variable is guarded by CONFIG_DRM_AMD_DC_DCN,
> we should keep this here.
> 

Ah, that's the one I was missing.

We could probably move it out of the DCN guard in patch 16 but
that can be done with a follow-up patch.

Technically DPIA only makes sense for DCN but there is no reason
to guard it specifically for DCN. The only reason we have the DCN
guard is to allow builds of our driver without floating point on
older HW. I wonder if that's even still needed since we now have
the fixups of the floating point stuff for PPC and ARM.

Harry

> Thanks!
>>
>>> +   /* Check DP tunnel LTTPR mode debug option. */
>>> +   if (link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA &&
>>> +   link->dc->debug.dpia_debug.bits.force_non_lttpr)
>>> +   link->lttpr_mode = LTTPR_MODE_NON_LTTPR; #endif
>>>
>>> if (link->lttpr_mode == LTTPR_MODE_NON_TRANSPARENT || link->lttpr_mode 
>>> == LTTPR_MODE_TRANSPARENT) {
>>> /* By reading LTTPR capability, RX assumes that we will enable 
>>> diff
>>> --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
>>> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
>>> index 7407c755a73e..ce15a38c2aea 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
>>> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
>>> @@ -528,6 +528,12 @@ static uint32_t dpia_get_eq_aux_rd_interval(const 
>>> struct dc_link *link,
>>> dp_translate_training_aux_read_interval(
>>> 
>>> link->dpcd_caps.lttpr_caps.aux_rd_interval[hop - 1]);
>>>
>>> +#if defined(CONFIG_DRM_AMD_DC_DCN)
>>
>> Same here. Please drop this guard if we don't need it.
>>
>> Harry
>>
>>> +   /* Check debug option for extending aux read interval. */
>>> +   if (link->dc->debug.dpia_debug.bits.extend_aux_rd_interval)
>>> +   wait_time_microsec = DPIA_DEBUG_EXTENDED_AUX_RD_INTERVAL_US;
>>> +#endif
>>> +
>>> return wait_time_microsec;
>>>  }
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/dc/dc.h
>>> b/drivers/gpu/drm/amd/display/dc/dc.h
>>> index e3f884942e04..86fa94a2ef48 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/dc.h
>>> +++ b/drivers/gpu/drm/amd/display/dc/dc.h
>>> @@ -499,7 +499,9 @@ union root_clock_optimization_options {  union
>>> dpia_debug_options {
>>> struct {
>>> uint32_t disable_dpia:1;
>>> -   uint32_t reserved:31;
>>> +   uint32_t force_non_lttpr:1;
>>> +   uint32_t extend_aux_rd_interval:1;
>>> +   uint32_t reserved:29;
>>> } bits;
>>> uint32_t raw;
>>>  };
>>> diff --git a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
>>> b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
>>> index 790b904e37e1..e3dfe4c89ce0 100644
>>> --- a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
>>> +++ b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
>>> @@ -34,6 +34,9 @@ struct dc_link_settings;
>>>  /* The approximate time (us) it takes to transmit 9 USB4 DP clock
>>> sync packets. */  #define DPIA_CLK_SYNC_DELAY 16000
>>>
>>> +/* Extend

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Tom Lendacky

On 10/6/21 8:23 AM, Alex Deucher wrote:

On Wed, Oct 6, 2021 at 5:42 AM Borislav Petkov wrote:

On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:

It's not incompatible per se, but SEM requires the IOMMU be enabled
because the C bit used for encryption is beyond the dma_mask of most
devices. If the C bit is not set, the en/decryption for DMA doesn't
occur. So you need IOMMU to be enabled in remapping mode to use SME
with most devices. Raven has further requirements in that it requires
IOMMUv2 functionality to support some features which currently uses a
direct mapping in the IOMMU and hence the C bit is not properly
handled.

So lemme ask you this: do Raven-containing systems exist out there which
don't have IOMMUv2 functionality and which can cause boot failures when
SME is enabled in the kernel .config?

There could be some OEM systems that disable the IOMMU on the platform
and don't provide a switch in the bios to enable it. The GPU driver
will still work in that case, it will just not be able to enable KFD
support for ROCm compute. SME won't work for most devices in that
case however since most devices have a DMA mask too small to handle
the C bit for encryption. SME should be dependent on IOMMU being
enabled.

That's not completely true. If the IOMMU is not enabled (off or in
passthrough mode), then the DMA api will check the DMA mask and use
SWIOTLB to bounce the DMA if the device doesn't support DMA at the
position where the c-bit is located (see force_dma_unencrypted() in
arch/x86/mm/mem_encrypt.c).

To avoid bounce buffering, though, commit 2cc13bb4f59f was introduced to
disable passthrough mode when SME is active (unless iommu=pt was
explicitly specified).

Thanks,
Tom

IOW, can we handle this at boot time properly, i.e., disable SME if we
detect Raven or IOMMUv2 support is missing?

If not, then we really will have to change the default.

I'm not an SME expert, but I thought that that was already the case.
We just added the error condition in the GPU driver to prevent the
driver from loading when the user forced SME on. IIRC, there were
users that cared more about SME than graphics support.

Alex

Thx.

--
Regards/Gruss,
Boris.

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquettedata=04%7C01%7Cthomas.lendacky%40amd.com%7Cbab2eedbc1704f90f63408d988cc7fb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637691234178637291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=xCXc1pcfJiWvKG1DTJKq986Ecid8M7M7K3gvCDWrZL8%3Dreserved=0

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Wed, Oct 06, 2021 at 09:23:22AM -0400, Alex Deucher wrote:
> There could be some OEM systems that disable the IOMMU on the platform
> and don't provide a switch in the bios to enable it.  The GPU driver
> will still work in that case, it will just not be able to enable KFD
> support for ROCm compute.  SME won't work for most devices in that
> case however since most devices have a DMA mask too small to handle
> the C bit for encryption.  SME should be dependent on IOMMU being
> enabled.

Yeah, I'd let you hash this out with Tom.

> I'm not an SME expert, but I thought that that was already the case.

Yeah, I think Paul wants this:

---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b79e88ee6627..e94c2df7a043 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,7 +1518,6 @@ config AMD_MEM_ENCRYPT
 
 config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
-   default y
depends on AMD_MEM_ENCRYPT
help
  Say yes to have system memory encrypted by default if running on

---

The reason we did this is so that you don't want to supply
mem_encrypt=on on the cmdline but didn't anticipate any such fun with
some devices.

> We just added the error condition in the GPU driver to prevent the
> driver from loading when the user forced SME on.  IIRC, there were
> users that cared more about SME than graphics support.

Well, it's a distro kernel so we should at least try to make everyone
happy. :)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/1] drm/amdgpu: return early if debugfs is not initialized

2021-10-06 Thread Das, Nirmoy




On 10/6/2021 1:55 PM, Lazar, Lijo wrote:



On 10/6/2021 3:21 PM, Nirmoy Das wrote:

Check first if debugfs is initialized before creating
amdgpu debugfs files.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 +++
  1 file changed, 3 insertions(+)



Sorry about another miss. There is one other option added in the patch.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a24c6f7bc923d5e2f3139855eb09b0d480d6b410 



"
config DEBUG_FS_DISALLOW_MOUNT
bool "Do not register debugfs as filesystem"
help
  The API is open but filesystem is not loaded. Clients can still 
do their work and read with debug tools that do not need debugfs 
filesystem.

"

This doesn't work under this mode. Guess, we are not worried about this.



It does work with DEBUG_FS_DISALLOW_MOUNT. I tested it with that option.



Reviewed-by: Lijo Lazar 



Thanks,

Nimroy



Thanks,
Lijo

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 6611b3c7c149..5497e2d31d1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,9 @@ int amdgpu_debugfs_init(struct amdgpu_device 
*adev)

  struct dentry *ent;
  int r, i;
  +    if (!debugfs_initialized())
+    return 0;
+
  ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, adev,
    _ib_preempt);
  if (IS_ERR(ent)) {

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:
> It's not incompatible per se, but SEM requires the IOMMU be enabled
> because the C bit used for encryption is beyond the dma_mask of most
> devices.  If the C bit is not set, the en/decryption for DMA doesn't
> occur.  So you need IOMMU to be enabled in remapping mode to use SME
> with most devices.  Raven has further requirements in that it requires
> IOMMUv2 functionality to support some features which currently uses a
> direct mapping in the IOMMU and hence the C bit is not properly
> handled.

So lemme ask you this: do Raven-containing systems exist out there which
don't have IOMMUv2 functionality and which can cause boot failures when
SME is enabled in the kernel .config?

IOW, can we handle this at boot time properly, i.e., disable SME if we
detect Raven or IOMMUv2 support is missing?

If not, then we really will have to change the default.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

On Wed, Oct 6, 2021 at 5:42 AM Borislav Petkov  wrote:
>
> On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:
> > It's not incompatible per se, but SEM requires the IOMMU be enabled
> > because the C bit used for encryption is beyond the dma_mask of most
> > devices.  If the C bit is not set, the en/decryption for DMA doesn't
> > occur.  So you need IOMMU to be enabled in remapping mode to use SME
> > with most devices.  Raven has further requirements in that it requires
> > IOMMUv2 functionality to support some features which currently uses a
> > direct mapping in the IOMMU and hence the C bit is not properly
> > handled.
>
> So lemme ask you this: do Raven-containing systems exist out there which
> don't have IOMMUv2 functionality and which can cause boot failures when
> SME is enabled in the kernel .config?

There could be some OEM systems that disable the IOMMU on the platform
and don't provide a switch in the bios to enable it.  The GPU driver
will still work in that case, it will just not be able to enable KFD
support for ROCm compute.  SME won't work for most devices in that
case however since most devices have a DMA mask too small to handle
the C bit for encryption.  SME should be dependent on IOMMU being
enabled.

>
> IOW, can we handle this at boot time properly, i.e., disable SME if we
> detect Raven or IOMMUv2 support is missing?
>
> If not, then we really will have to change the default.

I'm not an SME expert, but I thought that that was already the case.
We just added the error condition in the GPU driver to prevent the
driver from loading when the user forced SME on.  IIRC, there were
users that cared more about SME than graphics support.

Alex

>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/1] drm/amdgpu: return early if debugfs is not initialized


Am 06.10.21 um 13:55 schrieb Lazar, Lijo:



On 10/6/2021 3:21 PM, Nirmoy Das wrote:

Check first if debugfs is initialized before creating
amdgpu debugfs files.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 +++
  1 file changed, 3 insertions(+)



Sorry about another miss. There is one other option added in the patch.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a24c6f7bc923d5e2f3139855eb09b0d480d6b410 



"
config DEBUG_FS_DISALLOW_MOUNT
bool "Do not register debugfs as filesystem"
help
  The API is open but filesystem is not loaded. Clients can still 
do their work and read with debug tools that do not need debugfs 
filesystem.

"


I'm really wondering why it's useful to have debugfs enabled, but not 
mounted.




This doesn't work under this mode. Guess, we are not worried about this.

Reviewed-by: Lijo Lazar 


Anyway Reviewed-by: Christian König 



Thanks,
Lijo

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 6611b3c7c149..5497e2d31d1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,9 @@ int amdgpu_debugfs_init(struct amdgpu_device 
*adev)

  struct dentry *ent;
  int r, i;
  +    if (!debugfs_initialized())
+    return 0;
+
  ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, adev,
    _ib_preempt);
  if (IS_ERR(ent)) {

Re: [PATCH 1/1] drm/amdgpu: return early if debugfs is not initialized

2021-10-06 Thread Lazar, Lijo





On 10/6/2021 3:21 PM, Nirmoy Das wrote:

Check first if debugfs is initialized before creating
amdgpu debugfs files.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 +++
  1 file changed, 3 insertions(+)



Sorry about another miss. There is one other option added in the patch.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a24c6f7bc923d5e2f3139855eb09b0d480d6b410

"
config DEBUG_FS_DISALLOW_MOUNT
bool "Do not register debugfs as filesystem"
help
	  The API is open but filesystem is not loaded. Clients can still do 
their work and read with debug tools that do not need debugfs filesystem.

"

This doesn't work under this mode. Guess, we are not worried about this.

Reviewed-by: Lijo Lazar 

Thanks,
Lijo


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 6611b3c7c149..5497e2d31d1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,9 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
struct dentry *ent;
int r, i;
  
+	if (!debugfs_initialized())

+   return 0;
+
ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, adev,
  _ib_preempt);
if (IS_ERR(ent)) {

RE: [PATCH v2 00/23] USB4 DP tunneling

2021-10-06 Thread Lin, Wayne

[Public]

> -Original Message-
> From: Wentland, Harry 
> Sent: Wednesday, October 6, 2021 1:14 AM
> To: Lin, Wayne ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Kazlauskas, Nicholas 
> ; Siqueira, Rodrigo
> ; Wang, Chao-kai (Stylon) ; 
> Shih, Jude ; Kizito, Jimmy
> ; Somasundaram, Meenakshikumar 
> 
> Subject: Re: [PATCH v2 00/23] USB4 DP tunneling
>
> On 2021-10-05 03:51, Wayne Lin wrote:
> > USB4 runs over USB-C and can tunnels USB3, PCIe and DP protocols.
> >
> > A USB4 router is responsible for mapping Tunneled Protocol traffic to
> > USB4 packets and routes packets through the USB4 Fabric.
> > For this patchset, we have native DisplayPort able to be tunneled over
> > USB4 Fabric.
> >
> > E.g.
> > DP source -> DPIA (DP In Adapter) -> USB4 host router -> USB4 port ->
> > USB4 device router -> DPOA (DP Out Adapter) -> DPTX -> DP sink
> >
> > Briefly, there is a CM (Connection Manager) in USB subsystem which
> > handles relevant USB4 channel configuratons. Our DMCUB is responsible
> > for interacting with CM to control DPIA to enable Video Path & AUX
> > Path. Once DPIA gets into Paired state, DP source is then having a
> > constructed end-to-end path to interact with DP sink as the
> > conventional way.
> >
> > From DP Source perspective, the USB4 Fabric and the Adapters are
> > either totally transparent or act as an LTTPR. Besides, due to
> > constraints of USB4 protocols, AUX transactions under USB4 now is
> > handled by DMCUB to meet USB4 protocol requirement.
> >
> > Changes since v1:
> > * Give the description of rough working flow of USB4 DP tunneling
> >
>
> Thanks for the overview.
>
> Is DP-over-USB4, or our implementation of it, specific to AMD HW?
>
> Would it make sense to deal with DP-over-USB4 core functionality in DRM core 
> in a way that's common to all implementers?
Hi Harry,

In my opinion, it's just another connector interface. Like we have multiple 
type interfaces, HDMI/DP/VGA/Type-C (DP alt mode),
and now we have  new USB4 way to notify DP hotplug event.
DPIA/CM relevant hotplug events handling should be USB4 scope, I think current 
implementation should be fine.
>
> Have you run checkpatch.pl on all patches?
Yes, have run it.
>
> Patches 2-3, 9-18, 20-23 and
> Patches 1, 4-8 with the suggested title/description updates and Patch 19 with 
> the DCN guard dropped (or confirmation that it's needed) are
> Acked-by: Harry Wentland 
>
Thanks Harry!

> Harry
>
> > ---
> >
> > Jimmy Kizito (14):
> >   drm/amd/display: Update link encoder object creation.
> >   drm/amd/display: Support USB4 dynamic link encoder selection.
> >   drm/amd/display: Support USB4 for display endpoint control path.
> >   drm/amd/display: Support DP tunneling when DPRX detection
> >   drm/amd/display: Update training parameters for DPIA links
> >   drm/amd/display: Support USB4 when DP link training.
> >   drm/amd/display: Implement DPIA training loop
> >   drm/amd/display: Implement DPIA link configuration
> >   drm/amd/display: Implement DPIA clock recovery phase
> >   drm/amd/display: Implement DPIA equalisation phase
> >   drm/amd/display: Implement end of training for hop in DPIA display
> > path
> >   drm/amd/display: Read USB4 DP tunneling data from DPCD.
> >   drm/amd/display: Fix DIG_HPD_SELECT for USB4 display endpoints.
> >   drm/amd/display: Add debug flags for USB4 DP link training.
> >
> > Jude Shih (4):
> >   drm/amd/display: Support for SET_CONFIG processing with DMUB
> >   drm/amd/display: Deadlock/HPD Status/Crash Bug Fix
> >   drm/amd/display: Fix USB4 Aux via DMUB terminate unexpectedly
> >   drm/amd/display: USB4 bring up set correct address
> >
> > Meenakshikumar Somasundaram (5):
> >   drm/amd/display: USB4 DPIA enumeration and AUX Tunneling
> >   drm/amd/display: Support for DMUB HPD and HPD RX interrupt handling
> >   drm/amd/display: Support for SET_CONFIG processing with DMUB
> >   drm/amd/display: Add dpia debug options
> >   drm/amd/display: Fix for access for ddc pin and aux engine.
> >
> >  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 106 +-
> > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  12 +-
> > .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  17 +-
> >  drivers/gpu/drm/amd/display/dc/Makefile   |   2 +-
> >  drivers/gpu/drm/amd/display/dc/core/dc.c  | 179 +++-
> >  drivers/gpu/drm/amd/display/dc/core/dc_link.c |  81 +-
> >  .../gpu/drm/amd/display/dc/core/dc_link_ddc.c |   9 +-
> >  .../gpu/drm/amd/display/dc/core/dc_link_dp.c  |  36 +-
> >  .../drm/amd/display/dc/core/dc_link_dpia.c| 945 ++
> >  drivers/gpu/drm/amd/display/dc/core/dc_stat.c |   8 +
> >  drivers/gpu/drm/amd/display/dc/dc.h   |  22 +
> >  drivers/gpu/drm/amd/display/dc/dc_dp_types.h  |  31 +
> >  drivers/gpu/drm/amd/display/dc/dc_types.h |   1 +
> >  drivers/gpu/drm/amd/display/dc/dce/dce_aux.c  |   3 +
> >  .../display/dc/dcn31/dcn31_dio_link_encoder.c | 126 ++-
> >  .../drm/amd/display/dc/dcn31/dcn31_hwseq.c|   6 +
> >

RE: [PATCH v2 19/23] drm/amd/display: Add debug flags for USB4 DP link training

2021-10-06 Thread Lin, Wayne

[Public]

> -Original Message-
> From: Wentland, Harry 
> Sent: Wednesday, October 6, 2021 1:11 AM
> To: Lin, Wayne ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Kazlauskas, Nicholas 
> ; Siqueira, Rodrigo
> ; Wang, Chao-kai (Stylon) ; 
> Shih, Jude ; Kizito, Jimmy
> ; Somasundaram, Meenakshikumar 
> ; Lei, Jun 
> Subject: Re: [PATCH v2 19/23] drm/amd/display: Add debug flags for USB4 DP 
> link training
>
>
>
> On 2021-10-05 03:52, Wayne Lin wrote:
> > From: Jimmy Kizito 
> >
> > [Why & How]
> > Additional debug flags that can be useful for testing USB4 DP link
> > training.
> >
> > Add flags:
> > - 0x2 : Forces USB4 DP link to non-LTTPR mode
> > - 0x4 : Extends status read intervals to about 60s.
> >
> > Reviewed-by: Meenakshikumar Somasundaram
> > 
> > Reviewed-by: Jun Lei 
> > Acked-by: Wayne Lin 
> > Acked-by: Nicholas Kazlauskas 
> > Signed-off-by: Jimmy Kizito 
> > ---
> >  drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c   | 6 ++
> >  drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c | 6 ++
> >  drivers/gpu/drm/amd/display/dc/dc.h| 4 +++-
> >  drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h  | 3 +++
> >  4 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> > b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> > index bfba1d2c6a18..423fbd2b9b39 100644
> > --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> > +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> > @@ -4528,6 +4528,12 @@ bool dp_retrieve_lttpr_cap(struct dc_link *link)
> > else
> > link->lttpr_mode = LTTPR_MODE_NON_TRANSPARENT;
> > }
> > +#if defined(CONFIG_DRM_AMD_DC_DCN)
>
> Why is this guarded with DC_DCN when all other DPIA code isn't?
> It looks like it might be unnecessary.
Thanks Harry.

Since declaration of dpia_debug variable is guarded by CONFIG_DRM_AMD_DC_DCN,
we should keep this here.

Thanks!
>
> > +   /* Check DP tunnel LTTPR mode debug option. */
> > +   if (link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA &&
> > +   link->dc->debug.dpia_debug.bits.force_non_lttpr)
> > +   link->lttpr_mode = LTTPR_MODE_NON_LTTPR; #endif
> >
> > if (link->lttpr_mode == LTTPR_MODE_NON_TRANSPARENT || link->lttpr_mode 
> > == LTTPR_MODE_TRANSPARENT) {
> > /* By reading LTTPR capability, RX assumes that we will enable 
> > diff
> > --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
> > b/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
> > index 7407c755a73e..ce15a38c2aea 100644
> > --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
> > +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dpia.c
> > @@ -528,6 +528,12 @@ static uint32_t dpia_get_eq_aux_rd_interval(const 
> > struct dc_link *link,
> > dp_translate_training_aux_read_interval(
> > 
> > link->dpcd_caps.lttpr_caps.aux_rd_interval[hop - 1]);
> >
> > +#if defined(CONFIG_DRM_AMD_DC_DCN)
>
> Same here. Please drop this guard if we don't need it.
>
> Harry
>
> > +   /* Check debug option for extending aux read interval. */
> > +   if (link->dc->debug.dpia_debug.bits.extend_aux_rd_interval)
> > +   wait_time_microsec = DPIA_DEBUG_EXTENDED_AUX_RD_INTERVAL_US;
> > +#endif
> > +
> > return wait_time_microsec;
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/dc.h
> > b/drivers/gpu/drm/amd/display/dc/dc.h
> > index e3f884942e04..86fa94a2ef48 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dc.h
> > +++ b/drivers/gpu/drm/amd/display/dc/dc.h
> > @@ -499,7 +499,9 @@ union root_clock_optimization_options {  union
> > dpia_debug_options {
> > struct {
> > uint32_t disable_dpia:1;
> > -   uint32_t reserved:31;
> > +   uint32_t force_non_lttpr:1;
> > +   uint32_t extend_aux_rd_interval:1;
> > +   uint32_t reserved:29;
> > } bits;
> > uint32_t raw;
> >  };
> > diff --git a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
> > b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
> > index 790b904e37e1..e3dfe4c89ce0 100644
> > --- a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
> > +++ b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dpia.h
> > @@ -34,6 +34,9 @@ struct dc_link_settings;
> >  /* The approximate time (us) it takes to transmit 9 USB4 DP clock
> > sync packets. */  #define DPIA_CLK_SYNC_DELAY 16000
> >
> > +/* Extend interval between training status checks for manual testing.
> > +*/ #define DPIA_DEBUG_EXTENDED_AUX_RD_INTERVAL_US 6000
> > +
> >  /** @note Can remove once DP tunneling registers in upstream
> > include/drm/drm_dp_helper.h */
> >  /* DPCD DP Tunneling over USB4 */
> >  #define DP_TUNNELING_CAPABILITIES_SUPPORT 0xe000d
> >
--
Regards,
Wayne

[PATCH] drm/amd/display: move FPU associated DCN301 code to DML folder

2021-10-06 Thread Qingqing Zhuo

[Why & How]
As part of the FPU isolation work documented in
https://patchwork.freedesktop.org/series/93042/, isolate
code that uses FPU in DCN301 to DML, where all FPU code
should locate.

Cc: Christian König 
Cc: Harry Wentland 
Cc: Rodrigo Siqueira 
Tested-by: Zhan Liu 
Signed-off-by: Qingqing Zhuo 
---
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |   2 +
 .../gpu/drm/amd/display/dc/dcn301/Makefile|  26 --
 .../amd/display/dc/dcn301/dcn301_resource.c   | 349 +---
 .../amd/display/dc/dcn301/dcn301_resource.h   |   3 +
 drivers/gpu/drm/amd/display/dc/dml/Makefile   |   3 +
 .../amd/display/dc/dml/dcn301/dcn301_fpu.c| 390 ++
 .../amd/display/dc/dml/dcn301/dcn301_fpu.h|  42 ++
 7 files changed, 450 insertions(+), 365 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c
 create mode 100644 drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 3a8a3214f770..dc9fa31f8d6c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -2323,7 +2323,9 @@ bool dcn30_validate_bandwidth(struct dc *dc,
goto validate_out;
}
 
+   DC_FP_START();
dc->res_pool->funcs->calculate_wm_and_dlg(dc, context, pipes, pipe_cnt, 
vlevel);
+   DC_FP_END();
 
BW_VAL_TRACE_END_WATERMARKS();
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/Makefile 
b/drivers/gpu/drm/amd/display/dc/dcn301/Makefile
index 09264716d1dc..7aa628c21973 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/Makefile
@@ -13,32 +13,6 @@
 DCN301 = dcn301_init.o dcn301_resource.o dcn301_dccg.o \
dcn301_dio_link_encoder.o dcn301_hwseq.o dcn301_panel_cntl.o 
dcn301_hubbub.o
 
-ifdef CONFIG_X86
-CFLAGS_$(AMDDALPATH)/dc/dcn301/dcn301_resource.o := -msse
-endif
-
-ifdef CONFIG_PPC64
-CFLAGS_$(AMDDALPATH)/dc/dcn301/dcn301_resource.o := -mhard-float -maltivec
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifeq ($(call cc-ifversion, -lt, 0701, y), y)
-IS_OLD_GCC = 1
-endif
-CFLAGS_$(AMDDALPATH)/dc/dcn301/dcn301_resource.o += -mhard-float
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-CFLAGS_$(AMDDALPATH)/dc/dcn301/dcn301_resource.o += 
-mpreferred-stack-boundary=4
-else
-CFLAGS_$(AMDDALPATH)/dc/dcn301/dcn301_resource.o += -msse2
-endif
-endif
-
 AMD_DAL_DCN301 = $(addprefix $(AMDDALPATH)/dc/dcn301/,$(DCN301))
 
 AMD_DISPLAY_FILES += $(AMD_DAL_DCN301)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index 5350c93d7772..fbaa03f26d8b 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -82,6 +82,7 @@
 #include "dce/dce_i2c.h"
 
 #include "dml/dcn30/display_mode_vba_30.h"
+#include "dml/dcn301/dcn301_fpu.h"
 #include "vm_helper.h"
 #include "dcn20/dcn20_vmid.h"
 #include "amdgpu_socbb.h"
@@ -91,184 +92,6 @@
 
 #define DC_LOGGER_INIT(logger)
 
-struct _vcs_dpi_ip_params_st dcn3_01_ip = {
-   .odm_capable = 1,
-   .gpuvm_enable = 1,
-   .hostvm_enable = 1,
-   .gpuvm_max_page_table_levels = 1,
-   .hostvm_max_page_table_levels = 2,
-   .hostvm_cached_page_table_levels = 0,
-   .pte_group_size_bytes = 2048,
-   .num_dsc = 3,
-   .rob_buffer_size_kbytes = 184,
-   .det_buffer_size_kbytes = 184,
-   .dpte_buffer_size_in_pte_reqs_luma = 64,
-   .dpte_buffer_size_in_pte_reqs_chroma = 32,
-   .pde_proc_buffer_size_64k_reqs = 48,
-   .dpp_output_buffer_pixels = 2560,
-   .opp_output_buffer_lines = 1,
-   .pixel_chunk_size_kbytes = 8,
-   .meta_chunk_size_kbytes = 2,
-   .writeback_chunk_size_kbytes = 8,
-   .line_buffer_size_bits = 789504,
-   .is_line_buffer_bpp_fixed = 0,  // ?
-   .line_buffer_fixed_bpp = 48, // ?
-   .dcc_supported = true,
-   .writeback_interface_buffer_size_kbytes = 90,
-   .writeback_line_buffer_buffer_size = 656640,
-   .max_line_buffer_lines = 12,
-   .writeback_luma_buffer_size_kbytes = 12,  // 
writeback_line_buffer_buffer_size = 656640
-   .writeback_chroma_buffer_size_kbytes = 8,
-   .writeback_chroma_line_buffer_width_pixels = 4,
-   .writeback_max_hscl_ratio = 1,
-   .writeback_max_vscl_ratio = 1,
-   .writeback_min_hscl_ratio = 1,
-   .writeback_min_vscl_ratio = 1,
-   .writeback_max_hscl_taps = 1,
-   .writeback_max_vscl_taps = 1,
-   .writeback_line_buffer_luma_buffer_size = 0,
-   .writeback_line_buffer_chroma_buffer_size = 14643,
-   .cursor_buffer_size = 8,
-   .cursor_chunk_size = 2,
-   .max_num_otg = 4,

[PATCH 1/1] drm/amdgpu: return early if debugfs is not initialized

2021-10-06 Thread Nirmoy Das

Check first if debugfs is initialized before creating
amdgpu debugfs files.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 6611b3c7c149..5497e2d31d1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,9 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
struct dentry *ent;
int r, i;
 
+   if (!debugfs_initialized())
+   return 0;
+
ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, adev,
  _ib_preempt);
if (IS_ERR(ent)) {
-- 
2.32.0

Re: [PATCH 1/1] drm/amdgpu: ignore -EPERM error from debugfs

2021-10-06 Thread Das, Nirmoy




On 10/6/2021 8:59 AM, Christian König wrote:

Am 06.10.21 um 08:55 schrieb Lazar, Lijo:



On 10/6/2021 12:05 PM, Christian König wrote:

Am 06.10.21 um 08:32 schrieb Lazar, Lijo:



On 10/6/2021 11:49 AM, Christian König wrote:

Am 06.10.21 um 06:51 schrieb Lazar, Lijo:



On 10/5/2021 10:15 PM, Christian König wrote:

Am 05.10.21 um 15:49 schrieb Das, Nirmoy:


On 10/5/2021 3:22 PM, Christian König wrote:



Am 05.10.21 um 15:11 schrieb Nirmoy Das:

Debugfs core APIs will throw -EPERM when user disables debugfs
using CONFIG_DEBUG_FS_ALLOW_NONE or with kernel param. We 
shouldn't
see that as an error. Also validate drm root dentry before 
creating

amdgpu debugfs files.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 6611b3c7c149..d786072e918b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,16 @@ int amdgpu_debugfs_init(struct 
amdgpu_device *adev)

  struct dentry *ent;
  int r, i;
  +    if (IS_ERR(root)) {
+    /* When debugfs is disabled we get -EPERM which is 
not an

+ * error as this is user controllable.
+ */


Well setting primary->debugfs_root to an error code is 
probably not a good idea to begin with.


When debugfs is disabled that should most likely be NULL.



If we set primary->debugfs_root to  NULL then we need to add 
bunch of NULL checks everywhere before creating any debugfs files


because debugfs_create_{file|dir}() with NULL root is still 
valid. I am assuming a hypothetical case when debugfs_root dir 
creation fails even with debugfs enabled


but further calls are successful.  This wont be a problem if we 
propagate the error code.


Yeah, but an error code in members is ugly like hell and 
potentially causes crashes instead.


I strongly suggest to fix this so that root is NULL when debugfs 
isn't available and we add proper checks for that instead.


This shouldn't be done. A NULL is a valid parent for debugfs API. 
An invalid parent is always checked like this

  if (IS_ERR(parent))
    return parent;

Instead of adding redundant work like NULL checks, let the API do 
its work and don't break the API contract. For ex: usage of 
sample client, you may look at the drm usage; it does the same.


Yeah, but that is horrible API design and should be avoided.

ERR_PTR(), PTR_ERR(), IS_ERR() and similar are supposed to be used 
as alternative to signaling errors as return values from functions 
and should *never* ever be used to signal errors in pointer members.




One escape route may be - add another export from debugfs like 
debugfs_is_valid_node() which adheres to the current logic in 
debugfs API and use that in client code. Whenever debugfs changes 
to a different logic from IS_ERR, let that be changed.


Well that would then rather be drm_is_debugfs_enabled(), because 
that we separate debugfs handling into a drm core and individual 
drivers is drm specific.




Had one more look and looks like this will do the job. In other 
cases, API usage is allowed.


if (!debugfs_initialized())
    return;


Yeah, that might work as well.

Potentially a good idea to add that to both the core drm function and 
the amdgpu function. and not attempt to create debugfs files in the 
first place.



Sounds good, I will send patches to add this check.


Thanks,

Nirmoy




Christian.



Thanks,
Lijo


Christian.



Thanks,
Lijo


Regards,
Christian.



Thanks,
Lijo



Regards,
Christian.




Regards,

Nirmoy



Regards,
Christian.


+    if (PTR_ERR(root) == -EPERM)
+    return 0;
+
+    return PTR_ERR(ent);
+    }
+
  ent = debugfs_create_file("amdgpu_preempt_ib", 0600, 
root, adev,

    _ib_preempt);
  if (IS_ERR(ent)) {

Re: [PATCH 1/1] drm/amdgpu: ignore -EPERM error from debugfs


Am 06.10.21 um 08:55 schrieb Lazar, Lijo:



On 10/6/2021 12:05 PM, Christian König wrote:

Am 06.10.21 um 08:32 schrieb Lazar, Lijo:



On 10/6/2021 11:49 AM, Christian König wrote:

Am 06.10.21 um 06:51 schrieb Lazar, Lijo:



On 10/5/2021 10:15 PM, Christian König wrote:

Am 05.10.21 um 15:49 schrieb Das, Nirmoy:


On 10/5/2021 3:22 PM, Christian König wrote:



Am 05.10.21 um 15:11 schrieb Nirmoy Das:

Debugfs core APIs will throw -EPERM when user disables debugfs
using CONFIG_DEBUG_FS_ALLOW_NONE or with kernel param. We 
shouldn't
see that as an error. Also validate drm root dentry before 
creating

amdgpu debugfs files.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 6611b3c7c149..d786072e918b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,16 @@ int amdgpu_debugfs_init(struct 
amdgpu_device *adev)

  struct dentry *ent;
  int r, i;
  +    if (IS_ERR(root)) {
+    /* When debugfs is disabled we get -EPERM which is 
not an

+ * error as this is user controllable.
+ */


Well setting primary->debugfs_root to an error code is probably 
not a good idea to begin with.


When debugfs is disabled that should most likely be NULL.



If we set primary->debugfs_root to  NULL then we need to add 
bunch of NULL checks everywhere before creating any debugfs files


because debugfs_create_{file|dir}() with NULL root is still 
valid. I am assuming a hypothetical case when debugfs_root dir 
creation fails even with debugfs enabled


but further calls are successful.  This wont be a problem if we 
propagate the error code.


Yeah, but an error code in members is ugly like hell and 
potentially causes crashes instead.


I strongly suggest to fix this so that root is NULL when debugfs 
isn't available and we add proper checks for that instead.


This shouldn't be done. A NULL is a valid parent for debugfs API. 
An invalid parent is always checked like this

  if (IS_ERR(parent))
    return parent;

Instead of adding redundant work like NULL checks, let the API do 
its work and don't break the API contract. For ex: usage of sample 
client, you may look at the drm usage; it does the same.


Yeah, but that is horrible API design and should be avoided.

ERR_PTR(), PTR_ERR(), IS_ERR() and similar are supposed to be used 
as alternative to signaling errors as return values from functions 
and should *never* ever be used to signal errors in pointer members.




One escape route may be - add another export from debugfs like 
debugfs_is_valid_node() which adheres to the current logic in 
debugfs API and use that in client code. Whenever debugfs changes to 
a different logic from IS_ERR, let that be changed.


Well that would then rather be drm_is_debugfs_enabled(), because that 
we separate debugfs handling into a drm core and individual drivers 
is drm specific.




Had one more look and looks like this will do the job. In other cases, 
API usage is allowed.


if (!debugfs_initialized())
    return;


Yeah, that might work as well.

Potentially a good idea to add that to both the core drm function and 
the amdgpu function. and not attempt to create debugfs files in the 
first place.


Christian.



Thanks,
Lijo


Christian.



Thanks,
Lijo


Regards,
Christian.



Thanks,
Lijo



Regards,
Christian.




Regards,

Nirmoy



Regards,
Christian.


+    if (PTR_ERR(root) == -EPERM)
+    return 0;
+
+    return PTR_ERR(ent);
+    }
+
  ent = debugfs_create_file("amdgpu_preempt_ib", 0600, 
root, adev,

    _ib_preempt);
  if (IS_ERR(ent)) {

Re: [PATCH 1/1] drm/amdgpu: ignore -EPERM error from debugfs

2021-10-06 Thread Lazar, Lijo





On 10/6/2021 12:05 PM, Christian König wrote:

Am 06.10.21 um 08:32 schrieb Lazar, Lijo:



On 10/6/2021 11:49 AM, Christian König wrote:

Am 06.10.21 um 06:51 schrieb Lazar, Lijo:



On 10/5/2021 10:15 PM, Christian König wrote:

Am 05.10.21 um 15:49 schrieb Das, Nirmoy:


On 10/5/2021 3:22 PM, Christian König wrote:



Am 05.10.21 um 15:11 schrieb Nirmoy Das:

Debugfs core APIs will throw -EPERM when user disables debugfs
using CONFIG_DEBUG_FS_ALLOW_NONE or with kernel param. We shouldn't
see that as an error. Also validate drm root dentry before creating
amdgpu debugfs files.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 6611b3c7c149..d786072e918b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,16 @@ int amdgpu_debugfs_init(struct 
amdgpu_device *adev)

  struct dentry *ent;
  int r, i;
  +    if (IS_ERR(root)) {
+    /* When debugfs is disabled we get -EPERM which is not an
+ * error as this is user controllable.
+ */


Well setting primary->debugfs_root to an error code is probably 
not a good idea to begin with.


When debugfs is disabled that should most likely be NULL.



If we set primary->debugfs_root to  NULL then we need to add bunch 
of NULL checks everywhere before creating any debugfs files


because debugfs_create_{file|dir}() with NULL root is still valid. 
I am assuming a hypothetical case when debugfs_root dir creation 
fails even with debugfs enabled


but further calls are successful.  This wont be a problem if we 
propagate the error code.


Yeah, but an error code in members is ugly like hell and 
potentially causes crashes instead.


I strongly suggest to fix this so that root is NULL when debugfs 
isn't available and we add proper checks for that instead.


This shouldn't be done. A NULL is a valid parent for debugfs API. An 
invalid parent is always checked like this

  if (IS_ERR(parent))
    return parent;

Instead of adding redundant work like NULL checks, let the API do 
its work and don't break the API contract. For ex: usage of sample 
client, you may look at the drm usage; it does the same.


Yeah, but that is horrible API design and should be avoided.

ERR_PTR(), PTR_ERR(), IS_ERR() and similar are supposed to be used as 
alternative to signaling errors as return values from functions and 
should *never* ever be used to signal errors in pointer members.




One escape route may be - add another export from debugfs like 
debugfs_is_valid_node() which adheres to the current logic in debugfs 
API and use that in client code. Whenever debugfs changes to a 
different logic from IS_ERR, let that be changed.


Well that would then rather be drm_is_debugfs_enabled(), because that we 
separate debugfs handling into a drm core and individual drivers is drm 
specific.




Had one more look and looks like this will do the job. In other cases, 
API usage is allowed.


if (!debugfs_initialized())
return;

Thanks,
Lijo


Christian.



Thanks,
Lijo


Regards,
Christian.



Thanks,
Lijo



Regards,
Christian.




Regards,

Nirmoy



Regards,
Christian.


+    if (PTR_ERR(root) == -EPERM)
+    return 0;
+
+    return PTR_ERR(ent);
+    }
+
  ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, 
adev,

    _ib_preempt);
  if (IS_ERR(ent)) {

Re: [PATCH 1/1] drm/amdgpu: ignore -EPERM error from debugfs


Am 06.10.21 um 08:32 schrieb Lazar, Lijo:



On 10/6/2021 11:49 AM, Christian König wrote:

Am 06.10.21 um 06:51 schrieb Lazar, Lijo:



On 10/5/2021 10:15 PM, Christian König wrote:

Am 05.10.21 um 15:49 schrieb Das, Nirmoy:


On 10/5/2021 3:22 PM, Christian König wrote:



Am 05.10.21 um 15:11 schrieb Nirmoy Das:

Debugfs core APIs will throw -EPERM when user disables debugfs
using CONFIG_DEBUG_FS_ALLOW_NONE or with kernel param. We shouldn't
see that as an error. Also validate drm root dentry before creating
amdgpu debugfs files.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 6611b3c7c149..d786072e918b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,16 @@ int amdgpu_debugfs_init(struct 
amdgpu_device *adev)

  struct dentry *ent;
  int r, i;
  +    if (IS_ERR(root)) {
+    /* When debugfs is disabled we get -EPERM which is not an
+ * error as this is user controllable.
+ */


Well setting primary->debugfs_root to an error code is probably 
not a good idea to begin with.


When debugfs is disabled that should most likely be NULL.



If we set primary->debugfs_root to  NULL then we need to add bunch 
of NULL checks everywhere before creating any debugfs files


because debugfs_create_{file|dir}() with NULL root is still valid. 
I am assuming a hypothetical case when debugfs_root dir creation 
fails even with debugfs enabled


but further calls are successful.  This wont be a problem if we 
propagate the error code.


Yeah, but an error code in members is ugly like hell and 
potentially causes crashes instead.


I strongly suggest to fix this so that root is NULL when debugfs 
isn't available and we add proper checks for that instead.


This shouldn't be done. A NULL is a valid parent for debugfs API. An 
invalid parent is always checked like this

  if (IS_ERR(parent))
    return parent;

Instead of adding redundant work like NULL checks, let the API do 
its work and don't break the API contract. For ex: usage of sample 
client, you may look at the drm usage; it does the same.


Yeah, but that is horrible API design and should be avoided.

ERR_PTR(), PTR_ERR(), IS_ERR() and similar are supposed to be used as 
alternative to signaling errors as return values from functions and 
should *never* ever be used to signal errors in pointer members.




One escape route may be - add another export from debugfs like 
debugfs_is_valid_node() which adheres to the current logic in debugfs 
API and use that in client code. Whenever debugfs changes to a 
different logic from IS_ERR, let that be changed.


Well that would then rather be drm_is_debugfs_enabled(), because that we 
separate debugfs handling into a drm core and individual drivers is drm 
specific.


Christian.



Thanks,
Lijo


Regards,
Christian.



Thanks,
Lijo



Regards,
Christian.




Regards,

Nirmoy



Regards,
Christian.


+    if (PTR_ERR(root) == -EPERM)
+    return 0;
+
+    return PTR_ERR(ent);
+    }
+
  ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, 
adev,

    _ib_preempt);
  if (IS_ERR(ent)) {

Re: [PATCH 1/1] drm/amdgpu: ignore -EPERM error from debugfs

2021-10-06 Thread Lazar, Lijo





On 10/6/2021 11:49 AM, Christian König wrote:

Am 06.10.21 um 06:51 schrieb Lazar, Lijo:



On 10/5/2021 10:15 PM, Christian König wrote:

Am 05.10.21 um 15:49 schrieb Das, Nirmoy:


On 10/5/2021 3:22 PM, Christian König wrote:



Am 05.10.21 um 15:11 schrieb Nirmoy Das:

Debugfs core APIs will throw -EPERM when user disables debugfs
using CONFIG_DEBUG_FS_ALLOW_NONE or with kernel param. We shouldn't
see that as an error. Also validate drm root dentry before creating
amdgpu debugfs files.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 6611b3c7c149..d786072e918b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1617,6 +1617,16 @@ int amdgpu_debugfs_init(struct 
amdgpu_device *adev)

  struct dentry *ent;
  int r, i;
  +    if (IS_ERR(root)) {
+    /* When debugfs is disabled we get -EPERM which is not an
+ * error as this is user controllable.
+ */


Well setting primary->debugfs_root to an error code is probably not 
a good idea to begin with.


When debugfs is disabled that should most likely be NULL.



If we set primary->debugfs_root to  NULL then we need to add bunch 
of NULL checks everywhere before creating any debugfs files


because debugfs_create_{file|dir}() with NULL root is still valid. I 
am assuming a hypothetical case when debugfs_root dir creation fails 
even with debugfs enabled


but further calls are successful.  This wont be a problem if we 
propagate the error code.


Yeah, but an error code in members is ugly like hell and potentially 
causes crashes instead.


I strongly suggest to fix this so that root is NULL when debugfs 
isn't available and we add proper checks for that instead.


This shouldn't be done. A NULL is a valid parent for debugfs API. An 
invalid parent is always checked like this

  if (IS_ERR(parent))
    return parent;

Instead of adding redundant work like NULL checks, let the API do its 
work and don't break the API contract. For ex: usage of sample client, 
you may look at the drm usage; it does the same.


Yeah, but that is horrible API design and should be avoided.

ERR_PTR(), PTR_ERR(), IS_ERR() and similar are supposed to be used as 
alternative to signaling errors as return values from functions and 
should *never* ever be used to signal errors in pointer members.




One escape route may be - add another export from debugfs like 
debugfs_is_valid_node() which adheres to the current logic in debugfs 
API and use that in client code. Whenever debugfs changes to a different 
logic from IS_ERR, let that be changed.


Thanks,
Lijo


Regards,
Christian.



Thanks,
Lijo



Regards,
Christian.




Regards,

Nirmoy



Regards,
Christian.


+    if (PTR_ERR(root) == -EPERM)
+    return 0;
+
+    return PTR_ERR(ent);
+    }
+
  ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, 
adev,

    _ib_preempt);
  if (IS_ERR(ent)) {

Re: `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` causes AMDGPU to fail on Ryzen: amdgpu: SME is not compatible with RAVEN

2021-10-06 Thread Paul Menzel


Dear Borislav,


Thank you for your reply.

Am 05.10.21 um 16:38 schrieb Borislav Petkov:

On Tue, Oct 05, 2021 at 04:29:41PM +0200, Paul Menzel wrote:

Selecting the symbol `AMD_MEM_ENCRYPT` – as
done in Debian 5.13.9-1~exp1 [1] – also selects
`AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults to yes,


I'm assuming that "selecting" is done automatically: alldefconfig,
olddefconfig?

Because CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT only depends on
CONFIG_AMD_MEM_ENCRYPT and former can be disabled in oldconfig or
menuconfig etc.


Sorry for being unclear. Distributions want to enable support for that 
feature, but as long as it breaks systems, it should be opt-in via the 
Linux kernel command line, and not opt-out. Also the Kconfig help texts 
do not mention anything about these problems, and the AMDGPU log message 
is of level info and not error. It’d be even better, if the message 
would contain the information, how to disable SME (`mem_encrypt=off`).



Kind regards,

Paul

Re: [PATCH 1/1] drm/amdgpu: ignore -EPERM error from debugfs