Re: [PATCH v2 0/2] drm/i915/mtl: Add PTE encode functions

2023-04-25 Thread Das, Nirmoy



On 4/24/2023 8:29 PM, fei.y...@intel.com wrote:

From: Fei Yang 

Extract PTE patch from https://patchwork.freedesktop.org/series/116868/
to fix MTL boot issue caused by MOCS/PAT update.

v2: address comment from Matt.

Fei Yang (2):
   drm/i915/mtl: Add PTE encode function
   drm/i915/mtl: workaround coherency issue for Media



Pushed this to drm-intel-gt-next. Thanks for unblocking MTL.





  drivers/gpu/drm/i915/display/intel_dpt.c  |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 45 +++
  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 36 --
  drivers/gpu/drm/i915/gt/intel_gtt.h   | 12 +-
  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  7 
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 +++
  8 files changed, 112 insertions(+), 14 deletions(-)



Re: [PATCH 0/2] Restore MTL boot

2023-04-24 Thread Das, Nirmoy



On 4/24/2023 6:09 PM, Andi Shyti wrote:

Hi,

The two patches reverted in this series are, together, preventing
MTL from booting.

Revert them until the fix is deployed.

Andi

Andi Shyti (2):
   Revert "drm/i915/mtl: fix mocs selftest"
   Revert "drm/i915/mtl: Define MOCS and PAT tables for MTL"



Series is Reviewed-by: Nirmoy Das 



  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 +--
  drivers/gpu/drm/i915/gt/intel_gtt.c | 47 +
  drivers/gpu/drm/i915/gt/intel_gtt.h |  8 ---
  drivers/gpu/drm/i915/gt/intel_mocs.c| 70 +
  drivers/gpu/drm/i915/gt/selftest_mocs.c |  3 +-
  5 files changed, 4 insertions(+), 130 deletions(-)



Re: [Intel-gfx] [PATCH] drm/i915/mtl: workaround coherency issue for Media

2023-04-21 Thread Das, Nirmoy



On 4/20/2023 11:30 PM, Matt Roper wrote:

On Thu, Apr 20, 2023 at 01:38:59PM +0200, Nirmoy Das wrote:

From: Fei Yang 

This patch implements Wa_22016122933.

In MTL, memory writes initiated by Media tile update the whole
cache line even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line. CTB communication
is impacted by this issue because the head and tail pointers are
adjacent words within a cache line (see struct guc_ct_buffer_desc),
where one is written by GuC and the other by the host.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
CTB which is being updated by both CPU and GuC, mfence instruction
is added to make sure the CPU writes are visible to GPU right away
(flush the write combining buffer).

I posted a note about the commit message here on the original series
about an hour ago:

https://lore.kernel.org/intel-gfx/20230420205238.ga4085...@mdroper-desk1.amr.corp.intel.com/

Patch itself looks fine, I just think the last sentence above should be
simplified to avoid inaccuracy.


Thanks for your review, Matt. I will resend with that fixed.


Nirmoy



Matt


While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.

BSpec: 45101

Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 
Acked-by: Nirmoy Das 
Reviewed-by: Andrzej Hajda 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 -
  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  7 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++
  4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..89fc8ea6bcfc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct 
drm_i915_private *i915,
  struct drm_i915_gem_object *obj,
  bool always_coherent)
  {
-   if (i915_gem_object_is_lmem(obj))
+   /*
+* Wa_22016122933: always return I915_MAP_WC for MTL
+*/
+   if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
return I915_MAP_WB;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
index 1d9fdfb11268..236673c02f9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
@@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
if (obj->base.size < gsc->fw.size)
return -ENOSPC;
  
+	/*

+* Wa_22016122933: For MTL the shared memory needs to be mapped
+* as WC on CPU side and UC (PAT index 2) on GPU side
+*/
+   if (IS_METEORLAKE(i915))
+   i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
dst = i915_gem_object_pin_map_unlocked(obj,
   i915_coherent_map_type(i915, 
obj, true));
if (IS_ERR(dst))
@@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
memset(dst, 0, obj->base.size);
memcpy(dst, src, gsc->fw.size);
  
+	/*

+* Wa_22016122933: Making sure the data in dst is
+* visible to GSC right away
+*/
+   intel_guc_write_barrier(>uc.guc);
+
i915_gem_object_unpin_map(gsc->fw.obj);
i915_gem_object_unpin_map(obj);
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c

index e89f16ecf1ae..c9f20385f6a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc 
*guc, u32 size)
if (IS_ERR(obj))
return ERR_CAST(obj);
  
+	/*

+* Wa_22016122933: For MTL the shared memory needs to be mapped
+* as WC on CPU side and UC (PAT index 2) on GPU side
+*/
+   if (IS_METEORLAKE(gt->i915))
+   i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
vma = i915_vma_instance(obj, >ggtt->vm, NULL);
if (IS_ERR(vma))
goto err;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 

Re: [Intel-gfx] [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media

2023-04-20 Thread Das, Nirmoy

This is a important fix and can be pushed without depending on this series.

I will send this out to mailing list separately for CI.


Regards,

Nirmoy

On 4/20/2023 1:00 AM, fei.y...@intel.com wrote:

From: Fei Yang 

This patch implements Wa_22016122933.

In MTL, memory writes initiated by Media tile update the whole
cache line even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line. CTB communication
is impacted by this issue because the head and tail pointers are
adjacent words within a cache line (see struct guc_ct_buffer_desc),
where one is written by GuC and the other by the host.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
CTB which is being updated by both CPU and GuC, mfence instruction
is added to make sure the CPU writes are visible to GPU right away
(flush the write combining buffer).

While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.

BSpec: 45101

Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 
Acked-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 -
  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  7 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++
  4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..89fc8ea6bcfc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct 
drm_i915_private *i915,
  struct drm_i915_gem_object *obj,
  bool always_coherent)
  {
-   if (i915_gem_object_is_lmem(obj))
+   /*
+* Wa_22016122933: always return I915_MAP_WC for MTL
+*/
+   if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
return I915_MAP_WB;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
index 1d9fdfb11268..236673c02f9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
@@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
if (obj->base.size < gsc->fw.size)
return -ENOSPC;
  
+	/*

+* Wa_22016122933: For MTL the shared memory needs to be mapped
+* as WC on CPU side and UC (PAT index 2) on GPU side
+*/
+   if (IS_METEORLAKE(i915))
+   i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
dst = i915_gem_object_pin_map_unlocked(obj,
   i915_coherent_map_type(i915, 
obj, true));
if (IS_ERR(dst))
@@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
memset(dst, 0, obj->base.size);
memcpy(dst, src, gsc->fw.size);
  
+	/*

+* Wa_22016122933: Making sure the data in dst is
+* visible to GSC right away
+*/
+   intel_guc_write_barrier(>uc.guc);
+
i915_gem_object_unpin_map(gsc->fw.obj);
i915_gem_object_unpin_map(obj);
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c

index e89f16ecf1ae..c9f20385f6a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc 
*guc, u32 size)
if (IS_ERR(obj))
return ERR_CAST(obj);
  
+	/*

+* Wa_22016122933: For MTL the shared memory needs to be mapped
+* as WC on CPU side and UC (PAT index 2) on GPU side
+*/
+   if (IS_METEORLAKE(gt->i915))
+   i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
vma = i915_vma_instance(obj, >ggtt->vm, NULL);
if (IS_ERR(vma))
goto err;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1803a633ed64..99a0a89091e7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct 
ct_incoming_msg **msg)
/* now update descriptor */
WRITE_ONCE(desc->head, head);
  
+	/*

+* Wa_22016122933: 

Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Set has_llc=0

2023-04-20 Thread Das, Nirmoy
We have multiple bugs that requires this and it can be picked up 
irrespective of this series. I have sent a trybot patch for this and


once that passes, I will push this one.


https://patchwork.freedesktop.org/series/116746/


Nirmoy

On 4/20/2023 1:00 AM, fei.y...@intel.com wrote:

From: Fei Yang 

On MTL, LLC is not shared between GT and CPU, set has_llc=0.

Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 
Reviewed-by: Andrzej Hajda 
Reviewed-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/i915_pci.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index d64e074d7457..272a8ba37b64 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1147,6 +1147,7 @@ static const struct intel_device_info mtl_info = {
.has_flat_ccs = 0,
.has_gmd_id = 1,
.has_guc_deprivilege = 1,
+   .has_llc = 0,
.has_mslice_steering = 0,
.has_snoop = 1,
.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,


Re: [PATCH 4/8] drm/i915/mtl: workaround coherency issue for Media

2023-04-19 Thread Das, Nirmoy



On 4/17/2023 8:24 AM, fei.y...@intel.com wrote:

From: Fei Yang 

This patch implements Wa_22016122933.

In MTL, memory writes initiated by Media tile update the whole
cache line even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line. CTB communication
is impacted by this issue because the head and tail pointers are
adjacent words within a cache line (see struct guc_ct_buffer_desc),
where one is written by GuC and the other by the host.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
CTB which is being updated by both CPU and GuC, mfence instruction
is added to make sure the CPU writes are visible to GPU right away
(flush the write combining buffer).

While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.

BSpec: 45101

Signed-off-by: Fei Yang 



This was a great find :)

Acked-by: Nirmoy Das 


---
  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 -
  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|  7 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 18 --
  4 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..89fc8ea6bcfc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct 
drm_i915_private *i915,
  struct drm_i915_gem_object *obj,
  bool always_coherent)
  {
-   if (i915_gem_object_is_lmem(obj))
+   /*
+* Wa_22016122933: always return I915_MAP_WC for MTL
+*/
+   if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
return I915_MAP_WB;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c 
b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
index 1d9fdfb11268..236673c02f9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
@@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
if (obj->base.size < gsc->fw.size)
return -ENOSPC;
  
+	/*

+* Wa_22016122933: For MTL the shared memory needs to be mapped
+* as WC on CPU side and UC (PAT index 2) on GPU side
+*/
+   if (IS_METEORLAKE(i915))
+   i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
dst = i915_gem_object_pin_map_unlocked(obj,
   i915_coherent_map_type(i915, 
obj, true));
if (IS_ERR(dst))
@@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
memset(dst, 0, obj->base.size);
memcpy(dst, src, gsc->fw.size);
  
+	/*

+* Wa_22016122933: Making sure the data in dst is
+* visible to GSC right away
+*/
+   intel_guc_write_barrier(>uc.guc);
+
i915_gem_object_unpin_map(gsc->fw.obj);
i915_gem_object_unpin_map(obj);
  
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c

index d76508fa3af7..f9bddaa876d9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -743,6 +743,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc 
*guc, u32 size)
if (IS_ERR(obj))
return ERR_CAST(obj);
  
+	/*

+* Wa_22016122933: For MTL the shared memory needs to be mapped
+* as WC on CPU side and UC (PAT index 2) on GPU side
+*/
+   if (IS_METEORLAKE(gt->i915))
+   i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
vma = i915_vma_instance(obj, >ggtt->vm, NULL);
if (IS_ERR(vma))
goto err;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1803a633ed64..98e682b7df07 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -415,12 +415,6 @@ static int ct_write(struct intel_guc_ct *ct,
}
GEM_BUG_ON(tail > size);
  
-	/*

-* make sure H2G buffer update and LRC tail update (if this triggering a
-* submission) are visible before updating the descriptor tail
-*/
-   intel_guc_write_barrier(ct_to_guc(ct));
-
/* 

Re: [PATCH 3/8] drm/i915/mtl: Add PTE encode function

2023-04-19 Thread Das, Nirmoy



On 4/17/2023 8:24 AM, fei.y...@intel.com wrote:

From: Fei Yang 

PTE encode functions are platform dependent. This patch implements
PTE functions for MTL, and ensures the correct PTE encode function
is used by calling pte_encode function pointer instead of the
hardcoded gen8 version of PTE encode.

Signed-off-by: Fei Yang 
---
  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 45 
  drivers/gpu/drm/i915/gt/gen8_ppgtt.h |  3 ++
  drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 +--
  4 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index b8027392144d..c5eacfdba1a5 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
vm->vma_ops.bind_vma= dpt_bind_vma;
vm->vma_ops.unbind_vma  = dpt_unbind_vma;
  
-	vm->pte_encode = gen8_ggtt_pte_encode;

+   vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
  
  	dpt->obj = dpt_obj;

dpt->obj->is_dpt = true;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 4daaa6f55668..11b91e0453c8 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
return pte;
  }
  
+static u64 mtl_pte_encode(dma_addr_t addr,

+ enum i915_cache_level level,
+ u32 flags)
+{
+   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
+
+   if (unlikely(flags & PTE_READ_ONLY))
+   pte &= ~GEN8_PAGE_RW;
+
+   if (flags & PTE_LM)
+   pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
+
+   switch (level) {
+   case I915_CACHE_NONE:
+   pte |= GEN12_PPGTT_PTE_PAT1;
+   break;
+   case I915_CACHE_LLC:
+   case I915_CACHE_L3_LLC:
+   pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
+   break;
+   case I915_CACHE_WT:
+   pte |= GEN12_PPGTT_PTE_PAT0;
+   break;
+   }
+
+   return pte;
+}
+
  static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
  {
struct drm_i915_private *i915 = ppgtt->vm.i915;
@@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
  u32 flags)
  {
struct i915_page_directory *pd;
-   const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+   const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, 
flags);
gen8_pte_t *vaddr;
  
  	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));

@@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct 
i915_address_space *vm,
   enum i915_cache_level cache_level,
   u32 flags)
  {
-   const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+   const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
unsigned int rem = sg_dma_len(iter->sg);
u64 start = vma_res->start;
  
@@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,

GEM_BUG_ON(pt->is_compact);
  
  	vaddr = px_vaddr(pt);

-   vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
+   vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
drm_clflush_virt_range([gen8_pd_index(idx, 0)], sizeof(*vaddr));
  }
  
@@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,

}
  
  	vaddr = px_vaddr(pt);

-   vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+   vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
  }
  
  static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,

@@ -820,8 +848,8 @@ static int gen8_init_scratch(struct i915_address_space *vm)
pte_flags |= PTE_LM;
  
  	vm->scratch[0]->encode =

-   gen8_pte_encode(px_dma(vm->scratch[0]),
-   I915_CACHE_NONE, pte_flags);
+   vm->pte_encode(px_dma(vm->scratch[0]),
+  I915_CACHE_NONE, pte_flags);
  
  	for (i = 1; i <= vm->top; i++) {

struct drm_i915_gem_object *obj;
@@ -963,7 +991,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 */
ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
  
-	ppgtt->vm.pte_encode = gen8_pte_encode;

+   if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70))
+   ppgtt->vm.pte_encode = mtl_pte_encode;
+   else
+   ppgtt->vm.pte_encode = gen8_pte_encode;
  
  	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;

ppgtt->vm.insert_entries = gen8_ppgtt_insert;
diff --git 

Re: [Intel-gfx] [PATCH 2/8] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-19 Thread Das, Nirmoy

On 4/17/2023 8:24 AM, fei.y...@intel.com wrote:

From: Madhumitha Tolakanahalli Pradeep 


On MTL, GT can no longer allocate on LLC - only the CPU can.
This, along with addition of support for L4 cache calls a

s/calls a/calls for a

MOCS/PAT table update.
Alos the PAT index registers are multicasted for primary GT,

s/Alos/Also

and there is an address jump from index 7 to 8. This patch
makes sure these registers are programmed in the proper way.


"Makes sure that"

With those minor nits fixed:

Reviewed-by: Nirmoy Das 



BSpec: 44509, 45101, 44235

Cc: Matt Roper 
Cc: Lucas De Marchi 
Signed-off-by: Madhumitha Tolakanahalli Pradeep 

Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Nirmoy Das 
Signed-off-by: Fei Yang 
---
  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 +-
  drivers/gpu/drm/i915/gt/intel_gtt.c | 62 
  drivers/gpu/drm/i915/gt/intel_gtt.h | 20 ++-
  drivers/gpu/drm/i915/gt/intel_mocs.c| 76 +++--
  drivers/gpu/drm/i915/gt/selftest_mocs.c |  2 +-
  5 files changed, 149 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index fd1f9cd35e9d..e8c3b762a92a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -356,7 +356,11 @@
  #define GEN7_TLB_RD_ADDR  _MMIO(0x4700)
  
  #define GEN12_PAT_INDEX(index)			_MMIO(0x4800 + (index) * 4)

-#define XEHP_PAT_INDEX(index)  MCR_REG(0x4800 + (index) * 4)
+#define _PAT_INDEX(index)  _PICK_EVEN_2RANGES(index, 8, \
+  0x4800, 
0x4804, \
+  0x4848, 
0x484c)
+#define XEHP_PAT_INDEX(index)  MCR_REG(_PAT_INDEX(index))
+#define XELPMP_PAT_INDEX(index)_MMIO(_PAT_INDEX(index))
  
  #define XEHP_TILE0_ADDR_RANGE			MCR_REG(0x4900)

  #define   XEHP_TILE_LMEM_RANGE_SHIFT  8
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 4f436ba7a3c8..429f3971020d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -468,6 +468,42 @@ void gtt_write_workarounds(struct intel_gt *gt)
}
  }
  
+static void xelpmp_setup_private_ppat(struct intel_uncore *uncore)

+{
+   intel_uncore_write(uncore, XELPMP_PAT_INDEX(0), MTL_PPAT_L4_0_WB);
+   intel_uncore_write(uncore, XELPMP_PAT_INDEX(1), MTL_PPAT_L4_1_WT);
+   intel_uncore_write(uncore, XELPMP_PAT_INDEX(2), MTL_PPAT_L4_3_UC);
+   intel_uncore_write(uncore, XELPMP_PAT_INDEX(3),
+  MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
+   intel_uncore_write(uncore, XELPMP_PAT_INDEX(4),
+  MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
+
+   /*
+* Remaining PAT entries are left at the hardware-default
+* fully-cached setting
+*/
+
+}
+
+static void xelpg_setup_private_ppat(struct intel_gt *gt)
+{
+   intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(0),
+MTL_PPAT_L4_0_WB);
+   intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(1),
+MTL_PPAT_L4_1_WT);
+   intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(2),
+MTL_PPAT_L4_3_UC);
+   intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(3),
+MTL_PPAT_L4_0_WB | MTL_2_COH_1W);
+   intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(4),
+MTL_PPAT_L4_0_WB | MTL_3_COH_2W);
+
+   /*
+* Remaining PAT entries are left at the hardware-default
+* fully-cached setting
+*/
+}
+
  static void tgl_setup_private_ppat(struct intel_uncore *uncore)
  {
/* TGL doesn't support LLC or AGE settings */
@@ -603,16 +639,22 @@ void setup_private_pat(struct intel_gt *gt)
  
  	GEM_BUG_ON(GRAPHICS_VER(i915) < 8);
  
-	if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))

-   xehp_setup_private_ppat(gt);
-   else if (GRAPHICS_VER(i915) >= 12)
-   tgl_setup_private_ppat(uncore);
-   else if (GRAPHICS_VER(i915) >= 11)
-   icl_setup_private_ppat(uncore);
-   else if (IS_CHERRYVIEW(i915) || IS_GEN9_LP(i915))
-   chv_setup_private_ppat(uncore);
-   else
-   bdw_setup_private_ppat(uncore);
+   if (gt->type == GT_MEDIA) {
+   xelpmp_setup_private_ppat(gt->uncore);
+   } else {
+   if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+   xelpg_setup_private_ppat(gt);
+   else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+   xehp_setup_private_ppat(gt);
+   else if (GRAPHICS_VER(i915) >= 12)
+   tgl_setup_private_ppat(uncore);
+   else if (GRAPHICS_VER(i915) >= 11)
+  

Re: [Intel-gfx] [PATCH 1/8] drm/i915/mtl: Set has_llc=0

2023-04-19 Thread Das, Nirmoy



On 4/17/2023 8:24 AM, fei.y...@intel.com wrote:

From: Fei Yang 

On MTL, GT is no longer allocated on LLC, set has_llc=0.



This statement is bit unclear to me.  I would say "On MTL, LLC is not 
shared between GT and CPU"


Otherwise

Reviewed-by: Nirmoy Das 



Signed-off-by: Fei Yang 
---
  drivers/gpu/drm/i915/i915_pci.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index cddb6e197972..025d32c0b161 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1146,6 +1146,7 @@ static const struct intel_device_info mtl_info = {
.has_flat_ccs = 0,
.has_gmd_id = 1,
.has_guc_deprivilege = 1,
+   .has_llc = 0,
.has_mslice_steering = 0,
.has_snoop = 1,
.__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM,


Re: [PATCH v4 5/5] drm/i915/gt: Make sure that errors are propagated through request chains

2023-04-11 Thread Das, Nirmoy



On 3/8/2023 10:41 AM, Andi Shyti wrote:

Currently, when we perform operations such as clearing or copying
large blocks of memory, we generate multiple requests that are
executed in a chain.

However, if one of these requests fails, we may not realize it
unless it happens to be the last request in the chain. This is
because errors are not properly propagated.

For this we need to keep propagating the chain of fence
notification in order to always reach the final fence associated
to the final request.

To address this issue, we need to ensure that the chain of fence
notifications is always propagated so that we can reach the final
fence associated with the last request. By doing so, we will be
able to detect any memory operation  failures and determine
whether the memory is still invalid.

On copy and clear migration signal fences upon completion.

On copy and clear migration, signal fences upon request
completion to ensure that we have a reliable perpetuation of the
operation outcome.

Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration")
Reported-by: Matthew Auld 
Suggested-by: Chris Wilson 
Signed-off-by: Andi Shyti 
Cc: sta...@vger.kernel.org
Reviewed-by: Matthew Auld 
With  Matt's comment regarding missing lock in 
intel_context_migrate_clear addressed, this is:


Acked-by: Nirmoy Das 


---
  drivers/gpu/drm/i915/gt/intel_migrate.c | 41 ++---
  1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 3f638f1987968..0031e7b1b4704 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -742,13 +742,19 @@ intel_context_migrate_copy(struct intel_context *ce,
dst_offset = 2 * CHUNK_SZ;
}
  
+	/*

+* While building the chain of requests, we need to ensure
+* that no one can sneak into the timeline unnoticed.
+*/
+   mutex_lock(>timeline->mutex);
+
do {
int len;
  
-		rq = i915_request_create(ce);

+   rq = i915_request_create_locked(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
-   goto out_ce;
+   break;
}
  
  		if (deps) {

@@ -878,10 +884,14 @@ intel_context_migrate_copy(struct intel_context *ce,
  
  		/* Arbitration is re-enabled between requests. */

  out_rq:
-   if (*out)
+   i915_sw_fence_await(>submit);
+   i915_request_get(rq);
+   i915_request_add_locked(rq);
+   if (*out) {
+   i915_sw_fence_complete(&(*out)->submit);
i915_request_put(*out);
-   *out = i915_request_get(rq);
-   i915_request_add(rq);
+   }
+   *out = rq;
  
  		if (err)

break;
@@ -905,7 +915,10 @@ intel_context_migrate_copy(struct intel_context *ce,
cond_resched();
} while (1);
  
-out_ce:

+   mutex_unlock(>timeline->mutex);
+
+   if (*out)
+   i915_sw_fence_complete(&(*out)->submit);
return err;
  }
  
@@ -1005,7 +1018,7 @@ intel_context_migrate_clear(struct intel_context *ce,

rq = i915_request_create(ce);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
-   goto out_ce;
+   break;
}
  
  		if (deps) {

@@ -1056,17 +1069,23 @@ intel_context_migrate_clear(struct intel_context *ce,
  
  		/* Arbitration is re-enabled between requests. */

  out_rq:
-   if (*out)
-   i915_request_put(*out);
-   *out = i915_request_get(rq);
+   i915_sw_fence_await(>submit);
+   i915_request_get(rq);
i915_request_add(rq);
+   if (*out) {
+   i915_sw_fence_complete(&(*out)->submit);
+   i915_request_put(*out);
+   }
+   *out = rq;
+
if (err || !it.sg || !sg_dma_len(it.sg))
break;
  
  		cond_resched();

} while (1);
  
-out_ce:

+   if (*out)
+   i915_sw_fence_complete(&(*out)->submit);
return err;
  }
  


Re: [PATCH v4 3/5] drm/i915: Create the locked version of the request create

2023-04-11 Thread Das, Nirmoy



On 3/8/2023 10:41 AM, Andi Shyti wrote:

Make version of the request creation that doesn't hold any
lock.

Signed-off-by: Andi Shyti 
Cc: sta...@vger.kernel.org


Reviewed-by: Nirmoy Das 


---
  drivers/gpu/drm/i915/i915_request.c | 43 +++--
  drivers/gpu/drm/i915/i915_request.h |  2 ++
  2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 72aed544f8714..5ddb0e02b06b7 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1028,18 +1028,11 @@ __i915_request_create(struct intel_context *ce, gfp_t 
gfp)
return ERR_PTR(ret);
  }
  
-struct i915_request *

-i915_request_create(struct intel_context *ce)
+static struct i915_request *
+__i915_request_create_locked(struct intel_context *ce)
  {
struct i915_request *rq;
-   struct intel_timeline *tl;
-
-   if (intel_context_throttle(ce))
-   return ERR_PTR(-EINTR);
-
-   tl = intel_context_timeline_lock(ce);
-   if (IS_ERR(tl))
-   return ERR_CAST(tl);
+   struct intel_timeline *tl = ce->timeline;
  
  	/* Move our oldest request to the slab-cache (if not in use!) */

rq = list_first_entry(>requests, typeof(*rq), link);
@@ -1049,16 +1042,38 @@ i915_request_create(struct intel_context *ce)
intel_context_enter(ce);
rq = __i915_request_create(ce, GFP_KERNEL);
intel_context_exit(ce); /* active reference transferred to request */
-   if (IS_ERR(rq))
-   goto err_unlock;
  
  	/* Check that we do not interrupt ourselves with a new request */

rq->cookie = lockdep_pin_lock(>mutex);
  
  	return rq;

+}
+
+struct i915_request *
+i915_request_create_locked(struct intel_context *ce)
+{
+   intel_context_assert_timeline_is_locked(ce->timeline);
+
+   if (intel_context_throttle(ce))
+   return ERR_PTR(-EINTR);
+
+   return __i915_request_create_locked(ce);
+}
+
+struct i915_request *
+i915_request_create(struct intel_context *ce)
+{
+   struct i915_request *rq;
+   struct intel_timeline *tl;
+
+   tl = intel_context_timeline_lock(ce);
+   if (IS_ERR(tl))
+   return ERR_CAST(tl);
+
+   rq = __i915_request_create_locked(ce);
+   if (IS_ERR(rq))
+   intel_context_timeline_unlock(tl);
  
-err_unlock:

-   intel_context_timeline_unlock(tl);
return rq;
  }
  
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h

index f5e1bb5e857aa..bb48bd4605c03 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -374,6 +374,8 @@ struct i915_request * __must_check
  __i915_request_create(struct intel_context *ce, gfp_t gfp);
  struct i915_request * __must_check
  i915_request_create(struct intel_context *ce);
+struct i915_request * __must_check
+i915_request_create_locked(struct intel_context *ce);
  
  void __i915_request_skip(struct i915_request *rq);

  bool i915_request_set_error_once(struct i915_request *rq, int error);


Re: [PATCH v4 2/5] drm/i915/gt: Add intel_context_timeline_is_locked helper

2023-04-11 Thread Das, Nirmoy



On 3/8/2023 10:41 AM, Andi Shyti wrote:

We have:

  - intel_context_timeline_lock()
  - intel_context_timeline_unlock()

In the next patches we will also need:

  - intel_context_timeline_is_locked()

Add it.

Signed-off-by: Andi Shyti 
Cc: sta...@vger.kernel.org


Reviewed-by: Nirmoy Das 



---
  drivers/gpu/drm/i915/gt/intel_context.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
b/drivers/gpu/drm/i915/gt/intel_context.h
index f919a66cebf5b..87d5e2d60b6db 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -265,6 +265,12 @@ static inline void intel_context_timeline_unlock(struct 
intel_timeline *tl)
mutex_unlock(>mutex);
  }
  
+static inline void intel_context_assert_timeline_is_locked(struct intel_timeline *tl)

+   __must_hold(>mutex)
+{
+   lockdep_assert_held(>mutex);
+}
+
  int intel_context_prepare_remote_request(struct intel_context *ce,
 struct i915_request *rq);
  


Re: [Intel-gfx] [PATCH 1/7] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-06 Thread Das, Nirmoy

Hi Fei,

On 4/6/2023 4:55 PM, Yang, Fei wrote:

> On 4/1/2023 8:38 AM, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> On MTL, GT can no longer allocate on LLC - only the CPU can.
>> This, along with addition of support for ADM/L4 cache calls a
>> MOCS/PAT table update.
>> Also add PTE encode functions for MTL as it has different PAT
>> index definition than previous platforms.
>>
>> BSpec: 44509, 45101, 44235
>>
>> Cc: Matt Roper 
>> Cc: Lucas De Marchi 
>> Signed-off-by: Madhumitha Tolakanahalli Pradeep 


>> Signed-off-by: Aravind Iddamsetty 
>> Signed-off-by: Fei Yang 
>> ---
>> drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
>> drivers/gpu/drm/i915/gt/gen8_ppgtt.c     | 43 --
>> drivers/gpu/drm/i915/gt/gen8_ppgtt.h     |  3 +
>> drivers/gpu/drm/i915/gt/intel_ggtt.c     | 36 ++-
>> drivers/gpu/drm/i915/gt/intel_gtt.c      | 23 ++-
>> drivers/gpu/drm/i915/gt/intel_gtt.h      | 20 ++-
>> drivers/gpu/drm/i915/gt/intel_mocs.c     | 76 ++--
>> drivers/gpu/drm/i915/gt/selftest_mocs.c  |  2 +-
>> drivers/gpu/drm/i915/i915_pci.c          |  1 +
>>   9 files changed, 189 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c

>> index b8027392144d..c5eacfdba1a5 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>> @@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
>>  vm->vma_ops.bind_vma    = dpt_bind_vma;
>>  vm->vma_ops.unbind_vma  = dpt_unbind_vma;
>>
>> -     vm->pte_encode = gen8_ggtt_pte_encode;
>> +     vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
>>
>>        dpt->obj = dpt_obj;
>>  dpt->obj->is_dpt = true;
>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c

>> index 4daaa6f55668..4197b43150cc 100644
>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>> @@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
>>        return pte;
>>   }
>>
>> +static u64 mtl_pte_encode(dma_addr_t addr,
>> + enum i915_cache_level level,
>> +                       u32 flags)
>> +{
>> +     gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
>> +
>> +     if (unlikely(flags & PTE_READ_ONLY))
>> +             pte &= ~GEN8_PAGE_RW;
>> +
>> +     if (flags & PTE_LM)
>> +             pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
>> +
>> +     switch (level) {
>> +     case I915_CACHE_NONE:
>> +             pte |= GEN12_PPGTT_PTE_PAT1;
>> +             break;
>> +     case I915_CACHE_LLC:
>> +     case I915_CACHE_L3_LLC:
>> +             pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
>> +             break;
>> +     case I915_CACHE_WT:
>> +             pte |= GEN12_PPGTT_PTE_PAT0;
>> +             break;
>> +     }
>> +
>> +     return pte;
>> +}
>> +
>>   static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool 
create)

>>   {
>>        struct drm_i915_private *i915 = ppgtt->vm.i915;
>> @@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>>                      u32 flags)
>>   {
>>        struct i915_page_directory *pd;
>> -     const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, 
flags);
>> +     const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, 
cache_level, flags);

>>        gen8_pte_t *vaddr;
>>
>>        pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));
>> @@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct 
i915_address_space *vm,

>>       enum i915_cache_level cache_level,
>>       u32 flags)
>>   {
>> -     const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, 
flags);
>> +     const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, 
flags);

>>        unsigned int rem = sg_dma_len(iter->sg);
>>        u64 start = vma_res->start;
>>
>> @@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct 
i915_address_space *vm,

>>  GEM_BUG_ON(pt->is_compact);
>>
>>        vaddr = px_vaddr(pt);
>> - vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
>> + vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
>>  drm_clflush_virt_range([gen8_pd_index(idx, 0)], sizeof(*vaddr));
>>   }
>>
>> @@ -773,7 +801,7 @@ static void 
__xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,

>>        }
>>
>>        vaddr = px_vaddr(pt);
>> - vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, 
flags);
>> + vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, 
flags);

>>   }
>>
>>   static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
>> @@ -820,7 +848,7 @@ static int gen8_init_scratch(struct 
i915_address_space *vm)

>>                pte_flags |= PTE_LM;
>>
>>  vm->scratch[0]->encode =
>> - gen8_pte_encode(px_dma(vm->scratch[0]),
>> + vm->pte_encode(px_dma(vm->scratch[0]),
>>    I915_CACHE_NONE, pte_flags);
>>
>>        for (i = 1; i <= vm->top; i++) {
>> @@ -963,7 +991,10 @@ struct i915_ppgtt 

Re: [Intel-gfx] [PATCH 1/7] drm/i915/mtl: Define MOCS and PAT tables for MTL

2023-04-06 Thread Das, Nirmoy

Hi Fei,

On 4/1/2023 8:38 AM, fei.y...@intel.com wrote:

From: Fei Yang 

On MTL, GT can no longer allocate on LLC - only the CPU can.
This, along with addition of support for ADM/L4 cache calls a
MOCS/PAT table update.
Also add PTE encode functions for MTL as it has different PAT
index definition than previous platforms.

BSpec: 44509, 45101, 44235

Cc: Matt Roper 
Cc: Lucas De Marchi 
Signed-off-by: Madhumitha Tolakanahalli Pradeep 

Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Fei Yang 
---
  drivers/gpu/drm/i915/display/intel_dpt.c |  2 +-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 43 --
  drivers/gpu/drm/i915/gt/gen8_ppgtt.h |  3 +
  drivers/gpu/drm/i915/gt/intel_ggtt.c | 36 ++-
  drivers/gpu/drm/i915/gt/intel_gtt.c  | 23 ++-
  drivers/gpu/drm/i915/gt/intel_gtt.h  | 20 ++-
  drivers/gpu/drm/i915/gt/intel_mocs.c | 76 ++--
  drivers/gpu/drm/i915/gt/selftest_mocs.c  |  2 +-
  drivers/gpu/drm/i915/i915_pci.c  |  1 +
  9 files changed, 189 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index b8027392144d..c5eacfdba1a5 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -300,7 +300,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
vm->vma_ops.bind_vma= dpt_bind_vma;
vm->vma_ops.unbind_vma  = dpt_unbind_vma;
  
-	vm->pte_encode = gen8_ggtt_pte_encode;

+   vm->pte_encode = vm->gt->ggtt->vm.pte_encode;
  
  	dpt->obj = dpt_obj;

dpt->obj->is_dpt = true;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 4daaa6f55668..4197b43150cc 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -55,6 +55,34 @@ static u64 gen8_pte_encode(dma_addr_t addr,
return pte;
  }
  
+static u64 mtl_pte_encode(dma_addr_t addr,

+ enum i915_cache_level level,
+ u32 flags)
+{
+   gen8_pte_t pte = addr | GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
+
+   if (unlikely(flags & PTE_READ_ONLY))
+   pte &= ~GEN8_PAGE_RW;
+
+   if (flags & PTE_LM)
+   pte |= GEN12_PPGTT_PTE_LM | GEN12_PPGTT_PTE_NC;
+
+   switch (level) {
+   case I915_CACHE_NONE:
+   pte |= GEN12_PPGTT_PTE_PAT1;
+   break;
+   case I915_CACHE_LLC:
+   case I915_CACHE_L3_LLC:
+   pte |= GEN12_PPGTT_PTE_PAT0 | GEN12_PPGTT_PTE_PAT1;
+   break;
+   case I915_CACHE_WT:
+   pte |= GEN12_PPGTT_PTE_PAT0;
+   break;
+   }
+
+   return pte;
+}
+
  static void gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create)
  {
struct drm_i915_private *i915 = ppgtt->vm.i915;
@@ -427,7 +455,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
  u32 flags)
  {
struct i915_page_directory *pd;
-   const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+   const gen8_pte_t pte_encode = ppgtt->vm.pte_encode(0, cache_level, 
flags);
gen8_pte_t *vaddr;
  
  	pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));

@@ -580,7 +608,7 @@ static void gen8_ppgtt_insert_huge(struct 
i915_address_space *vm,
   enum i915_cache_level cache_level,
   u32 flags)
  {
-   const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags);
+   const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
unsigned int rem = sg_dma_len(iter->sg);
u64 start = vma_res->start;
  
@@ -743,7 +771,7 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,

GEM_BUG_ON(pt->is_compact);
  
  	vaddr = px_vaddr(pt);

-   vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
+   vaddr[gen8_pd_index(idx, 0)] = vm->pte_encode(addr, level, flags);
drm_clflush_virt_range([gen8_pd_index(idx, 0)], sizeof(*vaddr));
  }
  
@@ -773,7 +801,7 @@ static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,

}
  
  	vaddr = px_vaddr(pt);

-   vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+   vaddr[gen8_pd_index(idx, 0) / 16] = vm->pte_encode(addr, level, flags);
  }
  
  static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,

@@ -820,7 +848,7 @@ static int gen8_init_scratch(struct i915_address_space *vm)
pte_flags |= PTE_LM;
  
  	vm->scratch[0]->encode =

-   gen8_pte_encode(px_dma(vm->scratch[0]),
+   vm->pte_encode(px_dma(vm->scratch[0]),
I915_CACHE_NONE, pte_flags);
  
  	for (i = 1; i <= vm->top; i++) {

@@ -963,7 +991,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 */
ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
  
-	ppgtt->vm.pte_encode = 

Re: [PATCH 1/5] drm/i915/ttm: Add I915_BO_PREALLOC

2023-04-05 Thread Das, Nirmoy

Hi Andi,

On 4/5/2023 1:53 PM, Andi Shyti wrote:

Hi Nirmoy,


Add a mechanism to keep existing data when creating
a ttm object with I915_BO_ALLOC_USER flag.

why do we need this mechanism? What was the logic behind? These
are all questions people might have when checking this commit.
Please be a bit more explicative.


Agree, the commit message is bit short. I will add more content in next
revision.

you don't need to send a new version just for this commit log.

You could just propose a new commit log in the reply and if it's
OK, add it before pushing it.


Let me know what do you think about:

Add a mechanism to preserve existing data when creating a TTM

object with the I915_BO_ALLOC_USER flag. This will be used in the subsequent

patch where the I915_BO_ALLOC_USER flag will be applied to the framebuffer

object. For a pre-allocated framebuffer without the I915_BO_PREALLOC flag,

TTM would clear the content, which is not desirable.

Thanks,

Nirmoy



As you wish.

Andi


Cc: Matthew Auld
Cc: Andi Shyti
Cc: Andrzej Hajda
Cc: Ville Syrjälä
Cc: Jani Nikula
Cc: Imre Deak
Signed-off-by: Nirmoy Das

Reviewed-by: Andi Shyti


Thanks,

Nirmoy


Thanks,
Andi

Re: [PATCH 1/5] drm/i915/ttm: Add I915_BO_PREALLOC

2023-04-05 Thread Das, Nirmoy



On 4/4/2023 6:23 PM, Andi Shyti wrote:

Hi Nirmoy,

On Tue, Apr 04, 2023 at 04:30:56PM +0200, Nirmoy Das wrote:

Add a mechanism to keep existing data when creating
a ttm object with I915_BO_ALLOC_USER flag.

why do we need this mechanism? What was the logic behind? These
are all questions people might have when checking this commit.
Please be a bit more explicative.



Agree, the commit message is bit short. I will add more content in next 
revision.





Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Andrzej Hajda 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 

Reviewed-by: Andi Shyti 



Thanks,

Nirmoy



Thanks,
Andi


Re: [PATCH 1/5] drm/i915/ttm: Add I915_BO_PREALLOC

2023-04-05 Thread Das, Nirmoy



On 4/4/2023 5:30 PM, Andrzej Hajda wrote:



On 04.04.2023 16:30, Nirmoy Das wrote:

Add a mechanism to keep existing data when creating
a ttm object with I915_BO_ALLOC_USER flag.

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Andrzej Hajda 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 15 +++
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  5 +++--
  2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h

index 5dcbbef31d44..830c11431ee8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -328,6 +328,12 @@ struct drm_i915_gem_object {
   */
  #define I915_BO_ALLOC_GPU_ONLY  BIT(6)
  #define I915_BO_ALLOC_CCS_AUX  BIT(7)
+/*
+ * Object is allowed to retain its initial data and will not be 
cleared on first

+ * access if used along with I915_BO_ALLOC_USER. This is mainly to keep
+ * preallocated framebuffer data intact while transitioning it to 
i915drmfb.

+ */
+#define I915_BO_PREALLOC  BIT(8)
  #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
   I915_BO_ALLOC_VOLATILE | \
   I915_BO_ALLOC_CPU_CLEAR | \
@@ -335,10 +341,11 @@ struct drm_i915_gem_object {
   I915_BO_ALLOC_PM_VOLATILE | \
   I915_BO_ALLOC_PM_EARLY | \
   I915_BO_ALLOC_GPU_ONLY | \
- I915_BO_ALLOC_CCS_AUX)
-#define I915_BO_READONLY  BIT(8)
-#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not 
release! */

-#define I915_BO_PROTECTED BIT(10)
+ I915_BO_ALLOC_CCS_AUX | \
+ I915_BO_PREALLOC)
+#define I915_BO_READONLY  BIT(9)
+#define I915_TILING_QUIRK_BIT 10 /* unknown swizzling; do not 
release! */

+#define I915_BO_PROTECTED BIT(11)
  /**
   * @mem_flags - Mutable placement-related flags
   *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c

index dd188dfcc423..69eb20ed4d47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -576,7 +576,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, 
bool evict,

  struct dma_fence *migration_fence = NULL;
  struct ttm_tt *ttm = bo->ttm;
  struct i915_refct_sgt *dst_rsgt;
-    bool clear;
+    bool clear, prealloc_bo;
  int ret;
    if (GEM_WARN_ON(i915_ttm_is_ghost_object(bo))) {
@@ -632,7 +632,8 @@ int i915_ttm_move(struct ttm_buffer_object *bo, 
bool evict,

  return PTR_ERR(dst_rsgt);
    clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || 
!ttm_tt_is_populated(ttm));
-    if (!(clear && ttm && !(ttm->page_flags & 
TTM_TT_FLAG_ZERO_ALLOC))) {

+    prealloc_bo = obj->flags & I915_BO_PREALLOC;
+    if (!(clear && ttm && !((ttm->page_flags & 
TTM_TT_FLAG_ZERO_ALLOC) && !prealloc_bo))) {


This looks like school exercise for complicated usage of logical 
operators, and I have problem with understanding this :)

Couldn't this be somehow simplified?


(I thought I sent this email yesterday but was stuck in oAuth pop up 
sign-in)


Yes, this can be improved I think, took me while too.



Anyway as the patch just reuses existing code:
Reviewed-by: Andrzej Hajda 



Thanks Andrzej,

Nirmoy



Regards
Andrzej



  struct i915_deps deps;
    i915_deps_init(, GFP_KERNEL | __GFP_NORETRY | 
__GFP_NOWARN);




Re: [Intel-gfx] [PATCH v3] drm/i915/mtl: Disable stolen memory backed FB for A0

2023-04-04 Thread Das, Nirmoy



On 4/4/2023 8:27 PM, Ville Syrjälä wrote:

On Tue, Apr 04, 2023 at 08:13:42PM +0200, Nirmoy Das wrote:

Stolen memory is not usable for MTL A0 stepping beyond
certain access size and we have no control over userspace
access size of /dev/fb which can be backed by stolen memory.
So disable stolen memory backed fb by setting i915->dsm.usable_size
to zero.

v2: remove hsdes reference and fix commit message(Andi)
v3: use revid as we want to target SOC stepping(Radhakrishna)

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Daniele Ceraolo Spurio 
Cc: Lucas De Marchi 
Cc: Radhakrishna Sripada 
Signed-off-by: Nirmoy Das 
Reviewed-by: Andi Shyti 
---
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index 8ac376c24aa2..ee492d823f1b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -535,6 +535,14 @@ static int i915_gem_init_stolen(struct intel_memory_region 
*mem)
/* Basic memrange allocator for stolen space. */
drm_mm_init(>mm.stolen, 0, i915->dsm.usable_size);
  
+	/*

+* Access to stolen lmem beyond certain size for MTL A0 stepping
+* would crash the machine. Disable stolen lmem for userspace access
+* by setting usable_size to zero.
+*/
+   if (IS_METEORLAKE(i915) && INTEL_REVID(i915) == 0x0)
+   i915->dsm.usable_size = 0;

That certainly won't prevent FBC from using stolen.
Are we sure that FBC accesses are fine?


I think so. I remember Jouni tested this patch internally to unblock a 
FBC test.


Jouni, could you please share your thoughts. I can't seem to find the 
internal JIRA reference right now.



Regards,

Nirmoy




+
return 0;
  }
  
--

2.39.0


Re: [PATCH 3/5] drm/i915: Add a function to mmap framebuffer obj

2023-04-04 Thread Das, Nirmoy

Hi Andi,

On 4/4/2023 6:57 PM, Andi Shyti wrote:

Hi Nirmoy,

[...]


+int i915_gem_fb_mmap(struct drm_i915_gem_object *obj, struct vm_area_struct 
*vma)
+{
+   struct drm_i915_private *i915 = to_i915(obj->base.dev);
+   struct drm_device *dev = >drm;
+   struct i915_mmap_offset *mmo = NULL;
+   enum i915_mmap_type mmap_type;
+   struct i915_ggtt *ggtt = to_gt(i915)->ggtt;
+
+   if (drm_dev_is_unplugged(dev))
+   return -ENODEV;
+
+   /* handle ttm object */
+   if (obj->ops->mmap_ops) {
+   /*
+* ttm fault handler, ttm_bo_vm_fault_reserved() uses fake 
offset
+* to calculate page offset so set that up.
+*/
+   vma->vm_pgoff += drm_vma_node_start(>base.vma_node);

you could have kept my r-b.



I wasn't sure, so I removed it :)



  Good work here!

Reviewed-by: Andi Shyti 



Thanks,

Nirmoy



Thanks,
Andi


+   } else {
+   /* handle stolen and smem objects */
+   mmap_type = i915_ggtt_has_aperture(ggtt) ? I915_MMAP_TYPE_GTT : 
I915_MMAP_TYPE_WC;
+   mmo = mmap_offset_attach(obj, mmap_type, NULL);
+   if (!mmo)
+   return -ENODEV;
+   }
+
+   /*
+* When we install vm_ops for mmap we are too late for
+* the vm_ops->open() which increases the ref_count of
+* this obj and then it gets decreased by the vm_ops->close().
+* To balance this increase the obj ref_count here.
+*/
+   obj = i915_gem_object_get(obj);
+   return i915_gem_object_mmap(obj, mmo, vma);
+}


Re: [PATCH] drm/i915/mtl: Fix MTL stolen memory GGTT mapping

2023-03-28 Thread Das, Nirmoy



On 3/28/2023 3:24 AM, Daniele Ceraolo Spurio wrote:

The PTEs expect the offset from the base of the fake LMEM region (i.e.
the base of stolen) and not from the base of the DSM. Quoting the specs:
"Driver will set the Device Memory bit = 1 in the PTE when pointing to a
page in DSM and program the PTE with offset from LMEM_BAR. Device Memory
Offset from LMEM_BAR is same as offset from BGSM."

DSM starts 8MBs from BGSM, so we set dsm_base = 8MB.

Signed-off-by: Daniele Ceraolo Spurio 
Cc: Aravind Iddamsetty 
Cc: Matt Roper 
Cc: Lucas De Marchi 
Cc: Jani Nikula 
Cc: Nirmoy Das 
Cc: Fei Yang 
Cc: Radhakrishna Sripada 

Reviewed-by: Nirmoy Das 

---

I've omitted the fixes tag from the commit message since MTL is still
under force_probe, so there isn't really any need to propagate the fixes,
but here it is for reference:

Fixes: dbb2ffbfd708 ("drm/i915/mtl: enable local stolen memory")

  drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index d8e06e783e30..8ac376c24aa2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -890,8 +890,9 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, 
u16 type,
if (HAS_LMEMBAR_SMEM_STOLEN(i915)) {
/*
 * MTL dsm size is in GGC register.
-* Also MTL uses offset to DSMBASE in ptes, so i915
-* uses dsm_base = 0 to setup stolen region.
+* Also MTL uses offset to GSMBASE in ptes, so i915
+* uses dsm_base = 8MBs to setup stolen region, since
+* DSMBASE = GSMBASE + 8MB.
 */
ret = mtl_get_gms_size(uncore);
if (ret < 0) {
@@ -899,11 +900,11 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, 
u16 type,
return ERR_PTR(ret);
}
  
-		dsm_base = 0;

+   dsm_base = SZ_8M;
dsm_size = (resource_size_t)(ret * SZ_1M);
  
  		GEM_BUG_ON(pci_resource_len(pdev, GEN12_LMEM_BAR) != SZ_256M);

-   GEM_BUG_ON((dsm_size + SZ_8M) > lmem_size);
+   GEM_BUG_ON((dsm_base + dsm_size) > lmem_size);
} else {
/* Use DSM base address instead for stolen memory */
dsm_base = intel_uncore_read64(uncore, GEN12_DSMBASE) & 
GEN12_BDSM_MASK;
@@ -912,14 +913,12 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, 
u16 type,
dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M);
}
  
-	io_size = dsm_size;

-   if (HAS_LMEMBAR_SMEM_STOLEN(i915)) {
-   io_start = pci_resource_start(pdev, GEN12_LMEM_BAR) + SZ_8M;
-   } else if (pci_resource_len(pdev, GEN12_LMEM_BAR) < lmem_size) {
+   if (pci_resource_len(pdev, GEN12_LMEM_BAR) < lmem_size) {
io_start = 0;
io_size = 0;
} else {
io_start = pci_resource_start(pdev, GEN12_LMEM_BAR) + dsm_base;
+   io_size = dsm_size;
}
  
  	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :


Re: [PATCH] drm/i915/gem: Flush lmem contents after construction

2023-03-23 Thread Das, Nirmoy



On 3/16/2023 5:59 PM, Nirmoy Das wrote:

From: Chris Wilson 

i915_gem_object_create_lmem_from_data() lacks the flush of the data
written to lmem to ensure the object is marked as dirty and the writes
flushed to the backing store. Once created, we can immediately release
the obj->mm.mapping caching of the vmap.

Fixes: 7acbbc7cf485 ("drm/i915/guc: put all guc objects in lmem when available")
Cc: Matthew Auld 
Cc: Daniele Ceraolo Spurio 
Cc: Andi Shyti 
Cc: Matthew Brost 
Cc: John Harrison 
Signed-off-by: Chris Wilson 
Cc:  # v5.16+
Signed-off-by: Nirmoy Das 


Reviewed-by: Nirmoy Das 



---
  drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index 8949fb0a944f..3198b64ad7db 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -127,7 +127,8 @@ i915_gem_object_create_lmem_from_data(struct 
drm_i915_private *i915,
  
  	memcpy(map, data, size);
  
-	i915_gem_object_unpin_map(obj);

+   i915_gem_object_flush_map(obj);
+   __i915_gem_object_release_map(obj);
  
  	return obj;

  }


Re: [Intel-gfx] [PATCH v2: 1/3] drm/i915: Add a function to mmap framebuffer obj

2023-03-23 Thread Das, Nirmoy



On 3/20/2023 3:02 PM, Andrzej Hajda wrote:

On 20.03.2023 11:09, Nirmoy Das wrote:

Implement i915_gem_fb_mmap() to enable fb_ops.fb_mmap()
callback for i915's framebuffer objects.

v2: add a comment why i915_gem_object_get() needed(Andi).

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 
Reviewed-by: Andi Shyti 


Reviewed-by: Andrzej Hajda 



Thanks, Andrzej.


Going to resend it without RFC now as there are two r-bs and no one 
complained.



Regards,

Nirmoy



Regards
Andrzej


---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 127 +++
  drivers/gpu/drm/i915/gem/i915_gem_mman.h |   2 +-
  2 files changed, 83 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c

index d3c1dee16af2..341e952d3510 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -927,53 +927,15 @@ static struct file *mmap_singleton(struct 
drm_i915_private *i915)

  return file;
  }
  -/*
- * This overcomes the limitation in drm_gem_mmap's assignment of a
- * drm_gem_object as the vma->vm_private_data. Since we need to
- * be able to resolve multiple mmap offsets which could be tied
- * to a single gem object.
- */
-int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
+static int
+i915_gem_object_mmap(struct drm_i915_gem_object *obj,
+ struct i915_mmap_offset *mmo,
+ struct vm_area_struct *vma)
  {
-    struct drm_vma_offset_node *node;
-    struct drm_file *priv = filp->private_data;
-    struct drm_device *dev = priv->minor->dev;
-    struct drm_i915_gem_object *obj = NULL;
-    struct i915_mmap_offset *mmo = NULL;
+    struct drm_i915_private *i915 = to_i915(obj->base.dev);
+    struct drm_device *dev = >drm;
  struct file *anon;
  -    if (drm_dev_is_unplugged(dev))
-    return -ENODEV;
-
-    rcu_read_lock();
-    drm_vma_offset_lock_lookup(dev->vma_offset_manager);
-    node = drm_vma_offset_exact_lookup_locked(dev->vma_offset_manager,
-  vma->vm_pgoff,
-  vma_pages(vma));
-    if (node && drm_vma_node_is_allowed(node, priv)) {
-    /*
- * Skip 0-refcnted objects as it is in the process of being
- * destroyed and will be invalid when the vma manager lock
- * is released.
- */
-    if (!node->driver_private) {
-    mmo = container_of(node, struct i915_mmap_offset, 
vma_node);

-    obj = i915_gem_object_get_rcu(mmo->obj);
-
-    GEM_BUG_ON(obj && obj->ops->mmap_ops);
-    } else {
-    obj = i915_gem_object_get_rcu
-    (container_of(node, struct drm_i915_gem_object,
-  base.vma_node));
-
-    GEM_BUG_ON(obj && !obj->ops->mmap_ops);
-    }
-    }
-    drm_vma_offset_unlock_lookup(dev->vma_offset_manager);
-    rcu_read_unlock();
-    if (!obj)
-    return node ? -EACCES : -EINVAL;
-
  if (i915_gem_object_is_readonly(obj)) {
  if (vma->vm_flags & VM_WRITE) {
  i915_gem_object_put(obj);
@@ -1005,7 +967,7 @@ int i915_gem_mmap(struct file *filp, struct 
vm_area_struct *vma)

  if (obj->ops->mmap_ops) {
  vma->vm_page_prot = 
pgprot_decrypted(vm_get_page_prot(vma->vm_flags));

  vma->vm_ops = obj->ops->mmap_ops;
-    vma->vm_private_data = node->driver_private;
+    vma->vm_private_data = obj->base.vma_node.driver_private;
  return 0;
  }
  @@ -1043,6 +1005,81 @@ int i915_gem_mmap(struct file *filp, struct 
vm_area_struct *vma)

  return 0;
  }
  +/*
+ * This overcomes the limitation in drm_gem_mmap's assignment of a
+ * drm_gem_object as the vma->vm_private_data. Since we need to
+ * be able to resolve multiple mmap offsets which could be tied
+ * to a single gem object.
+ */
+int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+    struct drm_vma_offset_node *node;
+    struct drm_file *priv = filp->private_data;
+    struct drm_device *dev = priv->minor->dev;
+    struct drm_i915_gem_object *obj = NULL;
+    struct i915_mmap_offset *mmo = NULL;
+
+    if (drm_dev_is_unplugged(dev))
+    return -ENODEV;
+
+    rcu_read_lock();
+    drm_vma_offset_lock_lookup(dev->vma_offset_manager);
+    node = drm_vma_offset_exact_lookup_locked(dev->vma_offset_manager,
+  vma->vm_pgoff,
+  vma_pages(vma));
+    if (node && drm_vma_node_is_allowed(node, priv)) {
+    /*
+ * Skip 0-refcnted objects as it is in the process of being
+ * destroyed and will be invalid when the vma manager lock
+ * is released.
+ */
+    if (!node->driver_private) {
+    mmo = container_of(node, struct i915_mmap_offset, 
vma_node);

+    obj = i915_gem_object_get_rcu(mmo->obj);
+
+    GEM_BUG_ON(obj && obj->ops->mmap_ops);
+    } else {
+

Re: [RFC PATCH 1/2] drm/i915: Add a function to mmap framebuffer obj

2023-03-20 Thread Das, Nirmoy



On 3/20/2023 1:38 AM, Andi Shyti wrote:

Hi Nirmoy,

On Thu, Mar 16, 2023 at 06:22:19PM +0100, Nirmoy Das wrote:

Implement i915_gem_fb_mmap() to enable fb_ops.fb_mmap()
callback for i915's framebuffer objects.

v2: add a comment why i915_gem_object_get() needed(Andi).

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 

I think you can fire the PATCH here instead of the RFC. Looks
good to me.

Reviewed-by: Andi Shyti 



Thanks, I will do that.

Nirmoy



Andi


Re: [PATCH v2 2/2] drm/i915/debugfs: Enable upper layer interfaces to act on all gt's

2023-03-17 Thread Das, Nirmoy



On 3/1/2023 12:02 PM, Andi Shyti wrote:

The commit 82a149a62b6b5 ('drm/i915/gt: move remaining debugfs
interfaces into gt') moved gt-related debugfs files in the gtX/
directories to operate on individual gt's.

However, the original files were only functioning on the root
tile (tile 0) and have been left in the same location to maintain
compatibility with userspace users.

Add multiplexing functionality to the higher directories' files.
This enables the operations to be performed on all the tiles with
a single write. In the case of reads, the files provide an or'ed
value across all the tiles.

Signed-off-by: Andi Shyti 
Cc: Maciej Patelczyk 



Reviewed-by: Nirmoy Das 


---
  drivers/gpu/drm/i915/i915_debugfs.c | 38 ++---
  1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 45773ce1deac2..90663f251fd10 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -575,14 +575,36 @@ static int i915_wa_registers(struct seq_file *m, void 
*unused)
  static int i915_wedged_get(void *data, u64 *val)
  {
struct drm_i915_private *i915 = data;
+   struct intel_gt *gt;
+   unsigned int i;
  
-	return intel_gt_debugfs_reset_show(to_gt(i915), val);

+   *val = 0;
+
+   for_each_gt(gt, i915, i) {
+   int ret;
+   u64 v;
+
+   ret = intel_gt_debugfs_reset_show(gt, );
+   if (ret)
+   return ret;
+
+   /* at least one tile should be wedged */
+   *val |= !!v;
+   if (*val)
+   break;
+   }
+
+   return 0;
  }
  
  static int i915_wedged_set(void *data, u64 val)

  {
struct drm_i915_private *i915 = data;
-   intel_gt_debugfs_reset_store(to_gt(i915), val);
+   struct intel_gt *gt;
+   unsigned int i;
+
+   for_each_gt(gt, i915, i)
+   intel_gt_debugfs_reset_store(gt, val);
  
  	return 0;

  }
@@ -733,7 +755,11 @@ static int i915_sseu_status(struct seq_file *m, void 
*unused)
  static int i915_forcewake_open(struct inode *inode, struct file *file)
  {
struct drm_i915_private *i915 = inode->i_private;
-   intel_gt_pm_debugfs_forcewake_user_open(to_gt(i915));
+   struct intel_gt *gt;
+   unsigned int i;
+
+   for_each_gt(gt, i915, i)
+   intel_gt_pm_debugfs_forcewake_user_open(gt);
  
  	return 0;

  }
@@ -741,7 +767,11 @@ static int i915_forcewake_open(struct inode *inode, struct 
file *file)
  static int i915_forcewake_release(struct inode *inode, struct file *file)
  {
struct drm_i915_private *i915 = inode->i_private;
-   intel_gt_pm_debugfs_forcewake_user_release(to_gt(i915));
+   struct intel_gt *gt;
+   unsigned int i;
+
+   for_each_gt(gt, i915, i)
+   intel_gt_pm_debugfs_forcewake_user_release(gt);
  
  	return 0;

  }


Re: [RFC PATCH 2/2] drm/i915/display: Implement fb_mmap callback function

2023-03-17 Thread Das, Nirmoy

Hi Jani,

On 3/17/2023 10:39 AM, Jani Nikula wrote:

On Thu, 16 Mar 2023, Nirmoy Das  wrote:

If stolen memory allocation fails for fbdev, the driver will
fallback to system memory. Calculation of smem_start is wrong
for such framebuffer objs if the platform comes with no gmadr or
no aperture. Solve this by adding fb_mmap callback which will
use GTT if aperture is available otherwise will use cpu to access
the framebuffer.

Cc: Matthew Auld 
Cc: Andi Shyti 
Cc: Ville Syrjälä 
Cc: Jani Nikula 
Cc: Imre Deak 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/display/intel_fbdev.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index 673bcdfb7ff6..51d6fa034b00 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -40,8 +40,10 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "gem/i915_gem_lmem.h"

+#include "gem/i915_gem_mman.h"
  
  #include "i915_drv.h"

  #include "intel_display_types.h"
@@ -120,6 +122,16 @@ static int intel_fbdev_pan_display(struct 
fb_var_screeninfo *var,
return ret;
  }
  
+#define to_intel_fbdev(x) container_of(x, struct intel_fbdev, helper)

I'd add that as a function (rather than a macro) in a separate patch,
converting the existing users while at it.



Now I do see there are 5 instance of  this conversion. I will convert 
that into a function in a separate patch.



Thanks,

Nirmoy



BR,
Jani.



+static int intel_fbdev_mmap(struct fb_info *info, struct vm_area_struct *vma)
+{
+   struct intel_fbdev *fbdev = to_intel_fbdev(info->par);
+   struct drm_gem_object *bo = drm_gem_fb_get_obj(>fb->base, 0);
+   struct drm_i915_gem_object *obj = to_intel_bo(bo);
+
+   return i915_gem_fb_mmap(obj, vma);
+}
+
  static const struct fb_ops intelfb_ops = {
.owner = THIS_MODULE,
DRM_FB_HELPER_DEFAULT_OPS,
@@ -131,6 +143,7 @@ static const struct fb_ops intelfb_ops = {
.fb_imageblit = drm_fb_helper_cfb_imageblit,
.fb_pan_display = intel_fbdev_pan_display,
.fb_blank = intel_fbdev_blank,
+   .fb_mmap = intel_fbdev_mmap,
  };
  
  static int intelfb_alloc(struct drm_fb_helper *helper,


Re: [PATCH] drm/i915/gem: Clarify seemingly unaccounted obj refcount inc

2023-03-16 Thread Das, Nirmoy



On 3/15/2023 11:54 AM, Nirmoy Das wrote:

Add a comment why there is a obj refcount inc before installing
the vm_ops for the mmap call. Also remove the invalid older comment
as drm API(drm_gem_prime_mmap()) will hold an obj reference before
calling this driver mmap callback so we can't have 0-refcnted
object here.

Cc: Matthew Auld 
Cc: Andi Shyti 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index d3c1dee16af2..0bc8c3818443 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -952,9 +952,10 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct 
*vma)
  vma_pages(vma));
if (node && drm_vma_node_is_allowed(node, priv)) {
/*
-* Skip 0-refcnted objects as it is in the process of being
-* destroyed and will be invalid when the vma manager lock
-* is released.



This valid. Matt pointed out a case when user close the obj and call 
mmap and driver only have the fake offset to refer the object


and can end up calling this while driver is freeing the obj.


I will resend with keeping the valid comment.


Nirmoy


+* When we install vm_ops for mmap we are too late for
+* the vm_ops->open() which increases the ref_count of
+* this obj and then it gets decreased by the vm_ops->close().
+* To balance this increase the obj ref_count here.
 */
if (!node->driver_private) {
mmo = container_of(node, struct i915_mmap_offset, 
vma_node);


Re: [PATCH] drm/i915: Simplify vcs/bsc engine selection

2023-03-16 Thread Das, Nirmoy



On 3/16/2023 3:27 PM, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

No need to look at the mask of present engines when we already have a
count stored ever since e2d0ff3525b9 ("drm/i915: Count engine instances
per uabi class").

Signed-off-by: Tvrtko Ursulin 
Cc: Jonathan Cavitt 



Reviewed-by: Nirmoy Das 


---
  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 +++---
  1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9dce2957b4e5..3aeede6aee4d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2449,11 +2449,6 @@ static int eb_submit(struct i915_execbuffer *eb)
return err;
  }
  
-static int num_vcs_engines(struct drm_i915_private *i915)

-{
-   return hweight_long(VDBOX_MASK(to_gt(i915)));
-}
-
  /*
   * Find one BSD ring to dispatch the corresponding BSD command.
   * The engine index is returned.
@@ -2467,7 +2462,7 @@ gen8_dispatch_bsd_engine(struct drm_i915_private 
*dev_priv,
/* Check whether the file_priv has already selected one ring. */
if ((int)file_priv->bsd_engine < 0)
file_priv->bsd_engine =
-   get_random_u32_below(num_vcs_engines(dev_priv));
+   
get_random_u32_below(dev_priv->engine_uabi_class_count[I915_ENGINE_CLASS_VIDEO]);
  
  	return file_priv->bsd_engine;

  }
@@ -2655,7 +2650,8 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
return -1;
}
  
-	if (user_ring_id == I915_EXEC_BSD && num_vcs_engines(i915) > 1) {

+   if (user_ring_id == I915_EXEC_BSD &&
+   i915->engine_uabi_class_count[I915_ENGINE_CLASS_VIDEO] > 1) {
unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
  
  		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {


Re: [Intel-gfx] [PATCH v6 2/2] drm/i915: add guard page to ggtt->error_capture

2023-03-13 Thread Das, Nirmoy



On 3/10/2023 10:23 AM, Andrzej Hajda wrote:

Write-combining memory allows speculative reads by CPU.
ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
to prefetch memory beyond the error_capture, ie it tries
to read memory pointed by next PTE in GGTT.
If this PTE points to invalid address DMAR errors will occur.
This behaviour was observed on ADL and RPL platforms.
To avoid it, guard scratch page should be added after error_capture.
The patch fixes the most annoying issue with error capture but
since WC reads are used also in other places there is a risk similar
problem can affect them as well.

v2:
   - modified commit message (I hope the diagnosis is correct),
   - added bug checks to ensure scratch is initialized on gen3 platforms.
 CI produces strange stacktrace for it suggesting scratch[0] is NULL,
 to be removed after resolving the issue with gen3 platforms.
v3:
   - removed bug checks, replaced with gen check.
v4:
   - change code for scratch page insertion to support all platforms,
   - add info in commit message there could be more similar issues
v5:
   - check for nop_clear_range instead of gen8 (Tvrtko),
   - re-insert scratch pages on resume (Tvrtko)
v6:
   - use scratch_range callback to set scratch pages (Chris)

Signed-off-by: Andrzej Hajda 
Reviewed-by: Andi Shyti 

Acked-by: Nirmoy Das 

---
  drivers/gpu/drm/i915/gt/intel_ggtt.c | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 38e6f0b207fe0c..5ef7e03b11c8e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -572,8 +572,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
 * paths, and we trust that 0 will remain reserved. However,
 * the only likely reason for failure to insert is a driver
 * bug, which we expect to cause other failures...
+*
+* Since CPU can perform speculative reads on error capture
+* (write-combining allows it) add scratch page after error
+* capture to avoid DMAR errors.
 */
-   ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
+   ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
if (drm_mm_reserve_node(>vm.mm, >error_capture))
drm_mm_insert_node_in_range(>vm.mm,
@@ -583,11 +587,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
0, ggtt->mappable_end,
DRM_MM_INSERT_LOW);
}
-   if (drm_mm_node_allocated(>error_capture))
+   if (drm_mm_node_allocated(>error_capture)) {
+   u64 start = ggtt->error_capture.start;
+   u64 size = ggtt->error_capture.size;
+
+   ggtt->vm.scratch_range(>vm, start, size);
drm_dbg(>vm.i915->drm,
"Reserved GGTT:[%llx, %llx] for use by error capture\n",
-   ggtt->error_capture.start,
-   ggtt->error_capture.start + ggtt->error_capture.size);
+   start, start + size);
+   }
  
  	/*

 * The upper portion of the GuC address space has a sizeable hole
@@ -1280,6 +1288,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
  
  	flush = i915_ggtt_resume_vm(>vm);
  
+	if (drm_mm_node_allocated(>error_capture))

+   ggtt->vm.scratch_range(>vm, ggtt->error_capture.start,
+  ggtt->error_capture.size);
+
ggtt->invalidate(ggtt);
  
  	if (flush)




Re: [PATCH v6 1/2] drm/i915/gt: introduce vm->scratch_range callback

2023-03-13 Thread Das, Nirmoy



On 3/10/2023 10:23 AM, Andrzej Hajda wrote:

The callback will be responsible for setting scratch page PTEs for
specified range. In contrast to clear_range it cannot be optimized to nop.
It will be used by code adding guard pages.

Signed-off-by: Andrzej Hajda 

Reviewed-by: Nirmoy Das 

---
  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 23 +++
  drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |  1 +
  drivers/gpu/drm/i915/gt/intel_gtt.h   |  2 ++
  3 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 842e69c7b21e49..38e6f0b207fe0c 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -291,6 +291,27 @@ static void gen8_ggtt_insert_entries(struct 
i915_address_space *vm,
ggtt->invalidate(ggtt);
  }
  
+static void gen8_ggtt_clear_range(struct i915_address_space *vm,

+ u64 start, u64 length)
+{
+   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+   unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
+   unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
+   const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
+   gen8_pte_t __iomem *gtt_base =
+   (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
+   const int max_entries = ggtt_total_entries(ggtt) - first_entry;
+   int i;
+
+   if (WARN(num_entries > max_entries,
+   "First entry = %d; Num entries = %d (max=%d)\n",
+   first_entry, num_entries, max_entries))
+   num_entries = max_entries;
+
+   for (i = 0; i < num_entries; i++)
+   gen8_set_pte(_base[i], scratch_pte);
+}
+
  static void gen6_ggtt_insert_page(struct i915_address_space *vm,
  dma_addr_t addr,
  u64 offset,
@@ -919,6 +940,7 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.cleanup = gen6_gmch_remove;
ggtt->vm.insert_page = gen8_ggtt_insert_page;
ggtt->vm.clear_range = nop_clear_range;
+   ggtt->vm.scratch_range = gen8_ggtt_clear_range;
  
  	ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
  
@@ -1082,6 +1104,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)

ggtt->vm.clear_range = nop_clear_range;
if (!HAS_FULL_PPGTT(i915))
ggtt->vm.clear_range = gen6_ggtt_clear_range;
+   ggtt->vm.scratch_range = gen6_ggtt_clear_range;
ggtt->vm.insert_page = gen6_ggtt_insert_page;
ggtt->vm.insert_entries = gen6_ggtt_insert_entries;
ggtt->vm.cleanup = gen6_gmch_remove;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
index 77c793812eb46a..d6a74ae2527bd9 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
@@ -102,6 +102,7 @@ int intel_ggtt_gmch_probe(struct i915_ggtt *ggtt)
ggtt->vm.insert_page = gmch_ggtt_insert_page;
ggtt->vm.insert_entries = gmch_ggtt_insert_entries;
ggtt->vm.clear_range = gmch_ggtt_clear_range;
+   ggtt->vm.scratch_range = gmch_ggtt_clear_range;
ggtt->vm.cleanup = gmch_ggtt_remove;
  
  	ggtt->invalidate = gmch_ggtt_invalidate;

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 5a775310d3fcb5..69ce55f517f567 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -298,6 +298,8 @@ struct i915_address_space {
  u64 start, u64 length);
void (*clear_range)(struct i915_address_space *vm,
u64 start, u64 length);
+   void (*scratch_range)(struct i915_address_space *vm,
+ u64 start, u64 length);
void (*insert_page)(struct i915_address_space *vm,
dma_addr_t addr,
u64 offset,



Re: [PATCH RFC 3/3] drm/i915/display: Implement fb_mmap callback function

2023-03-07 Thread Das, Nirmoy

Hi Ville,

On 3/6/2023 3:32 PM, Ville Syrjälä wrote:

On Mon, Mar 06, 2023 at 11:28:50AM +0100, Nirmoy Das wrote:

If stolen memory allocation fails for fbdev, the driver will
fallback to system memory. Calculation of smem_start is wrong
for such framebuffer objs if the platform comes with no gmadr or
no aperture. Solve this by adding fb_mmap callback which also gives
driver more control.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/display/intel_fbdev.c | 20 
  1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index 98ae3a3a986a..ed0f9e2af3ed 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -40,8 +40,10 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "gem/i915_gem_lmem.h"

+#include "gem/i915_gem_mman.h"
  
  #include "i915_drv.h"

  #include "intel_display_types.h"
@@ -120,6 +122,23 @@ static int intel_fbdev_pan_display(struct 
fb_var_screeninfo *var,
return ret;
  }
  
+#define to_intel_fbdev(x) container_of(x, struct intel_fbdev, helper)

+static int intel_fbdev_mmap(struct fb_info *info, struct vm_area_struct *vma)
+{
+   struct intel_fbdev *fbdev = to_intel_fbdev(info->par);
+   struct drm_gem_object *bo = drm_gem_fb_get_obj(>fb->base, 0);
+   struct drm_i915_gem_object *obj = to_intel_bo(bo);
+   struct drm_device *dev = fbdev->helper.dev;

You seem to be missing the fb vs. mmio handling here entirely.



Could you please expand this more, I am not so familiar to fbdev code.





+
+   vma->vm_page_prot =
+   pgprot_writecombine(vm_get_page_prot(vma->vm_flags));

Does that do something sane on eg. !PAT?


+
+   if (obj->stolen)
+   return vm_iomap_memory(vma, info->fix.smem_start,
+  info->fix.smem_len);

Why doesn't i915_gem_object_mmap() know how to handle stolen?



Sent out another rfc series to address this.


Regards,

Nirmoy




+
+   return i915_gem_object_mmap(obj, vma);
+}
  static const struct fb_ops intelfb_ops = {
.owner = THIS_MODULE,
DRM_FB_HELPER_DEFAULT_OPS,
@@ -131,6 +150,7 @@ static const struct fb_ops intelfb_ops = {
.fb_imageblit = drm_fb_helper_cfb_imageblit,
.fb_pan_display = intel_fbdev_pan_display,
.fb_blank = intel_fbdev_blank,
+   .fb_mmap = intel_fbdev_mmap,
  };
  
  static int intelfb_alloc(struct drm_fb_helper *helper,

--
2.39.0


Re: [Intel-gfx] [PATCH 1/3] drm/i915: Set I915_BO_ALLOC_USER for framebuffer

2023-03-06 Thread Das, Nirmoy



On 3/6/2023 6:30 PM, Ville Syrjälä wrote:

On Mon, Mar 06, 2023 at 05:22:19PM +0100, Das, Nirmoy wrote:

On 3/6/2023 3:21 PM, Ville Syrjälä wrote:

On Mon, Mar 06, 2023 at 11:28:48AM +0100, Nirmoy Das wrote:

Framebuffer is exposed to userspace so set I915_BO_ALLOC_USER
flag for it. This also make sure that ttm allocates offset
for lmem objects.

I have no idea what that means.

Sorry for poor explanation.

Without I915_BO_ALLOC_USER, ttm will assume the obj as kernel buffer and
will not allocate fake offset which I needed for fb_mmap callback to work.

So that's the fake vm_pgoff thing? Doesn't that exist just so
mmap() through /dev/dri* can be passed a "gem handle"?
With fbdev mmap we already know which BO we want to map so
why would any of that stuff even be needed?



I was mainly concentrating on  using drm mmap API to achieve fb_mmap 
which eventually will call i915_gem_mmap()


and expects a  fake offset for the obj. I see your point: fb_mmap can be 
done without using drm mmap API which should be much simple . I will 
look into this and resend.



Thanks,

Nirmoy


Regards,
Nirmoy


Signed-off-by: Nirmoy Das 
---
   drivers/gpu/drm/i915/display/intel_dpt.c   | 4 +++-
   drivers/gpu/drm/i915/display/intel_fbdev.c | 3 ++-
   drivers/gpu/drm/i915/display/intel_plane_initial.c | 3 ++-
   3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index ad1a37b515fb..2e6238881860 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -254,7 +254,9 @@ intel_dpt_create(struct intel_framebuffer *fb)
   
   	size = round_up(size * sizeof(gen8_pte_t), I915_GTT_PAGE_SIZE);
   
-	dpt_obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_CONTIGUOUS);

+   dpt_obj = i915_gem_object_create_lmem(i915, size,
+ I915_BO_ALLOC_CONTIGUOUS |
+ I915_BO_ALLOC_USER);
if (IS_ERR(dpt_obj) && i915_ggtt_has_aperture(to_gt(i915)->ggtt))
dpt_obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) {
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index 3659350061a7..98ae3a3a986a 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -163,7 +163,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
obj = ERR_PTR(-ENODEV);
if (HAS_LMEM(dev_priv)) {
obj = i915_gem_object_create_lmem(dev_priv, size,
- I915_BO_ALLOC_CONTIGUOUS);
+ I915_BO_ALLOC_CONTIGUOUS |
+ I915_BO_ALLOC_USER);
} else {
/*
 * If the FB is too big, just don't use it since fbdev is not 
very
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c 
b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index bb6ea7de5c61..4a3680f6a3f5 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -110,7 +110,8 @@ initial_plane_vma(struct drm_i915_private *i915,
size * 2 > i915->dsm.usable_size)
return NULL;
   
-	obj = i915_gem_object_create_region_at(mem, phys_base, size, 0);

+   obj = i915_gem_object_create_region_at(mem, phys_base, size,
+  I915_BO_ALLOC_USER);
if (IS_ERR(obj))
return NULL;
   
--

2.39.0


Re: [Intel-gfx] [PATCH 1/3] drm/i915: Set I915_BO_ALLOC_USER for framebuffer

2023-03-06 Thread Das, Nirmoy



On 3/6/2023 3:21 PM, Ville Syrjälä wrote:

On Mon, Mar 06, 2023 at 11:28:48AM +0100, Nirmoy Das wrote:

Framebuffer is exposed to userspace so set I915_BO_ALLOC_USER
flag for it. This also make sure that ttm allocates offset
for lmem objects.

I have no idea what that means.


Sorry for poor explanation.

Without I915_BO_ALLOC_USER, ttm will assume the obj as kernel buffer and 
will not allocate fake offset which I needed for fb_mmap callback to work.


Regards,
Nirmoy




Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/display/intel_dpt.c   | 4 +++-
  drivers/gpu/drm/i915/display/intel_fbdev.c | 3 ++-
  drivers/gpu/drm/i915/display/intel_plane_initial.c | 3 ++-
  3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index ad1a37b515fb..2e6238881860 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -254,7 +254,9 @@ intel_dpt_create(struct intel_framebuffer *fb)
  
  	size = round_up(size * sizeof(gen8_pte_t), I915_GTT_PAGE_SIZE);
  
-	dpt_obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_CONTIGUOUS);

+   dpt_obj = i915_gem_object_create_lmem(i915, size,
+ I915_BO_ALLOC_CONTIGUOUS |
+ I915_BO_ALLOC_USER);
if (IS_ERR(dpt_obj) && i915_ggtt_has_aperture(to_gt(i915)->ggtt))
dpt_obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) {
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c
index 3659350061a7..98ae3a3a986a 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -163,7 +163,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
obj = ERR_PTR(-ENODEV);
if (HAS_LMEM(dev_priv)) {
obj = i915_gem_object_create_lmem(dev_priv, size,
- I915_BO_ALLOC_CONTIGUOUS);
+ I915_BO_ALLOC_CONTIGUOUS |
+ I915_BO_ALLOC_USER);
} else {
/*
 * If the FB is too big, just don't use it since fbdev is not 
very
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c 
b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index bb6ea7de5c61..4a3680f6a3f5 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -110,7 +110,8 @@ initial_plane_vma(struct drm_i915_private *i915,
size * 2 > i915->dsm.usable_size)
return NULL;
  
-	obj = i915_gem_object_create_region_at(mem, phys_base, size, 0);

+   obj = i915_gem_object_create_region_at(mem, phys_base, size,
+  I915_BO_ALLOC_USER);
if (IS_ERR(obj))
return NULL;
  
--

2.39.0


Re: [PATCH 2/3] drm/i915: Add a helper func for gem obj mmap

2023-03-06 Thread Das, Nirmoy



On 3/6/2023 3:26 PM, Ville Syrjälä wrote:

On Mon, Mar 06, 2023 at 11:28:49AM +0100, Nirmoy Das wrote:

Move gem obj mmap code to i915_gem_object_mmap() so that
this can be used by others.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 20 ++---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c   | 25 ++
  drivers/gpu/drm/i915/gem/i915_gem_mman.h   |  1 +
  3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index fd556a076d05..831dd8ebf819 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -12,6 +12,7 @@
  #include 
  
  #include "gem/i915_gem_dmabuf.h"

+#include "gem/i915_gem_mman.h"
  #include "i915_drv.h"
  #include "i915_gem_object.h"
  #include "i915_scatterlist.h"
@@ -94,27 +95,10 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf,
  static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct 
vm_area_struct *vma)
  {
struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
-   struct drm_i915_private *i915 = to_i915(obj->base.dev);
-   int ret;
  
  	dma_resv_assert_held(dma_buf->resv);
  
-	if (obj->base.size < vma->vm_end - vma->vm_start)

-   return -EINVAL;
-
-   if (HAS_LMEM(i915))
-   return drm_gem_prime_mmap(>base, vma);
-
-   if (!obj->base.filp)
-   return -ENODEV;
-
-   ret = call_mmap(obj->base.filp, vma);
-   if (ret)
-   return ret;
-
-   vma_set_file(vma, obj->base.filp);
-
-   return 0;
+   return i915_gem_object_mmap(obj, vma);
  }
  
  static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 2aac6bf78740..d378720ca626 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -11,6 +11,8 @@
  
  #include 
  
+#include "gem/i915_gem_lmem.h"

+
  #include "gt/intel_gt.h"
  #include "gt/intel_gt_requests.h"
  
@@ -1043,6 +1045,29 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)

return 0;
  }
  
+int i915_gem_object_mmap(struct drm_i915_gem_object *obj, struct vm_area_struct *vma)

+{
+   struct drm_i915_private *i915 = to_i915(obj->base.dev);
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   if (HAS_LMEM(i915))
+   return drm_gem_prime_mmap(>base, vma);

Calling some prime stuff here doesn't smell right.


Yes, I should use drm_gem_mmap_obj() here.





+
+   if (obj->base.filp) {
+   ret = call_mmap(obj->base.filp, vma);
+   if (ret)
+   return ret;
+
+   vma_set_file(vma, obj->base.filp);
+   return 0;
+   }
+
+   return -ENODEV;
+}
+
  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
  #include "selftests/i915_gem_mman.c"
  #endif
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
index 1fa91b3033b3..303e81ddc5ba 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
@@ -30,4 +30,5 @@ void i915_gem_object_release_mmap_gtt(struct 
drm_i915_gem_object *obj);
  void i915_gem_object_runtime_pm_release_mmap_offset(struct 
drm_i915_gem_object *obj);
  void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
  
+int i915_gem_object_mmap(struct drm_i915_gem_object *obj, struct vm_area_struct *vma);

  #endif
--
2.39.0


Re: [Intel-gfx] [PATCH 1/3] drm/i915: Set I915_BO_ALLOC_USER for framebuffer

2023-03-06 Thread Das, Nirmoy



On 3/6/2023 2:49 PM, Matthew Auld wrote:

On 06/03/2023 13:31, Das, Nirmoy wrote:

Hi Matt,

On 3/6/2023 1:25 PM, Matthew Auld wrote:

On 06/03/2023 12:07, Nirmoy Das wrote:

Framebuffer is exposed to userspace so set I915_BO_ALLOC_USER
flag for it. This also make sure that ttm allocates offset
for lmem objects.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/display/intel_dpt.c   | 4 +++-
  drivers/gpu/drm/i915/display/intel_fbdev.c | 3 ++-
  drivers/gpu/drm/i915/display/intel_plane_initial.c | 3 ++-
  3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c

index ad1a37b515fb..2e6238881860 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -254,7 +254,9 @@ intel_dpt_create(struct intel_framebuffer *fb)
    size = round_up(size * sizeof(gen8_pte_t), 
I915_GTT_PAGE_SIZE);
  -    dpt_obj = i915_gem_object_create_lmem(i915, size, 
I915_BO_ALLOC_CONTIGUOUS);

+    dpt_obj = i915_gem_object_create_lmem(i915, size,
+  I915_BO_ALLOC_CONTIGUOUS |
+  I915_BO_ALLOC_USER);


AFAICT this is just some driver internal stuff for display 
page-table, which gets mapped through GGTT or something, and is not 
the actual fb. Is it really exposed to the user?



I misunderstood this for something else. I will remove this.



  if (IS_ERR(dpt_obj) && 
i915_ggtt_has_aperture(to_gt(i915)->ggtt))

  dpt_obj = i915_gem_object_create_stolen(i915, size);
  if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) {
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c

index 3659350061a7..98ae3a3a986a 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -163,7 +163,8 @@ static int intelfb_alloc(struct drm_fb_helper 
*helper,

  obj = ERR_PTR(-ENODEV);
  if (HAS_LMEM(dev_priv)) {
  obj = i915_gem_object_create_lmem(dev_priv, size,
-  I915_BO_ALLOC_CONTIGUOUS);
+  I915_BO_ALLOC_CONTIGUOUS |
+  I915_BO_ALLOC_USER);
  } else {
  /*
   * If the FB is too big, just don't use it since fbdev is 
not very
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c 
b/drivers/gpu/drm/i915/display/intel_plane_initial.c

index bb6ea7de5c61..4a3680f6a3f5 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -110,7 +110,8 @@ initial_plane_vma(struct drm_i915_private *i915,
  size * 2 > i915->dsm.usable_size)
  return NULL;
  -    obj = i915_gem_object_create_region_at(mem, phys_base, size, 
0);

+    obj = i915_gem_object_create_region_at(mem, phys_base, size,
+   I915_BO_ALLOC_USER);


ALLOC_USER has the side effect of also zeroing the memory 
underneath, IIRC. However this here is the pre-allocated fb (will 
have some boot logo stuff), so we shouldn't ever clear it.



This was my concern.  I wonder if there is any other better way than 
to use a temp buffer to copy the pre-allocated content and put it 
back after getting i915_gem_object_create_region_at().


If we need ALLOC_USER for this buffer then maybe just a new flag like 
BO_PREALLOCATED which skips all the clearing?



Sounds good, I will look into that.


Thanks,

Nirmoy






Regards,

Nirmoy





  if (IS_ERR(obj))
  return NULL;


Re: [Intel-gfx] [PATCH 1/3] drm/i915: Set I915_BO_ALLOC_USER for framebuffer

2023-03-06 Thread Das, Nirmoy

Hi Matt,

On 3/6/2023 1:25 PM, Matthew Auld wrote:

On 06/03/2023 12:07, Nirmoy Das wrote:

Framebuffer is exposed to userspace so set I915_BO_ALLOC_USER
flag for it. This also make sure that ttm allocates offset
for lmem objects.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/display/intel_dpt.c   | 4 +++-
  drivers/gpu/drm/i915/display/intel_fbdev.c | 3 ++-
  drivers/gpu/drm/i915/display/intel_plane_initial.c | 3 ++-
  3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c

index ad1a37b515fb..2e6238881860 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -254,7 +254,9 @@ intel_dpt_create(struct intel_framebuffer *fb)
    size = round_up(size * sizeof(gen8_pte_t), I915_GTT_PAGE_SIZE);
  -    dpt_obj = i915_gem_object_create_lmem(i915, size, 
I915_BO_ALLOC_CONTIGUOUS);

+    dpt_obj = i915_gem_object_create_lmem(i915, size,
+  I915_BO_ALLOC_CONTIGUOUS |
+  I915_BO_ALLOC_USER);


AFAICT this is just some driver internal stuff for display page-table, 
which gets mapped through GGTT or something, and is not the actual fb. 
Is it really exposed to the user?



I misunderstood this for something else. I will remove this.




  if (IS_ERR(dpt_obj) && i915_ggtt_has_aperture(to_gt(i915)->ggtt))
  dpt_obj = i915_gem_object_create_stolen(i915, size);
  if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) {
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c 
b/drivers/gpu/drm/i915/display/intel_fbdev.c

index 3659350061a7..98ae3a3a986a 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev.c
@@ -163,7 +163,8 @@ static int intelfb_alloc(struct drm_fb_helper 
*helper,

  obj = ERR_PTR(-ENODEV);
  if (HAS_LMEM(dev_priv)) {
  obj = i915_gem_object_create_lmem(dev_priv, size,
-  I915_BO_ALLOC_CONTIGUOUS);
+  I915_BO_ALLOC_CONTIGUOUS |
+  I915_BO_ALLOC_USER);
  } else {
  /*
   * If the FB is too big, just don't use it since fbdev is 
not very
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c 
b/drivers/gpu/drm/i915/display/intel_plane_initial.c

index bb6ea7de5c61..4a3680f6a3f5 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -110,7 +110,8 @@ initial_plane_vma(struct drm_i915_private *i915,
  size * 2 > i915->dsm.usable_size)
  return NULL;
  -    obj = i915_gem_object_create_region_at(mem, phys_base, size, 0);
+    obj = i915_gem_object_create_region_at(mem, phys_base, size,
+   I915_BO_ALLOC_USER);


ALLOC_USER has the side effect of also zeroing the memory underneath, 
IIRC. However this here is the pre-allocated fb (will have some boot 
logo stuff), so we shouldn't ever clear it.



This was my concern.  I wonder if there is any other better way than to 
use a temp buffer to copy the pre-allocated content and put it back 
after getting i915_gem_object_create_region_at().



Regards,

Nirmoy





  if (IS_ERR(obj))
  return NULL;


Re: [PATCH] drm/i915: Make sure dsm_size has correct granularity

2023-02-03 Thread Das, Nirmoy

Hi Lucas,

On 2/3/2023 7:56 PM, Lucas De Marchi wrote:

On Thu, Feb 02, 2023 at 07:02:43PM +0100, Nirmoy Das wrote:

DSM granularity is 1MB so make sure we stick to that.


I think we need to be a bit more verbose here, because in future we may
need to refer to this commit if/when things change (e.g. the granularity
or the additional size needed on top of DSM).

The issue this is fixing is that the address set by firmware in 
GEN12_DSMBASE
and read here doesn't mean "anything above it until the of lmem is 
part of DSM".

There may be a few KB that is not part of DSM. How large is that space
is platform-dependent, but since it's always less than the DSM
granularity, it can be simplified by simply aligning the size like
is done here.



v2: replace "1 * SZ_1M" with SZ_1M (Andrzej).

Cc: Matthew Auld 
Suggested-by: Lucas De Marchi 
Signed-off-by: Nirmoy Das 
Reviewed-by: Andrzej Hajda 



Reviewed-by: Lucas De Marchi 

Are you ok with me amending the commit message and applying?



Yes, I fine that, thanks for doing that. I agree this is very short 
description that I have wrote.





After this patch I think you can follow the process to request committer
access.



Looking for to doing that :)


Nirmoy



Lucas De Marchi


---
drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c

index 90a967374b1a..d8e06e783e30 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -909,7 +909,7 @@ i915_gem_stolen_lmem_setup(struct 
drm_i915_private *i915, u16 type,
    dsm_base = intel_uncore_read64(uncore, GEN12_DSMBASE) & 
GEN12_BDSM_MASK;

    if (WARN_ON(lmem_size < dsm_base))
    return ERR_PTR(-ENODEV);
-    dsm_size = lmem_size - dsm_base;
+    dsm_size = ALIGN_DOWN(lmem_size - dsm_base, SZ_1M);
}

io_size = dsm_size;
--
2.39.0



Re: [PATCH v2 5/6] drm/ttm: stop allocating a dummy resource for pipelined gutting

2023-01-30 Thread Das, Nirmoy



On 1/30/2023 1:06 PM, Matthew Auld wrote:

From: Christian König 

That should not be necessary any more when drivers should at least be
able to handle a move without a resource.

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
Signed-off-by: Matthew Auld 

Acked-by: Nirmoy Das 

---
  drivers/gpu/drm/ttm/ttm_bo_util.c | 15 ++-
  1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 7635d7d6b13b..d9d2b0903b22 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -704,30 +704,23 @@ EXPORT_SYMBOL(ttm_bo_move_sync_cleanup);
   */
  int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
  {
-   static const struct ttm_place sys_mem = { .mem_type = TTM_PL_SYSTEM };
struct ttm_buffer_object *ghost;
-   struct ttm_resource *sys_res;
struct ttm_tt *ttm;
int ret;
  
-	ret = ttm_resource_alloc(bo, _mem, _res);

-   if (ret)
-   return ret;
-
/* If already idle, no need for ghost object dance. */
if (dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP)) {
if (!bo->ttm) {
/* See comment below about clearing. */
ret = ttm_tt_create(bo, true);
if (ret)
-   goto error_free_sys_mem;
+   return ret;
} else {
ttm_tt_unpopulate(bo->bdev, bo->ttm);
if (bo->type == ttm_bo_type_device)
ttm_tt_mark_for_clear(bo->ttm);
}
ttm_resource_free(bo, >resource);
-   ttm_bo_assign_mem(bo, sys_res);
return 0;
}
  
@@ -744,7 +737,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)

ret = ttm_tt_create(bo, true);
swap(bo->ttm, ttm);
if (ret)
-   goto error_free_sys_mem;
+   return ret;
  
  	ret = ttm_buffer_object_transfer(bo, );

if (ret)
@@ -760,13 +753,9 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
dma_resv_unlock(>base._resv);
ttm_bo_put(ghost);
bo->ttm = ttm;
-   ttm_bo_assign_mem(bo, sys_res);
return 0;
  
  error_destroy_tt:

ttm_tt_destroy(bo->bdev, ttm);
-
-error_free_sys_mem:
-   ttm_resource_free(bo, _res);
return ret;
  }


Re: [PATCH v2 4/6] drm/ttm: stop allocating dummy resources during BO creation

2023-01-30 Thread Das, Nirmoy



On 1/30/2023 1:06 PM, Matthew Auld wrote:

From: Christian König 

That should not be necessary any more when drivers should at least be
able to handle the move without a resource.

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
Signed-off-by: Matthew Auld 

Acked-by: Nirmoy Das 

---
  drivers/gpu/drm/ttm/ttm_bo.c | 7 ---
  1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 773080f48864..169818b32be2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -952,7 +952,6 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct 
ttm_buffer_object *bo,
 struct sg_table *sg, struct dma_resv *resv,
 void (*destroy) (struct ttm_buffer_object *))
  {
-   static const struct ttm_place sys_mem = { .mem_type = TTM_PL_SYSTEM };
int ret;
  
  	kref_init(>kref);

@@ -969,12 +968,6 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct 
ttm_buffer_object *bo,
bo->base.resv = >base._resv;
atomic_inc(_glob.bo_count);
  
-	ret = ttm_resource_alloc(bo, _mem, >resource);

-   if (unlikely(ret)) {
-   ttm_bo_put(bo);
-   return ret;
-   }
-
/*
 * For ttm_bo_type_device buffers, allocate
 * address space from the device.


Re: [PATCH v2 3/6] drm/ttm: clear the ttm_tt when bo->resource is NULL

2023-01-30 Thread Das, Nirmoy



On 1/30/2023 1:06 PM, Matthew Auld wrote:

In the next few patches, when initially creating a ttm BO, the
bo->resource is NULL, and the driver is then expected to handle the
initial dummy move.  However, if this is created as a system resource
the first ttm_tt we create will always have the clear value set to
false. Previously the initial ttm_tt would be created in
ttm_bo_validate() with the clear parameter always set to true.

Signed-off-by: Matthew Auld 
Cc: Christian König 
Reviewed-by: Christian König 

Acked-by: Nirmoy Das 

---
  drivers/gpu/drm/ttm/ttm_bo.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 326a3d13a829..773080f48864 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -120,8 +120,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
bool old_use_tt, new_use_tt;
int ret;
  
-	old_use_tt = bo->resource &&

-   ttm_manager_type(bdev, bo->resource->mem_type)->use_tt;
+   old_use_tt = !bo->resource || ttm_manager_type(bdev, 
bo->resource->mem_type)->use_tt;
new_use_tt = ttm_manager_type(bdev, mem->mem_type)->use_tt;
  
  	ttm_bo_unmap_virtual(bo);


Re: [Intel-gfx] [PATCH v2 2/6] drm/i915/ttm: audit remaining bo->resource

2023-01-30 Thread Das, Nirmoy



On 1/30/2023 1:06 PM, Matthew Auld wrote:

In the near future TTM will have NULL bo->resource when the object is
initially created, plus after calling into pipeline-gutting. Try to
handle the remaining cases. In practice NULL bo->resource should be
taken to mean swapped-out or purged object.

v2 (Andrzej):
   - Rather make i915_ttm_cpu_maps_iomem() return false with NULL
 resource.

References: 516198d317d8 ("drm/i915: audit bo->resource usage v3")
Signed-off-by: Matthew Auld 
Cc: Christian König 
Cc: Nirmoy Das 
Reviewed-by: Andrzej Hajda 



Reviewed-by: Nirmoy Das 


---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 10 --
  drivers/gpu/drm/i915/gem/i915_gem_ttm.h  |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  4 
  drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c   |  7 +--
  4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 4758f21c91e1..341b94672abc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -472,7 +472,7 @@ static int i915_ttm_shrink(struct drm_i915_gem_object *obj, 
unsigned int flags)
struct ttm_placement place = {};
int ret;
  
-	if (!bo->ttm || bo->resource->mem_type != TTM_PL_SYSTEM)

+   if (!bo->ttm || i915_ttm_cpu_maps_iomem(bo->resource))
return 0;
  
  	GEM_BUG_ON(!i915_tt->is_shmem);

@@ -511,7 +511,13 @@ static void i915_ttm_delete_mem_notify(struct 
ttm_buffer_object *bo)
  {
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
  
-	if (bo->resource && !i915_ttm_is_ghost_object(bo)) {

+   /*
+* This gets called twice by ttm, so long as we have a ttm resource or
+* ttm_tt then we can still safely call this. Due to pipeline-gutting,
+* we maybe have NULL bo->resource, but in that case we should always
+* have a ttm alive (like if the pages are swapped out).
+*/
+   if ((bo->resource || bo->ttm) && !i915_ttm_is_ghost_object(bo)) {
__i915_gem_object_pages_fini(obj);
i915_ttm_free_cached_io_rsgt(obj);
}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index 2a94a99ef76b..f8f6bed1b297 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -98,7 +98,7 @@ static inline bool i915_ttm_gtt_binds_lmem(struct 
ttm_resource *mem)
  static inline bool i915_ttm_cpu_maps_iomem(struct ttm_resource *mem)
  {
/* Once / if we support GGTT, this is also false for cached ttm_tts */
-   return mem->mem_type != I915_PL_SYSTEM;
+   return mem && mem->mem_type != I915_PL_SYSTEM;
  }
  
  bool i915_ttm_resource_mappable(struct ttm_resource *res);

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 76dd9e5e1a8b..d030182ca176 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -711,6 +711,10 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst,
  
  	assert_object_held(dst);

assert_object_held(src);
+
+   if (GEM_WARN_ON(!src_bo->resource || !dst_bo->resource))
+   return -EINVAL;
+
i915_deps_init(, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
  
  	ret = dma_resv_reserve_fences(src_bo->base.resv, 1);

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 7e67742bc65e..dfe39c8e74d8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -53,7 +53,7 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region 
*apply,
unsigned int flags;
int err = 0;
  
-	if (bo->resource->mem_type == I915_PL_SYSTEM || obj->ttm.backup)

+   if (!i915_ttm_cpu_maps_iomem(bo->resource) || obj->ttm.backup)
return 0;
  
  	if (pm_apply->allow_gpu && i915_gem_object_evictable(obj))

@@ -187,7 +187,10 @@ static int i915_ttm_restore(struct 
i915_gem_apply_to_region *apply,
return err;
  
  	/* Content may have been swapped. */

-   err = ttm_tt_populate(backup_bo->bdev, backup_bo->ttm, );
+   if (!backup_bo->resource)
+   err = ttm_bo_validate(backup_bo, i915_ttm_sys_placement(), 
);
+   if (!err)
+   err = ttm_tt_populate(backup_bo->bdev, backup_bo->ttm, );
if (!err) {
err = i915_gem_obj_copy_ttm(obj, backup, pm_apply->allow_gpu,
false);


Re: [PATCH v2 1/6] drm/i915/ttm: fix sparse warning

2023-01-30 Thread Das, Nirmoy



On 1/30/2023 1:06 PM, Matthew Auld wrote:

Sparse complains with:

drivers/gpu/drm/i915/gem/i915_gem_ttm.c:1066:21: sparse:
expected restricted vm_fault_t [assigned] [usertype] ret
drivers/gpu/drm/i915/gem/i915_gem_ttm.c:1066:21: sparse: got int

Fixes: 516198d317d8 ("drm/i915: audit bo->resource usage v3")
Reported-by: kernel test robot 
Signed-off-by: Matthew Auld 
Cc: Christian König 
Reviewed-by: Andrzej Hajda 

Reviewed-by: Nirmoy Das 

---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 7420276827a5..4758f21c91e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1067,11 +1067,12 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
.interruptible = true,
.no_wait_gpu = true, /* should be idle already */
};
+   int err;
  
  		GEM_BUG_ON(!bo->ttm || !(bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED));
  
-		ret = ttm_bo_validate(bo, i915_ttm_sys_placement(), );

-   if (ret) {
+   err = ttm_bo_validate(bo, i915_ttm_sys_placement(), );
+   if (err) {
dma_resv_unlock(bo->base.resv);
return VM_FAULT_SIGBUS;
}


Re: [Intel-gfx] [PATCH 1/2] drm/drm_vma_manager: Add drm_vma_node_allow_once()

2023-01-19 Thread Das, Nirmoy



On 1/19/2023 2:25 PM, Maxime Ripard wrote:

On Tue, 17 Jan 2023 18:52:35 +0100, Nirmoy Das wrote:

Currently there is no easy way for a drm driver to safely check and allow
drm_vma_offset_node for a drm file just once. Allow drm drivers to call
non-refcounted version of drm_vma_node_allow() so that a driver doesn't
need to keep track of each drm_vma_node_allow() to call subsequent
drm_vma_node_revoke() to prevent memory leak.

Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Andi Shyti 

[...]

Applied to drm/drm-misc (drm-misc-fixes).



Thanks Maxime!



Thanks!
Maxime


Re: [Intel-gfx] [PATCH 2/2] drm/i915: Fix a memory leak with reused mmap_offset

2023-01-18 Thread Das, Nirmoy



On 1/18/2023 11:26 AM, Mirsad Todorovac wrote:

Hi,

On 1/18/23 10:19, Tvrtko Ursulin wrote:

Thanks for working on this, it looks good to me and it aligns with 
how i915 uses the facility.


Copying Mirsad who reported the issue in case he is still happy to 
give it a quick test. Mirsad, I don't know if you are subscribed to 
one of the two mailing lists where series was posted. In case not, 
you can grab both patches from 
https://patchwork.freedesktop.org/series/112952/.


Nirmoy - we also have an IGT written by Chuansheng - 
https://patchwork.freedesktop.org/patch/515720/?series=101035=4. 
A more generic one could be placed in gem_mmap_offset test but this 
one works too in my testing and is IMO better than nothing.


Finally, let me add some tags below:

On 17/01/2023 17:52, Nirmoy Das wrote:

drm_vma_node_allow() and drm_vma_node_revoke() should be called in
balanced pairs. We call drm_vma_node_allow() once per-file everytime a
user calls mmap_offset, but only call drm_vma_node_revoke once per-file
on each mmap_offset. As the mmap_offset is reused by the client, the
per-file vm_count may remain non-zero and the rbtree leaked.

Call drm_vma_node_allow_once() instead to prevent that memory leak.

Cc: Tvrtko Ursulin 
Cc: Andi Shyti 


Fixes: 786555987207 ("drm/i915/gem: Store mmap_offsets in an rbtree 
rather than a plain list")

Reported-by: Chuansheng Liu 
Reported-by: Mirsad Todorovac 
Cc:  # v5.7+
Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c

index 4f69bff63068..2aac6bf78740 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -697,7 +697,7 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
  GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
  out:
  if (file)
-    drm_vma_node_allow(>vma_node, file);
+    drm_vma_node_allow_once(>vma_node, file);
  return mmo;
  err:


The drm/i915 patch seems OK and there are currently no memory leaks as of
reported by /sys/kernel/debug/kmemleak under the same Chrome load that 
triggered

the initial bug ...



Thanks, Mirsad for quickly checking this!


Nirmoy



Will post you if there are any changes.

Regards,
Mirsad



Re: [Intel-gfx] [PATCH 1/2] drm/drm_vma_manager: Add drm_vma_node_allow_once()

2023-01-18 Thread Das, Nirmoy



On 1/18/2023 10:38 AM, Andi Shyti wrote:

On Tue, Jan 17, 2023 at 06:52:35PM +0100, Nirmoy Das wrote:

Currently there is no easy way for a drm driver to safely check and allow
drm_vma_offset_node for a drm file just once. Allow drm drivers to call
non-refcounted version of drm_vma_node_allow() so that a driver doesn't
need to keep track of each drm_vma_node_allow() to call subsequent
drm_vma_node_revoke() to prevent memory leak.

Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Andi Shyti 


Next time, please, don't leave any spaces between tags.



I will keep that in my mind.




Suggested-by: Chris Wilson 
Signed-off-by: Nirmoy Das 

Reviewed-by: Andi Shyti 



Thanks,

Nirmoy



Thanks,
Andi


---
  drivers/gpu/drm/drm_vma_manager.c | 76 ++-
  include/drm/drm_vma_manager.h |  1 +
  2 files changed, 55 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/drm_vma_manager.c 
b/drivers/gpu/drm/drm_vma_manager.c
index 7de37f8c68fd..83229a031af0 100644
--- a/drivers/gpu/drm/drm_vma_manager.c
+++ b/drivers/gpu/drm/drm_vma_manager.c
@@ -240,27 +240,8 @@ void drm_vma_offset_remove(struct drm_vma_offset_manager 
*mgr,
  }
  EXPORT_SYMBOL(drm_vma_offset_remove);
  
-/**

- * drm_vma_node_allow - Add open-file to list of allowed users
- * @node: Node to modify
- * @tag: Tag of file to remove
- *
- * Add @tag to the list of allowed open-files for this node. If @tag is
- * already on this list, the ref-count is incremented.
- *
- * The list of allowed-users is preserved across drm_vma_offset_add() and
- * drm_vma_offset_remove() calls. You may even call it if the node is currently
- * not added to any offset-manager.
- *
- * You must remove all open-files the same number of times as you added them
- * before destroying the node. Otherwise, you will leak memory.
- *
- * This is locked against concurrent access internally.
- *
- * RETURNS:
- * 0 on success, negative error code on internal failure (out-of-mem)
- */
-int drm_vma_node_allow(struct drm_vma_offset_node *node, struct drm_file *tag)
+static int vma_node_allow(struct drm_vma_offset_node *node,
+ struct drm_file *tag, bool ref_counted)
  {
struct rb_node **iter;
struct rb_node *parent = NULL;
@@ -282,7 +263,8 @@ int drm_vma_node_allow(struct drm_vma_offset_node *node, 
struct drm_file *tag)
entry = rb_entry(*iter, struct drm_vma_offset_file, vm_rb);
  
  		if (tag == entry->vm_tag) {

-   entry->vm_count++;
+   if (ref_counted)
+   entry->vm_count++;
goto unlock;
} else if (tag > entry->vm_tag) {
iter = &(*iter)->rb_right;
@@ -307,8 +289,58 @@ int drm_vma_node_allow(struct drm_vma_offset_node *node, 
struct drm_file *tag)
kfree(new);
return ret;
  }
+
+/**
+ * drm_vma_node_allow - Add open-file to list of allowed users
+ * @node: Node to modify
+ * @tag: Tag of file to remove
+ *
+ * Add @tag to the list of allowed open-files for this node. If @tag is
+ * already on this list, the ref-count is incremented.
+ *
+ * The list of allowed-users is preserved across drm_vma_offset_add() and
+ * drm_vma_offset_remove() calls. You may even call it if the node is currently
+ * not added to any offset-manager.
+ *
+ * You must remove all open-files the same number of times as you added them
+ * before destroying the node. Otherwise, you will leak memory.
+ *
+ * This is locked against concurrent access internally.
+ *
+ * RETURNS:
+ * 0 on success, negative error code on internal failure (out-of-mem)
+ */
+int drm_vma_node_allow(struct drm_vma_offset_node *node, struct drm_file *tag)
+{
+   return vma_node_allow(node, tag, true);
+}
  EXPORT_SYMBOL(drm_vma_node_allow);
  
+/**

+ * drm_vma_node_allow_once - Add open-file to list of allowed users
+ * @node: Node to modify
+ * @tag: Tag of file to remove
+ *
+ * Add @tag to the list of allowed open-files for this node.
+ *
+ * The list of allowed-users is preserved across drm_vma_offset_add() and
+ * drm_vma_offset_remove() calls. You may even call it if the node is currently
+ * not added to any offset-manager.
+ *
+ * This is not ref-counted unlike drm_vma_node_allow() hence 
drm_vma_node_revoke()
+ * should only be called once after this.
+ *
+ * This is locked against concurrent access internally.
+ *
+ * RETURNS:
+ * 0 on success, negative error code on internal failure (out-of-mem)
+ */
+int drm_vma_node_allow_once(struct drm_vma_offset_node *node, struct drm_file 
*tag)
+{
+   return vma_node_allow(node, tag, false);
+}
+EXPORT_SYMBOL(drm_vma_node_allow_once);
+
  /**
   * drm_vma_node_revoke - Remove open-file from list of allowed users
   * @node: Node to modify
diff --git a/include/drm/drm_vma_manager.h b/include/drm/drm_vma_manager.h
index 4f8c35206f7c..6c2a2f21dbf0 100644
--- 

Re: [PATCH 2/2] drm_print: Remove deprecated DRM_DEBUG_KMS_RATELIMITED()

2023-01-18 Thread Das, Nirmoy



On 1/18/2023 7:27 AM, Christian König wrote:



Am 17.01.23 um 19:12 schrieb Das, Nirmoy:

Hi Alex,

On 1/17/2023 7:06 PM, Alex Deucher wrote:
On Tue, Jan 17, 2023 at 1:05 PM Nirmoy Das  
wrote:

There are no current users of DRM_DEBUG_KMS_RATELIMITED()
so remove it.

Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Sam Ravnborg 

Signed-off-by: Nirmoy Das 
Reviewed-by: Sam Ravnborg 

Series is:
Reviewed-by: Alex Deucher 

Feel free to take the patches through whatever tree you want.



Please help me with this, I don't have committer rights for any tree.


Going to push that into drm-misc-next later today.



Thanks, Christian.




Thanks,
Christian.




Nirmoy




Alex


---
  include/drm/drm_print.h | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index a44fb7ef257f..c3753da97c4e 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -605,9 +605,6 @@ void __drm_err(const char *format, ...);
  #define drm_dbg_kms_ratelimited(drm, fmt, ...) \
 __DRM_DEFINE_DBG_RATELIMITED(KMS, drm, fmt, ## __VA_ARGS__)

-/* NOTE: this is deprecated in favor of 
drm_dbg_kms_ratelimited(NULL, ...). */
-#define DRM_DEBUG_KMS_RATELIMITED(fmt, ...) 
drm_dbg_kms_ratelimited(NULL, fmt, ## __VA_ARGS__)

-
  /*
   * struct drm_device based WARNs
   *
--
2.39.0





Re: [PATCH 2/2] drm/i915: Fix a memory leak with reused mmap_offset

2023-01-18 Thread Das, Nirmoy

Hi Tvrtko,

On 1/18/2023 10:19 AM, Tvrtko Ursulin wrote:



Hi,

Thanks for working on this, it looks good to me and it aligns with how 
i915 uses the facility.


Copying Mirsad who reported the issue in case he is still happy to 
give it a quick test. Mirsad, I don't know if you are subscribed to 
one of the two mailing lists where series was posted. In case not, you 
can grab both patches from 
https://patchwork.freedesktop.org/series/112952/.


Nirmoy - we also have an IGT written by Chuansheng - 
https://patchwork.freedesktop.org/patch/515720/?series=101035=4. A 
more generic one could be placed in gem_mmap_offset test but this one 
works too in my testing and is IMO better than nothing.



This looks good to me. let's get this merge and I can look into 
improving it at later point.




Finally, let me add some tags below:

On 17/01/2023 17:52, Nirmoy Das wrote:

drm_vma_node_allow() and drm_vma_node_revoke() should be called in
balanced pairs. We call drm_vma_node_allow() once per-file everytime a
user calls mmap_offset, but only call drm_vma_node_revoke once per-file
on each mmap_offset. As the mmap_offset is reused by the client, the
per-file vm_count may remain non-zero and the rbtree leaked.

Call drm_vma_node_allow_once() instead to prevent that memory leak.

Cc: Tvrtko Ursulin 
Cc: Andi Shyti 


Fixes: 786555987207 ("drm/i915/gem: Store mmap_offsets in an rbtree 
rather than a plain list")

Reported-by: Chuansheng Liu 
Reported-by: Mirsad Todorovac 
Cc:  # v5.7+
Reviewed-by: Tvrtko Ursulin 



Thanks for your review and those extra tags.


Nirmoy



Regards,

Tvrtko



Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c

index 4f69bff63068..2aac6bf78740 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -697,7 +697,7 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
  GEM_BUG_ON(lookup_mmo(obj, mmap_type) != mmo);
  out:
  if (file)
-    drm_vma_node_allow(>vma_node, file);
+    drm_vma_node_allow_once(>vma_node, file);
  return mmo;
    err:


Re: [PATCH 2/2] drm_print: Remove deprecated DRM_DEBUG_KMS_RATELIMITED()

2023-01-17 Thread Das, Nirmoy

Hi Alex,

On 1/17/2023 7:06 PM, Alex Deucher wrote:

On Tue, Jan 17, 2023 at 1:05 PM Nirmoy Das  wrote:

There are no current users of DRM_DEBUG_KMS_RATELIMITED()
so remove it.

Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Sam Ravnborg 

Signed-off-by: Nirmoy Das 
Reviewed-by: Sam Ravnborg 

Series is:
Reviewed-by: Alex Deucher 

Feel free to take the patches through whatever tree you want.



Please help me with this, I don't have committer rights for any tree.


Nirmoy




Alex


---
  include/drm/drm_print.h | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index a44fb7ef257f..c3753da97c4e 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -605,9 +605,6 @@ void __drm_err(const char *format, ...);
  #define drm_dbg_kms_ratelimited(drm, fmt, ...) \
 __DRM_DEFINE_DBG_RATELIMITED(KMS, drm, fmt, ## __VA_ARGS__)

-/* NOTE: this is deprecated in favor of drm_dbg_kms_ratelimited(NULL, ...). */
-#define DRM_DEBUG_KMS_RATELIMITED(fmt, ...) drm_dbg_kms_ratelimited(NULL, fmt, 
## __VA_ARGS__)
-
  /*
   * struct drm_device based WARNs
   *
--
2.39.0



Re: [PATCH 1/2] drm/radeon: Do not use deprecated drm log API

2023-01-17 Thread Das, Nirmoy



On 1/17/2023 6:48 PM, Alex Deucher wrote:

On Tue, Jan 17, 2023 at 12:45 PM Nirmoy Das  wrote:

Replace deprecated DRM_DEBUG_KMS_RATELIMITED() and DRM_ERROR()
with proper APIs.

Cc: Alex Deucher 
Cc: Christian König 

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/radeon/radeon_dp_auxch.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_dp_auxch.c 
b/drivers/gpu/drm/radeon/radeon_dp_auxch.c
index 69379b95146e..76ce66efb5f8 100644
--- a/drivers/gpu/drm/radeon/radeon_dp_auxch.c
+++ b/drivers/gpu/drm/radeon/radeon_dp_auxch.c
@@ -158,7 +158,7 @@ radeon_dp_aux_transfer_native(struct drm_dp_aux *aux, 
struct drm_dp_aux_msg *msg
 } while (retry_count++ < 1000);

 if (retry_count >= 1000) {
-   DRM_ERROR("auxch hw never signalled completion, error %08x\n", 
tmp);
+   pr_err("auxch hw never signalled completion, error %08x\n", 
tmp);

Please use dev_err() instead so we get device identification on error
messages.  Makes it much easier when you have multiple GPUs in a
system.



Thanks for your quick review, Alex. I will resend with dev_err().


Nirmoy



Alex


 ret = -EIO;
 goto done;
 }
@@ -168,8 +168,7 @@ radeon_dp_aux_transfer_native(struct drm_dp_aux *aux, 
struct drm_dp_aux_msg *msg
 goto done;
 }
 if (tmp & AUX_RX_ERROR_FLAGS) {
-   DRM_DEBUG_KMS_RATELIMITED("dp_aux_ch flags not zero: %08x\n",
- tmp);
+   drm_dbg_kms_ratelimited(dev, "dp_aux_ch flags not zero: 
%08x\n", tmp);
 ret = -EIO;
 goto done;
 }
--
2.39.0



Re: [PATCH v2] drm/i915/selftests: Unwind hugepages to drop wakeref on error

2023-01-17 Thread Das, Nirmoy

|Reviewed-by: Nirmoy Das |

On 1/17/2023 1:32 PM, Nirmoy Das wrote:

From: Chris Wilson

Make sure that upon error after we have acquired the wakeref we do
release it again.

v2: add another missing "goto out_wf"(Andi).

Fixes: 027c38b4121e ("drm/i915/selftests: Grab the runtime pm in shrink_thp")
Cc: Andi Shyti
Reviewed-by: Matthew Auld
Reviewed-by: Andrzej Hajda
Signed-off-by: Chris Wilson
Signed-off-by: Nirmoy Das
---
  drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index c281b0ec9e05..defece0bcb81 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1855,7 +1855,7 @@ static int igt_shrink_thp(void *arg)
I915_SHRINK_ACTIVE);
i915_vma_unpin(vma);
if (err)
-   goto out_put;
+   goto out_wf;
  
  	/*

 * Now that the pages are *unpinned* shrinking should invoke
@@ -1871,19 +1871,19 @@ static int igt_shrink_thp(void *arg)
pr_err("unexpected pages mismatch, should_swap=%s\n",
   str_yes_no(should_swap));
err = -EINVAL;
-   goto out_put;
+   goto out_wf;
}
  
  	if (should_swap == (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys)) {

pr_err("unexpected residual page-size bits, should_swap=%s\n",
   str_yes_no(should_swap));
err = -EINVAL;
-   goto out_put;
+   goto out_wf;
}
  
  	err = i915_vma_pin(vma, 0, 0, flags);

if (err)
-   goto out_put;
+   goto out_wf;
  
  	while (n--) {

err = cpu_check(obj, n, 0xdeadbeaf);

Re: [PATCH 1/2] drm/print: Add drm_dbg_ratelimited

2023-01-17 Thread Das, Nirmoy

Hi Sam,

On 1/17/2023 3:49 PM, Sam Ravnborg wrote:

Hi Nirmoy

On Tue, Jan 17, 2023 at 12:53:49PM +0100, Nirmoy Das wrote:

Add a function for ratelimitted debug print.

Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Reviewed-by: Matthew Auld 
Signed-off-by: Nirmoy Das 

Thanks for adding this.
The patch as-is is:
Reviewed-by: Sam Ravnborg 

It would have been nice to start adding kernel-doc to the
non-deprecated logging functions. But as everyone else is missing this,
it is OK that we miss it here.

A couple of nice follow-up patches would be to introduce a KMS variant
and replace the only user of DRM_DEBUG_KMS_RATELIMITED with the new
variant and remove the old one.

And maybe even update the remaining *ERROR_RATELIMITED users to a new
variant - and drop the deprecated ones.



Thanks for reviewing this. I can definitely work on your suggested 
follow-up patches.


Nirmoy



Sam


---
  include/drm/drm_print.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index a44fb7ef257f..1d839f507319 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -602,6 +602,9 @@ void __drm_err(const char *format, ...);
drm_dev_printk(drm_ ? drm_->dev : NULL, KERN_DEBUG, fmt, ## 
__VA_ARGS__);\
  })
  
+#define drm_dbg_ratelimited(drm, fmt, ...) \

+   __DRM_DEFINE_DBG_RATELIMITED(DRIVER, drm, fmt, ## __VA_ARGS__)
+
  #define drm_dbg_kms_ratelimited(drm, fmt, ...) \
__DRM_DEFINE_DBG_RATELIMITED(KMS, drm, fmt, ## __VA_ARGS__)
  
--

2.39.0


Re: [PATCH] drm/ttm: fix some minor kerneldoc issues

2023-01-17 Thread Das, Nirmoy

Reviewed-by: Nirmoy Das 

On 1/17/2023 1:33 PM, Christian König wrote:

Pointed out by the kernel test robot while merging ttm_bo_api.h and
ttm_bo_driver.h.

Signed-off-by: Christian König 
Reported-by: kernel test robot 
---
  drivers/gpu/drm/ttm/ttm_bo_util.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index d33bff038d3a..77b50875b99f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -127,9 +127,8 @@ EXPORT_SYMBOL(ttm_move_memcpy);
   * ttm_bo_move_memcpy
   *
   * @bo: A pointer to a struct ttm_buffer_object.
- * @interruptible: Sleep interruptible if waiting.
- * @no_wait_gpu: Return immediately if the GPU is busy.
- * @new_mem: struct ttm_resource indicating where to move.
+ * @ctx: operation context
+ * @dst_mem: struct ttm_resource indicating where to move.
   *
   * Fallback move function for a mappable buffer object in mappable memory.
   * The function will, if successful,
@@ -281,8 +280,8 @@ static int ttm_buffer_object_transfer(struct 
ttm_buffer_object *bo,
  /**
   * ttm_io_prot
   *
- * bo: ttm buffer object
- * res: ttm resource object
+ * @bo: ttm buffer object
+ * @res: ttm resource object
   * @tmp: Page protection flag for a normal, cached mapping.
   *
   * Utility function that returns the pgprot_t that should be used for
@@ -621,7 +620,7 @@ static void ttm_bo_move_pipeline_evict(struct 
ttm_buffer_object *bo,
  }
  
  /**

- * ttm_bo_move_accel_cleanup.
+ * ttm_bo_move_accel_cleanup - cleanup helper for hw copies
   *
   * @bo: A pointer to a struct ttm_buffer_object.
   * @fence: A fence object that signals when moving is complete.
@@ -665,7 +664,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
  EXPORT_SYMBOL(ttm_bo_move_accel_cleanup);
  
  /**

- * ttm_bo_move_sync_cleanup.
+ * ttm_bo_move_sync_cleanup - cleanup by waiting for the move to finish
   *
   * @bo: A pointer to a struct ttm_buffer_object.
   * @new_mem: struct ttm_resource indicating where to move.


Re: [PATCH] drm/i915/selftests: Unwind hugepages to drop wakeref on error

2023-01-17 Thread Das, Nirmoy



On 1/16/2023 7:49 PM, Andi Shyti wrote:

Hi Nirmoy,

On Fri, Jan 13, 2023 at 01:00:53PM +0100, Nirmoy Das wrote:

From: Chris Wilson 

Make sure that upon error after we have acquired the wakeref we do
release it again.

Fixes: 027c38b4121e ("drm/i915/selftests: Grab the runtime pm in shrink_thp")
Reviewed-by: Matthew Auld 
Signed-off-by: Chris Wilson 
Signed-off-by: Nirmoy Das 
Cc:  # v6.0+
---
  drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index c281b0ec9e05..295d6f2cc4ff 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1855,7 +1855,7 @@ static int igt_shrink_thp(void *arg)
I915_SHRINK_ACTIVE);
i915_vma_unpin(vma);
if (err)
-   goto out_put;
+   goto out_wf;
  
  	/*

 * Now that the pages are *unpinned* shrinking should invoke
@@ -1871,7 +1871,7 @@ static int igt_shrink_thp(void *arg)
pr_err("unexpected pages mismatch, should_swap=%s\n",
   str_yes_no(should_swap));
err = -EINVAL;
-   goto out_put;
+   goto out_wf;
}

aren't we missing here one out_put -> out_wf change?

This one:

@@ -1878,7 +1878,7 @@ static int igt_shrink_thp(void *arg)
 pr_err("unexpected residual page-size bits, should_swap=%s\n",
str_yes_no(should_swap));
 err = -EINVAL;
-   goto out_put;
+   goto out_wf;



Thanks for catching this. Yes, we need this too. I will resend.


Nirmoy


 }
  
 err = i915_vma_pin(vma, 0, 0, flags);


Andi

  
  	if (should_swap == (obj->mm.page_sizes.sg || obj->mm.page_sizes.phys)) {

@@ -1883,7 +1883,7 @@ static int igt_shrink_thp(void *arg)
  
  	err = i915_vma_pin(vma, 0, 0, flags);

if (err)
-   goto out_put;
+   goto out_wf;
  
  	while (n--) {

err = cpu_check(obj, n, 0xdeadbeaf);
--
2.39.0


Re: [PATCH] drm/i915/selftests: Unwind hugepages to drop wakeref on error

2023-01-13 Thread Das, Nirmoy



On 1/13/2023 1:05 PM, Matthew Auld wrote:

On 13/01/2023 12:02, Das, Nirmoy wrote:
Thanks Matt, I missed the Fixes tag so resent it with fixes and Cc to 
stable.


I don't think kernel selftests are really stable material. AFAIK it's 
not something normal users care about.



True, in that case please ignore the latest copy of this patch!.


Thanks,

Nirmoy





On 1/13/2023 12:51 PM, Matthew Auld wrote:

On 13/01/2023 11:49, Nirmoy Das wrote:

From: Chris Wilson 

Make sure that upon error after we have acquired the wakeref we do
release it again.

Signed-off-by: Chris Wilson 
Signed-off-by: Nirmoy Das 

Reviewed-by: Matthew Auld 


Re: [PATCH] drm/i915/selftests: Unwind hugepages to drop wakeref on error

2023-01-13 Thread Das, Nirmoy
Thanks Matt, I missed the Fixes tag so resent it with fixes and Cc to 
stable.


On 1/13/2023 12:51 PM, Matthew Auld wrote:

On 13/01/2023 11:49, Nirmoy Das wrote:

From: Chris Wilson 

Make sure that upon error after we have acquired the wakeref we do
release it again.

Signed-off-by: Chris Wilson 
Signed-off-by: Nirmoy Das 

Reviewed-by: Matthew Auld 


Re: [Intel-gfx] [PATCH v3 11/11] drm/i915: replace Intel internal tracker with kernel core ref_tracker

2023-01-06 Thread Das, Nirmoy

Hi Andrzej,

On 2/22/2022 12:25 AM, Andrzej Hajda wrote:

Beside reusing existing code, the main advantage of ref_tracker is
tracking per instance of wakeref. It allows also to catch double
put.
On the other side we lose information about the first acquire and
the last release, but the advantages outweigh it.

Signed-off-by: Andrzej Hajda 
Reviewed-by: Chris Wilson 
---
  drivers/gpu/drm/i915/Kconfig.debug|  11 +-
  drivers/gpu/drm/i915/Makefile |   3 -
  .../drm/i915/display/intel_display_power.c|   2 +-
  drivers/gpu/drm/i915/gt/intel_engine_pm.c |   2 +-
  drivers/gpu/drm/i915/gt/intel_gt_pm.c |   2 +-
  drivers/gpu/drm/i915/intel_runtime_pm.c   |  25 +-
  drivers/gpu/drm/i915/intel_runtime_pm.h   |   2 +-
  drivers/gpu/drm/i915/intel_wakeref.c  |   8 +-
  drivers/gpu/drm/i915/intel_wakeref.h  |  72 +-
  drivers/gpu/drm/i915/intel_wakeref_tracker.c  | 234 --
  drivers/gpu/drm/i915/intel_wakeref_tracker.h  |  76 --
  11 files changed, 87 insertions(+), 350 deletions(-)
  delete mode 100644 drivers/gpu/drm/i915/intel_wakeref_tracker.c
  delete mode 100644 drivers/gpu/drm/i915/intel_wakeref_tracker.h

diff --git a/drivers/gpu/drm/i915/Kconfig.debug 
b/drivers/gpu/drm/i915/Kconfig.debug
index 3bdc73f30a9e1..6c57f3e265f20 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -32,6 +32,7 @@ config DRM_I915_DEBUG
select DEBUG_FS
select PREEMPT_COUNT
select I2C_CHARDEV
+   select REF_TRACKER
select STACKDEPOT
select STACKTRACE
select DRM_DP_AUX_CHARDEV
@@ -46,7 +47,6 @@ config DRM_I915_DEBUG
select DRM_I915_DEBUG_GEM
select DRM_I915_DEBUG_GEM_ONCE
select DRM_I915_DEBUG_MMIO
-   select DRM_I915_TRACK_WAKEREF
select DRM_I915_DEBUG_RUNTIME_PM
select DRM_I915_DEBUG_WAKEREF
select DRM_I915_SW_FENCE_DEBUG_OBJECTS
@@ -238,18 +238,13 @@ config DRM_I915_DEBUG_VBLANK_EVADE
  
  	  If in doubt, say "N".
  
-config DRM_I915_TRACK_WAKEREF

-   depends on STACKDEPOT
-   depends on STACKTRACE
-   bool
-
  config DRM_I915_DEBUG_RUNTIME_PM
bool "Enable extra state checking for runtime PM"
depends on DRM_I915
default n
+   select REF_TRACKER
select STACKDEPOT
select STACKTRACE
-   select DRM_I915_TRACK_WAKEREF
help
  Choose this option to turn on extra state checking for the
  runtime PM functionality. This may introduce overhead during
@@ -263,9 +258,9 @@ config DRM_I915_DEBUG_WAKEREF
bool "Enable extra tracking for wakerefs"
depends on DRM_I915
default n
+   select REF_TRACKER
select STACKDEPOT
select STACKTRACE
-   select DRM_I915_TRACK_WAKEREF
help
  Choose this option to turn on extra state checking and usage
  tracking for the wakerefPM functionality. This may introduce
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 88a403d3294cb..1f8d71430e2e6 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -76,9 +76,6 @@ i915-$(CONFIG_DEBUG_FS) += \
display/intel_display_debugfs.o \
display/intel_pipe_crc.o
  
-i915-$(CONFIG_DRM_I915_TRACK_WAKEREF) += \

-   intel_wakeref_tracker.o
-
  i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
  
  # "Graphics Technology" (aka we talk to the gpu)

diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c 
b/drivers/gpu/drm/i915/display/intel_display_power.c
index 9ebae7ac32356..0e1bf724f89b5 100644
--- a/drivers/gpu/drm/i915/display/intel_display_power.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
@@ -2107,7 +2107,7 @@ print_async_put_domains_state(struct i915_power_domains 
*power_domains)
 struct drm_i915_private,
 power_domains);
  
-	drm_dbg(>drm, "async_put_wakeref %u\n",

+   drm_dbg(>drm, "async_put_wakeref %lu\n",
power_domains->async_put_wakeref);
  
  	print_power_domains(power_domains, "async_put_domains[0]",

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 52e46e7830ff5..cf8cc348942cb 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -273,7 +273,7 @@ void intel_engine_init__pm(struct intel_engine_cs *engine)
  {
struct intel_runtime_pm *rpm = engine->uncore->rpm;
  
-	intel_wakeref_init(>wakeref, rpm, _ops);

+   intel_wakeref_init(>wakeref, rpm, _ops, engine->name);
intel_engine_init_heartbeat(engine);
  }
  
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c

index 7ee65a93f926f..01a055d0d0989 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -129,7 +129,7 @@ 

Re: [PATCH v2 2/2] drm/i915: Never return 0 if not all requests retired

2022-11-18 Thread Das, Nirmoy



On 11/18/2022 11:42 AM, Janusz Krzysztofik wrote:

Users of intel_gt_retire_requests_timeout() expect 0 return value on
success.  However, we have no protection from passing back 0 potentially
returned by a call to dma_fence_wait_timeout() when it succedes right
after its timeout has expired.

Replace 0 with -ETIME before potentially using the timeout value as return
code, so -ETIME is returned if there are still some requests not retired
after timeout, 0 otherwise.

v2: Move the added lines down so flush_submission() is not affected.

Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
Signed-off-by: Janusz Krzysztofik 
Cc: sta...@vger.kernel.org # v5.5+
---
  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..3ac4603eeb4ee 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -199,6 +199,9 @@ out_active: spin_lock(>lock);
if (remaining_timeout)
*remaining_timeout = timeout;
  
+	if (!timeout)

+   timeout = -ETIME;


This will return error, -ETIME when 0 timeout is passed, 
intel_gt_retire_requests().


We don't want that. I think you can use a separate variable to store 
return val from the dma_fence_wait_timeout()



Regards,

Nirmoy


+
return active_count ? timeout : 0;
  }
  


Re: [Intel-gfx] [PATCH 1/3] drm/i915: Fix negative remaining time after retire requests

2022-11-17 Thread Das, Nirmoy

On 11/16/2022 12:25 PM, Janusz Krzysztofik wrote:


Commit b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work
with GuC") extended the API of intel_gt_retire_requests_timeout() with an
extra argument 'remaining_timeout', intended for passing back unconsumed
portion of requested timeout when 0 (success) is returned.  However, when
request retirement happens to succeed despite an error returned by
dma_fence_wait_timeout(), the error code (a negative value) is passed back
instead of remaining time.  If a user then passes that negative value
forward as requested timeout to another wait, an explicit WARN or BUG can
be triggered.

Instead of copying the value of timeout variable to *remaining_timeout
before return, update the *remaining_timeout after each DMA fence wait.



Thanks for the detailed comment, indeed we were not accounting for the 
return value of dma_fence_wait_timeout()


Acked-by: Nirmoy Das 


Thanks,

Nirmoy



Set it to 0 on -ETIME, -EINTR or -ERESTARTSYS, and assume no time has been
consumed on other errors returned from the wait.

Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with 
GuC")
Signed-off-by: Janusz Krzysztofik 
Cc: sta...@vger.kernel.org # v5.15+
---
  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 23 ++---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..ccaf2fd80625b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -138,6 +138,9 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, 
long timeout,
unsigned long active_count = 0;
LIST_HEAD(free);
  
+	if (remaining_timeout)

+   *remaining_timeout = timeout;
+
flush_submission(gt, timeout); /* kick the ksoftirqd tasklets */
spin_lock(>lock);
list_for_each_entry_safe(tl, tn, >active_list, link) {
@@ -163,6 +166,23 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, 
long timeout,
 timeout);
dma_fence_put(fence);
  
+if (remaining_timeout) {

+   /*
+* If we get an error here but request
+* retirement succeeds anyway
+* (!active_count) and we return 0, the
+* caller may want to spend remaining
+* time on waiting for other events.
+*/
+   if (timeout == -ETIME ||
+   timeout == -EINTR ||
+   timeout == -ERESTARTSYS)
+   *remaining_timeout = 0;
+   else if (timeout >= 0)
+   *remaining_timeout = timeout;
+   /* else assume no time consumed */
+   }
+
/* Retirement is best effort */
if (!mutex_trylock(>mutex)) {
active_count++;
@@ -196,9 +216,6 @@ out_active: spin_lock(>lock);
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
  
-	if (remaining_timeout)

-   *remaining_timeout = timeout;
-
return active_count ? timeout : 0;
  }
  


Re: [Intel-gfx] [PATCH 2/3] drm/i915: Never return 0 on timeout when retiring requests

2022-11-17 Thread Das, Nirmoy

Looks very relevant to  our recent hangcheck failures.


Acked-by: Nirmoy Das 

On 11/16/2022 12:25 PM, Janusz Krzysztofik wrote:

Users of intel_gt_retire_requests_timeout() expect 0 return value on
success.  However, we have no protection from passing back 0 potentially
returned by dma_fence_wait_timeout() on timeout.

Replace 0 with -ETIME before using timeout as return value.

Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
Signed-off-by: Janusz Krzysztofik 
Cc: sta...@vger.kernel.org # v5.5+
---
  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index ccaf2fd80625b..ac6b2b1861397 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -213,6 +213,9 @@ out_active: spin_lock(>lock);
list_for_each_entry_safe(tl, tn, , link)
__intel_timeline_free(>kref);
  
+	if (!timeout)

+   timeout = -ETIME;
+
if (flush_submission(gt, timeout)) /* Wait, there's more! */
active_count++;
  


Re: [PATCH] drm/i915/guc: don't hardcode BCS0 in guc_hang selftest

2022-11-03 Thread Das, Nirmoy

LGTM Acked-by: Nirmoy Das 

On 11/2/2022 10:43 PM, Daniele Ceraolo Spurio wrote:

On MTL there are no BCS engines on the media GT, so we can't always use
BCS0 in the test. There is no actual reason to use a BCS engine over an
engine of a different class, so switch to using any available engine.

Signed-off-by: Daniele Ceraolo Spurio 
Cc: John Harrison 
---
  drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c 
b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
index 01f8cd3c3134..d91b58f70403 100644
--- a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
@@ -35,11 +35,14 @@ static int intel_hang_guc(void *arg)
struct i915_request *rq;
intel_wakeref_t wakeref;
struct i915_gpu_error *global = >i915->gpu_error;
-   struct intel_engine_cs *engine;
+   struct intel_engine_cs *engine = intel_selftest_find_any_engine(gt);
unsigned int reset_count;
u32 guc_status;
u32 old_beat;
  
+	if (!engine)

+   return 0;
+
ctx = kernel_context(gt->i915, NULL);
if (IS_ERR(ctx)) {
drm_err(>i915->drm, "Failed get kernel context: %ld\n", 
PTR_ERR(ctx));
@@ -48,14 +51,13 @@ static int intel_hang_guc(void *arg)
  
  	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
  
-	ce = intel_context_create(gt->engine[BCS0]);

+   ce = intel_context_create(engine);
if (IS_ERR(ce)) {
ret = PTR_ERR(ce);
drm_err(>i915->drm, "Failed to create spinner request: 
%d\n", ret);
goto err;
}
  
-	engine = ce->engine;

reset_count = i915_reset_count(global);
  
  	old_beat = engine->props.heartbeat_interval_ms;


Re: [PATCH] drm/i915: Refactor ttm ghost obj detection

2022-10-14 Thread Das, Nirmoy



On 10/14/2022 4:58 PM, Matthew Auld wrote:

On 14/10/2022 14:14, Nirmoy Das wrote:

Currently i915_ttm_to_gem() returns NULL for ttm ghost
object which makes it unclear when we should add a NULL
check for a caller of i915_ttm_to_gem() as ttm ghost
objects are expected behaviour for certain cases.

Create a separate function to detect ttm ghost object and
use that in places where we expect a ghost obj from ttm.

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 21 ++--
  drivers/gpu/drm/i915/gem/i915_gem_ttm.h  | 18 -
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  2 +-
  3 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index 6b60b99461e2..0a85651c654d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -279,7 +279,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct 
ttm_buffer_object *bo,

  struct i915_ttm_tt *i915_tt;
  int ret;
  -    if (!obj)
+    if (i915_ttm_is_ghost_object(bo))
  return NULL;
    i915_tt = kzalloc(sizeof(*i915_tt), GFP_KERNEL);
@@ -362,7 +362,7 @@ static bool i915_ttm_eviction_valuable(struct 
ttm_buffer_object *bo,

  {
  struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
  -    if (!obj)
+    if (i915_ttm_is_ghost_object(bo))
  return false;
    /*
@@ -511,7 +511,7 @@ static void i915_ttm_delete_mem_notify(struct 
ttm_buffer_object *bo)

  struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
  intel_wakeref_t wakeref = 0;
  -    if (bo->resource && likely(obj)) {
+    if (bo->resource && !i915_ttm_is_ghost_object(bo)) {
  /* ttm_bo_release() already has dma_resv_lock */
  if (i915_ttm_cpu_maps_iomem(bo->resource))
  wakeref = 
intel_runtime_pm_get(_i915(obj->base.dev)->runtime_pm);
@@ -624,7 +624,7 @@ static void i915_ttm_swap_notify(struct 
ttm_buffer_object *bo)

  struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
  int ret;
  -    if (!obj)
+    if (i915_ttm_is_ghost_object(bo))
  return;
    ret = i915_ttm_move_notify(bo);
@@ -657,7 +657,7 @@ static int i915_ttm_io_mem_reserve(struct 
ttm_device *bdev, struct ttm_resource

  struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo);
  bool unknown_state;
  -    if (!obj)
+    if (i915_ttm_is_ghost_object(mem->bo))
  return -EINVAL;
    if (!kref_get_unless_zero(>base.refcount))
@@ -690,7 +690,7 @@ static unsigned long i915_ttm_io_mem_pfn(struct 
ttm_buffer_object *bo,

  unsigned long base;
  unsigned int ofs;
  -    GEM_BUG_ON(!obj);
+    GEM_BUG_ON(i915_ttm_is_ghost_object(bo));
  GEM_WARN_ON(bo->ttm);
    base = obj->mm.region->iomap.base - 
obj->mm.region->region.start;
@@ -1035,13 +1035,12 @@ static vm_fault_t vm_fault_ttm(struct 
vm_fault *vmf)

  struct vm_area_struct *area = vmf->vma;
  struct ttm_buffer_object *bo = area->vm_private_data;
  struct drm_device *dev = bo->base.dev;
-    struct drm_i915_gem_object *obj;
+    struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
  intel_wakeref_t wakeref = 0;
  vm_fault_t ret;
  int idx;
  -    obj = i915_ttm_to_gem(bo);
-    if (!obj)
+    if (i915_ttm_is_ghost_object(bo))
  return VM_FAULT_SIGBUS;


I think this one can be dropped, maybe in a separate patch?



Yes, I can send a patch to fix that up.



Otherwise looks good to me,
Reviewed-by: Matthew Auld 



Thanks,


Nirmoy




    /* Sanity check that we allow writing into this object */
@@ -1141,7 +1140,7 @@ static void ttm_vm_open(struct vm_area_struct 
*vma)

  struct drm_i915_gem_object *obj =
  i915_ttm_to_gem(vma->vm_private_data);
  -    GEM_BUG_ON(!obj);
+ GEM_BUG_ON(i915_ttm_is_ghost_object(vma->vm_private_data));
  i915_gem_object_get(obj);
  }
  @@ -1150,7 +1149,7 @@ static void ttm_vm_close(struct 
vm_area_struct *vma)

  struct drm_i915_gem_object *obj =
  i915_ttm_to_gem(vma->vm_private_data);
  -    GEM_BUG_ON(!obj);
+ GEM_BUG_ON(i915_ttm_is_ghost_object(vma->vm_private_data));
  i915_gem_object_put(obj);
  }
  diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h

index e4842b4296fc..2a94a99ef76b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -27,19 +27,27 @@ i915_gem_to_ttm(struct drm_i915_gem_object *obj)
   */
  void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
  +/**
+ * i915_ttm_is_ghost_object - Check if the ttm bo is a ghost object.
+ * @bo: Pointer to the ttm buffer object
+ *
+ * Return: True if the ttm bo is not a i915 object but a ghost ttm 
object,

+ * False otherwise.
+ */
+static inline bool i915_ttm_is_ghost_object(struct ttm_buffer_object 
*bo)

+{
+    return bo->destroy != i915_ttm_bo_destroy;
+}
+
  /**
   * i915_ttm_to_gem - Convert a struct 

Re: [PATCH] drm/i915/slpc: Use platform limits for min/max frequency

2022-10-13 Thread Das, Nirmoy



On 10/12/2022 8:26 PM, Vinay Belgaumkar wrote:

GuC will set the min/max frequencies to theoretical max on
ATS-M. This will break kernel ABI, so limit min/max frequency
to RP0(platform max) instead.

Also modify the SLPC selftest to update the min frequency
when we have a server part so that we can iterate between
platform min and max.

Signed-off-by: Vinay Belgaumkar 
---
  drivers/gpu/drm/i915/gt/selftest_slpc.c   | 40 +--
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 29 ++
  .../gpu/drm/i915/gt/uc/intel_guc_slpc_types.h |  3 ++
  3 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_slpc.c 
b/drivers/gpu/drm/i915/gt/selftest_slpc.c
index 4c6e9257e593..1f84362af737 100644
--- a/drivers/gpu/drm/i915/gt/selftest_slpc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_slpc.c
@@ -234,6 +234,7 @@ static int run_test(struct intel_gt *gt, int test_type)
enum intel_engine_id id;
struct igt_spinner spin;
u32 slpc_min_freq, slpc_max_freq;
+   u32 saved_min_freq;
int err = 0;
  
  	if (!intel_uc_uses_guc_slpc(>uc))

@@ -252,20 +253,35 @@ static int run_test(struct intel_gt *gt, int test_type)
return -EIO;
}
  
-	/*

-* FIXME: With efficient frequency enabled, GuC can request
-* frequencies higher than the SLPC max. While this is fixed
-* in GuC, we level set these tests with RPn as min.
-*/
-   err = slpc_set_min_freq(slpc, slpc->min_freq);
-   if (err)
-   return err;
-
if (slpc->min_freq == slpc->rp0_freq) {
-   pr_err("Min/Max are fused to the same value\n");
-   return -EINVAL;
+   /* Servers will have min/max clamped to RP0 */



This should be "server parts". Tested the patch with Riana's suggested 
changes.


Acked-by: Nirmoy Das  with above changes.


Nirmoy


+   if (slpc->min_is_rpmax) {
+   err = slpc_set_min_freq(slpc, slpc->min_freq);
+   if (err) {
+   pr_err("Unable to update min freq on server 
part");
+   return err;
+   }
+
+   } else {
+   pr_err("Min/Max are fused to the same value\n");
+   return -EINVAL;
+   }
+   } else {
+   /*
+* FIXME: With efficient frequency enabled, GuC can request
+* frequencies higher than the SLPC max. While this is fixed
+* in GuC, we level set these tests with RPn as min.
+*/
+   err = slpc_set_min_freq(slpc, slpc->min_freq);
+   if (err)
+   return err;
}
  
+	saved_min_freq = slpc_min_freq;

+
+   /* New temp min freq = RPn */
+   slpc_min_freq = slpc->min_freq;
+
intel_gt_pm_wait_for_idle(gt);
intel_gt_pm_get(gt);
for_each_engine(engine, gt, id) {
@@ -347,7 +363,7 @@ static int run_test(struct intel_gt *gt, int test_type)
  
  	/* Restore min/max frequencies */

slpc_set_max_freq(slpc, slpc_max_freq);
-   slpc_set_min_freq(slpc, slpc_min_freq);
+   slpc_set_min_freq(slpc, saved_min_freq);
  
  	if (igt_flush_test(gt->i915))

err = -EIO;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
index fdd895f73f9f..11613d373a49 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
@@ -263,6 +263,7 @@ int intel_guc_slpc_init(struct intel_guc_slpc *slpc)
  
  	slpc->max_freq_softlimit = 0;

slpc->min_freq_softlimit = 0;
+   slpc->min_is_rpmax = false;
  
  	slpc->boost_freq = 0;

atomic_set(>num_waiters, 0);
@@ -588,6 +589,31 @@ static int slpc_set_softlimits(struct intel_guc_slpc *slpc)
return 0;
  }
  
+static bool is_slpc_min_freq_rpmax(struct intel_guc_slpc *slpc)

+{
+   int slpc_min_freq;
+
+   if (intel_guc_slpc_get_min_freq(slpc, _min_freq))
+   return false;
+
+   if (slpc_min_freq > slpc->rp0_freq)
+   return true;
+   else
+   return false;
+}
+
+static void update_server_min_softlimit(struct intel_guc_slpc *slpc)
+{
+   /* For server parts, SLPC min will be at RPMax.
+* Use min softlimit to clamp it to RP0 instead.
+*/
+   if (is_slpc_min_freq_rpmax(slpc) &&
+   !slpc->min_freq_softlimit) {
+   slpc->min_is_rpmax = true;
+   slpc->min_freq_softlimit = slpc->rp0_freq;
+   }
+}
+
  static int slpc_use_fused_rp0(struct intel_guc_slpc *slpc)
  {
/* Force SLPC to used platform rp0 */
@@ -647,6 +673,9 @@ int intel_guc_slpc_enable(struct intel_guc_slpc *slpc)
  
  	slpc_get_rp_values(slpc);
  
+	/* Handle the case where min=max=RPmax */

+   update_server_min_softlimit(slpc);
+
/* Set SLPC max limit 

Re: [PATCH v3 2/2] drm/i915/uapi: expose GTT alignment

2022-10-04 Thread Das, Nirmoy



On 10/4/2022 1:49 PM, Matthew Auld wrote:

On some platforms we potentially have different alignment restrictions
depending on the memory type. We also now have different alignment
restrictions for the same region across different kernel versions.
Extend the region query to return the minimum required GTT alignment.

Testcase: igt@gem_create@create-ext-placement-alignment
Testcase: igt@i915_query@query-regions-sanity-check
Suggested-by: Lionel Landwerlin 
Signed-off-by: Matthew Auld 
Cc: Michal Mrozek 
Cc: Thomas Hellström 
Cc: Stuart Summers 
Cc: Jordan Justen 
Cc: Yang A Shi 
Cc: Nirmoy Das 


Reviewed-by: Nirmoy Das 



Cc: Niranjana Vishwanathapura 
---
  drivers/gpu/drm/i915/i915_query.c |  1 +
  include/uapi/drm/i915_drm.h   | 29 +++--
  2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 6ec9c9fb7b0d..111377f210ed 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -498,6 +498,7 @@ static int query_memregion_info(struct drm_i915_private 
*i915,
info.region.memory_class = mr->type;
info.region.memory_instance = mr->instance;
info.probed_size = mr->total;
+   info.gtt_alignment = mr->min_page_size;
  
  		if (mr->type == INTEL_MEMORY_LOCAL)

info.probed_cpu_visible_size = mr->io_size;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 08d69e36fb66..2e613109356b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3346,8 +3346,33 @@ struct drm_i915_memory_region_info {
/** @region: The class:instance pair encoding */
struct drm_i915_gem_memory_class_instance region;
  
-	/** @rsvd0: MBZ */

-   __u32 rsvd0;
+   union {
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+   /**
+* @gtt_alignment:
+*
+* The minimum required GTT alignment for this type of memory.
+* When allocating a GTT address it must be aligned to this
+* value or larger. On some platforms the kernel might opt to
+* using 64K pages for I915_MEMORY_CLASS_DEVICE, where 64K GTT
+* pages can then be used if we also use 64K GTT alignment.
+*
+* NOTE: If this is zero then this must be an older
+* kernel which lacks support for this field.
+*
+* Side note: For larger objects (especially for
+* I915_MEMORY_CLASS_DEVICE), like 2M+ in size, userspace should
+* consider potentially bumping the GTT alignment to say 2M,
+* which could potentially increase the likelihood of the kernel
+* being able to utilise 2M GTT pages underneath, if the layout
+* of the physical pages allows it.  On some configurations we
+* can then also use a more efficient page-table layout, if we
+* can't use the more desirable 2M GTT page, so long as we know
+* that the entire page-table will be used by this object.
+*/
+   __u32 gtt_alignment;
+   };
  
  	/**

 * @probed_size: Memory probed by the driver


Re: [PATCH v2 1/2] drm/i915: enable PS64 support for DG2

2022-09-28 Thread Das, Nirmoy



On 9/27/2022 5:39 PM, Matthew Auld wrote:

It turns out that on production DG2/ATS HW we should have support for
PS64. This feature allows to provide a 64K TLB hint at the PTE level,
which is a lot more flexible than the current method of enabling 64K GTT
pages for the entire page-table, since that leads to all kinds of
annoying restrictions, as documented in:

commit caa574ffc4aaf4f29b890223878c63e2e7772f62
Author: Matthew Auld 
Date:   Sat Feb 19 00:17:49 2022 +0530

 drm/i915/uapi: document behaviour for DG2 64K support

 On discrete platforms like DG2, we need to support a minimum page size
 of 64K when dealing with device local-memory. This is quite tricky for
 various reasons, so try to document the new implicit uapi for this.

With PS64, we can now drop the 2M GTT alignment restriction, and instead
only require 64K or larger when dealing with lmem. We still use the
compact-pt layout when possible, but only when we are certain that this
doesn't interfere with userspace.

Note that this is a change in uAPI behaviour, but hopefully shouldn't be
a concern (IGT is at least able to autodetect the alignment), since we
are only making the GTT alignment constraint less restrictive.

Based on a patch from CQ Tang.

v2: update the comment wrt scratch page

Reported-by: Michal Mrozek 
Signed-off-by: Matthew Auld 
Cc: Lionel Landwerlin 
Cc: Thomas Hellström 
Cc: Stuart Summers 
Cc: Jordan Justen 
Cc: Yang A Shi 
Cc: Nirmoy Das 
---
  .../gpu/drm/i915/gem/selftests/huge_pages.c   | 159 +-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  81 +
  drivers/gpu/drm/i915/gt/intel_gtt.c   |  21 +--
  drivers/gpu/drm/i915/gt/intel_gtt.h   |   1 +
  drivers/gpu/drm/i915/i915_drv.h   |   7 -
  drivers/gpu/drm/i915/i915_pci.c   |   2 -
  drivers/gpu/drm/i915/i915_vma.c   |   9 +-
  drivers/gpu/drm/i915/intel_device_info.h  |   1 -
  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   9 +-
  include/uapi/drm/i915_drm.h   |  36 ++--
  10 files changed, 220 insertions(+), 106 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index c570cf780079..cc26c1293208 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1161,7 +1161,8 @@ static int igt_write_huge(struct drm_i915_private *i915,
GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
  
  	size = obj->base.size;

-   if (obj->mm.page_sizes.sg & I915_GTT_PAGE_SIZE_64K)
+   if (obj->mm.page_sizes.sg & I915_GTT_PAGE_SIZE_64K &&
+   !HAS_64K_PAGES(i915))
size = round_up(size, I915_GTT_PAGE_SIZE_2M);
  
  	n = 0;

@@ -1214,6 +1215,10 @@ static int igt_write_huge(struct drm_i915_private *i915,
 * size and ensure the vma offset is at the start of the pt
 * boundary, however to improve coverage we opt for testing both
 * aligned and unaligned offsets.
+*
+* With PS64 this is no longer the case, but to ensure we
+* sometimes get the compact layout for smaller objects, apply
+* the round_up anyway.
 */
if (obj->mm.page_sizes.sg & I915_GTT_PAGE_SIZE_64K)
offset_low = round_down(offset_low,
@@ -1411,6 +1416,7 @@ static int igt_ppgtt_sanity_check(void *arg)
{ SZ_2M + SZ_4K,SZ_64K | SZ_4K  },
{ SZ_2M + SZ_4K,SZ_2M  | SZ_4K  },
{ SZ_2M + SZ_64K,   SZ_2M  | SZ_64K },
+   { SZ_2M + SZ_64K,   SZ_64K  },
};
int i, j;
int err;
@@ -1540,6 +1546,156 @@ static int igt_ppgtt_compact(void *arg)
return err;
  }
  
+static int igt_ppgtt_mixed(void *arg)

+{
+   struct drm_i915_private *i915 = arg;
+   const unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+   struct drm_i915_gem_object *obj, *on;
+   struct i915_gem_engines *engines;
+   struct i915_gem_engines_iter it;
+   struct i915_address_space *vm;
+   struct i915_gem_context *ctx;
+   struct intel_context *ce;
+   struct file *file;
+   I915_RND_STATE(prng);
+   LIST_HEAD(objects);
+   struct intel_memory_region *mr;
+   struct i915_vma *vma;
+   unsigned int count;
+   u32 i, rem, addr;
+   int *order;
+   int n, err;
+
+   /*
+* Sanity check mixing 4K and 64K pages within the same page-table via
+* the new PS64 TLB hint.
+*/
+
+   if (!HAS_64K_PAGES(i915)) {
+   pr_info("device lacks PS64, skipping\n");
+   return 0;
+   }
+
+   file = mock_file(i915);
+   if (IS_ERR(file))
+   return PTR_ERR(file);
+
+   ctx = hugepage_ctx(i915, file);
+   if (IS_ERR(ctx)) {
+   err = PTR_ERR(ctx);
+   

Re: [PATCH] drm/i915/selftests: Remove flush_scheduled_work() from live_execlists

2022-09-23 Thread Das, Nirmoy

Reviewed-by: Nirmoy Das 

On 6/30/2022 2:57 PM, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

There are ongoing efforts to remove usages of flush_scheduled_work() from
drivers in order to avoid several cases of potentential problems when
flushing is done from certain contexts.

Remove the call from the live_execlists selftest. Its purpose was to be
thorough and sync with the execlists capture state handling, but that is
not strictly required for the test to function and can be removed.

Signed-off-by: Tvrtko Ursulin 
Cc: Tetsuo Handa 
---
  drivers/gpu/drm/i915/gt/selftest_execlists.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 09f8cd2d0e2c..e62d089257ae 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -85,8 +85,6 @@ static int wait_for_reset(struct intel_engine_cs *engine,
break;
} while (time_before(jiffies, timeout));
  
-	flush_scheduled_work();

-
if (rq->fence.error != -EIO) {
pr_err("%s: hanging request %llx:%lld not reset\n",
   engine->name,


Re: [PATCH] drm/i915: Improve debug print in vm_fault_ttm

2022-09-23 Thread Das, Nirmoy



On 9/22/2022 6:38 PM, Matthew Auld wrote:

On 22/09/2022 13:09, Nirmoy Das wrote:

Print the error code returned by __i915_ttm_migrate()
for better debuggability.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6889
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index e3fc38dd5db0..9619c0fe1025 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1034,7 +1034,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault 
*vmf)

  }
    if (err) {
-    drm_dbg(dev, "Unable to make resource CPU accessible\n");
+    drm_dbg(dev, "Unable to make resource CPU accessible(err 
= %pe)\n", err);


Yeah, looks useful. I think for that bug the object is just too large 
for the mappable part of lmem, so this just gives -2big or similar on 
small-bar systems. I presume that the test needs to be updated to 
account for the cpu_size or so.



Yeah, can't think of any other case. The test need to be updated, going 
to send out igt fixes for this.




With the kernel test robot warning fixed:
Acked-by: Matthew Auld 



Thanks, I will resend a updated one.



I looked at the GEM_BUG_ON(rq->reserved_space > ring->space), and I 
think the issue is maybe with emit_pte() using the ring->space to 
manually figure out the number of dwords it can emit (instead of the 
usual ring_begin()), which I guess works, but if we are unlucky and 
get interrupted (like with a very well timed sigbus here), while 
waiting for more ring space and end up bailing early, we might have 
trampled over the reserved_space when submitting the request. I guess 
normally the next ring_begin() would take care of the reserved_space, 
like when constructing the actual copy packet.



I am not so familiar with the code but sounds logical.


Nirmoy




dma_resv_unlock(bo->base.resv);
  ret = VM_FAULT_SIGBUS;
  goto out_rpm;


Re: [Intel-gfx] [PATCH] drm/i915: Do not cleanup obj with NULL bo->resource

2022-09-21 Thread Das, Nirmoy

Hi Matt

On 9/20/2022 7:06 PM, Nirmoy Das wrote:

For delayed BO release i915_ttm_delete_mem_notify()
gets called twice, once with proper bo->resource and
another time with NULL. We shouldn't do anything for
the 2nd time as we already cleanedup the obj once.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6850



Please add the below Fixes before merging, I missed that.

Fixes: ad74457a6b5a96 ("drm/i915/dgfx: Release mmap on rpm suspend")

Thanks,
Nirmoy


Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 0544b0a4a43a..e3fc38dd5db0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -511,7 +511,7 @@ static void i915_ttm_delete_mem_notify(struct 
ttm_buffer_object *bo)
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
intel_wakeref_t wakeref = 0;
  
-	if (likely(obj)) {

+   if (bo->resource && likely(obj)) {
/* ttm_bo_release() already has dma_resv_lock */
if (i915_ttm_cpu_maps_iomem(bo->resource))
wakeref = 
intel_runtime_pm_get(_i915(obj->base.dev)->runtime_pm);


Re: [PATCH] drm/i915: Do not dereference NULL bo->resource

2022-09-20 Thread Das, Nirmoy



On 9/19/2022 5:29 PM, Gupta, Anshuman wrote:



-Original Message-
From: Das, Nirmoy 
Sent: Monday, September 19, 2022 8:33 PM
To: intel-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org; Auld, Matthew
; Gupta, Anshuman 
Subject: [PATCH] drm/i915: Do not dereference NULL bo->resource

bo->resource could be NULL hence add a NULL check for resource before
bo->dereferencing it.

Will bo->resource will be NULL only in case of object is smem or it can be NULL 
even in lmem case as well ?



It can happen with lmem too. I think we should just use 
i915_gem_object_is_lmem() instead of i915_ttm_cpu_maps_iomem here.



Nirmoy


Thanks,
Anshuman Gupta.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6850
Fixes: ad74457a6b5a96 ("drm/i915/dgfx: Release mmap on rpm suspend")
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 0544b0a4a43a..8608801cd9ab 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -513,7 +513,7 @@ static void i915_ttm_delete_mem_notify(struct
ttm_buffer_object *bo)

if (likely(obj)) {
/* ttm_bo_release() already has dma_resv_lock */
-   if (i915_ttm_cpu_maps_iomem(bo->resource))
+   if (bo->resource && i915_ttm_cpu_maps_iomem(bo-

resource))

wakeref = intel_runtime_pm_get(_i915(obj-

base.dev)->runtime_pm);

__i915_gem_object_pages_fini(obj);
--
2.37.3


Re: [PATCH] drm/i915: consider HAS_FLAT_CCS() in needs_ccs_pages

2022-09-05 Thread Das, Nirmoy

LGTM Reviewed-by: Nirmoy Das 

On 9/5/2022 12:53 PM, Matthew Auld wrote:

Just move the HAS_FLAT_CCS() check into needs_ccs_pages. This also then
fixes i915_ttm_memcpy_allowed() which was incorrectly reporting true on
DG1, even though it doesn't have small-BAR or flat-CCS.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6605
Fixes: efeb3caf4341 ("drm/i915/ttm: disallow CPU fallback mode for ccs pages")
Signed-off-by: Matthew Auld 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_object.c | 3 +++
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 2 +-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 389e9f157ca5..85482a04d158 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -723,6 +723,9 @@ bool i915_gem_object_needs_ccs_pages(struct 
drm_i915_gem_object *obj)
bool lmem_placement = false;
int i;
  
+	if (!HAS_FLAT_CCS(to_i915(obj->base.dev)))

+   return false;
+
for (i = 0; i < obj->mm.n_placements; i++) {
/* Compression is not allowed for the objects with smem 
placement */
if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index bc9c432edffe..f64a3deb12fc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -297,7 +297,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct 
ttm_buffer_object *bo,
i915_tt->is_shmem = true;
}
  
-	if (HAS_FLAT_CCS(i915) && i915_gem_object_needs_ccs_pages(obj))

+   if (i915_gem_object_needs_ccs_pages(obj))
ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
  NUM_BYTES_PER_CCS_BYTE),
 PAGE_SIZE);


Re: [Intel-gfx] [PATCH] drm/i915/ttm: Abort suspend on i915_ttm_backup failure

2022-09-01 Thread Das, Nirmoy



On 9/1/2022 5:57 PM, Andrzej Hajda wrote:

On 31.08.2022 18:18, Nirmoy Das wrote:

On system suspend when system memory is low then i915_gem_obj_copy_ttm()
could fail trying to backup a lmem obj. GEM_WARN_ON() is not enough,
suspend shouldn't continue if i915_ttm_backup() throws an error.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6529
Reviewed-by: Matthew Auld 
Suggested-by: Chris P Wilson 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c

index 9aad84059d56..6f5d5c0909b4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -79,7 +79,12 @@ static int i915_ttm_backup(struct 
i915_gem_apply_to_region *apply,

  goto out_no_populate;
    err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, 
false);

-    GEM_WARN_ON(err);
+    if (err) {
+    drm_err(>drm,
+    "Unable to copy from device to system memory, err:%d\n",
+    err);


I wonder if %pe wouldn't be better here, up to you.



More readable err should be useful, resend with %pe.



Reviewed-by: Andrzej Hajda 



Thanks,

Nirmoy



Regards
Andrzej



+    goto out_no_populate;
+    }
  ttm_bo_wait_ctx(backup_bo, );
    obj->ttm.backup = backup;




Re: [Intel-gfx] [PATCH] drm/i915/ttm: Abort suspend on i915_ttm_backup failure

2022-08-31 Thread Das, Nirmoy



On 8/31/2022 5:50 PM, Matthew Auld wrote:

On 29/08/2022 13:04, Nirmoy Das wrote:

On system suspend when system memory is low then i915_gem_obj_copy_ttm()
could fail trying to backup a lmem obj. GEM_WARN_ON() is not enough,
suspend shouldn't continue if i915_ttm_backup() throws an error.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6529


Does this fix it? Does CI not complain about the drm_err? Also do we 
know what the actual error was?



The error isn't reoccurring so the best guess is: large framebuffer copy 
took long time and  wait_for_suspend()


timed out. This needs more coverage from IGT and I am looking into 
that.  Let's ignore the "Closes" tag from this


patch till I come up a IGT test for this.


Nirmoy





Suggested-by: Chris P Wilson 
Signed-off-by: Nirmoy Das 


Passing the error along seems reasonable to me,
Reviewed-by: Matthew Auld 


---
  drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c

index 9aad84059d56..6f5d5c0909b4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -79,7 +79,12 @@ static int i915_ttm_backup(struct 
i915_gem_apply_to_region *apply,

  goto out_no_populate;
    err = i915_gem_obj_copy_ttm(backup, obj, pm_apply->allow_gpu, 
false);

-    GEM_WARN_ON(err);
+    if (err) {
+    drm_err(>drm,
+    "Unable to copy from device to system memory, err:%d\n",
+    err);
+    goto out_no_populate;
+    }
  ttm_bo_wait_ctx(backup_bo, );
    obj->ttm.backup = backup;


Re: [PATCH] drm/i915/ttm: fix 32b build

2022-07-13 Thread Das, Nirmoy

Reviewed-by: Nirmoy Das 

On 7/12/2022 7:40 PM, Matthew Auld wrote:

Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.

Reported-by: Ville Syrjälä 
Fixes: bc99f1209f19 ("drm/i915/ttm: fix sg_table construction")
Signed-off-by: Matthew Auld 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_region.c |  2 ++
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c|  2 +-
  drivers/gpu/drm/i915/i915_scatterlist.c| 16 
  drivers/gpu/drm/i915/i915_scatterlist.h|  4 ++--
  drivers/gpu/drm/i915/intel_region_ttm.c|  2 +-
  drivers/gpu/drm/i915/intel_region_ttm.h|  2 +-
  6 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_region.c 
b/drivers/gpu/drm/i915/gem/i915_gem_region.c
index f46ee16a323a..a4fb577eceb4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_region.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_region.c
@@ -60,6 +60,8 @@ __i915_gem_object_create_region(struct intel_memory_region 
*mem,
if (page_size)
default_page_size = page_size;
  
+	/* We should be able to fit a page within an sg entry */

+   GEM_BUG_ON(overflows_type(default_page_size, u32));
GEM_BUG_ON(!is_power_of_2_u64(default_page_size));
GEM_BUG_ON(default_page_size < PAGE_SIZE);
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index 053b0022ddd0..5a5cf332d8a5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -602,7 +602,7 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 struct ttm_resource *res)
  {
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-   u64 page_alignment;
+   u32 page_alignment;
  
  	if (!i915_ttm_gtt_binds_lmem(res))

return i915_ttm_tt_get_st(bo->ttm);
diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c 
b/drivers/gpu/drm/i915/i915_scatterlist.c
index f63b50b71e10..dcc081874ec8 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -79,10 +79,10 @@ void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, 
size_t size)
   */
  struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
  u64 region_start,
- u64 page_alignment)
+ u32 page_alignment)
  {
-   const u64 max_segment = round_down(UINT_MAX, page_alignment);
-   u64 segment_pages = max_segment >> PAGE_SHIFT;
+   const u32 max_segment = round_down(UINT_MAX, page_alignment);
+   const u32 segment_pages = max_segment >> PAGE_SHIFT;
u64 block_size, offset, prev_end;
struct i915_refct_sgt *rsgt;
struct sg_table *st;
@@ -96,7 +96,7 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
  
  	i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT);

st = >table;
-   if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages),
+   if (sg_alloc_table(st, DIV_ROUND_UP_ULL(node->size, segment_pages),
   GFP_KERNEL)) {
i915_refct_sgt_put(rsgt);
return ERR_PTR(-ENOMEM);
@@ -123,7 +123,7 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
st->nents++;
}
  
-		len = min(block_size, max_segment - sg->length);

+   len = min_t(u64, block_size, max_segment - sg->length);
sg->length += len;
sg_dma_len(sg) += len;
  
@@ -155,11 +155,11 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,

   */
  struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res,
 u64 region_start,
-u64 page_alignment)
+u32 page_alignment)
  {
struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res);
const u64 size = res->num_pages << PAGE_SHIFT;
-   const u64 max_segment = round_down(UINT_MAX, page_alignment);
+   const u32 max_segment = round_down(UINT_MAX, page_alignment);
struct drm_buddy *mm = bman_res->mm;
struct list_head *blocks = _res->blocks;
struct drm_buddy_block *block;
@@ -207,7 +207,7 @@ struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct 
ttm_resource *res,
st->nents++;
}
  
-			len = min(block_size, max_segment - sg->length);

+   len = min_t(u64, 

Re: [PATCH v2 12/39] drm/i915: gem: add kernel-doc description for some function parameters

2022-07-13 Thread Das, Nirmoy

|Reviewed-by: Nirmoy Das |

On 7/13/2022 10:12 AM, Mauro Carvalho Chehab wrote:

There are some parameters missing at the kernel-doc markups on
some gem files. Some of those are trivial enough to be added.

Document them.

Signed-off-by: Mauro Carvalho Chehab
---

To avoid mailbombing on a large number of people, only mailing lists were C/C 
on the cover.
See [PATCH v2 00/39] 
at:https://lore.kernel.org/all/cover.1657699522.git.mche...@kernel.org/

  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 2 ++
  drivers/gpu/drm/i915/gem/i915_gem_ttm.h  | 1 +
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 2 ++
  3 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index ccec4055fde3..b5dd43405355 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -820,6 +820,8 @@ int i915_gem_object_wait_moving_fence(struct 
drm_i915_gem_object *obj,
   * in an unknown_state. This means that userspace must NEVER be allowed to 
touch
   * the pages, with either the GPU or CPU.
   *
+ * @obj: The object to check its state.
+ *
   * ONLY valid to be called after ensuring that all kernel fences have 
signalled
   * (in particular the fence for moving/clearing the object).
   */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
index e4842b4296fc..64151f40098f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.h
@@ -30,6 +30,7 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo);
  /**
   * i915_ttm_to_gem - Convert a struct ttm_buffer_object to an embedding
   * struct drm_i915_gem_object.
+ * @bo: The ttm buffer object.
   *
   * Return: Pointer to the embedding struct ttm_buffer_object, or NULL
   * if the object was not an i915 ttm object.
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 9a7e50534b84..56217d324a9b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -237,6 +237,7 @@ static struct dma_fence *i915_ttm_accel_move(struct 
ttm_buffer_object *bo,
   * @_src_iter: Storage space for the source kmap iterator.
   * @dst_iter: Pointer to the destination kmap iterator.
   * @src_iter: Pointer to the source kmap iterator.
+ * @num_pages: Number of pages to copy or to be cleared.
   * @clear: Whether to clear instead of copy.
   * @src_rsgt: Refcounted scatter-gather list of source memory.
   * @dst_rsgt: Refcounted scatter-gather list of destination memory.
@@ -541,6 +542,7 @@ __i915_ttm_move(struct ttm_buffer_object *bo,
   * i915_ttm_move - The TTM move callback used by i915.
   * @bo: The buffer object.
   * @evict: Whether this is an eviction.
+ * @ctx: Pointer to a struct ttm_operation_ctx
   * @dst_mem: The destination ttm resource.
   * @hop: If we need multihop, what temporary memory type to move to.
   *

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de 
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau

Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Re: [PATCH v2 10/39] drm/i915: i915_gem_ttm: fix a kernel-doc markup

2022-07-13 Thread Das, Nirmoy

|Reviewed-by: Nirmoy Das|

On 7/13/2022 10:11 AM, Mauro Carvalho Chehab wrote:

Two new fields were added to __i915_gem_ttm_object_init() without
their corresponding documentation.

Document them.

Fixes: 9b78b5dade2d ("drm/i915: add i915_gem_object_create_region_at()")
Signed-off-by: Mauro Carvalho Chehab
---

To avoid mailbombing on a large number of people, only mailing lists were C/C 
on the cover.
See [PATCH v2 00/39] 
at:https://lore.kernel.org/all/cover.1657699522.git.mche...@kernel.org/

  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 053b0022ddd0..e8cfb47b5f5a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1187,7 +1187,9 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
   * __i915_gem_ttm_object_init - Initialize a ttm-backed i915 gem object
   * @mem: The initial memory region for the object.
   * @obj: The gem object.
+ * @offset: The range start.
   * @size: Object size in bytes.
+ * @page_size: The requested page size in bytes for this object.
   * @flags: gem object flags.
   *
   * Return: 0 on success, negative error code on failure.

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de 
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau

Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Re: [PATCH v2 08/39] drm/i915: gem: fix some Kernel-doc issues

2022-07-13 Thread Das, Nirmoy


On 7/13/2022 10:11 AM, Mauro Carvalho Chehab wrote:

There are several trivial issueson kernel-doc markups at gem:

drivers/gpu/drm/i915/gem/i915_gem_create.c:146: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/gem/i915_gem_create.c:217: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/gem/i915_gem_create.c:401: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/gem/i915_gem_domain.c:116: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/gem/i915_gem_domain.c:177: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/gem/i915_gem_domain.c:262: warning: expecting 
prototype for Changes the cache(). Prototype was for 
i915_gem_object_set_cache_level() instead
drivers/gpu/drm/i915/gem/i915_gem_domain.c:456: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/gem/i915_gem_domain.c:500: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/gem/i915_gem_object.h:110: warning: Function 
parameter or member 'file' not described in 'i915_gem_object_lookup_rcu'
drivers/gpu/drm/i915/gem/i915_gem_object.h:110: warning: Excess 
function parameter 'filp' description in 'i915_gem_object_lookup_rcu'
drivers/gpu/drm/i915/gem/i915_gem_region.h:35: warning: Function 
parameter or member 'process_obj' not described in 
'i915_gem_apply_to_region_ops'
drivers/gpu/drm/i915/gem/i915_gem_wait.c:130: warning: This comment 
starts with '/**', but isn't a kernel-doc comment. Refer 
Documentation/doc-guide/kernel-doc.rst

Caused by:
- lack of function name at the kernel-doc markup;
- renamed parameters.

Address them.

Signed-off-by: Mauro Carvalho Chehab


|Reviewed-by: Nirmoy Das|


---

To avoid mailbombing on a large number of people, only mailing lists were C/C 
on the cover.
See [PATCH v2 00/39] 
at:https://lore.kernel.org/all/cover.1657699522.git.mche...@kernel.org/

  drivers/gpu/drm/i915/gem/i915_gem_create.c |  8 +---
  drivers/gpu/drm/i915/gem/i915_gem_domain.c | 17 +++--
  drivers/gpu/drm/i915/gem/i915_gem_object.h |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_wait.c   |  2 +-
  4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 33673fe7ee0a..8cb2eb092031 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -143,7 +143,8 @@ __i915_gem_object_create_user_ext(struct drm_i915_private 
*i915, u64 size,
  }
  
  /**

- * Creates a new object using the same path as DRM_I915_GEM_CREATE_EXT
+ * __i915_gem_object_create_user - Creates a new object using the same path
+ * as DRM_I915_GEM_CREATE_EXT
   * @i915: i915 private
   * @size: size of the buffer, in bytes
   * @placements: possible placement regions, in priority order
@@ -214,7 +215,7 @@ i915_gem_dumb_create(struct drm_file *file,
  }
  
  /**

- * Creates a new mm object and returns a handle to it.
+ * i915_gem_create_ioctl - Creates a new mm object and returns a handle to it.
   * @dev: drm device pointer
   * @data: ioctl data blob
   * @file: drm file pointer
@@ -398,7 +399,8 @@ static const i915_user_extension_fn create_extensions[] = {
  };
  
  /**

- * Creates a new mm object and returns a handle to it.
+ * i915_gem_create_ext_ioctl - Creates a new mm object and returns a handle
+ * to it.
   * @dev: drm device pointer
   * @data: ioctl data blob
   * @file: drm file pointer
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c 
b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 1674b0c5802b..49d7841ba979 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -113,7 +113,8 @@ void i915_gem_object_flush_if_display_locked(struct 
drm_i915_gem_object *obj)
  }
  
  /**

- * Moves a single object to the WC read, and possibly write domain.
+ * i915_gem_object_set_to_wc_domain - Moves a single object to the WC read,
+ * and possibly write domain.
   * @obj: object to act on
   * @write: ask for write access or read only
   *
@@ -174,7 +175,8 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object 
*obj, bool write)
  }
  
  /**

- * Moves a single object to the GTT read, and possibly write domain.
+ * i915_gem_object_set_to_gtt_domain - Moves a single object to the GTT read,
+ * and possibly 

Re: [PATCH v2] drm/syncobj: Fix sync syncobj issue

2022-07-13 Thread Das, Nirmoy

Hi Christian,

On 7/12/2022 12:26 PM, Christian König wrote:

Ping to the Intel guys here. Especially Lucas/Nirmoy/Lionel.

IIRC you stumbled over that problem as well, have you found any solution?


I might be wrong but  I think you are talking about 
igt@syncobj_timeline@transfer-timeline-point testcase which seems to be


green in CI now 
https://intel-gfx-ci.01.org/tree/drm-tip/igt@syncobj_timel...@transfer-timeline-point.html


Lucas found out that the issues got fixed after ec8d985ff26f ("drm: use 
dma_fence_unwrap_merge() in drm_syncobj")



Regards,

Nirmoy



Regards,
Christian.

Am 07.07.22 um 12:29 schrieb jie1zhan:

enable signaling after flatten dma_fence_chains on transfer

Signed-off-by: jie1zhan 
---
  drivers/gpu/drm/drm_syncobj.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 7e48dcd1bee4..0d9d3577325f 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -920,6 +920,7 @@ static int 
drm_syncobj_transfer_to_timeline(struct drm_file *file_private,

  if (ret)
  goto err_free_fence;
  +    dma_fence_enable_sw_signaling(fence);
  chain = dma_fence_chain_alloc();
  if (!chain) {
  ret = -ENOMEM;




Re: [PATCH v3] drm/i915/ttm: fix sg_table construction

2022-07-11 Thread Das, Nirmoy



On 7/11/2022 10:58 AM, Matthew Auld wrote:

If we encounter some monster sized local-memory page that exceeds the
maximum sg length (UINT32_MAX), ensure that don't end up with some
misaligned address in the entry that follows, leading to fireworks
later. Also ensure we have some coverage of this in the selftests.

v2(Chris):
   - Use round_down consistently to avoid udiv errors
v3(Nirmoy):
   - Also update the max_segment in the selftest

Fixes: f701b16d4cc5 ("drm/i915/ttm: add i915_sg_from_buddy_resource")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6379
Signed-off-by: Matthew Auld 



Reviewed-by: Nirmoy Das 


Cc: Thomas Hellström 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 11 --
  drivers/gpu/drm/i915/i915_scatterlist.c   | 19 +
  drivers/gpu/drm/i915/i915_scatterlist.h   |  6 --
  drivers/gpu/drm/i915/intel_region_ttm.c   | 10 ++---
  drivers/gpu/drm/i915/intel_region_ttm.h   |  3 ++-
  .../drm/i915/selftests/intel_memory_region.c  | 21 +--
  drivers/gpu/drm/i915/selftests/mock_region.c  |  3 ++-
  7 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 7e1f8b83077f..c5c8aa1f8558 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -602,10 +602,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 struct ttm_resource *res)
  {
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+   u64 page_alignment;
  
  	if (!i915_ttm_gtt_binds_lmem(res))

return i915_ttm_tt_get_st(bo->ttm);
  
+	page_alignment = bo->page_alignment << PAGE_SHIFT;

+   if (!page_alignment)
+   page_alignment = obj->mm.region->min_page_size;
+
/*
 * If CPU mapping differs, we need to add the ttm_tt pages to
 * the resulting st. Might make sense for GGTT.
@@ -616,7 +621,8 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
struct i915_refct_sgt *rsgt;
  
  			rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,

-res);
+res,
+
page_alignment);
if (IS_ERR(rsgt))
return rsgt;
  
@@ -625,7 +631,8 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,

return i915_refct_sgt_get(obj->ttm.cached_io_rsgt);
}
  
-	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);

+   return intel_region_ttm_resource_to_rsgt(obj->mm.region, res,
+page_alignment);
  }
  
  static int i915_ttm_truncate(struct drm_i915_gem_object *obj)

diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c 
b/drivers/gpu/drm/i915/i915_scatterlist.c
index 159571b9bd24..f63b50b71e10 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -68,6 +68,7 @@ void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t 
size)
   * drm_mm_node
   * @node: The drm_mm_node.
   * @region_start: An offset to add to the dma addresses of the sg list.
+ * @page_alignment: Required page alignment for each sg entry. Power of two.
   *
   * Create a struct sg_table, initializing it from a struct drm_mm_node,
   * taking a maximum segment length into account, splitting into segments
@@ -77,15 +78,18 @@ void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, 
size_t size)
   * error code cast to an error pointer on failure.
   */
  struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
- u64 region_start)
+ u64 region_start,
+ u64 page_alignment)
  {
-   const u64 max_segment = SZ_1G; /* Do we have a limit on this? */
+   const u64 max_segment = round_down(UINT_MAX, page_alignment);
u64 segment_pages = max_segment >> PAGE_SHIFT;
u64 block_size, offset, prev_end;
struct i915_refct_sgt *rsgt;
struct sg_table *st;
struct scatterlist *sg;
  
+	GEM_BUG_ON(!max_segment);

+
rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
if (!rsgt)
return ERR_PTR(-ENOMEM);
@@ -112,6 +116,8 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
sg = __sg_next(sg);
  
  			sg_dma_address(sg) = region_start + offset;

+   GEM_BUG_ON(!IS_ALIGNED(sg_dma_address(sg),
+  page_alignment));
sg_dma_len(sg) = 0;
sg->length = 0;
st->nents++;
@@ 

Re: [PATCH] drm/i915/ttm: fix sg_table construction

2022-07-08 Thread Das, Nirmoy



On 7/8/2022 9:41 AM, Matthew Auld wrote:

If we encounter some monster sized local-memory page that exceeds the
maximum sg length (UINT32_MAX), ensure that don't end up with some
misaligned address in the entry that follows, leading to fireworks
later. Also ensure we have some coverage of this in the selftests.

v2(Chris): use round_down consistently to avoid udiv errors

Fixes: f701b16d4cc5 ("drm/i915/ttm: add i915_sg_from_buddy_resource")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6379
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 11 +--
  drivers/gpu/drm/i915/i915_scatterlist.c   | 19 +++
  drivers/gpu/drm/i915/i915_scatterlist.h   |  6 --
  drivers/gpu/drm/i915/intel_region_ttm.c   | 10 +++---
  drivers/gpu/drm/i915/intel_region_ttm.h   |  3 ++-
  .../drm/i915/selftests/intel_memory_region.c  | 17 -
  drivers/gpu/drm/i915/selftests/mock_region.c  |  3 ++-
  7 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 7e1f8b83077f..c5c8aa1f8558 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -602,10 +602,15 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 struct ttm_resource *res)
  {
struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+   u64 page_alignment;
  
  	if (!i915_ttm_gtt_binds_lmem(res))

return i915_ttm_tt_get_st(bo->ttm);
  
+	page_alignment = bo->page_alignment << PAGE_SHIFT;

+   if (!page_alignment)
+   page_alignment = obj->mm.region->min_page_size;
+
/*
 * If CPU mapping differs, we need to add the ttm_tt pages to
 * the resulting st. Might make sense for GGTT.
@@ -616,7 +621,8 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
struct i915_refct_sgt *rsgt;
  
  			rsgt = intel_region_ttm_resource_to_rsgt(obj->mm.region,

-res);
+res,
+
page_alignment);
if (IS_ERR(rsgt))
return rsgt;
  
@@ -625,7 +631,8 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,

return i915_refct_sgt_get(obj->ttm.cached_io_rsgt);
}
  
-	return intel_region_ttm_resource_to_rsgt(obj->mm.region, res);

+   return intel_region_ttm_resource_to_rsgt(obj->mm.region, res,
+page_alignment);
  }
  
  static int i915_ttm_truncate(struct drm_i915_gem_object *obj)

diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c 
b/drivers/gpu/drm/i915/i915_scatterlist.c
index 159571b9bd24..f63b50b71e10 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.c
+++ b/drivers/gpu/drm/i915/i915_scatterlist.c
@@ -68,6 +68,7 @@ void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, size_t 
size)
   * drm_mm_node
   * @node: The drm_mm_node.
   * @region_start: An offset to add to the dma addresses of the sg list.
+ * @page_alignment: Required page alignment for each sg entry. Power of two.
   *
   * Create a struct sg_table, initializing it from a struct drm_mm_node,
   * taking a maximum segment length into account, splitting into segments
@@ -77,15 +78,18 @@ void i915_refct_sgt_init(struct i915_refct_sgt *rsgt, 
size_t size)
   * error code cast to an error pointer on failure.
   */
  struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node,
- u64 region_start)
+ u64 region_start,
+ u64 page_alignment)
  {
-   const u64 max_segment = SZ_1G; /* Do we have a limit on this? */
+   const u64 max_segment = round_down(UINT_MAX, page_alignment);
u64 segment_pages = max_segment >> PAGE_SHIFT;
u64 block_size, offset, prev_end;
struct i915_refct_sgt *rsgt;
struct sg_table *st;
struct scatterlist *sg;
  
+	GEM_BUG_ON(!max_segment);

+
rsgt = kmalloc(sizeof(*rsgt), GFP_KERNEL);
if (!rsgt)
return ERR_PTR(-ENOMEM);
@@ -112,6 +116,8 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
sg = __sg_next(sg);
  
  			sg_dma_address(sg) = region_start + offset;

+   GEM_BUG_ON(!IS_ALIGNED(sg_dma_address(sg),
+  page_alignment));
sg_dma_len(sg) = 0;
sg->length = 0;
st->nents++;
@@ -138,6 +144,7 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct 
drm_mm_node *node,
   * 

Re: [PATCH v1] Fix: SYNCOBJ TIMELINE Test failed.

2022-06-30 Thread Das, Nirmoy



On 6/29/2022 11:12 AM, Christian König wrote:

Am 29.06.22 um 08:02 schrieb jie1zhan:

  The issue cause by the commit :

721255b527(drm/syncobj: flatten dma_fence_chains on transfer).

Because it use the point of dma_fence incorrectly

Correct the point of dma_fence by fence array


Well that patch is just utterly nonsense as far as I can see.



Signed-off-by: jie1zhan 

Reviewed-by: Christian König 

Reviewed-by: Nirmoy Das 


I have strong doubts that Nirmoy has reviewed this and I certainly 
haven't reviewed it.



I haven't  reviewed this either.


Nirmoy



Christian.


---
  drivers/gpu/drm/drm_syncobj.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 7e48dcd1bee4..d5db818f1c76 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -887,7 +887,7 @@ static int drm_syncobj_flatten_chain(struct 
dma_fence **f)

  goto free_fences;
    dma_fence_put(*f);
-    *f = >base;
+    *f = array->fences[0];
  return 0;
    free_fences:




Re: [Intel-gfx] [PATCH 09/10] drm/i915: turn on small BAR support

2022-06-21 Thread Das, Nirmoy



On 6/21/2022 10:38 AM, Matthew Auld wrote:

On 17/06/2022 13:33, Thomas Hellström wrote:


On 5/25/22 20:43, Matthew Auld wrote:

With the uAPI in place we should now have enough in place to ensure a
working system on small BAR configurations.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
---
  drivers/gpu/drm/i915/gt/intel_region_lmem.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c 
b/drivers/gpu/drm/i915/gt/intel_region_lmem.c

index e9c12e0d6f59..6c6f8cbd7321 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -111,12 +111,6 @@ static struct intel_memory_region 
*setup_lmem(struct intel_gt *gt)
  flat_ccs_base = intel_gt_read_register(gt, 
XEHPSDV_FLAT_CCS_BASE_ADDR);
  flat_ccs_base = (flat_ccs_base >> XEHPSDV_CCS_BASE_SHIFT) 
* SZ_64K;

-    /* FIXME: Remove this when we have small-bar enabled */
-    if (pci_resource_len(pdev, 2) < lmem_size) {
-    drm_err(>drm, "System requires small-BAR support, 
which is currently unsupported on this kernel\n");

-    return ERR_PTR(-EINVAL);
-    }
-
  if (GEM_WARN_ON(lmem_size < flat_ccs_base))
  return ERR_PTR(-EIO);
@@ -169,6 +163,10 @@ static struct intel_memory_region 
*setup_lmem(struct intel_gt *gt)

  drm_info(>drm, "Local memory available: %pa\n",
   _size);
+    if (io_size < lmem_size)
+    drm_info(>drm, "Using a reduced BAR size of %lluMiB. 
Consider enabling the full BAR size if available in the BIOS.\n",

+ (u64)io_size >> 20);
+


Hmm. I wonder what BIOS uis typically call the mappable portion of 
VRAM. I'll se if I can check that on my DG1 system. Might be that an 
average user misinterprets "full BAR".


"PCI Subsystem settings" -> "Above 4G memory [enabled/disabled]"

Sample size of one though.

Maybe s/full BAR size/full memory size/ ?



Or  s/full BAR size/re-sizable BAR/

In newer BIOS, there is a more direct option to enable re-sizable bar: 
"Re-Size BAR"/"Resizable BAR".



Nirmoy





/Thomas




  return mem;
  err_region_put:


Re: [PATCH v2] drm/i915: Fix vm use-after-free in vma destruction

2022-06-20 Thread Das, Nirmoy

Acked-by: Nirmoy Das 

On 6/20/2022 2:36 PM, Thomas Hellström wrote:

In vma destruction, the following race may occur:

Thread 1: Thread 2:
i915_vma_destroy();

   ...
   list_del_init(vma->vm_link);
   ...
   mutex_unlock(vma->vm->mutex);
  __i915_vm_release();
release_references();

And in release_reference() we dereference vma->vm to get to the
vm gt pointer, leading to a use-after free.

However, __i915_vm_release() grabs the vm->mutex so the vm won't be
destroyed before vma->vm->mutex is released, so extract the gt pointer
under the vm->mutex to avoid the vma->vm dereference in
release_references().

v2: Fix a typo in the commit message (Andi Shyti)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5944
Fixes: e1a7ab4fca ("drm/i915: Remove the vm open count")

Cc: Niranjana Vishwanathapura 
Cc: Matthew Auld 
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/i915/i915_vma.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 0bffb70b3c5f..04d12f278f57 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1637,10 +1637,10 @@ static void force_unbind(struct i915_vma *vma)
GEM_BUG_ON(drm_mm_node_allocated(>node));
  }
  
-static void release_references(struct i915_vma *vma, bool vm_ddestroy)

+static void release_references(struct i915_vma *vma, struct intel_gt *gt,
+  bool vm_ddestroy)
  {
struct drm_i915_gem_object *obj = vma->obj;
-   struct intel_gt *gt = vma->vm->gt;
  
  	GEM_BUG_ON(i915_vma_is_active(vma));
  
@@ -1695,11 +1695,12 @@ void i915_vma_destroy_locked(struct i915_vma *vma)
  
  	force_unbind(vma);

list_del_init(>vm_link);
-   release_references(vma, false);
+   release_references(vma, vma->vm->gt, false);
  }
  
  void i915_vma_destroy(struct i915_vma *vma)

  {
+   struct intel_gt *gt;
bool vm_ddestroy;
  
  	mutex_lock(>vm->mutex);

@@ -1707,8 +1708,11 @@ void i915_vma_destroy(struct i915_vma *vma)
list_del_init(>vm_link);
vm_ddestroy = vma->vm_ddestroy;
vma->vm_ddestroy = false;
+
+   /* vma->vm may be freed when releasing vma->vm->mutex. */
+   gt = vma->vm->gt;
mutex_unlock(>vm->mutex);
-   release_references(vma, vm_ddestroy);
+   release_references(vma, gt, vm_ddestroy);
  }
  
  void i915_vma_parked(struct intel_gt *gt)


Re: [PATCH 02/10] drm/i915/uapi: add probed_cpu_visible_size

2022-06-01 Thread Das, Nirmoy

Acked-by: Nirmoy Das 

On 5/25/2022 8:43 PM, Matthew Auld wrote:

Userspace wants to know the size of CPU visible portion of device
local-memory, and on small BAR devices the probed_size is no longer
enough. In Vulkan, for example, it would like to know the size in bytes
for CPU visible VkMemoryHeap. We already track the io_size for each
region, so it's just case of plumbing that through to the region query.

Testcase: igt@i915_query@query-regions-sanity-check
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
---
  drivers/gpu/drm/i915/i915_query.c |  6 +++
  include/uapi/drm/i915_drm.h   | 74 +--
  2 files changed, 47 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 7584cec53d5d..9aa0b28aa6ee 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -496,6 +496,12 @@ static int query_memregion_info(struct drm_i915_private 
*i915,
info.region.memory_class = mr->type;
info.region.memory_instance = mr->instance;
info.probed_size = mr->total;
+
+   if (mr->type == INTEL_MEMORY_LOCAL)
+   info.probed_cpu_visible_size = mr->io_size;
+   else
+   info.probed_cpu_visible_size = mr->total;
+
info.unallocated_size = mr->avail;
  
  		if (__copy_to_user(info_ptr, , sizeof(info)))

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index de49b68b4fc8..9df419a45244 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3207,36 +3207,6 @@ struct drm_i915_gem_memory_class_instance {
   * struct drm_i915_memory_region_info - Describes one region as known to the
   * driver.
   *
- * Note that we reserve some stuff here for potential future work. As an 
example
- * we might want expose the capabilities for a given region, which could 
include
- * things like if the region is CPU mappable/accessible, what are the supported
- * mapping types etc.
- *
- * Note that to extend struct drm_i915_memory_region_info and struct
- * drm_i915_query_memory_regions in the future the plan is to do the following:
- *
- * .. code-block:: C
- *
- * struct drm_i915_memory_region_info {
- * struct drm_i915_gem_memory_class_instance region;
- * union {
- * __u32 rsvd0;
- * __u32 new_thing1;
- * };
- * ...
- * union {
- * __u64 rsvd1[8];
- * struct {
- * __u64 new_thing2;
- * __u64 new_thing3;
- * ...
- * };
- * };
- * };
- *
- * With this things should remain source compatible between versions for
- * userspace, even as we add new fields.
- *
   * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
   * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
   * at _i915_query_item.query_id.
@@ -3248,14 +3218,52 @@ struct drm_i915_memory_region_info {
/** @rsvd0: MBZ */
__u32 rsvd0;
  
-	/** @probed_size: Memory probed by the driver (-1 = unknown) */

+   /**
+* @probed_size: Memory probed by the driver (-1 = unknown)
+*
+* Note that it should not be possible to ever encounter a zero value
+* here, also note that no current region type will ever return -1 here.
+* Although for future region types, this might be a possibility. The
+* same applies to the other size fields.
+*/
__u64 probed_size;
  
  	/** @unallocated_size: Estimate of memory remaining (-1 = unknown) */

__u64 unallocated_size;
  
-	/** @rsvd1: MBZ */

-   __u64 rsvd1[8];
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder (if there is any) will not be CPU
+* accessible.
+*
+* On systems without small BAR, the @probed_size will
+* always equal the @probed_cpu_visible_size, since all
+* of it will be CPU accessible.
+*
+* Note this is only tracked for
+* I915_MEMORY_CLASS_DEVICE regions (for other types the
+* value here will always equal the 

Re: [Intel-gfx] [PATCH 07/10] drm/i915/error: skip non-mappable pages

2022-06-01 Thread Das, Nirmoy

Reviewed-by: Nirmoy Das 

On 5/25/2022 8:43 PM, Matthew Auld wrote:

Skip capturing any lmem pages that can't be copied using the CPU. This
in now only best effort on platforms that have small BAR.

Testcase: igt@gem-exec-capture@capture-invisible
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
---
  drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 0512c66fa4f3..77df899123c2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1116,11 +1116,15 @@ i915_vma_coredump_create(const struct intel_gt *gt,
dma_addr_t dma;
  
  		for_each_sgt_daddr(dma, iter, vma_res->bi.pages) {

+   dma_addr_t offset = dma - mem->region.start;
void __iomem *s;
  
-			s = io_mapping_map_wc(>iomap,

- dma - mem->region.start,
- PAGE_SIZE);
+   if (offset + PAGE_SIZE > mem->io_size) {
+   ret = -EINVAL;
+   break;
+   }
+
+   s = io_mapping_map_wc(>iomap, offset, PAGE_SIZE);
ret = compress_page(compress,
(void __force *)s, dst,
true);


Re: [Intel-gfx] [PATCH 06/10] drm/i915/uapi: add NEEDS_CPU_ACCESS hint

2022-06-01 Thread Das, Nirmoy

LGTM Reviewed-by: Nirmoy Das 

On 5/25/2022 8:43 PM, Matthew Auld wrote:

If set, force the allocation to be placed in the mappable portion of
I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory
(i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the
object, that way we can always spill the object into system memory if we
can't make space.

Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
---
  drivers/gpu/drm/i915/gem/i915_gem_create.c | 26 ++---
  include/uapi/drm/i915_drm.h| 61 +++---
  2 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index d094cae0ddf1..33673fe7ee0a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -241,6 +241,7 @@ struct create_ext {
struct drm_i915_private *i915;
struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
unsigned int n_placements;
+   unsigned int placement_mask;
unsigned long flags;
  };
  
@@ -337,6 +338,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,

for (i = 0; i < args->num_regions; i++)
ext_data->placements[i] = placements[i];
  
+	ext_data->placement_mask = mask;

return 0;
  
  out_dump:

@@ -411,7 +413,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
*data,
struct drm_i915_gem_object *obj;
int ret;
  
-	if (args->flags)

+   if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
return -EINVAL;
  
  	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),

@@ -427,13 +429,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
*data,
ext_data.n_placements = 1;
}
  
-	/*

-* TODO: add a userspace hint to force CPU_ACCESS for the object, which
-* can override this.
-*/
-   if (ext_data.n_placements > 1 ||
-   ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
-   ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+   if (args->flags & I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) {
+   if (ext_data.n_placements == 1)
+   return -EINVAL;
+
+   /*
+* We always need to be able to spill to system memory, if we
+* can't place in the mappable part of LMEM.
+*/
+   if (!(ext_data.placement_mask & BIT(INTEL_REGION_SMEM)))
+   return -EINVAL;
+   } else {
+   if (ext_data.n_placements > 1 ||
+   ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
+   ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+   }
  
  	obj = __i915_gem_object_create_user_ext(i915, args->size,

ext_data.placements,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e30f31a440b3..5b0a10e6a1b8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3366,11 +3366,11 @@ struct drm_i915_query_memory_regions {
   * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
   * extension support using struct i915_user_extension.
   *
- * Note that in the future we want to have our buffer flags here, at least for
- * the stuff that is immutable. Previously we would have two ioctls, one to
- * create the object with gem_create, and another to apply various parameters,
- * however this creates some ambiguity for the params which are considered
- * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
   */
  struct drm_i915_gem_create_ext {
/**
@@ -3378,7 +3378,6 @@ struct drm_i915_gem_create_ext {
 *
 * The (page-aligned) allocated size for the object will be returned.
 *
-*
 * DG2 64K min page size implications:
 *
 * On discrete platforms, starting from DG2, we have to contend with GTT
@@ -3390,7 +3389,9 @@ struct drm_i915_gem_create_ext {
 *
 * Note that the returned size here will always reflect any required
 * rounding up done by the kernel, i.e 4K will now become 64K on devices
-* such as DG2.
+* such as DG2. The 

Re: [PATCH] drm/i915/gem: Make drop_pages() return bool

2022-05-03 Thread Das, Nirmoy



On 5/3/2022 8:15 AM, Lucas De Marchi wrote:

Commit e4e806253003 ("drm/i915: Change shrink ordering to use locking
around unbinding.") changed the return type to int without changing the
return values or their meaning to "0 is success". Move it back to
boolean.

Signed-off-by: Lucas De Marchi 



Reviewed-by: Nirmoy Das 


---
  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 6a6ff98a8746..1030053571a2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -36,7 +36,7 @@ static bool can_release_pages(struct drm_i915_gem_object *obj)
return swap_available() || obj->mm.madv == I915_MADV_DONTNEED;
  }
  
-static int drop_pages(struct drm_i915_gem_object *obj,

+static bool drop_pages(struct drm_i915_gem_object *obj,
   unsigned long shrink, bool trylock_vm)
  {
unsigned long flags;


Re: [PATCH 2/2] drm/i915/selftests: tweak the misaligned_case

2022-04-21 Thread Das, Nirmoy

LGTM Reviewed-by: Nirmoy Das 

On 4/6/2022 9:30 PM, Matthew Auld wrote:

The compact-pt layout restrictions should only apply to the ppGTT. Also
make this play nice on platforms that only have the 64K GTT restriction,
and not the compact-pt thing.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index bccc49a8ab5e..8633bec18fa7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1112,10 +1112,16 @@ static int misaligned_case(struct i915_address_space 
*vm, struct intel_memory_re
expected_vma_size = round_up(size, 1 << 
(ffs(vma->resource->page_sizes_gtt) - 1));
expected_node_size = expected_vma_size;
  
-	if (NEEDS_COMPACT_PT(vm->i915) && i915_gem_object_is_lmem(obj)) {

-   /* compact-pt should expand lmem node to 2MB */
+   if (HAS_64K_PAGES(vm->i915) && i915_gem_object_is_lmem(obj)) {
+   /*
+* The compact-pt should expand lmem node to 2MB for the ppGTT,
+* for all other cases we should only expect 64K.
+*/
expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
-   expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+   if (NEEDS_COMPACT_PT(vm->i915) && !i915_is_ggtt(vm))
+   expected_node_size = round_up(size, 
I915_GTT_PAGE_SIZE_2M);
+   else
+   expected_node_size = round_up(size, 
I915_GTT_PAGE_SIZE_64K);
}
  
  	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {


Re: [PATCH 1/2] drm/i915/selftests: fixup min_alignment usage

2022-04-21 Thread Das, Nirmoy

Reviewed-by: Nirmoy Das 

On 4/6/2022 9:30 PM, Matthew Auld wrote:

Trying to cast the region id into the region type doesn't work too well,
since the i915_vm_min_alignment() won't give us the correct value for
the stolen-lmem case.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 5c9bfa409ff5..bccc49a8ab5e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1150,7 +1150,7 @@ static int misaligned_pin(struct i915_address_space *vm,
flags |= PIN_GLOBAL;
  
  	for_each_memory_region(mr, vm->i915, id) {

-   u64 min_alignment = i915_vm_min_alignment(vm, (enum 
intel_memory_type)id);
+   u64 min_alignment = i915_vm_min_alignment(vm, mr->type);
u64 size = min_alignment;
u64 addr = round_down(hole_start + (hole_size / 2), 
min_alignment);
  


Re: [PATCH 2/2] drm/i915/buddy: sanity check the size

2022-04-07 Thread Das, Nirmoy

|Reviewed-by: Nirmoy Das |

On 4/7/2022 1:06 PM, Matthew Auld wrote:

Ensure we check that the size is compatible with the requested
page_size. For tiny objects that are automatically annotated with
TTM_PL_FLAG_CONTIGUOUS(since they fit within a single page), we
currently end up silently overriding the min_page_size, which ends up
hiding bugs elsewhere.

Signed-off-by: Matthew Auld
Cc: Thomas Hellström
Cc: Nirmoy Das
---
  drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c 
b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
index 8e4e3f72c1ef..a5109548abc0 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
@@ -70,6 +70,7 @@ static int i915_ttm_buddy_man_alloc(struct 
ttm_resource_manager *man,
min_page_size = bo->page_alignment << PAGE_SHIFT;
  
  	GEM_BUG_ON(min_page_size < mm->chunk_size);

+   GEM_BUG_ON(!IS_ALIGNED(size, min_page_size));
  
  	if (place->fpfn + bman_res->base.num_pages != place->lpfn &&

place->flags & TTM_PL_FLAG_CONTIGUOUS) {

Re: [PATCH] drm/i915: consider min_page_size when migrating

2022-04-07 Thread Das, Nirmoy

LGTM Reviewed-by: Nirmoy Das 

On 4/6/2022 8:19 PM, Matthew Auld wrote:

We can only force migrate an object if the existing object size is
compatible with the new destinations min_page_size for the region.
Currently we blow up with something like:

[ 2857.497462] kernel BUG at drivers/gpu/drm/i915/gt/intel_migrate.c:431!
[ 2857.497497] invalid opcode:  [#1] PREEMPT SMP NOPTI
[ 2857.497502] CPU: 1 PID: 8921 Comm: i915_selftest Tainted: G U  W 
5.18.0-rc1-drm-tip+ #27
[ 2857.497513] RIP: 0010:emit_pte.cold+0x11a/0x17e [i915]
[ 2857.497646] Code: 00 48 c7 c2 f0 cd c1 a0 48 c7 c7 e9 99 bd a0 e8 d2 77 5d e0 bf 
01 00 00 00 e8 08 47 5d e0 31 f6 bf 09 00 00 00 e8 3c 7b 4d e0 <0f> 0b 48 c7 c1 
e0 2a c5 a0 ba 34 00 00 00 48 c7 c6 00 ce c1 a0 48
[ 2857.497654] RSP: 0018:c90f7748 EFLAGS: 00010246
[ 2857.497658] RAX:  RBX: c90f77c8 RCX: 0006
[ 2857.497662] RDX:  RSI:  RDI: 0009
[ 2857.497665] RBP:  R08: 0001 R09: 0001
[ 2857.497668] R10: 00022302 R11: 88846dea08f0 R12: 0001
[ 2857.497672] R13: 0188 R14: 081b R15: 888106b7c040
[ 2857.497675] FS:  7f0d4c4e0600() GS:88845da8() 
knlGS:
[ 2857.497679] CS:  0010 DS:  ES:  CR0: 80050033
[ 2857.497682] CR2: 7f113966c088 CR3: 000211e60003 CR4: 003706e0
[ 2857.497686] DR0:  DR1:  DR2: 
[ 2857.497689] DR3:  DR6: fffe0ff0 DR7: 0400
[ 2857.497692] Call Trace:
[ 2857.497694]  
[ 2857.497697]  intel_context_migrate_copy+0x1e5/0x4f0 [i915]

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_object.c| 3 +++
  drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c | 4 +++-
  2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index c1c3b510b9e2..07e816ddfb3d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -606,6 +606,9 @@ bool i915_gem_object_can_migrate(struct drm_i915_gem_object 
*obj,
if (!mr)
return false;
  
+	if (!IS_ALIGNED(obj->base.size, mr->min_page_size))

+   return false;
+
if (obj->mm.region == mr)
return true;
  
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c

index 9922ac91ec71..6f98adb3a103 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -47,14 +47,16 @@ static int igt_create_migrate(struct intel_gt *gt, enum 
intel_region_id src,
  {
struct drm_i915_private *i915 = gt->i915;
struct intel_memory_region *src_mr = i915->mm.regions[src];
+   struct intel_memory_region *dst_mr = i915->mm.regions[dst];
struct drm_i915_gem_object *obj;
struct i915_gem_ww_ctx ww;
int err = 0;
  
  	GEM_BUG_ON(!src_mr);

+   GEM_BUG_ON(!dst_mr);
  
  	/* Switch object backing-store on create */

-   obj = i915_gem_object_create_region(src_mr, PAGE_SIZE, 0, 0);
+   obj = i915_gem_object_create_region(src_mr, dst_mr->min_page_size, 0, 
0);
if (IS_ERR(obj))
return PTR_ERR(obj);
  


Re: [PATCH 1/2] dma-buf/sync-file: fix logic error in new fence merge code

2022-03-29 Thread Das, Nirmoy
I finally managed to find a machine and tested this series. If it is not 
too late


The series is Tested-by: Nirmoy Das 

On 3/29/2022 9:00 AM, Christian König wrote:

When the array is empty because everything is signaled we can't use
add_fence() to add something because that would filter the signaled
fence again.

Signed-off-by: Christian König 
Fixes: 519f490db07e ("dma-buf/sync-file: fix warning about fence containers")
---
  drivers/dma-buf/sync_file.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index b8dea4ec123b..514d213261df 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -262,7 +262,7 @@ static struct sync_file *sync_file_merge(const char *name, 
struct sync_file *a,
}
  
  	if (index == 0)

-   add_fence(fences, , dma_fence_get_stub());
+   fences[index++] = dma_fence_get_stub();
  
  	if (num_fences > index) {

struct dma_fence **tmp;


Re: [Intel-gfx] [PATCH] drm/i915: fix remaining_timeout in intel_gt_retire_requests_timeout

2022-03-28 Thread Das, Nirmoy



On 3/25/2022 9:33 PM, Ceraolo Spurio, Daniele wrote:



On 3/25/2022 11:37 AM, Das, Nirmoy wrote:


On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote:

In intel_gt_wait_for_idle, we use the remaining timeout returned from
intel_gt_retire_requests_timeout to wait on the GuC being idle. 
However,

the returned variable can have a negative value if something goes wrong
during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait
function.
To fix this, make sure to only return the timeout if it is positive.

Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle 
to work with GuC")

Signed-off-by: Daniele Ceraolo Spurio 
Cc: Matthew Brost 
Cc: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c

index edb881d756309..ef70c209976d8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -197,7 +197,7 @@ out_active: spin_lock(>lock);
  active_count++;
    if (remaining_timeout)
-    *remaining_timeout = timeout;
+    *remaining_timeout = timeout > 0 ? timeout : 0;



Should the last flush_submission() be  "if ( timeout > 0 
&_submission(gt, timeout))" ?


I considered it, but flush_submission only checks for timeout != 0 so 
it won't accidentally make use of a negative value thinking it's 
positive. I don't know if the flush is purposely done even if timeout 
is negative or if that's a mistake, but that code has been there long 
before we modified the function to return the remaining timeout and 
never seems to have caused issues, so I decided not to change it.



Yes, we need clarify if we really need the final flush if the timeout is 
negative.


But this patch  is Acked-by: Nirmoy Das 

Nirmoy



Daniele




Nirmoy


    return active_count ? timeout : 0;
  }




Re: [Intel-gfx] [PATCH] drm/i915: fix remaining_timeout in intel_gt_retire_requests_timeout

2022-03-25 Thread Das, Nirmoy



On 3/25/2022 6:58 PM, Daniele Ceraolo Spurio wrote:

In intel_gt_wait_for_idle, we use the remaining timeout returned from
intel_gt_retire_requests_timeout to wait on the GuC being idle. However,
the returned variable can have a negative value if something goes wrong
during the wait, leading to us hitting a GEM_BUG_ON in the GuC wait
function.
To fix this, make sure to only return the timeout if it is positive.

Fixes: b97060a99b01b ("drm/i915/guc: Update intel_gt_wait_for_idle to work with 
GuC")
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Matthew Brost 
Cc: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..ef70c209976d8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -197,7 +197,7 @@ out_active: spin_lock(>lock);
active_count++;
  
  	if (remaining_timeout)

-   *remaining_timeout = timeout;
+   *remaining_timeout = timeout > 0 ? timeout : 0;



Should the last flush_submission() be  "if ( timeout > 0 
&_submission(gt, timeout))" ?



Nirmoy

  
  	return active_count ? timeout : 0;

  }


Re: [Intel-gfx] [PATCH 1/2] drm/i915/ttm: limit where we apply TTM_PL_FLAG_CONTIGUOUS

2022-03-25 Thread Das, Nirmoy



On 3/25/2022 11:03 AM, Das, Nirmoy wrote:

Reviewed-by: Nirmoy Das 
Sorry, I meant this r-b for the  2nd patch and for this one Acked-by: 
Nirmoy Das 


Re: [PATCH 1/2] drm/i915/ttm: limit where we apply TTM_PL_FLAG_CONTIGUOUS

2022-03-25 Thread Das, Nirmoy



On 3/25/2022 8:16 AM, Thomas Hellström wrote:


On 3/24/22 18:21, Matthew Auld wrote:

We only need this when allocating device local-memory, where this
influences the drm_buddy. Currently there is some funny behaviour where
an "in limbo" system memory object is lacking the relevant placement
flags etc. before we first allocate the ttm_tt, leading to ttm
performing a move when not needed, since the current placement is seen
as not compatible.

Suggested-by: Thomas Hellström 
Fixes: 2ed38cec5606 ("drm/i915: opportunistically apply 
ALLOC_CONTIGIOUS")

Signed-off-by: Matthew Auld 
Cc: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index e4a06fcf741a..97e648fa76bd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -133,6 +133,9 @@ i915_ttm_place_from_region(const struct 
intel_memory_region *mr,

  memset(place, 0, sizeof(*place));
  place->mem_type = intel_region_to_ttm_type(mr);
  +    if (mr->type == INTEL_MEMORY_SYSTEM)
+    return;
+


Reviewed-by: Thomas Hellström 



Reviewed-by: Nirmoy Das 





  if (flags & I915_BO_ALLOC_CONTIGUOUS)
  place->flags |= TTM_PL_FLAG_CONTIGUOUS;
  if (offset != I915_BO_INVALID_OFFSET) {


Re: [Intel-gfx] [PATCH] drm/i915/guc: Correctly free guc capture struct on error

2022-03-24 Thread Das, Nirmoy

Reviewed-by: Nirmoy Das 

On 3/24/2022 1:04 AM, Daniele Ceraolo Spurio wrote:

On error the "new" allocation is not freed, so add the required kfree.

Fixes: 247f8071d5893 ("drm/i915/guc: Pre-allocate output nodes for extraction")
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Alan Previn 
Cc: John Harrison 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index afdcbe63e9eb1..c4e25966d3e9f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -1040,6 +1040,7 @@ guc_capture_alloc_one_node(struct intel_guc *guc)
if (!new->reginfo[i].regs) {
while (i)
kfree(new->reginfo[--i].regs);
+   kfree(new);
return NULL;
}
}


Re: [Intel-gfx] [PATCH v4 6/8] drm/ttm: Add a parameter to add extra pages into ttm_tt

2022-03-21 Thread Das, Nirmoy
In the previous version I replied only to the mailing list email so 
probably my email slipped through.


Reviewed-by: Nirmoy Das  for patch 6-7

On 3/19/2022 9:42 PM, Ramalingam C wrote:

Add a parameter called "extra_pages" for ttm_tt_init, to indicate that
driver needs extra pages in ttm_tt.

v2:
   Used imperative wording [Thomas and Christian]

Signed-off-by: Ramalingam C 
cc: Christian Koenig 
cc: Hellstrom Thomas 
Reviewed-by: Thomas Hellstrom 
Reviewed-by: Christian Konig 
---
  drivers/gpu/drm/drm_gem_vram_helper.c  |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c|  2 +-
  drivers/gpu/drm/qxl/qxl_ttm.c  |  2 +-
  drivers/gpu/drm/ttm/ttm_agp_backend.c  |  2 +-
  drivers/gpu/drm/ttm/ttm_tt.c   | 12 +++-
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
  include/drm/ttm/ttm_tt.h   |  4 +++-
  7 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c
index dc7f938bfff2..123045b58fec 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct 
ttm_buffer_object *bo,
if (!tt)
return NULL;
  
-	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);

+   ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
if (ret < 0)
goto err_ttm_tt_init;
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index e4a06fcf741a..3b9f99c765c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -290,7 +290,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct 
ttm_buffer_object *bo,
i915_tt->is_shmem = true;
}
  
-	ret = ttm_tt_init(_tt->ttm, bo, page_flags, caching);

+   ret = ttm_tt_init(_tt->ttm, bo, page_flags, caching, 0);
if (ret)
goto err_free;
  
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c

index b2e33d5ba5d0..52156b54498f 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct 
ttm_buffer_object *bo,
ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
if (ttm == NULL)
return NULL;
-   if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
+   if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
kfree(ttm);
return NULL;
}
diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c 
b/drivers/gpu/drm/ttm/ttm_agp_backend.c
index 6ddc16f0fe2b..d27691f2e451 100644
--- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
+++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
@@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object 
*bo,
agp_be->mem = NULL;
agp_be->bridge = bridge;
  
-	if (ttm_tt_init(_be->ttm, bo, page_flags, ttm_write_combined)) {

+   if (ttm_tt_init(_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
kfree(agp_be);
return NULL;
}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index d234aab800a0..1a66d9fc589a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt 
*ttm)
  static void ttm_tt_init_fields(struct ttm_tt *ttm,
   struct ttm_buffer_object *bo,
   uint32_t page_flags,
-  enum ttm_caching caching)
+  enum ttm_caching caching,
+  unsigned long extra_pages)
  {
-   ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
+   ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + 
extra_pages;
ttm->caching = ttm_cached;
ttm->page_flags = page_flags;
ttm->dma_address = NULL;
@@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
  }
  
  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,

-   uint32_t page_flags, enum ttm_caching caching)
+   uint32_t page_flags, enum ttm_caching caching,
+   unsigned long extra_pages)
  {
-   ttm_tt_init_fields(ttm, bo, page_flags, caching);
+   ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
  
  	if (ttm_tt_alloc_page_directory(ttm)) {

pr_err("Failed allocating page table\n");
@@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct 
ttm_buffer_object *bo,
  {
int ret;
  
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);

+   ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
  
  	if (page_flags & TTM_TT_FLAG_EXTERNAL)

ret = ttm_sg_tt_alloc_page_directory(ttm);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 

Re: [Intel-gfx] [PATCH v2 0/7] drm/i915: Use the memcpy_from_wc function from drm

2022-03-21 Thread Das, Nirmoy

looks good to me overall but I would get others r-b.

Patches 1-3 Reviewed-by: Nirmoy Das 

Patches 4-7 Acked-by: Nirmoy Das 

On 03/03/2022 19:00, Balasubramani Vivekanandan wrote:

drm_memcpy_from_wc() performs fast copy from WC memory type using
non-temporal instructions. Now there are two similar implementations of
this function. One exists in drm_cache.c as drm_memcpy_from_wc() and
another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc().
drm_memcpy_from_wc() was the recent addition through the series
https://patchwork.freedesktop.org/patch/436276/?series=90681=6

The goal of this patch series is to change all users of
i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common
implementation in drm and eventually remove the copy from i915.

Another benefit of using memcpy functions from drm is that
drm_memcpy_from_wc() is available for non-x86 architectures.
i915_memcpy_from_wc() is implemented only for x86 and prevents building
i915 for ARM64.
drm_memcpy_from_wc() does fast copy using non-temporal instructions for
x86 and for other architectures makes use of memcpy() family of
functions as fallback.

Another major difference is unlike i915_memcpy_from_wc(),
drm_memcpy_from_wc() will not fail if the passed address argument is not
alignment to be used with non-temporal load instructions or if the
platform lacks support for those instructions (non-temporal load
instructions are provided through SSE4.1 instruction set extension).
Instead drm_memcpy_from_wc() continues with fallback functions to
complete the copy.
This relieves the caller from checking the return value of
i915_memcpy_from_wc() and explicitly using a fallback.

Follow up series will be created to remove the memcpy_from_wc functions
from i915 once the dependency is completely removed.

v2: Fixed missing check to find if the address is from system memory or
 io memory and use the right initialization function to construct the
 iosys_map structure (Review feedback from Lucas)

Cc: Jani Nikula
Cc: Lucas De Marchi  
Cc: David Airlie

Cc: Daniel Vetter
Cc: Chris Wilson  
Cc: Thomas Hellstr_m  
Cc: Joonas Lahtinen

Cc: Rodrigo Vivi
Cc: Tvrtko Ursulin
Cc: Nirmoy Das

Balasubramani Vivekanandan (7):
   drm: Relax alignment constraint for destination address
   drm: Add drm_memcpy_from_wc() variant which accepts destination
 address
   drm/i915: use the memcpy_from_wc call from the drm
   drm/i915/guc: use the memcpy_from_wc call from the drm
   drm/i915/selftests: use the memcpy_from_wc call from the drm
   drm/i915/gt: Avoid direct dereferencing of io memory
   drm/i915: Avoid dereferencing io mapped memory

  drivers/gpu/drm/drm_cache.c   | 98 +--
  drivers/gpu/drm/i915/gem/i915_gem_object.c|  6 +-
  drivers/gpu/drm/i915/gt/selftest_reset.c  | 21 ++--
  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c| 15 ++-
  drivers/gpu/drm/i915/i915_gpu_error.c | 45 +
  .../drm/i915/selftests/intel_memory_region.c  | 41 +---
  include/drm/drm_cache.h   |  3 +
  7 files changed, 174 insertions(+), 55 deletions(-)



Re: [PATCH v3 7/7] drm/i915: fixup the initial fb base on DGFX

2022-03-15 Thread Das, Nirmoy
|This seems more natural to me than the previous version. Acked-by: 
Nirmoy Das  |


Nirmoy

On 14/03/2022 12:28, Matthew Auld wrote:

On integrated it looks like the GGTT base should always 1:1 maps to
somewhere within DSM. On discrete the base seems to be pre-programmed with
a normal lmem address, and is not 1:1 mapped with the base address. On
such devices probe the lmem address directly from the PTE.

v2(Ville):
   - The base is actually the pre-programmed GGTT address, which is then
 meant to 1:1 map to somewhere inside dsm. In the case of dgpu the
 base looks to just be some offset within lmem, but this also happens
 to be the exact dsm start, on dg1. Therefore we should only need to
 fudge the physical address, before allocating from stolen.
   - Bail if it's not located in dsm.
v3:
   - Scratch that. There doesn't seem to be any relationship with the
 base and PTE address, on at least DG1. Let's instead just grab the
 lmem address from the PTE itself.

Signed-off-by: Matthew Auld
Cc: Thomas Hellström
Cc: Ville Syrjälä
Cc: Nirmoy Das
---
  .../drm/i915/display/intel_plane_initial.c| 50 ---
  1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c 
b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index f797fcef18fc..7979929bb632 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -47,17 +47,55 @@ static struct i915_vma *
  initial_plane_vma(struct drm_i915_private *i915,
  struct intel_initial_plane_config *plane_config)
  {
-   struct intel_memory_region *mem = i915->mm.stolen_region;
+   struct intel_memory_region *mem;
struct drm_i915_gem_object *obj;
struct i915_vma *vma;
+   resource_size_t phys_base;
u32 base, size;
u64 pinctl;
  
-	if (!mem || plane_config->size == 0)

+   if (plane_config->size == 0)
+   return NULL;
+
+   base = round_down(plane_config->base, I915_GTT_MIN_ALIGNMENT);
+   if (IS_DGFX(i915)) {
+   gen8_pte_t __iomem *gte = to_gt(i915)->ggtt->gsm;
+   gen8_pte_t pte;
+
+   gte += base / I915_GTT_PAGE_SIZE;
+
+   pte = ioread64(gte);
+   if (!(pte & GEN12_GGTT_PTE_LM)) {
+   drm_err(>drm,
+   "Initial plane programming missing PTE_LM 
bit\n");
+   return NULL;
+   }
+
+   phys_base = pte & I915_GTT_PAGE_MASK;
+   mem = i915->mm.regions[INTEL_REGION_LMEM];
+
+   /*
+* We don't currently expect this to ever be placed in the
+* stolen portion.
+*/
+   if (phys_base >= resource_size(>region)) {
+   drm_err(>drm,
+   "Initial plane programming using invalid range, 
phys_base=%pa\n",
+   _base);
+   return NULL;
+   }
+
+   drm_dbg(>drm,
+   "Using phys_base=%pa, based on initial plane 
programming\n",
+   _base);
+   } else {
+   phys_base = base;
+   mem = i915->mm.stolen_region;
+   }
+
+   if (!mem)
return NULL;
  
-	base = round_down(plane_config->base,

- I915_GTT_MIN_ALIGNMENT);
size = round_up(plane_config->base + plane_config->size,
mem->min_page_size);
size -= base;
@@ -68,11 +106,11 @@ initial_plane_vma(struct drm_i915_private *i915,
 * features.
 */
if (IS_ENABLED(CONFIG_FRAMEBUFFER_CONSOLE) &&
+   mem == i915->mm.stolen_region &&
size * 2 > i915->stolen_usable_size)
return NULL;
  
-	obj = i915_gem_object_create_region_at(i915->mm.stolen_region,

-  base, size, 0);
+   obj = i915_gem_object_create_region_at(mem, phys_base, size, 0);
if (IS_ERR(obj))
return NULL;
  

Re: [Intel-gfx] [PATCH v2 0/8] Some more bits for small BAR enabling

2022-03-11 Thread Das, Nirmoy

The series is Acked-by: Nirmoy Das 


On 10/03/2022 13:27, Matthew Auld wrote:

The leftover bits around dealing with stolen-local memory + small BAR, plus
some related fixes.

v2: some tweaks based on feedback from Ville



Re: [Intel-gfx] [PATCH] drm/i915/gtt: reduce overzealous alignment constraints for GGTT

2022-03-08 Thread Das, Nirmoy

|Acked-by: Nirmoy Das |

On 03/03/2022 11:02, Matthew Auld wrote:

Currently this will enforce both 2M alignment and padding for any LMEM
pages inserted into the GGTT. However, this was only meant to be applied
to the compact-pt layout with the ppGTT. For the GGTT we can reduce the
alignment and padding to 64K.

Bspec: 45015
Fixes: 87bd701ee268 ("drm/i915: enforce min GTT alignment for discrete cards")
Signed-off-by: Matthew Auld
Cc: Thomas Hellström
Cc: Robert Beckett
Cc: Ramalingam C
---
  drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 4bcdfcab3642..a5f5b2dda332 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -234,7 +234,8 @@ void i915_address_space_init(struct i915_address_space *vm, 
int subclass)
memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
 ARRAY_SIZE(vm->min_alignment));
  
-	if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) {

+   if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915) &&
+   subclass == VM_CLASS_PPGTT) {
vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
I915_GTT_PAGE_SIZE_2M;
} else if (HAS_64K_PAGES(vm->i915)) {

Re: [Intel-gfx] [PATCH 0/7] drm/i915: Use the memcpy_from_wc function from drm

2022-02-23 Thread Das, Nirmoy



On 23/02/2022 12:08, Balasubramani Vivekanandan wrote:

On 23.02.2022 10:02, Das, Nirmoy wrote:

On 22/02/2022 15:51, Balasubramani Vivekanandan wrote:

drm_memcpy_from_wc() performs fast copy from WC memory type using
non-temporal instructions. Now there are two similar implementations of
this function. One exists in drm_cache.c as drm_memcpy_from_wc() and
another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc().
drm_memcpy_from_wc() was the recent addition through the series
https://patchwork.freedesktop.org/patch/436276/?series=90681=6

The goal of this patch series is to change all users of
i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common
implementation in drm and eventually remove the copy from i915.

Another benefit of using memcpy functions from drm is that
drm_memcpy_from_wc() is available for non-x86 architectures.
i915_memcpy_from_wc() is implemented only for x86 and prevents building
i915 for ARM64.
drm_memcpy_from_wc() does fast copy using non-temporal instructions for
x86 and for other architectures makes use of memcpy() family of
functions as fallback.

Another major difference is unlike i915_memcpy_from_wc(),
drm_memcpy_from_wc() will not fail if the passed address argument is not
alignment to be used with non-temporal load instructions or if the
platform lacks support for those instructions (non-temporal load
instructions are provided through SSE4.1 instruction set extension).
Instead drm_memcpy_from_wc() continues with fallback functions to
complete the copy.
This relieves the caller from checking the return value of
i915_memcpy_from_wc() and explicitly using a fallback.

Follow up series will be created to remove the memcpy_from_wc functions
from i915 once the dependency is completely removed.

Overall the series looks good to me but I think you can add another patch to
remove

i915_memcpy_from_wc() as I don't see any other usages left after this series, 
may be I
am missing something?

I have changed all users of i915_memcpy_from_wc() to drm function. But
this is another function i915_unaligned_memcpy_from_wc() in
i915_memcpy.c which is blocking completely eliminating the i915_memcpy.c
file from i915.
This function accepts unaligned source address and does fast copy only
for the aligned region of memory and remaining part is copied using
memcpy function.
Either I can move i915_unaligned_memcpy_from_wc() also to drm but I am
concerned since it is more a platform specific handling, does it make
sense to keep it in drm.
Else I have retain to i915_unaligned_memcpy_from_wc() inside i915 and
refactor the function to use drm_memcpy_from_wc() instead of the
__memcpy_ntdqu().



I think for completeness it makes sense to remove i915_memcpy_from_wc() 
and its helper functions


in this series.  I don't think we can have 
i915_unaligned_memcpy_from_wc() if want i915 on ARM[0] so I think


you can remove usages of i915_unaligned_memcpy_from_wc() as well.


[0]IIUC  CI_BUG_ON() check in i915_unaligned_memcpy_from_wc() will 
raise  a build error on ARM



Regards,

Nirmoy



But before I could do more changes, I wanted feedback on the current
change. So I decided to go ahead with creating series for review.

Regards,
Bala


Regards,
Nirmoy


Cc: Jani Nikula 
Cc: Lucas De Marchi 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: Thomas Hellstr_m 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 

Balasubramani Vivekanandan (7):
drm: Relax alignment constraint for destination address
drm: Add drm_memcpy_from_wc() variant which accepts destination
  address
drm/i915: use the memcpy_from_wc call from the drm
drm/i915/guc: use the memcpy_from_wc call from the drm
drm/i915/selftests: use the memcpy_from_wc call from the drm
drm/i915/gt: Avoid direct dereferencing of io memory
drm/i915: Avoid dereferencing io mapped memory

   drivers/gpu/drm/drm_cache.c   | 98 +--
   drivers/gpu/drm/i915/gem/i915_gem_object.c|  8 +-
   drivers/gpu/drm/i915/gt/selftest_reset.c  | 21 ++--
   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c| 11 ++-
   drivers/gpu/drm/i915/i915_gpu_error.c | 45 +
   .../drm/i915/selftests/intel_memory_region.c  |  8 +-
   include/drm/drm_cache.h   |  3 +
   7 files changed, 148 insertions(+), 46 deletions(-)



  1   2   >