Re: [Intel-gfx] [PATCH 14/14] Doc/gpu/rfc/i915: i915 DG2 uAPI

2021-10-11 Thread Tang, CQ



> -Original Message-
> From: C, Ramalingam 
> Sent: Monday, October 11, 2021 9:12 AM
> To: dri-devel ; intel-gfx  g...@lists.freedesktop.org>
> Cc: Daniel Vetter ; Auld, Matthew
> ; Tang, CQ ; Hellstrom,
> Thomas ; C, Ramalingam
> ; Daniel Vetter 
> Subject: [PATCH 14/14] Doc/gpu/rfc/i915: i915 DG2 uAPI
> 
> Details of the new features getting added as part of DG2 enabling and their
> implicit impact on the uAPI.
> 
> Signed-off-by: Ramalingam C 
> cc: Daniel Vetter 
> cc: Matthew Auld 
> ---
>  Documentation/gpu/rfc/i915_dg2.rst | 47
> ++
>  Documentation/gpu/rfc/index.rst|  3 ++
>  2 files changed, 50 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
> 
> diff --git a/Documentation/gpu/rfc/i915_dg2.rst
> b/Documentation/gpu/rfc/i915_dg2.rst
> new file mode 100644
> index ..a83ca26cd758
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_dg2.rst
> @@ -0,0 +1,47 @@
> +
> +I915 DG2 RFC Section
> +
> +
> +Upstream plan
> +=
> +Plan to upstream the DG2 enabling is:
> +
> +* Merge basic HW enabling for DG2(Still without pciid)
> +* Merge the 64k support for lmem
> +* Merge the flat CCS enabling patches
> +* Add the pciid for DG2 and enable the DG2 in CI
> +
> +
> +64K page support for lmem
> +=
> +On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k
> is not supported anymore.
> +
> +DG2 hw dont support the 64k(lmem) and 4k(smem) pages in the same
> ppgtt
> +Page table. Refer the struct drm_i915_gem_create_ext for the implication
> of handling the 64k page size.
> +
> +.. kernel-doc:: include/uapi/drm/i915_drm.h
> +:functions: drm_i915_gem_create_ext
> +
> +
> +flat CCS support for lmem
> +=
> +Gen 12+ devices support 3D surfaces compression and compression
> +formats. This is accomplished by an additional compression control state
> (CCS) stored for each surface.

General introduction, OK.

> +
> +Gen 12 devices(TGL and DG1) stores compression state in a separate region
> of memory.
> +It is managed by userspace and has an associated set of userspace
> +managed page tables used by hardware for address translation.

I don't know the purpose of this paragraph, do we need to mention TGL/DG1? This 
is "Gen 12", not "Gen 12+" in first paragraph.

> +
> +In Gen 12.5 devices(XEXPSDV and DG2) Flat CCS is introduced to replace
> +the userspace managed AUX pagetable with the flat indexed region of
> +device memory for storing the compression state

Because this is DG2 document, do we need to mention XeHP SDV?


> +
> +GOP Driver steals a chunk of memory for the CCS surface corresponding
> +to the entire range of local memory. The memory required for the CCS of
> +the entire local memory is
> +1/256 of the main local memory. The Gop driver will also program a
> +secure register (XEHPSDV_FLAT_CCS_BASE_ADDR 0x4910) with this address
> value.

I think it is not necessary to say the CCS base register. This is internal 
detail.

> +
> +So the Total local memory available for driver allocation is Total lmem
> +size - CCS data size

Well, we need to minus the GTT, lmem stolen (DG2 only), and WOPCM.  Maybe just 
say, total local memory available is smaller because of other reserved regions.

> +
> +Flat CCS data needs to be cleared when a lmem object is allocated. And
> +CCS data can be copied in and out of CCS region through
> XY_CTRL_SURF_COPY_BLT.

OK.

--CQ

> diff --git a/Documentation/gpu/rfc/index.rst
> b/Documentation/gpu/rfc/index.rst index 91e93a705230..afb320ed4028
> 100644
> --- a/Documentation/gpu/rfc/index.rst
> +++ b/Documentation/gpu/rfc/index.rst
> @@ -20,6 +20,9 @@ host such documentation:
> 
>  i915_gem_lmem.rst
> 
> +.. toctree::
> +i915_dg2.rst
> +
>  .. toctree::
> 
>  i915_scheduler.rst
> --
> 2.20.1



Re: [Intel-gfx] [PATCH] drm/i915: Perform execbuffer object locking as a separate step

2021-06-17 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx  On Behalf Of
> Thomas Hellström
> Sent: Tuesday, June 15, 2021 4:36 AM
> To: intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> Cc: Thomas Hellström ; Auld, Matthew
> 
> Subject: [Intel-gfx] [PATCH] drm/i915: Perform execbuffer object locking as a
> separate step
> 
> To help avoid evicting already resident buffers from the batch we're
> processing, perform locking as a separate step.
> 
> Signed-off-by: Thomas Hellström 
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 25 --
> -
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 201fed19d120..394eb40c95b5 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -922,21 +922,38 @@ static int eb_lookup_vmas(struct i915_execbuffer
> *eb)
>   return err;
>  }
> 
> -static int eb_validate_vmas(struct i915_execbuffer *eb)
> +static int eb_lock_vmas(struct i915_execbuffer *eb)
>  {
>   unsigned int i;
>   int err;
> 
> - INIT_LIST_HEAD(>unbound);
> -
>   for (i = 0; i < eb->buffer_count; i++) {
> - struct drm_i915_gem_exec_object2 *entry = >exec[i];
>   struct eb_vma *ev = >vma[i];
>   struct i915_vma *vma = ev->vma;
> 
>   err = i915_gem_object_lock(vma->obj, >ww);
>   if (err)
>   return err;
> + }
> +
> + return 0;
> +}
> +
> +static int eb_validate_vmas(struct i915_execbuffer *eb) {
> + unsigned int i;
> + int err;
> +
> + INIT_LIST_HEAD(>unbound);
> +
> + err = eb_lock_vmas(eb);
> + if (err)
> + return err;
> +
> + for (i = 0; i < eb->buffer_count; i++) {
> + struct drm_i915_gem_exec_object2 *entry = >exec[i];
> + struct eb_vma *ev = >vma[i];
> + struct i915_vma *vma = ev->vma;
> 
>   err = eb_pin_vma(eb, entry, ev);
>   if (err == -EDEADLK)

Thomas, just checked eb_pin_vma(), it calls i915_vma_pin_ww(), if the object is 
already locked, under what condition these calls still return -EDEADLK?

--CQ

> --
> 2.31.1
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 14/17] drm/i915/pxp: User interface for Protected buffer

2021-05-25 Thread Tang, CQ



> -Original Message-
> From: Intel-gfx  On Behalf Of
> Daniele Ceraolo Spurio
> Sent: Monday, May 24, 2021 10:48 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Vetter, Daniel ; Huang Sean Z
> ; dri-de...@lists.freedesktop.org; Chris Wilson
> ; Kondapally Kalyan
> ; Bommu, Krishnaiah
> 
> Subject: [Intel-gfx] [PATCH v4 14/17] drm/i915/pxp: User interface for
> Protected buffer
> 
> From: Bommu Krishnaiah 
> 
> This api allow user mode to create Protected buffers. Only contexts marked
> as protected are allowed to operate on protected buffers.
> 
> We only allow setting the flags at creation time.
> 
> All protected objects that have backing storage will be considered invalid
> when the session is destroyed and they won't be usable anymore.

Then these protected objects will be hanging in the system till user call 
gem_close() to free them?
If the objects won't be usable anymore, why don't we automatically free these 
objects when the session is destroyed?

How is a session started/destroyed?  From the code, intel_pxp_init() is called 
when loading i915 driver, so I think session lifetime is the same as i915 
driver lifetime.
Can we start multiple sessions after loading the driver?

--CQ

> 
> Given that the PXP HW supports multiple modes (but we currently only care
> about one), a flag variable has been reserved in the structure used in the
> create_ext ioctl for possible future updates.
> 
> This is a rework of the original code by Bommu Krishnaiah. I've kept
> authorship unchanged since significant chunks have not been modified.
> 
> v2: split context changes, fix defines and improve documentation (Chris),
> add object invalidation logic
> v3: fix spinlock definition and usage, only validate objects when
> they're first added to a context lut, only remove them once (Chris),
> make protected context flag not mandatory in protected object execbuf
> to avoid abuse (Lionel)
> v4: rebase to new gem_create_ext
> 
> Signed-off-by: Bommu Krishnaiah 
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Telukuntla Sreedhar 
> Cc: Kondapally Kalyan 
> Cc: Gupta Anshuman 
> Cc: Huang Sean Z 
> Cc: Chris Wilson 
> Cc: Lionel Landwerlin 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_create.c| 26 
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 15 +++
>  drivers/gpu/drm/i915/gem/i915_gem_object.c|  6 +++
>  drivers/gpu/drm/i915/gem/i915_gem_object.h| 12 ++
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 13 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp.c  | 41 +++
>  drivers/gpu/drm/i915/pxp/intel_pxp.h  | 13 ++
>  drivers/gpu/drm/i915/pxp/intel_pxp_types.h|  5 +++
>  include/uapi/drm/i915_drm.h   | 33 ++-
>  9 files changed, 163 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> index 548ddf39d853..c14be3882c35 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> @@ -6,6 +6,7 @@
>  #include "gem/i915_gem_ioctls.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "gem/i915_gem_region.h"
> +#include "pxp/intel_pxp.h"
> 
>  #include "i915_drv.h"
>  #include "i915_trace.h"
> @@ -99,7 +100,11 @@ i915_gem_setup(struct drm_i915_gem_object *obj,
> u64 size)
> 
>   GEM_BUG_ON(size != obj->base.size);
> 
> + if (obj->user_flags & I915_GEM_OBJECT_PROTECTED)
> + intel_pxp_object_add(obj);
> +
>   trace_i915_gem_object_create(obj);
> +
>   return 0;
>  }
> 
> @@ -344,8 +349,29 @@ static int ext_set_placements(struct
> i915_user_extension __user *base,
>   return set_placements(, data);
>  }
> 
> +static int ext_set_protected(struct i915_user_extension __user *base,
> +void *data) {
> + struct drm_i915_gem_create_ext_protected_content ext;
> + struct create_ext *ext_data = data;
> +
> + if (copy_from_user(, base, sizeof(ext)))
> + return -EFAULT;
> +
> + if (ext.flags)
> + return -EINVAL;
> +
> + if (!intel_pxp_is_enabled(_data->i915->gt.pxp))
> + return -ENODEV;
> +
> + ext_data->vanilla_object->user_flags |=
> I915_GEM_OBJECT_PROTECTED;
> +
> + return 0;
> +}
> +
> +
>  static const i915_user_extension_fn create_extensions[] = {
>   [I915_GEM_CREATE_EXT_MEMORY_REGIONS] =
> ext_set_placements,
> + [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] =
> ext_set_protected,
>  };
> 
>  /**
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index c08e28847064..5dd813d04a9f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -839,6 +839,21 @@ static struct i915_vma *eb_lookup_vma(struct
> i915_execbuffer *eb, u32 handle)
>   if (unlikely(!obj))
>   return ERR_PTR(-ENOENT);
> 
> +  

Re: [Intel-gfx] [PATCH v2 4/7] drm/i915/gtt/dgfx: place the PD in LMEM

2021-04-27 Thread Tang, CQ



> -Original Message-
> From: Intel-gfx  On Behalf Of
> Matthew Auld
> Sent: Tuesday, April 27, 2021 1:54 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH v2 4/7] drm/i915/gtt/dgfx: place the PD in LMEM
> 
> It's a requirement that for dgfx we place all the paging structures in device
> local-memory.
> 
> v2: use i915_coherent_map_type()
> v3: improve the shared dma-resv object comment
> 
> Signed-off-by: Matthew Auld 
> Cc: Tvrtko Ursulin 
> ---
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c |  5 -
> drivers/gpu/drm/i915/gt/intel_gtt.c  | 30 +---
> drivers/gpu/drm/i915/gt/intel_gtt.h  |  1 +
>  3 files changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index f83496836f0f..11fb5df45a0f 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -712,7 +712,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt
> *gt)
>*/
>   ppgtt->vm.has_read_only = !IS_GEN_RANGE(gt->i915, 11, 12);
> 
> - ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
> + if (HAS_LMEM(gt->i915))
> + ppgtt->vm.alloc_pt_dma = alloc_pt_lmem;

Here we might want to allocate lmem from the 'gt' in the argument,  however, 
below inside alloc_pt_lmem(), we always allocate lmem to tile0.
Is this desired?

--CQ

> + else
> + ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
> 
>   err = gen8_init_scratch(>vm);
>   if (err)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index d386b89e2758..061c39d2ad51 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -7,10 +7,26 @@
> 
>  #include 
> 
> +#include "gem/i915_gem_lmem.h"
>  #include "i915_trace.h"
>  #include "intel_gt.h"
>  #include "intel_gtt.h"
> 
> +struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space
> +*vm, int sz) {
> + struct drm_i915_gem_object *obj;
> +
> + obj = i915_gem_object_create_lmem(vm->i915, sz, 0);
> + /*
> +  * Ensure all paging structures for this vm share the same dma-resv
> +  * object underneath, with the idea that one object_lock() will lock
> +  * them all at once.
> +  */
> + if (!IS_ERR(obj))
> + obj->base.resv = >resv;
> + return obj;
> +}
> +
>  struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm,
> int sz)  {
>   struct drm_i915_gem_object *obj;
> @@ -19,7 +35,11 @@ struct drm_i915_gem_object *alloc_pt_dma(struct
> i915_address_space *vm, int sz)
>   i915_gem_shrink_all(vm->i915);
> 
>   obj = i915_gem_object_create_internal(vm->i915, sz);
> - /* ensure all dma objects have the same reservation class */
> + /*
> +  * Ensure all paging structures for this vm share the same dma-resv
> +  * object underneath, with the idea that one object_lock() will lock
> +  * them all at once.
> +  */
>   if (!IS_ERR(obj))
>   obj->base.resv = >resv;
>   return obj;
> @@ -27,9 +47,11 @@ struct drm_i915_gem_object *alloc_pt_dma(struct
> i915_address_space *vm, int sz)
> 
>  int map_pt_dma(struct i915_address_space *vm, struct
> drm_i915_gem_object *obj)  {
> + enum i915_map_type type;
>   void *vaddr;
> 
> - vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
> + type = i915_coherent_map_type(vm->i915, obj, true);
> + vaddr = i915_gem_object_pin_map_unlocked(obj, type);
>   if (IS_ERR(vaddr))
>   return PTR_ERR(vaddr);
> 
> @@ -39,9 +61,11 @@ int map_pt_dma(struct i915_address_space *vm,
> struct drm_i915_gem_object *obj)
> 
>  int map_pt_dma_locked(struct i915_address_space *vm, struct
> drm_i915_gem_object *obj)  {
> + enum i915_map_type type;
>   void *vaddr;
> 
> - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
> + type = i915_coherent_map_type(vm->i915, obj, true);
> + vaddr = i915_gem_object_pin_map(obj, type);
>   if (IS_ERR(vaddr))
>   return PTR_ERR(vaddr);
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 40e486704558..44ce27c51631 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -527,6 +527,7 @@ int setup_scratch_page(struct i915_address_space
> *vm);  void free_scratch(struct i915_address_space *vm);
> 
>  struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm,
> int sz);
> +struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space
> +*vm, int sz);
>  struct i915_page_table *alloc_pt(struct i915_address_space *vm);  struct
> i915_page_directory *alloc_pd(struct i915_address_space *vm);  struct
> i915_page_directory *__alloc_pd(int npde);
> --
> 2.26.3
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> 

Re: [Intel-gfx] [PATCH v5] drm/i915/gt: Ratelimit heartbeat completion probing

2021-02-11 Thread Tang, CQ
Do you have the patch that can apply to DII?

--CQ

> -Original Message-
> From: Chris Wilson 
> Sent: Thursday, February 11, 2021 3:14 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson ; Tang, CQ 
> Subject: [PATCH v5] drm/i915/gt: Ratelimit heartbeat completion probing
> 
> The heartbeat runs through a few phases that we expect to complete within
> a certain number of heartbeat intervals. First we must submit the heartbeat
> to the queue, and if the queue is occupied it may take a couple of intervals
> before the heartbeat preempts the workload and is submitted to HW. Once
> running on HW, completion is not instantaneous as it may have to first reset
> the current workload before it itself runs through the empty request and
> signals completion. As such, we know that the heartbeat must take at least
> the preempt reset timeout and before we have had a chance to reset the
> engine, we do not want to issue a global reset ourselves (simply so that we
> only try to do one reset at a time and not confuse ourselves by resetting
> twice and hitting an innocent.)
> 
> So by taking into consideration that once running the request must take a
> finite amount of time, we can delay the final completion check to
> accommodate that and avoid checking too early (before we've had a chance
> to handle any engine resets required).
> 
> v3: Now with more tracking for selftests and detection of false/unexpected
> hang reports.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853
> Suggested-by: CQ Tang 
> Signed-off-by: Chris Wilson 
> ---
>  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 92 +++---
> drivers/gpu/drm/i915/gt/intel_engine_types.h  |  7 ++
>  .../drm/i915/gt/selftest_engine_heartbeat.c   | 93 ---
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  5 +-
>  4 files changed, 148 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index 0b062fad1837..e14f9eab5779 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -20,6 +20,18 @@
>   * issue a reset -- in the hope that restores progress.
>   */
> 
> +#define HEARTBEAT_COMPLETION 50u /* milliseconds */
> +
> +static long completion_timeout(const struct intel_engine_cs *engine) {
> + long timeout = HEARTBEAT_COMPLETION;
> +
> + if (intel_engine_has_preempt_reset(engine))
> + timeout += READ_ONCE(engine-
> >props.preempt_timeout_ms);
> +
> + return msecs_to_jiffies(timeout);
> +}
> +
>  static bool next_heartbeat(struct intel_engine_cs *engine)  {
>   long delay;
> @@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs
> *engine)
>   return false;
> 
>   delay = msecs_to_jiffies_timeout(delay);
> +
> + /*
> +  * Once we submit a heartbeat to the HW, we know that it will take
> +  * at least a certain amount of time to complete. On a hanging system
> +  * it will first have to wait for the preempt reset timeout, and
> +  * then it will take some time for the reset to resume with the
> +  * heartbeat and for it to complete. So once we have submitted the
> +  * heartbeat to HW, we can wait a while longer before declaring the
> +  * engine stuck and forcing a reset ourselves. If we do a reset
> +  * and the engine is also doing a reset, it is possible that we
> +  * reset the engine twice, harming an innocent.
> +  *
> +  * Before we have sumitted the heartbeat, we do not want to
> change
> +  * the interval as we want to promote the heartbeat and trigger
> +  * preemption in a deterministic time frame.
> +  */
> + if (engine->heartbeat.systole &&
> + i915_request_is_active(engine->heartbeat.systole))
> + delay = max(delay, completion_timeout(engine));
> +
>   if (delay >= HZ)
>   delay = round_jiffies_up_relative(delay);
>   mod_delayed_work(system_highpri_wq, >heartbeat.work,
> delay + 1); @@ -48,12 +80,39 @@ heartbeat_create(struct intel_context *ce,
> gfp_t gfp)
>   return rq;
>  }
> 
> +static void
> +untrack_heartbeat(struct intel_engine_cs *engine) {
> + struct i915_request *rq;
> +
> + rq = engine->heartbeat.systole;
> + if (!rq)
> + return;
> +
> + ENGINE_TRACE(engine, "heartbeat " RQ_FMT " completed\n",
> RQ_ARG(rq));
> + I915_SELFTEST_ONLY(engine->heartbeat.completed++);
> +
> + WRITE_ONCE(engine->heartbeat.systole, NULL);
> +

Re: [Intel-gfx] [PATCH] drm/i915/gt: Ratelimit heartbeat completion probing

2021-02-10 Thread Tang, CQ
Chris,
I applied this patch to DII.  I could not apply automatically, so I do it 
manually. The only problem is following in selftest code:
Dii code:
+err_reset:
intel_engine_set_heartbeat(engine, CONFIG_DRM_I915_HEARTBEAT_INTERVAL);
your code:
 +err_reset:
reset_heartbeat(engine);

Anyway, I run the patch on PVC simics with selftest gt_hearbeat, but I catch 
following error:
[   64.836061] i915: Running intel_heartbeat_live_selftests/live_heartbeat_fast
[   64.838307] heartbeat bcs0 heartbeat {seqno:1:5, prio:2147483646} not ticking
[   64.840308] heartbeatAwake? 3
[   64.841262] heartbeatBarriers?: no
[   64.842311] heartbeatLatency: 2260us
[   64.843422] heartbeatForcewake: 0 domains, 0 active
[   64.844857] heartbeatHeartbeat: 7 ms ago
[   64.846071] heartbeatReset count: -1 (global 0)
..

It detects GPU hang.

What is the problem? Do you have the patch for DII for me to try?

--CQ

> -Original Message-
> From: Chris Wilson 
> Sent: Friday, February 5, 2021 6:32 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson ; Tang, CQ 
> Subject: [PATCH] drm/i915/gt: Ratelimit heartbeat completion probing
> 
> The heartbeat runs through a few phases that we expect to complete within
> a certain number of heartbeat intervals. First we must submit the heartbeat
> to the queue, and if the queue is occupied it may take a couple of intervals
> before the heartbeat preempts the workload and is submitted to HW. Once
> running on HW, completion is not instantaneous as it may have to first reset
> the current workload before it itself runs through the empty request and
> signals completion. As such, we know that the heartbeat must take at least
> the preempt reset timeout and before we have had a chance to reset the
> engine, we do not want to issue a global reset ourselves (simply so that we
> only try to do one reset at a time and not confuse ourselves by resetting
> twice and hitting an innocent.)
> 
> So by taking into consideration that once running the request must take a
> finite amount of time, we can delay the final completion check to
> accommodate that and avoid checking too early (before we've had a chance
> to handle any engine resets required).
> 
> v2: Attach a callback to flush the work immediately upon the heartbeat
> completion and insert the delay before the next.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2853
> Suggested-by: CQ Tang 
> Signed-off-by: Chris Wilson 
> ---
>  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 93 +--
> drivers/gpu/drm/i915/gt/intel_engine_types.h  |  1 +
>  .../drm/i915/gt/selftest_engine_heartbeat.c   | 65 ++---
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  |  5 +-
>  4 files changed, 119 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index 93741a65924a..01d8a04f77b6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -20,6 +20,18 @@
>   * issue a reset -- in the hope that restores progress.
>   */
> 
> +#define HEARTBEAT_COMPLETION 50u /* milliseconds */
> +
> +static long completion_timeout(const struct intel_engine_cs *engine) {
> + long timeout = HEARTBEAT_COMPLETION;
> +
> + if (intel_engine_has_preempt_reset(engine))
> + timeout += READ_ONCE(engine-
> >props.preempt_timeout_ms);
> +
> + return msecs_to_jiffies(timeout);
> +}
> +
>  static bool next_heartbeat(struct intel_engine_cs *engine)  {
>   long delay;
> @@ -29,6 +41,26 @@ static bool next_heartbeat(struct intel_engine_cs
> *engine)
>   return false;
> 
>   delay = msecs_to_jiffies_timeout(delay);
> +
> + /*
> +  * Once we submit a heartbeat to the HW, we know that it will take
> +  * at least a certain amount of time to complete. On a hanging system
> +  * it will first have to wait for the preempt reset timeout, and
> +  * then it will take some time for the reset to resume with the
> +  * heartbeat and for it to complete. So once we have submitted the
> +  * heartbeat to HW, we can wait a while longer before declaring the
> +  * engine stuck and forcing a reset ourselves. If we do a reset
> +  * and the engine is also doing a reset, it is possible that we
> +  * reset the engine twice, harming an innocent.
> +  *
> +  * Before we have sumitted the heartbeat, we do not want to
> change
> +  * the interval as we to promote the heartbeat and trigger
> preemption
> +  * in a deterministic time frame.
> +  */
> + if (engine->heartbeat.systole &

Re: [Intel-gfx] [PATCH v3 3/3] drm/i915/gtt/dg1: add PTE_LM plumbing for GGTT

2021-02-03 Thread Tang, CQ



> -Original Message-
> From: Matthew Auld 
> Sent: Wednesday, February 3, 2021 9:03 AM
> To: Tang, CQ 
> Cc: Auld, Matthew ; intel-
> g...@lists.freedesktop.org; Chris Wilson 
> Subject: Re: [Intel-gfx] [PATCH v3 3/3] drm/i915/gtt/dg1: add PTE_LM
> plumbing for GGTT
> 
> On Wed, 3 Feb 2021 at 16:51, Tang, CQ  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Intel-gfx  On Behalf
> > > Of Matthew Auld
> > > Sent: Wednesday, February 3, 2021 7:24 AM
> > > To: intel-gfx@lists.freedesktop.org
> > > Cc: Chris Wilson 
> > > Subject: [Intel-gfx] [PATCH v3 3/3] drm/i915/gtt/dg1: add PTE_LM
> > > plumbing for GGTT
> > >
> > > For the PTEs we get an LM bit, to signal whether the page resides in
> > > SMEM or LMEM.
> > >
> > > Based on a patch from Michel Thierry.
> > >
> > > Signed-off-by: Matthew Auld 
> > > Cc: Joonas Lahtinen 
> > > Signed-off-by: Daniele Ceraolo Spurio
> > > 
> > > Reviewed-by: Chris Wilson 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_ggtt.c | 24 +++-
> > > drivers/gpu/drm/i915/gt/intel_gtt.h  |  4 +++-
> > >  2 files changed, 22 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > index fc399ac16eda..b0b8ded834f0 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > > @@ -10,6 +10,8 @@
> > >
> > >  #include 
> > >
> > > +#include "gem/i915_gem_lmem.h"
> > > +
> > >  #include "intel_gt.h"
> > >  #include "i915_drv.h"
> > >  #include "i915_scatterlist.h"
> > > @@ -189,7 +191,12 @@ static u64 gen8_ggtt_pte_encode(dma_addr_t
> addr,
> > >   enum i915_cache_level level,
> > >   u32 flags)  {
> > > - return addr | _PAGE_PRESENT;
> > > + gen8_pte_t pte = addr | _PAGE_PRESENT;
> > > +
> > > + if (flags & PTE_LM)
> > > + pte |= GEN12_GGTT_PTE_LM;
> > > +
> > > + return pte;
> > >  }
> > >
> > >  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte) @@
> > > -201,13
> > > +208,13 @@ static void gen8_ggtt_insert_page(struct
> > > +i915_address_space
> > > *vm,
> > > dma_addr_t addr,
> > > u64 offset,
> > > enum i915_cache_level level,
> > > -   u32 unused)
> > > +   u32 flags)
> > >  {
> > >   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> > >   gen8_pte_t __iomem *pte =
> > >   (gen8_pte_t __iomem *)ggtt->gsm + offset /
> > > I915_GTT_PAGE_SIZE;
> > >
> > > - gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, 0));
> > > + gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
> > >
> > >   ggtt->invalidate(ggtt);
> > >  }
> > > @@ -217,7 +224,7 @@ static void gen8_ggtt_insert_entries(struct
> > > i915_address_space *vm,
> > >enum i915_cache_level level,
> > >u32 flags)  {
> > > - const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, 0);
> > > + const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level,
> > > flags);
> > >   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> > >   gen8_pte_t __iomem *gte;
> > >   gen8_pte_t __iomem *end;
> > > @@ -459,6 +466,8 @@ static void ggtt_bind_vma(struct
> > > i915_address_space *vm,
> > >   pte_flags = 0;
> > >   if (i915_gem_object_is_readonly(obj))
> > >   pte_flags |= PTE_READ_ONLY;
> > > + if (i915_gem_object_is_lmem(obj))
> > > + pte_flags |= PTE_LM;
> > >
> > >   vm->insert_entries(vm, vma, cache_level, pte_flags);
> > >   vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; @@ -794,6 +803,7 @@
> > > static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
> > >   struct drm_i915_private *i915 = ggtt->vm.i915;
> > >   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> > >   phys_addr_t phys_

Re: [Intel-gfx] [PATCH v3 3/3] drm/i915/gtt/dg1: add PTE_LM plumbing for GGTT

2021-02-03 Thread Tang, CQ



> -Original Message-
> From: Intel-gfx  On Behalf Of
> Matthew Auld
> Sent: Wednesday, February 3, 2021 7:24 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson 
> Subject: [Intel-gfx] [PATCH v3 3/3] drm/i915/gtt/dg1: add PTE_LM plumbing
> for GGTT
> 
> For the PTEs we get an LM bit, to signal whether the page resides in SMEM or
> LMEM.
> 
> Based on a patch from Michel Thierry.
> 
> Signed-off-by: Matthew Auld 
> Cc: Joonas Lahtinen 
> Signed-off-by: Daniele Ceraolo Spurio 
> Reviewed-by: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 24 +++-
> drivers/gpu/drm/i915/gt/intel_gtt.h  |  4 +++-
>  2 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index fc399ac16eda..b0b8ded834f0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -10,6 +10,8 @@
> 
>  #include 
> 
> +#include "gem/i915_gem_lmem.h"
> +
>  #include "intel_gt.h"
>  #include "i915_drv.h"
>  #include "i915_scatterlist.h"
> @@ -189,7 +191,12 @@ static u64 gen8_ggtt_pte_encode(dma_addr_t addr,
>   enum i915_cache_level level,
>   u32 flags)
>  {
> - return addr | _PAGE_PRESENT;
> + gen8_pte_t pte = addr | _PAGE_PRESENT;
> +
> + if (flags & PTE_LM)
> + pte |= GEN12_GGTT_PTE_LM;
> +
> + return pte;
>  }
> 
>  static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte) @@ -201,13
> +208,13 @@ static void gen8_ggtt_insert_page(struct i915_address_space
> *vm,
> dma_addr_t addr,
> u64 offset,
> enum i915_cache_level level,
> -   u32 unused)
> +   u32 flags)
>  {
>   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   gen8_pte_t __iomem *pte =
>   (gen8_pte_t __iomem *)ggtt->gsm + offset /
> I915_GTT_PAGE_SIZE;
> 
> - gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, 0));
> + gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
> 
>   ggtt->invalidate(ggtt);
>  }
> @@ -217,7 +224,7 @@ static void gen8_ggtt_insert_entries(struct
> i915_address_space *vm,
>enum i915_cache_level level,
>u32 flags)
>  {
> - const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, 0);
> + const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level,
> flags);
>   struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
>   gen8_pte_t __iomem *gte;
>   gen8_pte_t __iomem *end;
> @@ -459,6 +466,8 @@ static void ggtt_bind_vma(struct i915_address_space
> *vm,
>   pte_flags = 0;
>   if (i915_gem_object_is_readonly(obj))
>   pte_flags |= PTE_READ_ONLY;
> + if (i915_gem_object_is_lmem(obj))
> + pte_flags |= PTE_LM;
> 
>   vm->insert_entries(vm, vma, cache_level, pte_flags);
>   vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; @@ -794,6 +803,7 @@
> static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
>   struct drm_i915_private *i915 = ggtt->vm.i915;
>   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
>   phys_addr_t phys_addr;
> + u32 pte_flags;
>   int ret;
> 
>   /* For Modern GENs the PTEs and register space are split in the BAR
> */ @@ -823,9 +833,13 @@ static int ggtt_probe_common(struct i915_ggtt
> *ggtt, u64 size)
>   return ret;
>   }
> 
> + pte_flags = 0;
> + if (i915_gem_object_is_lmem(ggtt->vm.scratch[0]))
> + pte_flags |= PTE_LM;
> +
>   ggtt->vm.scratch[0]->encode =
>   ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
> - I915_CACHE_NONE, 0);
> + I915_CACHE_NONE, pte_flags);
> 
>   return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 0eef625dd787..24b5808df16d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -85,7 +85,9 @@ typedef u64 gen8_pte_t;
>  #define BYT_PTE_SNOOPED_BY_CPU_CACHESREG_BIT(2)
>  #define BYT_PTE_WRITEABLEREG_BIT(1)
> 
> -#define GEN12_PPGTT_PTE_LM BIT_ULL(11)
> +#define GEN12_PPGTT_PTE_LM   BIT_ULL(11)
> +
> +#define GEN12_GGTT_PTE_LMBIT_ULL(1)

Where does the Bspec say bit-1 is for LMEM?

--CQ

> 
>  /*
>   * Cacheability Control is a 4-bit value. The low three bits are stored in 
> bits
> --
> 2.26.2
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced vma check and vma insert

2020-12-16 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Wednesday, December 16, 2020 12:44 PM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Cc: stable@ 
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced
> vma check and vma insert
> 
> Quoting Tang, CQ (2020-12-16 17:27:40)
> >
> >
> > > -Original Message-
> > > From: Chris Wilson 
> > > Sent: Wednesday, December 16, 2020 12:43 AM
> > > To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> > > Cc: stable@ 
> > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between
> > > misplaced vma check and vma insert
> > >
> > > Quoting Tang, CQ (2020-12-16 00:51:21)
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Chris Wilson 
> > > > > Sent: Tuesday, December 15, 2020 2:02 PM
> > > > > To: Tang, CQ ;
> > > > > intel-gfx@lists.freedesktop.org
> > > > > Cc: stable@ 
> > > > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between
> > > > > misplaced vma check and vma insert
> > > > >
> > > > > Quoting Tang, CQ (2020-12-15 21:50:53)
> > > > > >
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Chris Wilson 
> > > > > > > Sent: Tuesday, December 15, 2020 12:31 PM
> > > > > > > To: intel-gfx@lists.freedesktop.org
> > > > > > > Cc: Chris Wilson ; Tang, CQ
> > > > > > > ; sta...@vger.kernel.org
> > > > > > > Subject: [PATCH] drm/i915: Fix mismatch between misplaced
> > > > > > > vma check and vma insert
> > > > > > >
> > > > > > > When inserting a VMA, we restrict the placement to the low
> > > > > > > 4G unless the caller opts into using the full range. This
> > > > > > > was done to allow usersapce the opportunity to transition
> > > > > > > slowly from a 32b address space, and to avoid breaking
> > > > > > > inherent 32b assumptions of some
> > > > > commands.
> > > > > > >
> > > > > > > However, for insert we limited ourselves to 4G-4K, but on
> > > > > > > verification we allowed the full 4G. This causes some
> > > > > > > attempts to bind a new buffer to sporadically fail with
> > > > > > > -ENOSPC, but at other times be
> > > > > bound successfully.
> > > > > > >
> > > > > > > commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to
> > > 4GB
> > > > > > > - 1
> > > > > > > page") suggests that there is a genuine problem with
> > > > > > > stateless addressing that cannot utilize the last page in 4G
> > > > > > > and so we purposefully
> > > > > excluded it.
> > > > > > >
> > > > > > > Reported-by: CQ Tang 
> > > > > > > Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to
> > > > > > > 4GB
> > > > > > > - 1
> > > > > > > page")
> > > > > > > Signed-off-by: Chris Wilson 
> > > > > > > Cc: CQ Tang 
> > > > > > > Cc: sta...@vger.kernel.org
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > > index 193996144c84..2ff32daa50bd 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > > @@ -382,7 +382,7 @@ eb_vma_misplaced(const struct
> > > > > > > drm_i915_gem_exec_object2 *entry,
> > > > > > >   return true;
> > > > > > >
> > > > > > >   if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
> > > > > > > - (vma->node.start + vma->node.size - 1) >> 32)
>

Re: [Intel-gfx] [PATCH i-g-t] i915/gem_softpin: Check the last 32b page is excluded

2020-12-16 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Wednesday, December 16, 2020 12:53 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson ; Tang, CQ 
> Subject: [PATCH i-g-t] i915/gem_softpin: Check the last 32b page is excluded
> 
> In order to prevent issues with 32b stateless address, the last page under 4G
> is excluded for non-48b objects.
> 
> Signed-off-by: Chris Wilson 
> Cc: CQ Tang 
> ---
>  tests/i915/gem_softpin.c | 37
> +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/tests/i915/gem_softpin.c b/tests/i915/gem_softpin.c index
> a3e6dcac3..703beb77d 100644
> --- a/tests/i915/gem_softpin.c
> +++ b/tests/i915/gem_softpin.c
> @@ -156,6 +156,39 @@ static void test_zero(int i915)
>   gem_close(i915, object.handle);
>  }
> 
> +static void test_32b_last_page(int i915) {
> + uint64_t sz, gtt = gem_aperture_size(i915);
> + struct drm_i915_gem_exec_object2 object = {
> + .flags = EXEC_OBJECT_PINNED,
> + };
> + struct drm_i915_gem_execbuffer2 execbuf = {
> + .buffers_ptr = to_user_pointer(),
> + .buffer_count = 1,
> + };
> +
> + /*
> +  * The last page under 32b is excluded for !48b objects in order to
> +  * prevent issues with stateless addressing.
> +  */
> +
> + igt_require(gtt >= 1ull << 32);
> + object.handle = batch_create(i915, ),

Where is this batch_create() version?

--CQ

> +
> + object.offset = 1ull << 32;
> + object.offset -= sz;
> + igt_assert_f(__gem_execbuf(i915, ) == -EINVAL,
> +  "execbuf succeeded with object.offset=%llx
> + %"PRIx64"\n",
> +  object.offset, sz);
> +
> + object.offset -= 4096;
> + igt_assert_f(__gem_execbuf(i915, ) == 0,
> +  "execbuf failed with object.offset=%llx + %"PRIx64"\n",
> +  object.offset, sz);
> +
> + gem_close(i915, object.handle);
> +}
> +
>  static void test_softpin(int fd)
>  {
>   const uint32_t size = 1024 * 1024;
> @@ -622,6 +655,10 @@ igt_main
>   igt_require(gem_uses_full_ppgtt(fd));
>   test_zero(fd);
>   }
> + igt_subtest("32b-excludes-last-page") {
> + igt_require(gem_uses_full_ppgtt(fd));
> + test_32b_last_page(fd);
> + }
>   igt_subtest("softpin")
>   test_softpin(fd);
>   igt_subtest("overlap")
> --
> 2.29.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced vma check and vma insert

2020-12-16 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Wednesday, December 16, 2020 12:43 AM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Cc: stable@ 
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced
> vma check and vma insert
> 
> Quoting Tang, CQ (2020-12-16 00:51:21)
> >
> >
> > > -Original Message-
> > > From: Chris Wilson 
> > > Sent: Tuesday, December 15, 2020 2:02 PM
> > > To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> > > Cc: stable@ 
> > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between
> > > misplaced vma check and vma insert
> > >
> > > Quoting Tang, CQ (2020-12-15 21:50:53)
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Chris Wilson 
> > > > > Sent: Tuesday, December 15, 2020 12:31 PM
> > > > > To: intel-gfx@lists.freedesktop.org
> > > > > Cc: Chris Wilson ; Tang, CQ
> > > > > ; sta...@vger.kernel.org
> > > > > Subject: [PATCH] drm/i915: Fix mismatch between misplaced vma
> > > > > check and vma insert
> > > > >
> > > > > When inserting a VMA, we restrict the placement to the low 4G
> > > > > unless the caller opts into using the full range. This was done
> > > > > to allow usersapce the opportunity to transition slowly from a
> > > > > 32b address space, and to avoid breaking inherent 32b
> > > > > assumptions of some
> > > commands.
> > > > >
> > > > > However, for insert we limited ourselves to 4G-4K, but on
> > > > > verification we allowed the full 4G. This causes some attempts
> > > > > to bind a new buffer to sporadically fail with -ENOSPC, but at
> > > > > other times be
> > > bound successfully.
> > > > >
> > > > > commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to
> 4GB
> > > > > - 1
> > > > > page") suggests that there is a genuine problem with stateless
> > > > > addressing that cannot utilize the last page in 4G and so we
> > > > > purposefully
> > > excluded it.
> > > > >
> > > > > Reported-by: CQ Tang 
> > > > > Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB
> > > > > - 1
> > > > > page")
> > > > > Signed-off-by: Chris Wilson 
> > > > > Cc: CQ Tang 
> > > > > Cc: sta...@vger.kernel.org
> > > > > ---
> > > > >  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > index 193996144c84..2ff32daa50bd 100644
> > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > @@ -382,7 +382,7 @@ eb_vma_misplaced(const struct
> > > > > drm_i915_gem_exec_object2 *entry,
> > > > >   return true;
> > > > >
> > > > >   if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
> > > > > - (vma->node.start + vma->node.size - 1) >> 32)
> > > > > + (vma->node.start + vma->node.size + 4095) >> 32)
> > > >
> > > > Why 4095 not 4096?
> > >
> > > It's the nature of the test that we need an inclusive bound.
> > >
> > > Consider an object of size 4G - 4K, that is allowed to fit within our 32b 
> > > GTT.
> > >
> > >   4G - 4k + 4K = 4G == 1 << 32: => vma misplaced
> > >
> > >   4G - 4k + 4k - 1 = 4G -1 = 0x => vma ok
> >
> > How do we trigger this code?  I run gem_exec_params@larger-than-life-
> batch but did not see this code is executed.
> > Basically how do we triggre first attempt to pin the object in place.
> 
> It's the first pin tried, but the incoming execobj.offset must be available 
> and
> the object itself must be ready to be pinned. That's true for the current tree
> on all current gen.

For gem_exec_params@larger-than-life-batch subtest, I only see 
i915_vma_misplaced() be called when EXEC_OBJECT_SUPPORTS_48B_ADDRESS flags is 
specified, and the test passes.
I want to catch the bug before you fixed here. So a 4GB object should be OK, 
because before your fix, i915_vma_misplaced() returns false.
I did specify execobj.offset=0, but the driver code goes to i915_vma_insert() 
directly and return -ENOSPC.

How do I make gem_exec_params@larger-than-life-batch code to catch this bug?

--CQ


> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced vma check and vma insert

2020-12-15 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Tuesday, December 15, 2020 2:02 PM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Cc: stable@ 
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced
> vma check and vma insert
> 
> Quoting Tang, CQ (2020-12-15 21:50:53)
> >
> >
> > > -Original Message-
> > > From: Chris Wilson 
> > > Sent: Tuesday, December 15, 2020 12:31 PM
> > > To: intel-gfx@lists.freedesktop.org
> > > Cc: Chris Wilson ; Tang, CQ
> > > ; sta...@vger.kernel.org
> > > Subject: [PATCH] drm/i915: Fix mismatch between misplaced vma check
> > > and vma insert
> > >
> > > When inserting a VMA, we restrict the placement to the low 4G unless
> > > the caller opts into using the full range. This was done to allow
> > > usersapce the opportunity to transition slowly from a 32b address
> > > space, and to avoid breaking inherent 32b assumptions of some
> commands.
> > >
> > > However, for insert we limited ourselves to 4G-4K, but on
> > > verification we allowed the full 4G. This causes some attempts to
> > > bind a new buffer to sporadically fail with -ENOSPC, but at other times be
> bound successfully.
> > >
> > > commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
> > > page") suggests that there is a genuine problem with stateless
> > > addressing that cannot utilize the last page in 4G and so we purposefully
> excluded it.
> > >
> > > Reported-by: CQ Tang 
> > > Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
> > > page")
> > > Signed-off-by: Chris Wilson 
> > > Cc: CQ Tang 
> > > Cc: sta...@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > index 193996144c84..2ff32daa50bd 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > @@ -382,7 +382,7 @@ eb_vma_misplaced(const struct
> > > drm_i915_gem_exec_object2 *entry,
> > >   return true;
> > >
> > >   if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
> > > - (vma->node.start + vma->node.size - 1) >> 32)
> > > + (vma->node.start + vma->node.size + 4095) >> 32)
> >
> > Why 4095 not 4096?
> 
> It's the nature of the test that we need an inclusive bound.
> 
> Consider an object of size 4G - 4K, that is allowed to fit within our 32b GTT.
> 
>   4G - 4k + 4K = 4G == 1 << 32: => vma misplaced
> 
>   4G - 4k + 4k - 1 = 4G -1 = 0x => vma ok

How do we trigger this code?  I run gem_exec_params@larger-than-life-batch but 
did not see this code is executed.
Basically how do we triggre first attempt to pin the object in place.

--CQ

> 
> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced vma check and vma insert

2020-12-15 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Tuesday, December 15, 2020 2:02 PM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Cc: stable@ 
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced
> vma check and vma insert
> 
> Quoting Tang, CQ (2020-12-15 21:50:53)
> >
> >
> > > -Original Message-
> > > From: Chris Wilson 
> > > Sent: Tuesday, December 15, 2020 12:31 PM
> > > To: intel-gfx@lists.freedesktop.org
> > > Cc: Chris Wilson ; Tang, CQ
> > > ; sta...@vger.kernel.org
> > > Subject: [PATCH] drm/i915: Fix mismatch between misplaced vma check
> > > and vma insert
> > >
> > > When inserting a VMA, we restrict the placement to the low 4G unless
> > > the caller opts into using the full range. This was done to allow
> > > usersapce the opportunity to transition slowly from a 32b address
> > > space, and to avoid breaking inherent 32b assumptions of some
> commands.
> > >
> > > However, for insert we limited ourselves to 4G-4K, but on
> > > verification we allowed the full 4G. This causes some attempts to
> > > bind a new buffer to sporadically fail with -ENOSPC, but at other times be
> bound successfully.
> > >
> > > commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
> > > page") suggests that there is a genuine problem with stateless
> > > addressing that cannot utilize the last page in 4G and so we purposefully
> excluded it.
> > >
> > > Reported-by: CQ Tang 
> > > Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
> > > page")
> > > Signed-off-by: Chris Wilson 
> > > Cc: CQ Tang 
> > > Cc: sta...@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > index 193996144c84..2ff32daa50bd 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > @@ -382,7 +382,7 @@ eb_vma_misplaced(const struct
> > > drm_i915_gem_exec_object2 *entry,
> > >   return true;
> > >
> > >   if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
> > > - (vma->node.start + vma->node.size - 1) >> 32)
> > > + (vma->node.start + vma->node.size + 4095) >> 32)
> >
> > Why 4095 not 4096?
> 
> It's the nature of the test that we need an inclusive bound.
> 
> Consider an object of size 4G - 4K, that is allowed to fit within our 32b GTT.
> 
>   4G - 4k + 4K = 4G == 1 << 32: => vma misplaced
> 
>   4G - 4k + 4k - 1 = 4G -1 = 0x => vma ok

Yes, the checking in i915_vma_insert() is ">" not ">="
if (size > end) {
DRM_DEBUG("Attempting to bind an object larger than the 
aperture: request=%llu > %s aperture=%llu\n",
  size, flags & PIN_MAPPABLE ? "mappable" : "total",
  end);
return -ENOSPC;
}

Reviewed-by: CQ Tang 


> 
> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Fix mismatch between misplaced vma check and vma insert

2020-12-15 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Tuesday, December 15, 2020 12:31 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson ; Tang, CQ ;
> sta...@vger.kernel.org
> Subject: [PATCH] drm/i915: Fix mismatch between misplaced vma check and
> vma insert
> 
> When inserting a VMA, we restrict the placement to the low 4G unless the
> caller opts into using the full range. This was done to allow usersapce the
> opportunity to transition slowly from a 32b address space, and to avoid
> breaking inherent 32b assumptions of some commands.
> 
> However, for insert we limited ourselves to 4G-4K, but on verification we
> allowed the full 4G. This causes some attempts to bind a new buffer to
> sporadically fail with -ENOSPC, but at other times be bound successfully.
> 
> commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
> page") suggests that there is a genuine problem with stateless addressing
> that cannot utilize the last page in 4G and so we purposefully excluded it.
> 
> Reported-by: CQ Tang 
> Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
> page")
> Signed-off-by: Chris Wilson 
> Cc: CQ Tang 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 193996144c84..2ff32daa50bd 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -382,7 +382,7 @@ eb_vma_misplaced(const struct
> drm_i915_gem_exec_object2 *entry,
>   return true;
> 
>   if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
> - (vma->node.start + vma->node.size - 1) >> 32)
> + (vma->node.start + vma->node.size + 4095) >> 32)

Why 4095 not 4096?

--CQ

>   return true;
> 
>   if (flags & __EXEC_OBJECT_NEEDS_MAP &&
> --
> 2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 1/2] i915/gem_exec_params: Assert a 4G object does _not_ fit without 48b

2020-12-15 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Tuesday, December 15, 2020 1:07 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: igt-...@lists.freedesktop.org; Chris Wilson ;
> Tang, CQ 
> Subject: [PATCH i-g-t 1/2] i915/gem_exec_params: Assert a 4G object does
> _not_ fit without 48b
> 
> Without opting into 48B addressing, objects are strictly limited to being
> placed only the first (4G - 4K). This is to avoid an issue with stateless 32b
> addressing being unable to access the last 32b page.
> Assert that we do indeed fail to fit in a 4G object without setting the
> EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.
> 
> Reported-by: CQ Tang 
> References:: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB -
> 1 page")
> Signed-off-by: Chris Wilson 
> Cc: CQ Tang 
> ---
>  tests/i915/gem_exec_params.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/i915/gem_exec_params.c b/tests/i915/gem_exec_params.c
> index c405f4eb7..e679c512a 100644
> --- a/tests/i915/gem_exec_params.c
> +++ b/tests/i915/gem_exec_params.c
> @@ -340,7 +340,13 @@ static void test_larger_than_life_batch(int fd)
> for_each_engine(e, fd) {
>  /* Keep the batch_len implicit [0] */
>  execbuf.flags = eb_ring(e);
> -gem_execbuf(fd, );
> +
> +/* non-48b objects are limited to the low (4G - 4K) */
> +igt_assert_eq(__gem_execbuf(fd, ), -ENOSPC);
> +
> +exec.flags = EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
> +igt_assert_eq(__gem_execbuf(fd, ), 0);
> +exec.flags = 0;

It is good to test both cases.
Reviewed-by: CQ Tang 


> }
> 
> gem_sync(fd, exec.handle);
> --
> 2.29.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/gt: Onion unwind for scratch page allocation failure

2020-10-20 Thread Tang, CQ
Reviewed-by: CQ Tang 

--CQ

> -Original Message-
> From: Intel-gfx  On Behalf Of
> Chris Wilson
> Sent: Monday, October 19, 2020 1:35 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Auld, Matthew ; Chris Wilson  wilson.co.uk>
> Subject: [Intel-gfx] [PATCH] drm/i915/gt: Onion unwind for scratch page
> allocation failure
> 
> In switching to using objects for our ppGTT scratch pages, care was not taken
> to avoid trying to unref NULL objects on failure. And for gen6 ppGTT, it
> appears we forgot entirely to unwind after a partial allocation failure.
> 
> Fixes: 89351925a477 ("drm/i915/gt: Switch to object allocations for page
> directories")
> Signed-off-by: Chris Wilson 
> Cc: Matthew Auld 
> Cc: Joonas Lahtinen 
> ---
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 18 --
> drivers/gpu/drm/i915/gt/gen8_ppgtt.c |  3 ++-
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index c0d17f87b00f..680bd9442eb0 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -239,18 +239,24 @@ static int gen6_ppgtt_init_scratch(struct
> gen6_ppgtt *ppgtt)
>  I915_CACHE_NONE, PTE_READ_ONLY);
> 
>   vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> - if (IS_ERR(vm->scratch[1]))
> - return PTR_ERR(vm->scratch[1]);
> + if (IS_ERR(vm->scratch[1])) {
> + ret = PTR_ERR(vm->scratch[1]);
> + goto err_scratch0;
> + }
> 
>   ret = pin_pt_dma(vm, vm->scratch[1]);
> - if (ret) {
> - i915_gem_object_put(vm->scratch[1]);
> - return ret;
> - }
> + if (ret)
> + goto err_scratch1;
> 
>   fill32_px(vm->scratch[1], vm->scratch[0]->encode);
> 
>   return 0;
> +
> +err_scratch1:
> + i915_gem_object_put(vm->scratch[1]);
> +err_scratch0:
> + i915_gem_object_put(vm->scratch[0]);
> + return ret;
>  }
> 
>  static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt) diff --git
> a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index b236aa046f91..a37c968ef8f7 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -607,7 +607,8 @@ static int gen8_init_scratch(struct i915_address_space
> *vm)
>   return 0;
> 
>  free_scratch:
> - free_scratch(vm);
> + while (i--)
> + i915_gem_object_put(vm->scratch[i]);
>   return -ENOMEM;
>  }
> 
> --
> 2.20.1
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/gt: Onion unwind for scratch page allocation failure

2020-10-19 Thread Tang, CQ
I fixed this problem partially (not gen6 code) with patch:
http://intel-gfx-pw.fi.intel.com/series/5911/

Should I continue to fix on internal?

--CQ


> -Original Message-
> From: Intel-gfx  On Behalf Of
> Chris Wilson
> Sent: Monday, October 19, 2020 1:35 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Auld, Matthew ; Chris Wilson  wilson.co.uk>
> Subject: [Intel-gfx] [PATCH] drm/i915/gt: Onion unwind for scratch page
> allocation failure
> 
> In switching to using objects for our ppGTT scratch pages, care was not taken
> to avoid trying to unref NULL objects on failure. And for gen6 ppGTT, it
> appears we forgot entirely to unwind after a partial allocation failure.
> 
> Fixes: 89351925a477 ("drm/i915/gt: Switch to object allocations for page
> directories")
> Signed-off-by: Chris Wilson 
> Cc: Matthew Auld 
> Cc: Joonas Lahtinen 
> ---
>  drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 18 --
> drivers/gpu/drm/i915/gt/gen8_ppgtt.c |  3 ++-
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> index c0d17f87b00f..680bd9442eb0 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
> @@ -239,18 +239,24 @@ static int gen6_ppgtt_init_scratch(struct
> gen6_ppgtt *ppgtt)
>  I915_CACHE_NONE, PTE_READ_ONLY);
> 
>   vm->scratch[1] = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
> - if (IS_ERR(vm->scratch[1]))
> - return PTR_ERR(vm->scratch[1]);
> + if (IS_ERR(vm->scratch[1])) {
> + ret = PTR_ERR(vm->scratch[1]);
> + goto err_scratch0;
> + }
> 
>   ret = pin_pt_dma(vm, vm->scratch[1]);
> - if (ret) {
> - i915_gem_object_put(vm->scratch[1]);
> - return ret;
> - }
> + if (ret)
> + goto err_scratch1;
> 
>   fill32_px(vm->scratch[1], vm->scratch[0]->encode);
> 
>   return 0;
> +
> +err_scratch1:
> + i915_gem_object_put(vm->scratch[1]);
> +err_scratch0:
> + i915_gem_object_put(vm->scratch[0]);
> + return ret;
>  }
> 
>  static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt) diff --git
> a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index b236aa046f91..a37c968ef8f7 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -607,7 +607,8 @@ static int gen8_init_scratch(struct i915_address_space
> *vm)
>   return 0;
> 
>  free_scratch:
> - free_scratch(vm);
> + while (i--)
> + i915_gem_object_put(vm->scratch[i]);
>   return -ENOMEM;
>  }
> 
> --
> 2.20.1
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority

2020-10-15 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Thursday, October 15, 2020 1:33 PM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> > > We can also prune the free_list immediately, if we know we are
> > > outside of any critical section. (We do this before create ioctls,
> > > and I thought upon close(device), but I see that's just contexts.)
> > >
> > > > The worker launching time is delayed a lot, we call queue_work()
> > > > when we
> > > add the first object onto the empty 'free_list', but when the worker
> > > is launched, the 'free_list' has sometimes accumulated 1M objects.
> > > Maybe it is because of waiting currently running worker to finish?
> > >
> > > 1M is a lot more than is comfortable, and that's even with a
> > > high-priority worker.  The problem with objects being freed from any
> > > context is that we can't simply put a flush_work around there. (Not
> > > without ridding ourselves of a few mutexes at least.) We could try
> > > more than worker, but it's no more more effort to starve 2 cpus than it is
> to starve 1.
> > >
> > > No, with that much pressure the only option is to apply the
> > > backpressure at the point of allocation ala create_ioctl. i.e. find
> > > the hog, and look to see if there's a convenient spot before/after
> > > to call i915_gem_flush_free_objects(). Since you highlight the
> > > vma-stash as the likely culprit, and the free_pt_stash is unlikely
> > > to be inside any critical section, might as well try flushing from there 
> > > for
> starters.
> >
> > I have not yet tested, but I guess calling i915_gem_flush_free_objects()
> inside free_pt_stash() will solve the problem that gem_exec_gttfill has,
> because it will give some back pressure on the system traffic.
> 
> Still I'm slightly concerned that so many PD objects are being created; it's 
> not
> something that shows up in the smem ppgtt tests (or at least it's been
> dwarfed by other bottlenecks), and the set of vma (and so the
> PD) are meant to reach a steady state. You would need to be using a
> constant set of objects and recycling the vma, not to hit the create_ioctl 
> flush.
> However, it points back to the pressure point being around the vma bind.
> 
> > But this is only for the page table 4K lmem objects allocated/freed by vma-
> stash. We might encounter the same situation with user space allocated
> objects.
> 
> See gem_exec_create, it's mission is to cause memory starvation by creating
> as many new objects as it can and releasing them after a nop batch. That's
> why we have the freelist flush from create_ioctl.
> 
> Now I need to add a pass that tries to create as many vma from a few objects
> as is possible.
> 
> (And similarly why we try to free requests as they are created.)
> 
> One problem is that they will catch the client after the hog, not necessarily
> the hog themselves.
> 
> I'm optimistic we can make freeing the object atomic, even if that means
> pushing the pages onto some reclaim list. (Which is currently a really nasty
> drawback of the free worker, a trick lost with the removal of
> struct_mutex.)

On a DG1 system with 4GB lmem, and 7.7GB system memory, I can easily catch 
system OOM when running "gem_exec_gttfill --r all".

When I call i915_gem_flush_free_objects() inside i915_vm_free_pt_stash(), the 
test code passes without any problem.

With simple test 'gem_exec_gttfill --r all", I also concerned that why driver 
create/free so many vma-stash objects, when the system OOM happened, there are 
total 1.5M stash objects freed.

--CQ

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority

2020-10-15 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Thursday, October 15, 2020 8:07 AM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> 
> Quoting Tang, CQ (2020-10-14 00:29:13)
> > i915_gem_free_object() is called by multiple threads/processes, they all
> add objects onto the same free_list. The free_list processing worker thread
> becomes bottle-neck. I see that the worker is mostly a single thread (with
> particular thread ID), but sometimes multiple threads are launched to
> process the 'free_list' work concurrently. But the processing speed is still
> slower than the multiple process's feeding speed, and 'free_list' is holding
> more and more memory.
> 
> We can also prune the free_list immediately, if we know we are outside of
> any critical section. (We do this before create ioctls, and I thought upon
> close(device), but I see that's just contexts.)
> 
> > The worker launching time is delayed a lot, we call queue_work() when we
> add the first object onto the empty 'free_list', but when the worker is
> launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is
> because of waiting currently running worker to finish?
> 
> 1M is a lot more than is comfortable, and that's even with a high-priority
> worker.  The problem with objects being freed from any context is that we
> can't simply put a flush_work around there. (Not without ridding ourselves of
> a few mutexes at least.) We could try more than worker, but it's no more
> more effort to starve 2 cpus than it is to starve 1.
> 
> No, with that much pressure the only option is to apply the backpressure at
> the point of allocation ala create_ioctl. i.e. find the hog, and look to see 
> if
> there's a convenient spot before/after to call
> i915_gem_flush_free_objects(). Since you highlight the vma-stash as the
> likely culprit, and the free_pt_stash is unlikely to be inside any critical 
> section,
> might as well try flushing from there for starters.

I have not yet tested, but I guess calling i915_gem_flush_free_objects() inside 
free_pt_stash() will solve the problem that gem_exec_gttfill has, because it 
will give some back pressure on the system traffic.

But this is only for the page table 4K lmem objects allocated/freed by 
vma-stash. We might encounter the same situation with user space allocated 
objects.

--CQ

> 
> Hmm, actually we are tantalizing close to having dropped all mutexes (and
> similar global lock-like effects) from free_objects. That would be a nice
> victory.
> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority

2020-10-13 Thread Tang, CQ
i915_gem_free_object() is called by multiple threads/processes, they all add 
objects onto the same free_list. The free_list processing worker thread becomes 
bottle-neck. I see that the worker is mostly a single thread (with particular 
thread ID), but sometimes multiple threads are launched to process the 
'free_list' work concurrently. But the processing speed is still slower than 
the multiple process's feeding speed, and 'free_list' is holding more and more 
memory.

The worker launching time is delayed a lot, we call queue_work() when we add 
the first object onto the empty 'free_list', but when the worker is launched, 
the 'free_list' has sometimes accumulated 1M objects. Maybe it is because of 
waiting currently running worker to finish?

This happens with direct call to __i915_gem_free_object_rcu() and no 
cond_resched().

--CQ

> -Original Message-
> From: Intel-gfx  On Behalf Of
> Tang, CQ
> Sent: Tuesday, October 13, 2020 9:41 AM
> To: Chris Wilson ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> 
> 
> 
> > -Original Message-
> > From: Chris Wilson 
> > Sent: Tuesday, October 13, 2020 9:25 AM
> > To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim
> > workqueue high priority
> >
> > Quoting Tang, CQ (2020-10-13 17:19:27)
> > > Chris,
> > > I tested this patch. It is still not enough, I keep catch running out 
> > > of
> lmem.
> > Every worker invocation takes larger and larger freeing object count.
> > >
> >
> > Was that with the immediate call (not via call_rcu) to
> > __i915_gem_free_object_rcu?
> >
> > If this brings the freelist under control, the next item is judicious
> > use of cond_synchronize_rcu(). We just have to make sure we penalize
> > the right hog.
> >
> > Otherwise, we have to shotgun apply i915_gem_flush_free_objects() and
> > still find somewhere to put the rcu sync.
> 
> This is with call_rcu().
> 
> Then I removed cond_resched(), it does not help, and then I call
> __i915_gem_free_object_rcu() directly, still the same error, However, I
> noticed that sometimes 'queue_work()' return false, which means the work
> is already queued, how? The worker had been called so 'free_list' is empty:
> 
> [  117.381888] queue_work: 107967, 107930; 1 [  119.180230] queue_work:
> 125531, 125513; 1 [  121.349308] queue_work: 155017, 154996; 1 [  124.214885]
> queue_work: 193918, 193873; 1 [  127.967260] queue_work: 256838, 256776;
> 1 [  133.281045] queue_work: 345753, 345734; 1 [  141.457995] queue_work:
> 516943, 516859; 1 [  156.264420] queue_work: 863622, 863516; 1 [  156.322619]
> queue_work: 865849, 3163; 0 [  156.448551] queue_work: 865578, 7141; 0
> [  156.882985] queue_work: 866984, 24138; 0 [  157.952163] queue_work:
> 862902, 53365; 0 [  159.838412] queue_work: 842522, 95504; 0 [  174.321508]
> queue_work: 937179, 657323; 0
> 
> --CQ
> 
> > -Chris
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority

2020-10-13 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Tuesday, October 13, 2020 9:25 AM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> 
> Quoting Tang, CQ (2020-10-13 17:19:27)
> > Chris,
> > I tested this patch. It is still not enough, I keep catch running out 
> > of lmem.
> Every worker invocation takes larger and larger freeing object count.
> >
> 
> Was that with the immediate call (not via call_rcu) to
> __i915_gem_free_object_rcu?
> 
> If this brings the freelist under control, the next item is judicious use of
> cond_synchronize_rcu(). We just have to make sure we penalize the right
> hog.
> 
> Otherwise, we have to shotgun apply i915_gem_flush_free_objects() and
> still find somewhere to put the rcu sync.

This is with call_rcu().

Then I removed cond_resched(), it does not help, and then I call 
__i915_gem_free_object_rcu() directly, still the same error,
However, I noticed that sometimes 'queue_work()' return false, which means the 
work is already queued, how? The worker had been called so 'free_list' is empty:

[  117.381888] queue_work: 107967, 107930; 1
[  119.180230] queue_work: 125531, 125513; 1
[  121.349308] queue_work: 155017, 154996; 1
[  124.214885] queue_work: 193918, 193873; 1
[  127.967260] queue_work: 256838, 256776; 1
[  133.281045] queue_work: 345753, 345734; 1
[  141.457995] queue_work: 516943, 516859; 1
[  156.264420] queue_work: 863622, 863516; 1
[  156.322619] queue_work: 865849, 3163; 0
[  156.448551] queue_work: 865578, 7141; 0
[  156.882985] queue_work: 866984, 24138; 0
[  157.952163] queue_work: 862902, 53365; 0
[  159.838412] queue_work: 842522, 95504; 0
[  174.321508] queue_work: 937179, 657323; 0

--CQ

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority

2020-10-13 Thread Tang, CQ
Chris,
I tested this patch. It is still not enough, I keep catch running out of 
lmem.  Every worker invocation takes larger and larger freeing object count.

Here is my debugging code:

+static int obj_count = 0;
+
..
+   if (llist_add(>freed, >mm.free_list)) {
+   bool b;
+   b = queue_work(i915->wq, >mm.free_work);
+   pr_err("queue_work: %d, %d; %d\n", 
atomic_read(>mm.free_count), obj_count, b);
+   obj_count = 1;
+   } else {
+   obj_count++;
+   }

And here  is the output:

[  821.213680] queue_work: 108068, 105764; 1
[  823.309468] queue_work: 148843, 147948; 1
[  826.453132] queue_work: 22, 218123; 1
[  831.522506] queue_work: 334812, 333773; 1
[  840.040571] queue_work: 539650, 538922; 1
[  860.337644] queue_work: 960811, 1017158; 1

The second number, 'obj_count', is the objects taken by last worker invocation 
to free.

--CQ


> -Original Message-
> From: Chris Wilson 
> Sent: Tuesday, October 13, 2020 3:33 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson ; Tang, CQ 
> Subject: [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
> 
> Since removing dev->struct_mutex usage, we only use i915->wq for batch
> freeing of GEM objects and ppGTT, it is essential for memory reclaim. If we
> let the workqueue dawdle, we trap excess amounts of memory, so give it a
> priority boost. Although since we no longer depend on a singular mutex, we
> could run unbounded, but first lets try to keep some constraint upon the
> worker.
> 
> Signed-off-by: Chris Wilson 
> Cc: CQ Tang 
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 16 +++-
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c index 8bb7e2dcfaaa..8c9198f0d2ad
> 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -219,20 +219,10 @@ intel_teardown_mchbar(struct drm_i915_private
> *dev_priv)  static int i915_workqueues_init(struct drm_i915_private
> *dev_priv)  {
>   /*
> -  * The i915 workqueue is primarily used for batched retirement of
> -  * requests (and thus managing bo) once the task has been
> completed
> -  * by the GPU. i915_retire_requests() is called directly when we
> -  * need high-priority retirement, such as waiting for an explicit
> -  * bo.
> -  *
> -  * It is also used for periodic low-priority events, such as
> -  * idle-timers and recording error state.
> -  *
> -  * All tasks on the workqueue are expected to acquire the dev mutex
> -  * so there is no point in running more than one instance of the
> -  * workqueue at any time.  Use an ordered one.
> +  * The i915 workqueue is primarily used for batched freeing of
> +  * GEM objects and ppGTT, and is essential for memory reclaim.
>*/
> - dev_priv->wq = alloc_ordered_workqueue("i915", 0);
> + dev_priv->wq = alloc_ordered_workqueue("i915", WQ_HIGHPRI);
>   if (dev_priv->wq == NULL)
>   goto out_err;
> 
> --
> 2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 53/60] drm/i915: Create stolen memory region from local memory

2020-08-07 Thread Tang, CQ



> -Original Message-
> From: Joonas Lahtinen 
> Sent: Friday, August 7, 2020 2:39 AM
> To: Tang, CQ ; Dave Airlie 
> Cc: Auld, Matthew ; Intel Graphics Development
> ; Abdiel Janulgue
> ; Wilson, Chris P
> ; Balestrieri, Francesco
> ; Vishwanathapura, Niranjana
> ; Dhanalakota, Venkata S
> ; Neel Desai ;
> Brost, Matthew ; Dutt, Sudeep
> ; De Marchi, Lucas 
> Subject: Re: [RFC 53/60] drm/i915: Create stolen memory region from local
> memory
> 
> Quoting Dave Airlie (2020-07-14 22:26:16)
> > On Wed, 15 Jul 2020 at 02:57, Tang, CQ  wrote:
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Auld, Matthew 
> > > > Sent: Tuesday, July 14, 2020 8:02 AM
> > > > To: Dave Airlie 
> > > > Cc: Intel Graphics Development ;
> > > > Tang, CQ ; Joonas Lahtinen
> > > > ;
> > > > Abdiel Janulgue ; Wilson, Chris P
> > > > ; Balestrieri, Francesco
> > > > ; Vishwanathapura, Niranjana
> > > > ; Dhanalakota, Venkata S
> > > > ; Neel Desai
> > > > ; Brost, Matthew ;
> > > > Dutt, Sudeep ; De Marchi, Lucas
> > > > 
> > > > Subject: Re: [RFC 53/60] drm/i915: Create stolen memory region
> > > > from local memory
> > > >
> > > > On 13/07/2020 05:48, Dave Airlie wrote:
> > > > > On Fri, 10 Jul 2020 at 22:01, Matthew Auld
> > > > > 
> > > > wrote:
> > > > >>
> > > > >> From: CQ Tang 
> > > > >>
> > > > >> Add "REGION_STOLEN" device info to dg1, create stolen memory
> > > > >> region from upper portion of local device memory, starting from
> DSMBASE.
> > > > >>
> > > > >> The memory region is marked with "is_devmem=true".
> > > > >
> > > > > So is stolen fake on LMEM devices? The concept of stolen doesn't
> > > > > seem to make much sense with VRAM, so please enlighten me.
> > > >
> > > > CQ, do we actually need an explicit stolen LMEM region? The idea
> > > > of having a DSM like stolen region for LMEM does sound
> > > > strange(outside of the usual reserved portions which are for HW
> > > > use etc), since the driver has complete control over LMEM. Is it
> > > > just a convenience thing to keep things working as-is for fbc, initial 
> > > > fb,
> etc. or is there more to it?
> > > > There is buddy_alloc_range() for LMEM which we could potentially
> > > > use to wrap an object around for things like the initial fb or similar.
> > >
> > > This is a natural extension from IGT stolen memory region into DGT, we
> want to allocate objects from stolen area. In DGT, we have one stolen area
> per tile so we create one region in each of these area. Using memory region
> is easier to manage objects allocation and free. Other than fbc and rc6, we
> have gt/ring allocate stolen memory objects when without LMEM, so only
> apply to IGT case:
> > >
> > > display/intel_display.c:obj =
> i915_gem_object_create_stolen_for_preallocated(dev_priv,
> > > display/intel_fbdev.c:  obj =
> i915_gem_object_create_stolen(dev_priv, size);
> > > display/intel_overlay.c:obj = i915_gem_object_create_stolen(i915,
> PAGE_SIZE);
> > > intel_rc6.c:pctx =
> i915_gem_object_create_stolen_for_preallocated(i915,
> > > intel_rc6.c:pctx = i915_gem_object_create_stolen(i915, pctx_size);
> > >
> > > intel_ring.c:   obj = 
> > > intel_gt_object_create_stolen(ggtt->vm.gt,
> size);
> > > intel_gt.c: obj = intel_gt_object_create_stolen(gt, size);
> > >
> > > For some reason, we don't use buddy allocator to manage the stolen
> memory, instead, we use drm_mm_node allocator directly, we have one-to-
> one mapping between drm_mm address space to dma address of the stolen
> memory. We also use contiguous allocation where an object always get a
> single contiguous block of pages.
> > >
> > > So fundamentally, we want to use the same code to work on both IGT
> stolen memory and DGT stolen memory.
> >
> > If this is fundamentally a software construct then it's horrible, if
> > the HW has a stolen base like Ville said, and it needs to be in a
> > chunk of VRAM, how do you go about sizing that, and carving it out
> > from the user?
> >
> > I don't think wanting to share the same codepaths here

Re: [Intel-gfx] [PATCH 1/3] drm: Restore driver.preclose() for all to use

2020-07-29 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Tuesday, July 28, 2020 9:28 AM
> To: Daniel Vetter ; Dave Airlie 
> Cc: intel-gfx ; stable
> ; Gustavo Padovan
> ; Tang, CQ ; dri-
> devel ; Vetter, Daniel
> 
> Subject: Re: [PATCH 1/3] drm: Restore driver.preclose() for all to use
> 
> Quoting Daniel Vetter (2020-07-27 20:32:45)
> > On Thu, Jul 23, 2020 at 7:21 PM Chris Wilson 
> wrote:
> > >
> > > An unfortunate sequence of events, but it turns out there is a valid
> > > usecase for being able to free/decouple the driver objects before
> > > they are freed by the DRM core. In particular, if we have a pointer
> > > into a drm core object from inside a driver object, that pointer
> > > needs to be nerfed *before* it is freed so that concurrent access
> > > (e.g. debugfs) does not following the dangling pointer.
> > >
> > > The legacy marker was adding in the code movement from drp_fops.c to
> > > drm_file.c
> >
> > I might fumble a lot, but not this one:
> >
> > commit 45c3d213a400c952ab7119f394c5293bb6877e6b
> > Author: Daniel Vetter 
> > Date:   Mon May 8 10:26:33 2017 +0200
> >
> > drm: Nerf the preclose callback for modern drivers
> 
> Gah, when I going through the history it looked like it appeared out of
> nowhere.
> 
> > Also looking at the debugfs hook that has some rather adventurous
> > stuff going on I think, feels a bit like a kitchensink with batteries
> > included. If that's really all needed I'd say iterate the contexts by
> > first going over files, then the ctx (which arent shared anyway) and
> > the problem should also be gone.
> 
> Or we could cut out the middlelayer and put the release under the driver
> control with a call to the drm_release() when the driver is ready.

Chiris, can explain this idea, or post a patch ?

--CQ

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm: Restore driver.preclose() for all to use

2020-07-27 Thread Tang, CQ



> -Original Message-
> From: Daniel Vetter 
> Sent: Monday, July 27, 2020 12:33 PM
> To: Chris Wilson ; Dave Airlie 
> Cc: intel-gfx ; stable
> ; Gustavo Padovan
> ; Tang, CQ ; dri-
> devel ; Vetter, Daniel
> 
> Subject: Re: [PATCH 1/3] drm: Restore driver.preclose() for all to use
> 
> On Thu, Jul 23, 2020 at 7:21 PM Chris Wilson 
> wrote:
> >
> > An unfortunate sequence of events, but it turns out there is a valid
> > usecase for being able to free/decouple the driver objects before they
> > are freed by the DRM core. In particular, if we have a pointer into a
> > drm core object from inside a driver object, that pointer needs to be
> > nerfed *before* it is freed so that concurrent access (e.g. debugfs)
> > does not following the dangling pointer.
> >
> > The legacy marker was adding in the code movement from drp_fops.c to
> > drm_file.c
> 
> I might fumble a lot, but not this one:
> 
> commit 45c3d213a400c952ab7119f394c5293bb6877e6b
> Author: Daniel Vetter 
> Date:   Mon May 8 10:26:33 2017 +0200
> 
> drm: Nerf the preclose callback for modern drivers
> 
> Also looking at the debugfs hook that has some rather adventurous stuff
> going on I think, feels a bit like a kitchensink with batteries included. If 
> that's
> really all needed I'd say iterate the contexts by first going over files, 
> then the
> ctx (which arent shared anyway) and the problem should also be gone.

Debugfs code can jump in after drm_gem_release() (where file->object_idr is 
destroyed), but before postclose(). At this window, everything is fine for 
debugfs context accessing except the file->object_idr.

--CQ

> -Daniel
> 
> > References: 9acdac68bcdc ("drm: rename drm_fops.c to drm_file.c")
> > Signed-off-by: Chris Wilson 
> > Cc: Daniel Vetter 
> > Cc: Gustavo Padovan 
> > Cc: CQ Tang 
> > Cc:  # v4.12+
> > ---
> >  drivers/gpu/drm/drm_file.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > index 0ac4566ae3f4..7b4258d6f7cc 100644
> > --- a/drivers/gpu/drm/drm_file.c
> > +++ b/drivers/gpu/drm/drm_file.c
> > @@ -258,8 +258,7 @@ void drm_file_free(struct drm_file *file)
> >   (long)old_encode_dev(file->minor->kdev->devt),
> >   atomic_read(>open_count));
> >
> > -   if (drm_core_check_feature(dev, DRIVER_LEGACY) &&
> > -   dev->driver->preclose)
> > +   if (dev->driver->preclose)
> > dev->driver->preclose(dev, file);
> >
> > if (drm_core_check_feature(dev, DRIVER_LEGACY))
> > --
> > 2.20.1
> >
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/3] drm/i915/gem: Move context decoupling from postclose to preclose

2020-07-23 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Thursday, July 23, 2020 10:21 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org; Chris Wilson ;
> Tang, CQ ; Vetter, Daniel ;
> sta...@vger.kernel.org
> Subject: [PATCH 2/3] drm/i915/gem: Move context decoupling from
> postclose to preclose
> 
> Since the GEM contexts refer to other GEM state, we need to nerf those
> pointers before that state is freed during drm_gem_release(). We need to
> move i915_gem_context_close() from the postclose callback to the preclose.
> 
> In particular, debugfs likes to peek into the GEM contexts, and from there
> peek at the drm core objects. If the context is closed during the peeking, we
> may attempt to dereference a stale core object.
> 
> Signed-off-by: Chris Wilson 
> Cc: CQ Tang 
> Cc: Daniel Vetter 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c index 5fd5af4bc855..15242a8c70f7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1114,11 +1114,15 @@ static void i915_driver_lastclose(struct
> drm_device *dev)
>   vga_switcheroo_process_delayed_switch();
>  }
> 
> +static void i915_driver_preclose(struct drm_device *dev, struct
> +drm_file *file) {
> + i915_gem_context_close(file);
> +}
> +
>  static void i915_driver_postclose(struct drm_device *dev, struct drm_file
> *file)  {
>   struct drm_i915_file_private *file_priv = file->driver_priv;
> 
> - i915_gem_context_close(file);
>   i915_gem_release(dev, file);

Now we separate i915_gem_context_close() from i915_gem_release() and other 
freeing code in postclose(), is there any side effect to allow code to run in 
between?
Can we move all postclose() code into preclose()?

--CQ

> 
>   kfree_rcu(file_priv, rcu);
> @@ -1850,6 +1854,7 @@ static struct drm_driver driver = {
>   .release = i915_driver_release,
>   .open = i915_driver_open,
>   .lastclose = i915_driver_lastclose,
> + .preclose  = i915_driver_preclose,
>   .postclose = i915_driver_postclose,
> 
>   .gem_close_object = i915_gem_close_object,
> --
> 2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 53/60] drm/i915: Create stolen memory region from local memory

2020-07-14 Thread Tang, CQ



> -Original Message-
> From: Auld, Matthew 
> Sent: Tuesday, July 14, 2020 8:02 AM
> To: Dave Airlie 
> Cc: Intel Graphics Development ; Tang, CQ
> ; Joonas Lahtinen ;
> Abdiel Janulgue ; Wilson, Chris P
> ; Balestrieri, Francesco
> ; Vishwanathapura, Niranjana
> ; Dhanalakota, Venkata S
> ; Neel Desai ;
> Brost, Matthew ; Dutt, Sudeep
> ; De Marchi, Lucas 
> Subject: Re: [RFC 53/60] drm/i915: Create stolen memory region from local
> memory
> 
> On 13/07/2020 05:48, Dave Airlie wrote:
> > On Fri, 10 Jul 2020 at 22:01, Matthew Auld 
> wrote:
> >>
> >> From: CQ Tang 
> >>
> >> Add "REGION_STOLEN" device info to dg1, create stolen memory region
> >> from upper portion of local device memory, starting from DSMBASE.
> >>
> >> The memory region is marked with "is_devmem=true".
> >
> > So is stolen fake on LMEM devices? The concept of stolen doesn't seem
> > to make much sense with VRAM, so please enlighten me.
> 
> CQ, do we actually need an explicit stolen LMEM region? The idea of having a
> DSM like stolen region for LMEM does sound strange(outside of the usual
> reserved portions which are for HW use etc), since the driver has complete
> control over LMEM. Is it just a convenience thing to keep things working as-is
> for fbc, initial fb, etc. or is there more to it?
> There is buddy_alloc_range() for LMEM which we could potentially use to
> wrap an object around for things like the initial fb or similar.

This is a natural extension from IGT stolen memory region into DGT, we want to 
allocate objects from stolen area. In DGT, we have one stolen area per tile so 
we create one region in each of these area. Using memory region is easier to 
manage objects allocation and free. Other than fbc and rc6, we have gt/ring 
allocate stolen memory objects when without LMEM, so only apply to IGT case:

display/intel_display.c:obj = 
i915_gem_object_create_stolen_for_preallocated(dev_priv,
display/intel_fbdev.c:  obj = 
i915_gem_object_create_stolen(dev_priv, size);
display/intel_overlay.c:obj = i915_gem_object_create_stolen(i915, 
PAGE_SIZE);
intel_rc6.c:pctx = 
i915_gem_object_create_stolen_for_preallocated(i915,
intel_rc6.c:pctx = i915_gem_object_create_stolen(i915, pctx_size);

intel_ring.c:   obj = 
intel_gt_object_create_stolen(ggtt->vm.gt, size);
intel_gt.c: obj = intel_gt_object_create_stolen(gt, size);

For some reason, we don't use buddy allocator to manage the stolen memory, 
instead, we use drm_mm_node allocator directly, we have one-to-one mapping 
between drm_mm address space to dma address of the stolen memory. We also use 
contiguous allocation where an object always get a single contiguous block of 
pages.

So fundamentally, we want to use the same code to work on both IGT stolen 
memory and DGT stolen memory.

--CQ


> 
> >
> > Dave.
> >
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] drm/i915: make stolen more region centric

2020-01-14 Thread Tang, CQ
Reviewed-by: CQ Tang 

> -Original Message-
> From: Auld, Matthew 
> Sent: Friday, January 3, 2020 12:00 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Tang, CQ ; Chris Wilson 
> Subject: [PATCH 1/3] drm/i915: make stolen more region centric
> 
> From: CQ Tang 
> 
> Signed-off-by: CQ Tang 
> Signed-off-by: Matthew Auld 
> Cc: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/display/intel_fbc.c  |  20 +--
>  drivers/gpu/drm/i915/display/intel_fbdev.c|   4 +-
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |   7 +-
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 128 --
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.h|   7 +-
>  drivers/gpu/drm/i915/gt/intel_rc6.c   |   4 +-
>  drivers/gpu/drm/i915/gt/intel_ring.c  |   2 +-
>  drivers/gpu/drm/i915/i915_debugfs.c   |   4 +-
>  drivers/gpu/drm/i915/i915_drv.h   |   6 -
>  drivers/gpu/drm/i915/intel_memory_region.h|   3 +
>  10 files changed, 90 insertions(+), 95 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c
> b/drivers/gpu/drm/i915/display/intel_fbc.c
> index a1048ece541e..3c4e70da717c 100644
> --- a/drivers/gpu/drm/i915/display/intel_fbc.c
> +++ b/drivers/gpu/drm/i915/display/intel_fbc.c
> @@ -420,6 +420,7 @@ static int find_compression_threshold(struct
> drm_i915_private *dev_priv,
> unsigned int size,
> unsigned int fb_cpp)
>  {
> + struct intel_memory_region *mem =
> +dev_priv->mm.regions[INTEL_REGION_STOLEN];
>   int compression_threshold = 1;
>   int ret;
>   u64 end;
> @@ -441,7 +442,7 @@ static int find_compression_threshold(struct
> drm_i915_private *dev_priv,
>*/
> 
>   /* Try to over-allocate to reduce reallocations and fragmentation. */
> - ret = i915_gem_stolen_insert_node_in_range(dev_priv, node, size
> <<= 1,
> + ret = i915_gem_stolen_insert_node_in_range(mem, node, size <<=
> 1,
>  4096, 0, end);
>   if (ret == 0)
>   return compression_threshold;
> @@ -452,7 +453,7 @@ static int find_compression_threshold(struct
> drm_i915_private *dev_priv,
>   (fb_cpp == 2 && compression_threshold == 2))
>   return 0;
> 
> - ret = i915_gem_stolen_insert_node_in_range(dev_priv, node,
> size >>= 1,
> + ret = i915_gem_stolen_insert_node_in_range(mem, node, size >>=
> 1,
>  4096, 0, end);
>   if (ret && INTEL_GEN(dev_priv) <= 4) {
>   return 0;
> @@ -467,6 +468,7 @@ static int find_compression_threshold(struct
> drm_i915_private *dev_priv,  static int intel_fbc_alloc_cfb(struct
> drm_i915_private *dev_priv,
>  unsigned int size, unsigned int fb_cpp)  {
> + struct intel_memory_region *mem =
> +dev_priv->mm.regions[INTEL_REGION_STOLEN];
>   struct intel_fbc *fbc = _priv->fbc;
>   struct drm_mm_node *uninitialized_var(compressed_llb);
>   int ret;
> @@ -493,7 +495,7 @@ static int intel_fbc_alloc_cfb(struct drm_i915_private
> *dev_priv,
>   if (!compressed_llb)
>   goto err_fb;
> 
> - ret = i915_gem_stolen_insert_node(dev_priv,
> compressed_llb,
> + ret = i915_gem_stolen_insert_node(mem, compressed_llb,
> 4096, 4096);
>   if (ret)
>   goto err_fb;
> @@ -519,22 +521,23 @@ static int intel_fbc_alloc_cfb(struct
> drm_i915_private *dev_priv,
> 
>  err_fb:
>   kfree(compressed_llb);
> - i915_gem_stolen_remove_node(dev_priv, >compressed_fb);
> + i915_gem_stolen_remove_node(mem, >compressed_fb);
>  err_llb:
> - if (drm_mm_initialized(_priv->mm.stolen))
> + if (drm_mm_initialized(>stolen))
>   pr_info_once("drm: not enough stolen space for
> compressed buffer (need %d more bytes), disabling. Hint: you may be able
> to increase stolen memory size in the BIOS to avoid this.\n", size);
>   return -ENOSPC;
>  }
> 
>  static void __intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv)  {
> + struct intel_memory_region *mem =
> +dev_priv->mm.regions[INTEL_REGION_STOLEN];
>   struct intel_fbc *fbc = _priv->fbc;
> 
>   if (drm_mm_node_allocated(>compressed_fb))
> - i915_gem_stolen_remove_node(dev_priv, 
> >compressed_fb);
> + i915_gem_stolen_remove_node(mem, 
> >compressed_fb);
> 
>   if (fbc->compressed_llb) {
> - i915_gem_

Re: [Intel-gfx] [PATCH 1/3] drm/i915: make stolen more region centric

2020-01-14 Thread Tang, CQ



> -Original Message-
> From: Chris Wilson 
> Sent: Friday, January 3, 2020 12:29 PM
> To: Auld, Matthew ; intel-
> g...@lists.freedesktop.org
> Cc: Tang, CQ 
> Subject: Re: [PATCH 1/3] drm/i915: make stolen more region centric
> 
> Quoting Matthew Auld (2020-01-03 20:00:28)
> > From: CQ Tang 
> 
> Just throwing the kitchen sink into intel_memory_region is not very
> appetizing. There seems to be no design behind this -- as foretold by the lack
> of rationale.

For single stolen region, I agreed with you. But this is used to match the 
other region design, and prepare multiple stolen regions.

--CQ

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t 5/9] i915/gem_ctx_isolation: Check engine relative registers

2019-11-25 Thread Tang, CQ
Chris,
I applied your patches and tested on DG1 hardware.  This new code works 
well, except there are still two issues to this test  (gem_ctx_isloation.c)

1. the test loops over all possible engines (..ccs0, ccs1, ccs2,). the 
legacy interface is used to check if an engine is supported:
  gem_require_ring(fd, e->flags);

for unsupported engines, the following message is printed by IGT test system:
Test requirement not met in function gem_require_ring, file 
../lib/ioctl_wrappers.c:1288:
Test requirement: gem_has_ring(fd, ring)

I think this is a minor issue, enhancement is expected to the IGT test code to 
handle it.

2. all the xxx-nonpriv-switch subtest fails (where xxx is the supported 
engine), for rcs0 example:

Starting subtest: rcs0-nonpriv-switch
(gem_ctx_isolation:16463) igt_aux-CRITICAL: Test assertion failure function 
sig_abort, file ../lib/igt_aux.c:502:
(gem_ctx_isolation:16463) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Stack trace:
  #0 ../lib/igt_core.c:1850 __igt_fail_assert()
  #1 ../lib/igt_aux.c:506 igt_fork_hang_detector()
  #2 [killpg+0x40]
  #3 [ioctl+0xb]
  #4 [drmIoctl+0x30]
  #5 ../lib/i915/gem_context.c:119 __gem_context_destroy()
  #6 ../lib/i915/gem_context.c:137 gem_context_destroy()
  #7 ../tests/i915/gem_ctx_isolation.c:666 nonpriv()
  #8 ../tests/i915/gem_ctx_isolation.c:947 __real_main893()
  #9 ../tests/i915/gem_ctx_isolation.c:893 main()
  #10 [__libc_start_main+0xf3]

I spent quite a while to study the source code, but I still don't understand 
what this subtest is doing.  If you know, can you explain some?

--CQ


> -Original Message-
> From: Intel-gfx  On Behalf Of
> Chris Wilson
> Sent: Wednesday, November 13, 2019 4:53 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: igt-...@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH i-g-t 5/9] i915/gem_ctx_isolation: Check engine
> relative registers
> 
> Some of the non-privileged registers are at the same offset on each engine.
> We can improve our coverage for unknown HW layout by using the reported
> engine->mmio_base for relative offsets.
> 
> Signed-off-by: Chris Wilson 
> ---
>  tests/i915/gem_ctx_isolation.c | 164 -
>  1 file changed, 100 insertions(+), 64 deletions(-)
> 
> diff --git a/tests/i915/gem_ctx_isolation.c b/tests/i915/gem_ctx_isolation.c
> index 6aa27133c..546ffac3a 100644
> --- a/tests/i915/gem_ctx_isolation.c
> +++ b/tests/i915/gem_ctx_isolation.c
> @@ -70,6 +70,7 @@ static const struct named_register {
>   uint32_t ignore_bits;
>   uint32_t write_mask; /* some registers bits do not exist */
>   bool masked;
> + bool relative;
>  } nonpriv_registers[] = {
>   { "NOPID", NOCTX, RCS0, 0x2094 },
>   { "MI_PREDICATE_RESULT_2", NOCTX, RCS0, 0x23bc }, @@ -109,7
> +110,6 @@ static const struct named_register {
>   { "PS_DEPTH_COUNT_1", GEN8, RCS0, 0x22f8, 2 },
>   { "BB_OFFSET", GEN8, RCS0, 0x2158, .ignore_bits = 0x7 },
>   { "MI_PREDICATE_RESULT_1", GEN8, RCS0, 0x241c },
> - { "CS_GPR", GEN8, RCS0, 0x2600, 32 },
>   { "OA_CTX_CONTROL", GEN8, RCS0, 0x2360 },
>   { "OACTXID", GEN8, RCS0, 0x2364 },
>   { "PS_INVOCATION_COUNT_2", GEN8, RCS0, 0x2448, 2, .write_mask
> = ~0x3 }, @@ -138,79 +138,56 @@ static const struct named_register {
> 
>   { "CTX_PREEMPT", NOCTX /* GEN10 */, RCS0, 0x2248 },
>   { "CS_CHICKEN1", GEN11, RCS0, 0x2580, .masked = true },
> - { "HDC_CHICKEN1", GEN_RANGE(10, 10), RCS0, 0x7304, .masked =
> true },
> 
>   /* Privileged (enabled by w/a + FORCE_TO_NONPRIV) */
>   { "CTX_PREEMPT", NOCTX /* GEN9 */, RCS0, 0x2248 },
>   { "CS_CHICKEN1", GEN_RANGE(9, 10), RCS0, 0x2580, .masked = true },
>   { "COMMON_SLICE_CHICKEN2", GEN_RANGE(9, 9), RCS0,
> 0x7014, .masked = true },
> - { "HDC_CHICKEN1", GEN_RANGE(9, 9), RCS0, 0x7304, .masked =
> true },
> + { "HDC_CHICKEN1", GEN_RANGE(9, 10), RCS0, 0x7304, .masked =
> true },
>   { "SLICE_COMMON_ECO_CHICKEN1", GEN_RANGE(11, 11) /* + glk */,
> RCS0,  0x731c, .masked = true },
>   { "L3SQREG4", NOCTX /* GEN9:skl,kbl */, RCS0, 0xb118, .write_mask
> = ~0x10 },
>   { "HALF_SLICE_CHICKEN7", GEN_RANGE(11, 11), RCS0,
> 0xe194, .masked = true },
>   { "SAMPLER_MODE", GEN_RANGE(11, 11), RCS0, 0xe18c, .masked =
> true },
> 
> - { "BCS_GPR", GEN9, BCS0, 0x22600, 32 },
>   { "BCS_SWCTRL", GEN8, BCS0, 0x22200, .write_mask = 0x3, .masked =
> true },
> 
>   { "MFC_VDBOX1", NOCTX, VCS0, 0x12800, 64 },
>   { "MFC_VDBOX2", NOCTX, VCS1, 0x1c800, 64 },
> 
> - { "VCS0_GPR", GEN_RANGE(9, 10), VCS0, 0x12600, 32 },
> - { "VCS1_GPR", GEN_RANGE(9, 10), VCS1, 0x1c600, 32 },
> - { "VECS_GPR", GEN_RANGE(9, 10), VECS0, 0x1a600, 32 },
> -
> - { "VCS0_GPR", GEN11, VCS0, 0x1c0600, 32 },
> - { "VCS1_GPR", GEN11, VCS1, 0x1c4600, 32 },
> - { "VCS2_GPR", GEN11, VCS2, 0x1d0600, 32 },
> - { "VCS3_GPR", GEN11, VCS3, 0x1d4600, 32 },
> - { "VECS_GPR", GEN11, VECS0, 0x1c8600, 32 

Re: [Intel-gfx] [PATCH i-g-t 5/9] i915/gem_ctx_isolation: Check engine relative registers

2019-11-21 Thread Tang, CQ


> -Original Message-
> From: Chris Wilson 
> Sent: Thursday, November 21, 2019 3:45 PM
> To: Tang, CQ ; intel-gfx@lists.freedesktop.org
> Cc: igt-...@lists.freedesktop.org
> Subject: RE: [Intel-gfx] [PATCH i-g-t 5/9] i915/gem_ctx_isolation: Check
> engine relative registers
> 
> Quoting Tang, CQ (2019-11-21 21:07:13)
> >
> >
> > > -Original Message-
> > > From: Intel-gfx  On Behalf
> > > Of Chris Wilson
> > > Sent: Wednesday, November 13, 2019 4:53 AM
> > > To: intel-gfx@lists.freedesktop.org
> > > Cc: igt-...@lists.freedesktop.org
> > > Subject: [Intel-gfx] [PATCH i-g-t 5/9] i915/gem_ctx_isolation: Check
> > > engine @@ -248,6 +238,7 @@ static void tmpl_regs(int fd,  {
> > >   const unsigned int gen_bit = 1 <<
> > > intel_gen(intel_get_drm_devid(fd));
> > >   const unsigned int engine_bit = ENGINE(e->class, e->instance);
> > > + const uint32_t mmio_base = gem_engine_mmio_base(fd, e->name);
> >
> > Chris, I tried to test this patch, but "gem_engine_mmio_base()" above is
> not defined.
> > Can you check?
> 
> Did you perchance look at patch 4?

Thanks, find this one:
[i-g-t,4/9] i915: Start putting the mmio_base to wider use

--CQ

> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH i-g-t 5/9] i915/gem_ctx_isolation: Check engine relative registers

2019-11-21 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx  On Behalf Of
> Chris Wilson
> Sent: Wednesday, November 13, 2019 4:53 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: igt-...@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH i-g-t 5/9] i915/gem_ctx_isolation: Check engine
> relative registers
> 
> Some of the non-privileged registers are at the same offset on each engine.
> We can improve our coverage for unknown HW layout by using the reported
> engine->mmio_base for relative offsets.
> 
> Signed-off-by: Chris Wilson 
> ---
>  tests/i915/gem_ctx_isolation.c | 164 -
>  1 file changed, 100 insertions(+), 64 deletions(-)
> 
> diff --git a/tests/i915/gem_ctx_isolation.c b/tests/i915/gem_ctx_isolation.c
> index 6aa27133c..546ffac3a 100644
> --- a/tests/i915/gem_ctx_isolation.c
> +++ b/tests/i915/gem_ctx_isolation.c
> @@ -70,6 +70,7 @@ static const struct named_register {
>   uint32_t ignore_bits;
>   uint32_t write_mask; /* some registers bits do not exist */
>   bool masked;
> + bool relative;
>  } nonpriv_registers[] = {
>   { "NOPID", NOCTX, RCS0, 0x2094 },
>   { "MI_PREDICATE_RESULT_2", NOCTX, RCS0, 0x23bc }, @@ -109,7
> +110,6 @@ static const struct named_register {
>   { "PS_DEPTH_COUNT_1", GEN8, RCS0, 0x22f8, 2 },
>   { "BB_OFFSET", GEN8, RCS0, 0x2158, .ignore_bits = 0x7 },
>   { "MI_PREDICATE_RESULT_1", GEN8, RCS0, 0x241c },
> - { "CS_GPR", GEN8, RCS0, 0x2600, 32 },
>   { "OA_CTX_CONTROL", GEN8, RCS0, 0x2360 },
>   { "OACTXID", GEN8, RCS0, 0x2364 },
>   { "PS_INVOCATION_COUNT_2", GEN8, RCS0, 0x2448, 2, .write_mask
> = ~0x3 }, @@ -138,79 +138,56 @@ static const struct named_register {
> 
>   { "CTX_PREEMPT", NOCTX /* GEN10 */, RCS0, 0x2248 },
>   { "CS_CHICKEN1", GEN11, RCS0, 0x2580, .masked = true },
> - { "HDC_CHICKEN1", GEN_RANGE(10, 10), RCS0, 0x7304, .masked =
> true },
> 
>   /* Privileged (enabled by w/a + FORCE_TO_NONPRIV) */
>   { "CTX_PREEMPT", NOCTX /* GEN9 */, RCS0, 0x2248 },
>   { "CS_CHICKEN1", GEN_RANGE(9, 10), RCS0, 0x2580, .masked = true },
>   { "COMMON_SLICE_CHICKEN2", GEN_RANGE(9, 9), RCS0,
> 0x7014, .masked = true },
> - { "HDC_CHICKEN1", GEN_RANGE(9, 9), RCS0, 0x7304, .masked =
> true },
> + { "HDC_CHICKEN1", GEN_RANGE(9, 10), RCS0, 0x7304, .masked =
> true },
>   { "SLICE_COMMON_ECO_CHICKEN1", GEN_RANGE(11, 11) /* + glk */,
> RCS0,  0x731c, .masked = true },
>   { "L3SQREG4", NOCTX /* GEN9:skl,kbl */, RCS0, 0xb118, .write_mask
> = ~0x10 },
>   { "HALF_SLICE_CHICKEN7", GEN_RANGE(11, 11), RCS0,
> 0xe194, .masked = true },
>   { "SAMPLER_MODE", GEN_RANGE(11, 11), RCS0, 0xe18c, .masked =
> true },
> 
> - { "BCS_GPR", GEN9, BCS0, 0x22600, 32 },
>   { "BCS_SWCTRL", GEN8, BCS0, 0x22200, .write_mask = 0x3, .masked =
> true },
> 
>   { "MFC_VDBOX1", NOCTX, VCS0, 0x12800, 64 },
>   { "MFC_VDBOX2", NOCTX, VCS1, 0x1c800, 64 },
> 
> - { "VCS0_GPR", GEN_RANGE(9, 10), VCS0, 0x12600, 32 },
> - { "VCS1_GPR", GEN_RANGE(9, 10), VCS1, 0x1c600, 32 },
> - { "VECS_GPR", GEN_RANGE(9, 10), VECS0, 0x1a600, 32 },
> -
> - { "VCS0_GPR", GEN11, VCS0, 0x1c0600, 32 },
> - { "VCS1_GPR", GEN11, VCS1, 0x1c4600, 32 },
> - { "VCS2_GPR", GEN11, VCS2, 0x1d0600, 32 },
> - { "VCS3_GPR", GEN11, VCS3, 0x1d4600, 32 },
> - { "VECS_GPR", GEN11, VECS0, 0x1c8600, 32 },
> + { "xCS_GPR", GEN9, ALL, 0x600, 32, .relative = true },
> 
>   {}
>  }, ignore_registers[] = {
>   { "RCS timestamp", GEN6, ~0u, 0x2358 },
>   { "BCS timestamp", GEN7, ~0u, 0x22358 },
> 
> - { "VCS0 timestamp", GEN_RANGE(7, 10), ~0u, 0x12358 },
> - { "VCS1 timestamp", GEN_RANGE(7, 10), ~0u, 0x1c358 },
> - { "VECS timestamp", GEN_RANGE(8, 10), ~0u, 0x1a358 },
> -
> - { "VCS0 timestamp", GEN11, ~0u, 0x1c0358 },
> - { "VCS1 timestamp", GEN11, ~0u, 0x1c4358 },
> - { "VCS2 timestamp", GEN11, ~0u, 0x1d0358 },
> - { "VCS3 timestamp", GEN11, ~0u, 0x1d4358 },
> - { "VECS timestamp", GEN11, ~0u, 0x1c8358 },
> + { "xCS timestamp", GEN8, ALL, 0x358, .relative = true },
> 
>   /* huc read only */
> - { "BSD0 0x2000", GEN11, ~0u, 0x1c + 0x2000 },
> - { "BSD0 0x2000", GEN11, ~0u, 0x1c + 0x2014 },
> - { "BSD0 0x2000", GEN11, ~0u, 0x1c + 0x23b0 },
> -
> - { "BSD1 0x2000", GEN11, ~0u, 0x1c4000 + 0x2000 },
> - { "BSD1 0x2000", GEN11, ~0u, 0x1c4000 + 0x2014 },
> - { "BSD1 0x2000", GEN11, ~0u, 0x1c4000 + 0x23b0 },
> -
> - { "BSD2 0x2000", GEN11, ~0u, 0x1d + 0x2000 },
> - { "BSD2 0x2000", GEN11, ~0u, 0x1d + 0x2014 },
> - { "BSD2 0x2000", GEN11, ~0u, 0x1d + 0x23b0 },
> -
> - { "BSD3 0x2000", GEN11, ~0u, 0x1d4000 + 0x2000 },
> - { "BSD3 0x2000", GEN11, ~0u, 0x1d4000 + 0x2014 },
> - { "BSD3 0x2000", GEN11, ~0u, 0x1d4000 + 0x23b0 },
> + { "BSD 0x2000", GEN11, ALL, 0x2000, .relative = true },
> + { "BSD 0x2014", GEN11, ALL, 0x2014, .relative = true },
> + { "BSD 

Re: [Intel-gfx] [PATCH 1/3] drm/i915: Switch obj->mm.lock lockdep annotations on its head

2019-11-07 Thread Tang, CQ
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -22,6 +22,8 @@
>   *
>   */
> 
> +#include 
> +
>  #include "display/intel_frontbuffer.h"
>  #include "gt/intel_gt.h"
>  #include "i915_drv.h"
> @@ -52,6 +54,14 @@ void i915_gem_object_init(struct
> drm_i915_gem_object *obj,  {
>   __mutex_init(>mm.lock, "obj->mm.lock", key);
> 
> + if (IS_ENABLED(CONFIG_LOCKDEP)) {
> + mutex_lock_nested(>mm.lock,
> I915_MM_GET_PAGES);
> + fs_reclaim_acquire(GFP_KERNEL);
> + might_lock(>mm.lock);
> + fs_reclaim_release(GFP_KERNEL);
> + mutex_unlock(>mm.lock);
> + }
> +

I looked the upstream code in drm-tip,   I see other changes but not above.  Is 
this correct?

--CQ


>   spin_lock_init(>vma.lock);
>   INIT_LIST_HEAD(>vma.list);
> 
> @@ -186,7 +196,7 @@ static void __i915_gem_free_objects(struct
> drm_i915_private *i915,
>   GEM_BUG_ON(!list_empty(>lut_list));
> 
>   atomic_set(>mm.pages_pin_count, 0);
> - __i915_gem_object_put_pages(obj, I915_MM_NORMAL);
> + __i915_gem_object_put_pages(obj);
>   GEM_BUG_ON(i915_gem_object_has_pages(obj));
>   bitmap_free(obj->bit_17);
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 458cd51331f1..edaf7126a84d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -319,11 +319,22 @@ i915_gem_object_unpin_pages(struct
> drm_i915_gem_object *obj)
> 
>  enum i915_mm_subclass { /* lockdep subclass for obj-
> >mm.lock/struct_mutex */
>   I915_MM_NORMAL = 0,
> - I915_MM_SHRINKER /* called "recursively" from direct-reclaim-
> esque */
> + /*
> +  * Only used by struct_mutex, when called "recursively" from
> +  * direct-reclaim-esque. Safe because there is only every one
> +  * struct_mutex in the entire system.
> +  */
> + I915_MM_SHRINKER = 1,
> + /*
> +  * Used for obj->mm.lock when allocating pages. Safe because the
> object
> +  * isn't yet on any LRU, and therefore the shrinker can't deadlock on
> +  * it. As soon as the object has pages, obj->mm.lock nests within
> +  * fs_reclaim.
> +  */
> + I915_MM_GET_PAGES = 1,
>  };
> 
> -int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
> - enum i915_mm_subclass subclass);
> +int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj);
>  void i915_gem_object_truncate(struct drm_i915_gem_object *obj);  void
> i915_gem_object_writeback(struct drm_i915_gem_object *obj);
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 96008374a412..15f8297dc34e 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -162,7 +162,11 @@ struct drm_i915_gem_object {
>   atomic_t bind_count;
> 
>   struct {
> - struct mutex lock; /* protects the pages and their use */
> + /*
> +  * Protects the pages and their use. Do not use directly, but
> +  * instead go through the pin/unpin interfaces.
> +  */
> + struct mutex lock;
>   atomic_t pages_pin_count;
>   atomic_t shrink_pin;
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 29f4c2850745..f402c2c415c2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -106,7 +106,7 @@ int __i915_gem_object_get_pages(struct
> drm_i915_gem_object *obj)  {
>   int err;
> 
> - err = mutex_lock_interruptible(>mm.lock);
> + err = mutex_lock_interruptible_nested(>mm.lock,
> +I915_MM_GET_PAGES);
>   if (err)
>   return err;
> 
> @@ -190,8 +190,7 @@ __i915_gem_object_unset_pages(struct
> drm_i915_gem_object *obj)
>   return pages;
>  }
> 
> -int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
> - enum i915_mm_subclass subclass)
> +int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
>  {
>   struct sg_table *pages;
>   int err;
> @@ -202,7 +201,7 @@ int __i915_gem_object_put_pages(struct
> drm_i915_gem_object *obj,
>   GEM_BUG_ON(atomic_read(>bind_count));
> 
>   /* May be called by shrinker from within get_pages() (on another bo)
> */
> - mutex_lock_nested(>mm.lock, subclass);
> + mutex_lock(>mm.lock);
>   if (unlikely(atomic_read(>mm.pages_pin_count))) {
>   err = -EBUSY;
>   goto unlock;
> @@ -308,7 +307,7 @@ void *i915_gem_object_pin_map(struct
> drm_i915_gem_object *obj,
>   if (!i915_gem_object_type_has(obj, flags))
>   return ERR_PTR(-ENXIO);
> 
> - err = 

Re: [Intel-gfx] [PATCH v4 1/2] drm/i915: lookup for mem_region of a mem_type

2019-11-06 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx  On Behalf Of
> Ramalingam C
> Sent: Wednesday, November 6, 2019 8:08 AM
> To: intel-gfx ; Chris Wilson  wilson.co.uk>
> Cc: Auld, Matthew 
> Subject: [Intel-gfx] [PATCH v4 1/2] drm/i915: lookup for mem_region of a
> mem_type
> 
> Lookup function to retrieve the pointer to a memory region of a mem_type.

We could have multi-regions with the same memory type. Your code just returns 
the first one. Is this desired?

--CQ

> 
> Signed-off-by: Ramalingam C 
> cc: Matthew Auld 
> ---
>  drivers/gpu/drm/i915/intel_memory_region.c | 12 
> drivers/gpu/drm/i915/intel_memory_region.h |  3 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.c
> b/drivers/gpu/drm/i915/intel_memory_region.c
> index baaeaecc64af..ae899df7a1c2 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/intel_memory_region.c
> @@ -16,6 +16,18 @@ const u32 intel_region_map[] = {
>   [INTEL_REGION_STOLEN] = REGION_MAP(INTEL_MEMORY_STOLEN,
> 0),  };
> 
> +struct intel_memory_region *
> +intel_memory_region_lookup(struct drm_i915_private *i915,
> +enum intel_memory_type mem_type)
> +{
> + enum intel_region_id id;
> +
> + for (id = INTEL_REGION_SMEM; id < INTEL_REGION_UNKNOWN;
> id++)
> + if (i915->mm.regions[id]->type == mem_type)
> + return i915->mm.regions[id];
> + return NULL;
> +}
> +
>  static u64
>  intel_memory_region_free_pages(struct intel_memory_region *mem,
>  struct list_head *blocks)
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.h
> b/drivers/gpu/drm/i915/intel_memory_region.h
> index 238722009677..d210936c4d72 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.h
> +++ b/drivers/gpu/drm/i915/intel_memory_region.h
> @@ -125,5 +125,8 @@ void intel_memory_region_put(struct
> intel_memory_region *mem);
> 
>  int intel_memory_regions_hw_probe(struct drm_i915_private *i915);  void
> intel_memory_regions_driver_release(struct drm_i915_private *i915);
> +struct intel_memory_region *
> +intel_memory_region_lookup(struct drm_i915_private *i915,
> +enum intel_memory_type mem_type);
> 
>  #endif
> --
> 2.20.1
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its head

2019-11-05 Thread Tang, CQ


> -Original Message-
> From: Daniel Vetter 
> Sent: Tuesday, November 5, 2019 2:02 AM
> To: Intel Graphics Development 
> Cc: Daniel Vetter ; Auld, Matthew
> ; Chris Wilson ; Tang,
> CQ ; Ursulin, Tvrtko ; Joonas
> Lahtinen ; Vetter, Daniel
> 
> Subject: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its
> head
> 
> The trouble with having a plain nesting flag for locks which do not naturally
> nest (unlike block devices and their partitions, which is the original 
> motivation
> for nesting levels) is that lockdep will never spot a true deadlock if you 
> screw
> up.
> 
> This patch is an attempt at trying better, by highlighting a bit more the 
> actual
> nature of the nesting that's going on. Essentially we have two kinds of
> objects:
> 
> - objects without pages allocated, which cannot be on any lru and are
>   hence inaccessible to the shrinker.
> 
> - objects which have pages allocated, which are on an lru, and which
>   the shrinker can decide to throw out.
> 
> For the former type of object, memory allcoations while holding
> obj->mm.lock are permissible. For the latter they are not. And
> get/put_pages transitions between the two types of objects.
> 
> This is still not entirely fool-proof since the rules might chance.
> But as long as we run such a code ever at runtime lockdep should be able to
> observe the inconsistency and complain (like with any other lockdep class
> that we've split up in multiple classes). But there are a few clear benefits:
> 
> - We can drop the nesting flag parameter from
>   __i915_gem_object_put_pages, because that function by definition is
>   never going allocate memory, and calling it on an object which
>   doesn't have its pages allocated would be a bug.
> 
> - We strictly catch more bugs, since there's not only one place in the
>   entire tree which is annotated with the special class. All the
>   other places that had explicit lockdep nesting annotations we're now
>   going to leave up to lockdep again.
> 
> - Specifically this catches stuff like calling get_pages from
>   put_pages (which isn't really a good idea, if we can call get_pages
>   so could the shrinker). I've seen patches do exactly that.

If we don't allow get_pages from put_pages, then we need to think a new way to 
swap the pages freed by put_pages.
In the lmem swapping case, put_pages can't just free the pages, it needs to 
save the pages to somewhere else.

The saving operation requires to call get_pages because we need temp objects 
for blitter engine to do the copying.

Can we use another thread to do the async copying?


--CQ


> 
> Of course I fully expect CI will show me for the fool I am with this one here 
> :-)
> 
> v2: There can only be one (lockdep only has a cache for the first subclass, 
> not
> for deeper ones, and we don't want to make these locks even slower). Still
> separate enums for better documentation.
> 
> Real fix: don forget about phys objs and pin_map(), and fix the shrinker to
> have the right annotations ... silly me.
> 
> v3: Forgot usertptr too ...
> 
> v4: Improve comment for pages_pin_count, drop the IMPORTANT comment
> and instead prime lockdep (Chris).
> 
> v5: Appease checkpatch, no double empty lines (Chris)
> 
> v6: More rebasing over selftest changes. Also somehow I forgot to push this
> patch :-/
> 
> Also format comments consistently while at it.
> 
> v7: Fix typo in commit message (Joonas)
> 
> Also drop the priming, with the lmem merge we now have allocations while
> holding the lmem lock, which wreaks the generic priming I've done in earlier
> patches. Should probably be resurrected when lmem is fixed. See
> 
> commit 232a6ebae419193f5b8da4fa869ae5089ab105c2
> Author: Matthew Auld 
> Date:   Tue Oct 8 17:01:14 2019 +0100
> 
> drm/i915: introduce intel_memory_region
> 
> I'm keeping the priming patch locally so it wont get lost.
> 
> Cc: Matthew Auld 
> Cc: Chris Wilson 
> Cc: "Tang, CQ" 
> Cc: Tvrtko Ursulin 
> Cc: Joonas Lahtinen 
> Reviewed-by: Chris Wilson  (v5)
> Reviewed-by: Joonas Lahtinen  (v6)
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c  |  4 +++-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h  | 17 ++---
>  .../gpu/drm/i915/gem/i915_gem_object_types.h|  6 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c   |  9 -
>  drivers/gpu/drm/i915/gem/i915_gem_phys.c|  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c|  5 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  4 ++--
>  drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 14 +++---
>  .../drm/i915/selftests/intel_memory_regi

Re: [Intel-gfx] [PATCH v2 13/22] drm/i915: treat stolen as a region

2019-10-03 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx  On Behalf Of
> Matthew Auld
> Sent: Thursday, October 3, 2019 12:25 PM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH v2 13/22] drm/i915: treat stolen as a region
> 
> Convert stolen memory over to a region object. Still leaves open the
> question with what to do with pre-allocated objects...
> 
> Signed-off-by: Matthew Auld 
> Cc: Joonas Lahtinen 
> Cc: Abdiel Janulgue 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 66
> +++---  drivers/gpu/drm/i915/gem/i915_gem_stolen.h
> |  3 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c| 14 +
>  drivers/gpu/drm/i915/i915_pci.c|  2 +-
>  4 files changed, 62 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> index 3dd295bb61f6..a91ef9fe98cd 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
> 
> +#include "gem/i915_gem_region.h"
>  #include "i915_drv.h"
>  #include "i915_gem_stolen.h"
> 
> @@ -150,7 +151,7 @@ static int i915_adjust_stolen(struct drm_i915_private
> *dev_priv,
>   return 0;
>  }
> 
> -void i915_gem_cleanup_stolen(struct drm_i915_private *dev_priv)
> +static void i915_gem_cleanup_stolen(struct drm_i915_private *dev_priv)
>  {
>   if (!drm_mm_initialized(_priv->mm.stolen))
>   return;
> @@ -355,7 +356,7 @@ static void icl_get_stolen_reserved(struct
> drm_i915_private *i915,
>   }
>  }
> 
> -int i915_gem_init_stolen(struct drm_i915_private *dev_priv)
> +static int i915_gem_init_stolen(struct drm_i915_private *dev_priv)
>  {
>   resource_size_t reserved_base, stolen_top;
>   resource_size_t reserved_total, reserved_size; @@ -539,6 +540,9
> @@ i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
> 
>   i915_gem_stolen_remove_node(dev_priv, stolen);
>   kfree(stolen);
> +
> + if (obj->mm.region)
> + i915_gem_object_release_memory_region(obj);

We fully support memory region concept for both bios stolen system memory and 
local stolen memory, we don't need such kind of checking.
This looks to upstream code before latest DG1 code. I am not sure if we want to 
upstream this intermediate code.

--CQ


>  }
> 
>  static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops
> = { @@ -548,8 +552,9 @@ static const struct drm_i915_gem_object_ops
> i915_gem_object_stolen_ops = {  };
> 
>  static struct drm_i915_gem_object *
> -_i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
> -struct drm_mm_node *stolen)
> +__i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
> + struct drm_mm_node *stolen,
> + struct intel_memory_region *mem)
>  {
>   struct drm_i915_gem_object *obj;
>   unsigned int cache_level;
> @@ -566,6 +571,9 @@ _i915_gem_object_create_stolen(struct
> drm_i915_private *dev_priv,
>   cache_level = HAS_LLC(dev_priv) ? I915_CACHE_LLC :
> I915_CACHE_NONE;
>   i915_gem_object_set_cache_coherency(obj, cache_level);
> 
> + if (mem)
> + i915_gem_object_init_memory_region(obj, mem, 0);
> +
>   if (i915_gem_object_pin_pages(obj))
>   goto cleanup;
> 
> @@ -576,10 +584,12 @@ _i915_gem_object_create_stolen(struct
> drm_i915_private *dev_priv,
>   return NULL;
>  }
> 
> -struct drm_i915_gem_object *
> -i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
> -   resource_size_t size)
> +static struct drm_i915_gem_object *
> +_i915_gem_object_create_stolen(struct intel_memory_region *mem,
> +resource_size_t size,
> +unsigned int flags)
>  {
> + struct drm_i915_private *dev_priv = mem->i915;
>   struct drm_i915_gem_object *obj;
>   struct drm_mm_node *stolen;
>   int ret;
> @@ -598,7 +608,7 @@ i915_gem_object_create_stolen(struct
> drm_i915_private *dev_priv,
>   if (ret)
>   goto err_free;
> 
> - obj = _i915_gem_object_create_stolen(dev_priv, stolen);
> + obj = __i915_gem_object_create_stolen(dev_priv, stolen, mem);
>   if (obj == NULL) {
>   ret = -ENOMEM;
>   goto err_remove;
> @@ -613,6 +623,44 @@ i915_gem_object_create_stolen(struct
> drm_i915_private *dev_priv,
>   return ERR_PTR(ret);
>  }
> 
> +struct drm_i915_gem_object *
> +i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
> +   resource_size_t size)
> +{
> +
> + return i915_gem_object_create_region(dev_priv-
> >mm.regions[INTEL_MEMORY_STOLEN],
> +  size,
> I915_BO_ALLOC_CONTIGUOUS); }
> +
> +static int init_stolen(struct intel_memory_region *mem) {
> + /*
> +  * Initialise stolen early so that we may reserve preallocated
> +

Re: [Intel-gfx] [PATCH v2 19/22] drm/i915: don't allocate the ring in stolen if we lack aperture

2019-10-03 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx  On Behalf Of
> Matthew Auld
> Sent: Thursday, October 3, 2019 12:25 PM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH v2 19/22] drm/i915: don't allocate the ring in
> stolen if we lack aperture
> 
> Since we have no way access it from the CPU. For such cases just fallback to
> internal objects.
> 
> Signed-off-by: Matthew Auld 
> ---
>  drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> index e220c09c6f32..c48f1d20af5f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> @@ -1273,7 +1273,9 @@ static struct i915_vma *create_ring_vma(struct
> i915_ggtt *ggtt, int size)
>   struct drm_i915_gem_object *obj;
>   struct i915_vma *vma;
> 
> - obj = i915_gem_object_create_stolen(i915, size);
> + obj = ERR_PTR(-ENODEV);
> + if (HAS_MAPPABLE_APERTURE(i915))
> + obj = i915_gem_object_create_stolen(i915, size);

Don't we already support local stolen memory region?  In this case, if it has 
aperture, it is bios stolen system memory, if no aperture, it is local stolen 
memory, the same call to i915_gem_object_create_stolen() will work in both 
cases.

--CQ


>   if (IS_ERR(obj))
>   obj = i915_gem_object_create_internal(i915, size);
>   if (IS_ERR(obj))
> --
> 2.20.1
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its head

2019-08-22 Thread Tang, CQ


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> Sent: Thursday, August 22, 2019 7:50 AM
> To: Intel Graphics Development 
> Cc: Daniel Vetter ; Chris Wilson  wilson.co.uk>; Tang, CQ ; Ursulin, Tvrtko
> ; Joonas Lahtinen
> ; Vetter, Daniel 
> Subject: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its
> head
> 
> The trouble with having a plain nesting flag for locks which do not naturally
> nest (unlike block devices and their partitions, which is the original 
> motivation
> for nesting levels) is that lockdep will never spot a true deadlock if you 
> screw
> up.
> 
> This patch is an attempt at trying better, by highlighting a bit more the 
> actual
> nature of the nesting that's going on. Essentially we have two kinds of
> objects:
> 
> - objects without pages allocated, which cannot be on any lru and are
>   hence inaccessible to the shrinker.
> 
> - objects which have pages allocated, which are on an lru, and which
>   the shrinker can decide to throw out.
> 
> For the former type of object, memory allcoations while holding
> obj->mm.lock are permissible. For the latter they are not. And
> get/put_pages transitions between the two types of objects.
> 
> This is still not entirely fool-proof since the rules might chance.
> But as long as we run such a code ever at runtime lockdep should be able to
> observe the inconsistency and complain (like with any other lockdep class
> that we've split up in multiple classes). But there are a few clear benefits:
> 
> - We can drop the nesting flag parameter from
>   __i915_gem_object_put_pages, because that function by definition is
>   never going allocate memory, and calling it on an object which
>   doesn't have its pages allocated would be a bug.
> 
> - We strictly catch more bugs, since there's not only one place in the
>   entire tree which is annotated with the special class. All the
>   other places that had explicit lockdep nesting annotations we're now
>   going to leave up to lockdep again.
> 
> - Specifically this catches stuff like calling get_pages from
>   put_pages (which isn't really a good idea, if we can call put_pages
>   so could the shrinker). I've seen patches do exactly that.
> 
> Of course I fully expect CI will show me for the fool I am with this one here 
> :-)
> 
> v2: There can only be one (lockdep only has a cache for the first subclass, 
> not
> for deeper ones, and we don't want to make these locks even slower). Still
> separate enums for better documentation.
> 
> Real fix: don forget about phys objs and pin_map(), and fix the shrinker to
> have the right annotations ... silly me.
> 
> v3: Forgot usertptr too ...
> 
> v4: Improve comment for pages_pin_count, drop the IMPORTANT comment
> and instead prime lockdep (Chris).
> 
> v5: Appease checkpatch, no double empty lines (Chris)

Suppose I want to apply the current swapping code on top of this patch, I will 
have the following locks nested:

Pin_pages(A);-->
mutex_lock_interruptible_nested(>mm.lock, I915_MM_GET_PAGES); -->
lock 'struct_mutex'; -->
Find object B to swap out; -->
mutex_lock(>mm.lock); -->
create shadow object C; -->
mutex_lock_nested(>mm.lock, I915_MM_GET_PAGES);

Is this correct?  Should I lock shadow object C with 'I915_MM_GET_PAGES?   Then 
A and C are nested with the same subclass.

--CQ


> 
> Cc: Chris Wilson 
> Cc: "Tang, CQ" 
> Cc: Tvrtko Ursulin 
> Cc: Joonas Lahtinen 
> Reviewed-by: Chris Wilson 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 12 +++-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h   | 16 +---
>  drivers/gpu/drm/i915/gem/i915_gem_object_types.h |  6 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c|  9 -
>  drivers/gpu/drm/i915/gem/i915_gem_phys.c |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c |  5 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c  |  4 ++--
>  drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 12 ++--
>  8 files changed, 44 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index d7855dc5a5c5..1bdd7485bc72 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -22,6 +22,8 @@
>   *
>   */
> 
> +#include 
> +
>  #include "display/intel_frontbuffer.h"
>  #include "gt/intel_gt.h"
>  #include "i915_drv.h"
> @@ -51,6 +53,14 @@ void i915_gem_object_init(struct
> drm_i915_gem_object *obj,  {
>   mutex_init(>mm.lock);
> 
> +

Re: [Intel-gfx] [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its head

2019-08-16 Thread Tang, CQ


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> Sent: Friday, August 16, 2019 3:08 PM
> To: Tang, CQ 
> Cc: Intel Graphics Development ; Chris
> Wilson ; Ursulin, Tvrtko
> ; Joonas Lahtinen
> ; Vetter, Daniel 
> Subject: Re: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on
> its head
> 
> On Fri, Aug 16, 2019 at 9:23 PM Tang, CQ  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> > > Sent: Friday, August 16, 2019 11:24 AM
> > > To: Intel Graphics Development 
> > > Cc: Daniel Vetter ; Chris Wilson
> > > ; Tang, CQ ; Ursulin,
> > > Tvrtko ; Joonas Lahtinen
> > > ; Vetter, Daniel
> > > 
> > > Subject: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations
> > > on its head
> > >
> > > The trouble with having a plain nesting flag for locks which do not
> > > naturally nest (unlike block devices and their partitions, which is
> > > the original motivation for nesting levels) is that lockdep will
> > > never spot a true deadlock if you screw up.
> > >
> > > This patch is an attempt at trying better, by highlighting a bit
> > > more the actual nature of the nesting that's going on. Essentially
> > > we have two kinds of
> > > objects:
> > >
> > > - objects without pages allocated, which cannot be on any lru and are
> > >   hence inaccessible to the shrinker.
> > >
> > > - objects which have pages allocated, which are on an lru, and which
> > >   the shrinker can decide to throw out.
> > >
> > > For the former type of object, memory allcoations while holding
> > > obj->mm.lock are permissible. For the latter they are not. And
> > > get/put_pages transitions between the two types of objects.
> > >
> > > This is still not entirely fool-proof since the rules might chance.
> > > But as long as we run such a code ever at runtime lockdep should be
> > > able to observe the inconsistency and complain (like with any other
> > > lockdep class that we've split up in multiple classes). But there are a 
> > > few
> clear benefits:
> > >
> > > - We can drop the nesting flag parameter from
> > >   __i915_gem_object_put_pages, because that function by definition is
> > >   never going allocate memory, and calling it on an object which
> > >   doesn't have its pages allocated would be a bug.
> > >
> > > - We strictly catch more bugs, since there's not only one place in the
> > >   entire tree which is annotated with the special class. All the
> > >   other places that had explicit lockdep nesting annotations we're now
> > >   going to leave up to lockdep again.
> > >
> > > - Specifically this catches stuff like calling get_pages from
> > >   put_pages (which isn't really a good idea, if we can call put_pages
> > >   so could the shrinker). I've seen patches do exactly that.
> > >
> > > Of course I fully expect CI will show me for the fool I am with this
> > > one here :-)
> > >
> > > v2: There can only be one (lockdep only has a cache for the first
> > > subclass, not for deeper ones, and we don't want to make these locks
> > > even slower). Still separate enums for better documentation.
> > >
> > > Real fix: don forget about phys objs and pin_map(), and fix the
> > > shrinker to have the right annotations ... silly me.
> > >
> > > v3: Forgot usertptr too ...
> > >
> > > v4: Improve comment for pages_pin_count, drop the IMPORTANT
> comment
> > > and instead prime lockdep (Chris).
> > >
> > > Cc: Chris Wilson 
> > > Cc: "Tang, CQ" 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Joonas Lahtinen 
> > > Signed-off-by: Daniel Vetter 
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 13 -
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h   | 16
> +---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object_types.h |  6 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_pages.c|  9 -
> > >  drivers/gpu/drm/i915/gem/i915_gem_phys.c |  2 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c |  5 ++---
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c  |  4 ++--
> > >  drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 12 ++--
> &g

Re: [Intel-gfx] [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its head

2019-08-16 Thread Tang, CQ


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> Sent: Friday, August 16, 2019 3:03 PM
> To: Chris Wilson 
> Cc: Intel Graphics Development ; Tang, CQ
> ; Ursulin, Tvrtko ; Joonas
> Lahtinen ; Vetter, Daniel
> 
> Subject: Re: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on
> its head
> 
> On Fri, Aug 16, 2019 at 8:46 PM Chris Wilson 
> wrote:
> >
> > Quoting Daniel Vetter (2019-08-16 19:23:36)
> > > The trouble with having a plain nesting flag for locks which do not
> > > naturally nest (unlike block devices and their partitions, which is
> > > the original motivation for nesting levels) is that lockdep will
> > > never spot a true deadlock if you screw up.
> > >
> > > This patch is an attempt at trying better, by highlighting a bit
> > > more the actual nature of the nesting that's going on. Essentially
> > > we have two kinds of objects:
> > >
> > > - objects without pages allocated, which cannot be on any lru and are
> > >   hence inaccessible to the shrinker.
> > >
> > > - objects which have pages allocated, which are on an lru, and which
> > >   the shrinker can decide to throw out.
> > >
> > > For the former type of object, memory allcoations while holding
> > > obj->mm.lock are permissible. For the latter they are not. And
> > > get/put_pages transitions between the two types of objects.
> > >
> > > This is still not entirely fool-proof since the rules might chance.
> > > But as long as we run such a code ever at runtime lockdep should be
> > > able to observe the inconsistency and complain (like with any other
> > > lockdep class that we've split up in multiple classes). But there
> > > are a few clear benefits:
> > >
> > > - We can drop the nesting flag parameter from
> > >   __i915_gem_object_put_pages, because that function by definition is
> > >   never going allocate memory, and calling it on an object which
> > >   doesn't have its pages allocated would be a bug.
> > >
> > > - We strictly catch more bugs, since there's not only one place in the
> > >   entire tree which is annotated with the special class. All the
> > >   other places that had explicit lockdep nesting annotations we're now
> > >   going to leave up to lockdep again.
> > >
> > > - Specifically this catches stuff like calling get_pages from
> > >   put_pages (which isn't really a good idea, if we can call put_pages
> > >   so could the shrinker). I've seen patches do exactly that.
> > >
> > > Of course I fully expect CI will show me for the fool I am with this
> > > one here :-)
> > >
> > > v2: There can only be one (lockdep only has a cache for the first
> > > subclass, not for deeper ones, and we don't want to make these locks
> > > even slower). Still separate enums for better documentation.
> > >
> > > Real fix: don forget about phys objs and pin_map(), and fix the
> > > shrinker to have the right annotations ... silly me.
> > >
> > > v3: Forgot usertptr too ...
> > >
> > > v4: Improve comment for pages_pin_count, drop the IMPORTANT
> comment
> > > and instead prime lockdep (Chris).
> > >
> > > Cc: Chris Wilson 
> > > Cc: "Tang, CQ" 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Joonas Lahtinen 
> > > Signed-off-by: Daniel Vetter 
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 13 -
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h   | 16
> +---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object_types.h |  6 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_pages.c|  9 -
> > >  drivers/gpu/drm/i915/gem/i915_gem_phys.c |  2 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c |  5 ++---
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c  |  4 ++--
> > >  drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 12 ++--
> > >  8 files changed, 45 insertions(+), 22 deletions(-)
> >
> > static inline int __must_check
> > i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) {
> > might_lock(>mm.lock);
> >
> > if (atomic_inc_not_zero(>mm.pages_pin_count))
> > return 0;
> >
> > return __i915_gem_object_get_pages(obj); }
> >
> > is now testing the wrong lock class.
> 
> Unfortunately ther

Re: [Intel-gfx] [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its head

2019-08-16 Thread Tang, CQ


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch]
> Sent: Friday, August 16, 2019 11:24 AM
> To: Intel Graphics Development 
> Cc: Daniel Vetter ; Chris Wilson  wilson.co.uk>; Tang, CQ ; Ursulin, Tvrtko
> ; Joonas Lahtinen
> ; Vetter, Daniel 
> Subject: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its
> head
> 
> The trouble with having a plain nesting flag for locks which do not naturally
> nest (unlike block devices and their partitions, which is the original 
> motivation
> for nesting levels) is that lockdep will never spot a true deadlock if you 
> screw
> up.
> 
> This patch is an attempt at trying better, by highlighting a bit more the 
> actual
> nature of the nesting that's going on. Essentially we have two kinds of
> objects:
> 
> - objects without pages allocated, which cannot be on any lru and are
>   hence inaccessible to the shrinker.
> 
> - objects which have pages allocated, which are on an lru, and which
>   the shrinker can decide to throw out.
> 
> For the former type of object, memory allcoations while holding
> obj->mm.lock are permissible. For the latter they are not. And
> get/put_pages transitions between the two types of objects.
> 
> This is still not entirely fool-proof since the rules might chance.
> But as long as we run such a code ever at runtime lockdep should be able to
> observe the inconsistency and complain (like with any other lockdep class
> that we've split up in multiple classes). But there are a few clear benefits:
> 
> - We can drop the nesting flag parameter from
>   __i915_gem_object_put_pages, because that function by definition is
>   never going allocate memory, and calling it on an object which
>   doesn't have its pages allocated would be a bug.
> 
> - We strictly catch more bugs, since there's not only one place in the
>   entire tree which is annotated with the special class. All the
>   other places that had explicit lockdep nesting annotations we're now
>   going to leave up to lockdep again.
> 
> - Specifically this catches stuff like calling get_pages from
>   put_pages (which isn't really a good idea, if we can call put_pages
>   so could the shrinker). I've seen patches do exactly that.
> 
> Of course I fully expect CI will show me for the fool I am with this one here 
> :-)
> 
> v2: There can only be one (lockdep only has a cache for the first subclass, 
> not
> for deeper ones, and we don't want to make these locks even slower). Still
> separate enums for better documentation.
> 
> Real fix: don forget about phys objs and pin_map(), and fix the shrinker to
> have the right annotations ... silly me.
> 
> v3: Forgot usertptr too ...
> 
> v4: Improve comment for pages_pin_count, drop the IMPORTANT comment
> and instead prime lockdep (Chris).
> 
> Cc: Chris Wilson 
> Cc: "Tang, CQ" 
> Cc: Tvrtko Ursulin 
> Cc: Joonas Lahtinen 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c   | 13 -
>  drivers/gpu/drm/i915/gem/i915_gem_object.h   | 16 +---
>  drivers/gpu/drm/i915/gem/i915_gem_object_types.h |  6 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c|  9 -
>  drivers/gpu/drm/i915/gem/i915_gem_phys.c |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c |  5 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c  |  4 ++--
>  drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 12 ++--
>  8 files changed, 45 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 3929c3a6b281..d01258b175f5 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -22,6 +22,8 @@
>   *
>   */
> 
> +#include 
> +
>  #include "display/intel_frontbuffer.h"
>  #include "gt/intel_gt.h"
>  #include "i915_drv.h"
> @@ -61,6 +63,15 @@ void i915_gem_object_init(struct
> drm_i915_gem_object *obj,  {
>   mutex_init(>mm.lock);
> 
> + if (IS_ENABLED(CONFIG_LOCKDEP)) {
> + mutex_lock_nested(>mm.lock,
> I915_MM_GET_PAGES);
> + fs_reclaim_acquire(GFP_KERNEL);
> + might_lock(>mm.lock);
> + fs_reclaim_release(GFP_KERNEL);
> + mutex_unlock(>mm.lock);
> + }
> +
> +
>   spin_lock_init(>vma.lock);
>   INIT_LIST_HEAD(>vma.list);
> 
> @@ -191,7 +202,7 @@ static void __i915_gem_free_objects(struct
> drm_i915_private *i915,
>   GEM_BUG_ON(!list_empty(>lut_list));
> 
>   atomic_se

Re: [Intel-gfx] [PATCH] RFC: drm/i915: Switch obj->mm.lock lockdep annotations on its head

2019-08-15 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Daniel Vetter
> Sent: Wednesday, August 14, 2019 12:25 PM
> To: Intel Graphics Development 
> Cc: Daniel Vetter ; Vetter, Daniel
> 
> Subject: [Intel-gfx] [PATCH] RFC: drm/i915: Switch obj->mm.lock lockdep
> annotations on its head
> 
> The trouble with having a plain nesting flag for locks which do not naturally
> nest (unlike block devices and their partitions, which is the original 
> motivation
> for nesting levels) is that lockdep will never spot a true deadlock if you 
> screw
> up.
> 
> This patch is an attempt at trying better, by highlighting a bit more the 
> actual
> nature of the nesting that's going on. Essentially we have two kinds of
> objects:
> 
> - objects without pages allocated, which cannot be on any lru and are
>   hence inaccessible to the shrinker.
> 
> - objects which have pages allocated, which are on an lru, and which
>   the shrinker can decide to throw out.
> 
> For the former type of object, memory allcoations while holding
> obj->mm.lock are permissible. For the latter they are not. And
> get/put_pages transitions between the two types of objects.
> 
> This is still not entirely fool-proof since the rules might chance.
> But as long as we run such a code ever at runtime lockdep should be able to
> observe the inconsistency and complain (like with any other lockdep class
> that we've split up in multiple classes). But there are a few clear benefits:
> 
> - We can drop the nesting flag parameter from
>   __i915_gem_object_put_pages, because that function by definition is
>   never going allocate memory, and calling it on an object which
>   doesn't have its pages allocated would be a bug.
> 
> - We strictly catch more bugs, since there's not only one place in the
>   entire tree which is annotated with the special class. All the
>   other places that had explicit lockdep nesting annotations we're now
>   going to leave up to lockdep again.
> 
> - Specifically this catches stuff like calling get_pages from
>   put_pages (which isn't really a good idea, if we can call put_pages
>   so could the shrinker). I've seen patches do exactly that.
> 
> Of course I fully expect CI will show me for the fool I am with this one here 
> :-)
> 
> v2: There can only be one (lockdep only has a cache for the first subclass, 
> not
> for deeper ones, and we don't want to make these locks even slower). Still
> separate enums for better documentation.
> 
> Real fix: don forget about phys objs and pin_map(), and fix the shrinker to
> have the right annotations ... silly me.
> 
> v3: Forgot usertptr too ...

I eventually looked this patch. My question is on the shrinking calling stack:

Pin_pages(A)-->get_page(A)-->lock(objA->mm.lock, 
I915_MM_GET_PAGES)-->i915_gem_shrink()-->
Lock(struct_mutex)-->put_pages(B)-->lock(objB->mm.lock)

objA is locked with: mutex_lock_interruptible_nested(>mm.lock, 
I915_MM_GET_PAGES);

objB is locked with: mutex_lock(>mm.lock);
My understanding is that objB locking is equivalently to:
mutex_lock_nested(>mm.lock, I915_MM_NORMAL);

so you lock subclass=2 first on A, then lock subclass=0 next B, the reverse 
order.
Doesn't this cause a lockdep warning?   

--CQ


> 
> Cc: Chris Wilson 
> Cc: Tvrtko Ursulin 
> Cc: Joonas Lahtinen 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c   |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h   | 16 +---
>  drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 10 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c|  9 -
>  drivers/gpu/drm/i915/gem/i915_gem_phys.c |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c |  5 ++---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c  |  4 ++--
>  drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 12 ++--
>  8 files changed, 38 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 3929c3a6b281..a1a835539e09 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -191,7 +191,7 @@ static void __i915_gem_free_objects(struct
> drm_i915_private *i915,
>   GEM_BUG_ON(!list_empty(>lut_list));
> 
>   atomic_set(>mm.pages_pin_count, 0);
> - __i915_gem_object_put_pages(obj, I915_MM_NORMAL);
> + __i915_gem_object_put_pages(obj);
>   GEM_BUG_ON(i915_gem_object_has_pages(obj));
>   bitmap_free(obj->bit_17);
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 3714cf234d64..5ce511ca7fa8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -281,11 +281,21 @@ i915_gem_object_unpin_pages(struct
> drm_i915_gem_object *obj)
> 
>  enum i915_mm_subclass { /* 

Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic eviction

2019-08-15 Thread Tang, CQ


> -Original Message-
> From: Daniel Vetter [mailto:dan...@ffwll.ch]
> Sent: Thursday, August 15, 2019 9:21 AM
> To: Tang, CQ 
> Cc: Matthew Auld ; Intel Graphics
> Development ; Auld, Matthew
> 
> Subject: Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic
> eviction
> 
> On Thu, Aug 15, 2019 at 4:58 PM Tang, CQ  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On
> > > Behalf Of Daniel Vetter
> > > Sent: Thursday, August 15, 2019 7:27 AM
> > > To: Matthew Auld 
> > > Cc: Intel Graphics Development ;
> > > Auld, Matthew 
> > > Subject: Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support
> > > basic eviction
> > >
> > > On Thu, Aug 15, 2019 at 12:48 PM Matthew Auld
> > >  wrote:
> > > >
> > > > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter  wrote:
> > > > >
> > > > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > > > Support basic eviction for regions.
> > > > > >
> > > > > > Signed-off-by: Matthew Auld 
> > > > > > Cc: Joonas Lahtinen 
> > > > > > Cc: Abdiel Janulgue 
> > > > >
> > > > > So from a very high level this looks like it was largely
> > > > > modelled after i915_gem_shrink.c and not i915_gem_evict.c (our
> > > > > other "make room, we're running out of stuff" code). Any specific
> reasons?
> > > >
> > > > IIRC I think it was originally based on the patches that exposed
> > > > stolen-memory to userspace from a few years ago.
> > > >
> > > > >
> > > > > I think i915_gem_evict is a lot closer match for what we want
> > > > > for vram (it started out to manage severely limitted GTT on
> > > > > gen2/3/4) after all. With the complication that we'll have to
> > > > > manage physical memory with multiple virtual mappings of it on
> > > > > top, so unfortunately we can't just reuse the locking patter
> > > > > Chris has come up with in his
> > > struct_mutex-removal branch.
> > > > > But at least conceptually it should be a lot closer.
> > > >
> > > > When you say make it more like i915_gem_evict, what does that mean?
> > > > Are you talking about the eviction roster stuff, or the
> > > > placement/locking of the eviction logic, with it being deep down
> > > > in get_pages?
> > >
> > > So there's kinda two aspects here that I meant.
> > >
> > > First is the high-level approach of the shrinker, which is a direct
> > > reflection of core mm low memory handling principles: Core mm just
> > > tries to equally shrink everyone when there's low memory, which is
> > > managed by watermarks, and a few other tricks. This is all only
> > > best-effort, and if multiple threads want a lot of memory at the
> > > same time then it's all going to fail with ENOMEM.
> > >
> > > On gpus otoh, and what we do in i915_gem_eviction.c for gtt (and
> > > very much needed with the tiny gtt for everything in gen2/3/4/5) is
> > > that when we run out of space, we stall, throw out everyone else,
> > > and have exclusive access to the entire gpu space. Then the next
> > > batchbuffer goes through the same dance. With this you guarantee
> > > that if you have a series of batchbuffers which all need e.g. 60% of
> > > lmem, they will all be able to execute. With the shrinker-style of
> > > low-memory handling eventually you're unlucky, both threads will only
> get up to 50%, fail with ENOSPC, and userspace crashes.
> > > Which is not good.
> > >
> > > The other bit is locking. Since we need to free pages from the
> > > shrinker there's tricky locking rules involved. Worse, we cannot
> > > back off from the shrinker down to e.g. the kmalloc or alloc_pages
> > > called that put us into reclaim. Which means the usual deadlock
> > > avoidance trick of having a slowpath where you first drop all the
> > > locks, then reacquire them in the right order doesn't work - in some
> > > cases the caller of kmalloc or alloc_pages could be holding a lock
> > > that we'd need to unlock first. Hence why the shrinker uses the best-
> effort-might-fail solution of trylocks, encoded in shrinker_lock.
> > >
> > > But for lmem we don't have such an excuse, b

Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic eviction

2019-08-15 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Daniel Vetter
> Sent: Thursday, August 15, 2019 7:27 AM
> To: Matthew Auld 
> Cc: Intel Graphics Development ; Auld,
> Matthew 
> Subject: Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic
> eviction
> 
> On Thu, Aug 15, 2019 at 12:48 PM Matthew Auld
>  wrote:
> >
> > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter  wrote:
> > >
> > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > Support basic eviction for regions.
> > > >
> > > > Signed-off-by: Matthew Auld 
> > > > Cc: Joonas Lahtinen 
> > > > Cc: Abdiel Janulgue 
> > >
> > > So from a very high level this looks like it was largely modelled
> > > after i915_gem_shrink.c and not i915_gem_evict.c (our other "make
> > > room, we're running out of stuff" code). Any specific reasons?
> >
> > IIRC I think it was originally based on the patches that exposed
> > stolen-memory to userspace from a few years ago.
> >
> > >
> > > I think i915_gem_evict is a lot closer match for what we want for
> > > vram (it started out to manage severely limitted GTT on gen2/3/4)
> > > after all. With the complication that we'll have to manage physical
> > > memory with multiple virtual mappings of it on top, so unfortunately
> > > we can't just reuse the locking patter Chris has come up with in his
> struct_mutex-removal branch.
> > > But at least conceptually it should be a lot closer.
> >
> > When you say make it more like i915_gem_evict, what does that mean?
> > Are you talking about the eviction roster stuff, or the
> > placement/locking of the eviction logic, with it being deep down in
> > get_pages?
> 
> So there's kinda two aspects here that I meant.
> 
> First is the high-level approach of the shrinker, which is a direct 
> reflection of
> core mm low memory handling principles: Core mm just tries to equally
> shrink everyone when there's low memory, which is managed by
> watermarks, and a few other tricks. This is all only best-effort, and if 
> multiple
> threads want a lot of memory at the same time then it's all going to fail with
> ENOMEM.
> 
> On gpus otoh, and what we do in i915_gem_eviction.c for gtt (and very much
> needed with the tiny gtt for everything in gen2/3/4/5) is that when we run
> out of space, we stall, throw out everyone else, and have exclusive access to
> the entire gpu space. Then the next batchbuffer goes through the same
> dance. With this you guarantee that if you have a series of batchbuffers
> which all need e.g. 60% of lmem, they will all be able to execute. With the
> shrinker-style of low-memory handling eventually you're unlucky, both
> threads will only get up to 50%, fail with ENOSPC, and userspace crashes.
> Which is not good.
> 
> The other bit is locking. Since we need to free pages from the shrinker
> there's tricky locking rules involved. Worse, we cannot back off from the
> shrinker down to e.g. the kmalloc or alloc_pages called that put us into
> reclaim. Which means the usual deadlock avoidance trick of having a
> slowpath where you first drop all the locks, then reacquire them in the right
> order doesn't work - in some cases the caller of kmalloc or alloc_pages could
> be holding a lock that we'd need to unlock first. Hence why the shrinker uses
> the best-effort-might-fail solution of trylocks, encoded in shrinker_lock.
> 
> But for lmem we don't have such an excuse, because it's all our own code.
> The locking design can (and should!) assume that it can get out of any
> deadlock and always acquire all the locks it needs. Without that you can't
> achive the first part about guaranteeing execution of batches which
> collectively need more than 100% of lmem, but individually all fit. As an
> example if you look at the amdgpu command submission ioctl, that passes
> around ttm_operation_ctx which tracks a few things about locks and other
> bits, and if they hit a possible deadlock situation they can unwind the entire
> CS and restart by taking the locks in the right order.

Thank you for the explanation.

What does our 'struct_mutex' protect for exactly?  As example, I see when 
blitter engine functions are called, we hold 'struct_mutex" first.

Can we replace 'struct_mutex' with some fine-grain locks so that we can lock 
obj->mm.lock first, and then lock these fine-grain locks?

I need some background info about 'struct_mutex' design.

--CQ

> 
> I thought I typed that up somewhere, but I guess it got lost ...
> 
> Cheers, Daniel
> 
> >
> > >
> > > But I might be entirely off the track with reconstructing how this
> > > code came to be, so please elaborate a bit.
> > >
> > > Thanks, Daniel
> 
> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3 03/37] drm/i915/region: support basic eviction

2019-08-10 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Chris Wilson
> Sent: Saturday, August 10, 2019 3:19 AM
> To: Auld, Matthew ; intel-
> g...@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH v3 03/37] drm/i915/region: support basic
> eviction
> 
> Quoting Matthew Auld (2019-08-09 23:26:09)
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c index 6ff01a404346..8735dea74809
> > 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1105,6 +1105,23 @@ i915_gem_madvise_ioctl(struct drm_device
> *dev, void *data,
> > !i915_gem_object_has_pages(obj))
> > i915_gem_object_truncate(obj);
> >
> > +   if (obj->mm.region) {
> > +   mutex_lock(>mm.region->obj_lock);
> > +
> > +   switch (obj->mm.madv) {
> > +   case I915_MADV_WILLNEED:
> > +   list_move(>mm.region_link,
> > + >mm.region->objects);
> > +   break;
> > +   default:
> > +   list_move(>mm.region_link,
> > + >mm.region->purgeable);
> > +   break;
> > +   }
> > +
> > +   mutex_unlock(>mm.region->obj_lock);
> > +   }
> > +
> > args->retained = obj->mm.madv != __I915_MADV_PURGED;
> 
> Little bit of an impedance mismatch, I hope this turns out fine when
> everything is a memory region.
> 
> > mutex_unlock(>mm.lock);
> >
> > diff --git a/drivers/gpu/drm/i915/intel_memory_region.c
> > b/drivers/gpu/drm/i915/intel_memory_region.c
> > index ef12e462acb8..3a3caaadea1f 100644
> > --- a/drivers/gpu/drm/i915/intel_memory_region.c
> > +++ b/drivers/gpu/drm/i915/intel_memory_region.c
> > @@ -12,6 +12,51 @@ const u32 intel_region_map[] = {
> > [INTEL_MEMORY_STOLEN] = BIT(INTEL_STOLEN +
> > INTEL_MEMORY_TYPE_SHIFT) | BIT(0),  };
> >
> > +static int
> > +intel_memory_region_evict(struct intel_memory_region *mem,
> > + resource_size_t target,
> > + unsigned int flags) {
> > +   struct drm_i915_gem_object *obj;
> > +   resource_size_t found;
> > +   int err;
> > +
> > +   err = 0;
> > +   found = 0;
> > +
> > +   mutex_lock(>obj_lock);
> > +   list_for_each_entry(obj, >purgeable, mm.region_link) {
> > +   if (!i915_gem_object_has_pages(obj))
> > +   continue;
> > +
> > +   if (READ_ONCE(obj->pin_global))
> > +   continue;
> > +
> > +   if (atomic_read(>bind_count))
> > +   continue;
> > +
> > +   mutex_unlock(>obj_lock);
> > +
> > +   __i915_gem_object_put_pages(obj, I915_MM_SHRINKER);
> 
> So we don't really care about the object being bound then? As all we care
> about is the page's pin_count.
> 
> So instead of obj->pin_global, obj->bind_bound, you just want
> 
> if (atomic_read(>pages.pin_count))
>   continue;
> 
> as the quick check to see if it is worth preceding.
> 
> > +   mutex_lock_nested(>mm.lock, I915_MM_SHRINKER);
> > +   if (!i915_gem_object_has_pages(obj)) {
> > +   obj->mm.madv = __I915_MADV_PURGED;
> > +   found += obj->base.size;
> > +   }
> > +   mutex_unlock(>mm.lock);
> 
> The locking here accomplishes what? You just want a boolean from
> put_pages().

I have the same question. But looked the i915_gem_shrink() function, it has 
similar code. Do we prevent any race condition here?
I want to use this function for swapping so hope to understand more.

--CQ

> 
> > +
> > +   if (found >= target)
> > +   return 0;
> > +
> > +   mutex_lock(>obj_lock);
> > +   }
> > +
> > +   err = -ENOSPC;
> > +   mutex_unlock(>obj_lock);
> > +   return err;
> > +}
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] drm/i915/blt: bump the size restriction

2019-08-09 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Tang, CQ
> Sent: Friday, August 9, 2019 3:21 PM
> To: Chris Wilson ; Auld, Matthew
> ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915/blt: bump the size restriction
> 
> 
> 
> > -Original Message-
> > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On
> > Behalf Of Chris Wilson
> > Sent: Friday, August 9, 2019 2:18 PM
> > To: Auld, Matthew ; intel-
> > g...@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915/blt: bump the size
> > restriction
> >
> > Quoting Matthew Auld (2019-08-09 21:29:26)
> > > As pointed out by Chris, with our current approach we are actually
> > > limited to S16_MAX * PAGE_SIZE for our size when using the blt to
> > > clear pages. Keeping things simple try to fix this by reducing the
> > > copy to a sequence of S16_MAX * PAGE_SIZE blocks.
> 
> Just a general question. If the address space is limited to 1G, for example,
> and we want to copy between two 1G-size objects.
> Do we do fragmentation inside the blitter copying routine?
> 
> This could happen during swapping. We pre-allocate 1G PPGTT, and reserve
> the rest of address space, so only 1G space can be used.

Or, can we add offset and size into the copying object, like the pread/pwrite 
to specify offset/size to read/write.
Then we can do loop in caller. But prefer to do fragmentation inside.

--CQ

> 
> --CQ
> 
> 
> > >
> > > Reported-by: Chris Wilson 
> > > Signed-off-by: Matthew Auld 
> > > Cc: Chris Wilson 
> > > ---
> > >  .../gpu/drm/i915/gem/i915_gem_client_blt.c|  31 +++-
> > >  .../gpu/drm/i915/gem/i915_gem_object_blt.c| 139 ++-
> --
> > -
> > >  .../gpu/drm/i915/gem/i915_gem_object_blt.h|   9 +-
> > >  .../i915/gem/selftests/i915_gem_client_blt.c  |  16 +-
> > > .../i915/gem/selftests/i915_gem_object_blt.c  |  22 ++-
> > >  5 files changed, 170 insertions(+), 47 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > > index 08a84c940d8d..4b096309a97e 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > > @@ -5,6 +5,8 @@
> > >
> > >  #include "i915_drv.h"
> > >  #include "gt/intel_context.h"
> > > +#include "gt/intel_engine_pm.h"
> > > +#include "gt/intel_engine_pool.h"
> > >  #include "i915_gem_client_blt.h"
> > >  #include "i915_gem_object_blt.h"
> > >
> > > @@ -156,7 +158,9 @@ static void clear_pages_worker(struct
> > > work_struct
> > *work)
> > > struct drm_i915_private *i915 = w->ce->engine->i915;
> > > struct drm_i915_gem_object *obj = w->sleeve->vma->obj;
> > > struct i915_vma *vma = w->sleeve->vma;
> > > +   struct intel_engine_pool_node *pool;
> > > struct i915_request *rq;
> > > +   struct i915_vma *batch;
> > > int err = w->dma.error;
> > >
> > > if (unlikely(err))
> > > @@ -176,10 +180,17 @@ static void clear_pages_worker(struct
> > work_struct *work)
> > > if (unlikely(err))
> > > goto out_unlock;
> > >
> > > +   intel_engine_pm_get(w->ce->engine);
> > > +   batch = intel_emit_vma_fill_blt(, w->ce, vma,
> > > + w->value);
> >
> > I had to search for where pool was being set!
> >
> > Hmm, batch is from pool right? So we are the owner of the batch, and
> > we could set batch->private = pool.
> >
> > > +   if (IS_ERR(batch)) {
> > > +   err = PTR_ERR(batch);
> > > +   goto out_unpin;
> > > +   }
> > > +
> > > rq = intel_context_create_request(w->ce);
> > > if (IS_ERR(rq)) {
> > > err = PTR_ERR(rq);
> > > -   goto out_unpin;
> > > +   goto out_batch;
> > > }
> > >
> > > /* There's no way the fence has signalled */ @@ -187,6
> > > +198,16 @@ static void clear_pages_worker(struct work_struct *work)
> > >clear_pages_dma_fence_cb))
> > > GEM_BUG_ON(1);
&

Re: [Intel-gfx] [PATCH 3/3] drm/i915/blt: bump the size restriction

2019-08-09 Thread Tang, CQ


> -Original Message-
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Chris Wilson
> Sent: Friday, August 9, 2019 2:18 PM
> To: Auld, Matthew ; intel-
> g...@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915/blt: bump the size restriction
> 
> Quoting Matthew Auld (2019-08-09 21:29:26)
> > As pointed out by Chris, with our current approach we are actually
> > limited to S16_MAX * PAGE_SIZE for our size when using the blt to
> > clear pages. Keeping things simple try to fix this by reducing the
> > copy to a sequence of S16_MAX * PAGE_SIZE blocks.

Just a general question. If the address space is limited to 1G, for example, 
and we want to copy between two 1G-size objects.
Do we do fragmentation inside the blitter copying routine?

This could happen during swapping. We pre-allocate 1G PPGTT, and reserve the 
rest of address space, so only 1G space can be used.

--CQ


> >
> > Reported-by: Chris Wilson 
> > Signed-off-by: Matthew Auld 
> > Cc: Chris Wilson 
> > ---
> >  .../gpu/drm/i915/gem/i915_gem_client_blt.c|  31 +++-
> >  .../gpu/drm/i915/gem/i915_gem_object_blt.c| 139 ++---
> -
> >  .../gpu/drm/i915/gem/i915_gem_object_blt.h|   9 +-
> >  .../i915/gem/selftests/i915_gem_client_blt.c  |  16 +-
> > .../i915/gem/selftests/i915_gem_object_blt.c  |  22 ++-
> >  5 files changed, 170 insertions(+), 47 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > index 08a84c940d8d..4b096309a97e 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
> > @@ -5,6 +5,8 @@
> >
> >  #include "i915_drv.h"
> >  #include "gt/intel_context.h"
> > +#include "gt/intel_engine_pm.h"
> > +#include "gt/intel_engine_pool.h"
> >  #include "i915_gem_client_blt.h"
> >  #include "i915_gem_object_blt.h"
> >
> > @@ -156,7 +158,9 @@ static void clear_pages_worker(struct work_struct
> *work)
> > struct drm_i915_private *i915 = w->ce->engine->i915;
> > struct drm_i915_gem_object *obj = w->sleeve->vma->obj;
> > struct i915_vma *vma = w->sleeve->vma;
> > +   struct intel_engine_pool_node *pool;
> > struct i915_request *rq;
> > +   struct i915_vma *batch;
> > int err = w->dma.error;
> >
> > if (unlikely(err))
> > @@ -176,10 +180,17 @@ static void clear_pages_worker(struct
> work_struct *work)
> > if (unlikely(err))
> > goto out_unlock;
> >
> > +   intel_engine_pm_get(w->ce->engine);
> > +   batch = intel_emit_vma_fill_blt(, w->ce, vma, w->value);
> 
> I had to search for where pool was being set!
> 
> Hmm, batch is from pool right? So we are the owner of the batch, and we
> could set batch->private = pool.
> 
> > +   if (IS_ERR(batch)) {
> > +   err = PTR_ERR(batch);
> > +   goto out_unpin;
> > +   }
> > +
> > rq = intel_context_create_request(w->ce);
> > if (IS_ERR(rq)) {
> > err = PTR_ERR(rq);
> > -   goto out_unpin;
> > +   goto out_batch;
> > }
> >
> > /* There's no way the fence has signalled */ @@ -187,6 +198,16
> > @@ static void clear_pages_worker(struct work_struct *work)
> >clear_pages_dma_fence_cb))
> > GEM_BUG_ON(1);
> >
> > +   i915_vma_lock(batch);
> > +   err = i915_vma_move_to_active(batch, rq, 0);
> > +   i915_vma_unlock(batch);
> > +   if (unlikely(err))
> > +   goto out_request;
> > +
> > +   err = intel_engine_pool_mark_active(pool, rq);
> > +   if (unlikely(err))
> > +   goto out_request;
> > +
> > if (w->ce->engine->emit_init_breadcrumb) {
> > err = w->ce->engine->emit_init_breadcrumb(rq);
> > if (unlikely(err))
> > @@ -202,7 +223,9 @@ static void clear_pages_worker(struct work_struct
> *work)
> > if (err)
> > goto out_request;
> >
> > -   err = intel_emit_vma_fill_blt(rq, vma, w->value);
> > +   err = w->ce->engine->emit_bb_start(rq,
> > +  batch->node.start, 
> > batch->node.size,
> > +  0);
> >  out_request:
> > if (unlikely(err)) {
> > i915_request_skip(rq, err); @@ -210,7 +233,11 @@
> > static void clear_pages_worker(struct work_struct *work)
> > }
> >
> > i915_request_add(rq);
> > +out_batch:
> > +   i915_vma_unpin(batch);
> > +   intel_engine_pool_put(pool);
> >  out_unpin:
> > +   intel_engine_pm_put(w->ce->engine);
> > i915_vma_unpin(vma);
> >  out_unlock:
> > mutex_unlock(>drm.struct_mutex);
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
> > index fa90c38c8b07..c1e5edd1e359 100644
> > ---