from:"Lionel Landwerlin"

Re: Intel clc dependency

2024-04-11 Thread Lionel Landwerlin


On 11/04/2024 06:05, Tapani Pälli wrote:


On 11.4.2024 1.15, Brian Paul wrote:

On 4/10/24 13:53, Timo Aaltonen wrote:

Brian Paul kirjoitti 6.4.2024 klo 1.05:
I'm trying to build the Intel Vulkan driver.  First time in a few 
months.  I'm having build problems related to clc.  I'm on Ubuntu 
22.04



[...]
[1347/3181] Generating src/intel/vulkan/...om command (wrapped by 
meson to set env)
FAILED: 
src/intel/vulkan/grl/gfx125_bvh_build_BFS_BFS_pass1_indexed_batchable.h 

env MESA_SHADER_CACHE_DISABLE=true MESA_SPIRV_LOG_LEVEL=error 
/home/brianp/build3/mesa/build/src/intel/compiler/intel_clc -p dg2 
--prefix gfx125_bvh_build_BFS_BFS_pass1_indexed_batchable -e 
BFS_pass1_indexed_batchable --in 
../src/intel/vulkan/grl/gpu/bvh_build_BFS.cl --in 
/home/brianp/build3/mesa/src/intel/vulkan/grl/gpu/libs/lsc_intrinsics_fallback.cl 
-o 
src/intel/vulkan/grl/gfx125_bvh_build_BFS_BFS_pass1_indexed_batchable.h 
-- -cl-std=cl2.0 -D__OPENCL_VERSION__=200 -DMAX_HW_SIMD_WIDTH=16 
-DMAX_WORKGROUP_SIZE=16 
-I/home/brianp/build3/mesa/src/intel/vulkan/grl/gpu 
-I/home/brianp/build3/mesa/src/intel/vulkan/grl/include

ERROR: libclc shader missing. Consider installing the libclc package
Aborted (core dumped)

I've installed every clc-related package I could find.  I've tried 
several options for the 'intel-clc' option without luck.


BTW, the description of intel-clc in meson_options.txt looks suspect:

option(
   'intel-clc',
   type : 'combo',
   deprecated: {'true': 'enabled', 'false': 'disabled'},
   value : 'disabled',
   choices : [
 'enabled', 'disabled', 'system',
   ],
   description : 'Build the intel-clc compiler (enables Vulkan 
Intel ' +

 'Ray Tracing on supported hardware).'
)

The default is 'disabled' but that's deprecated?  Choices include 
'enabled' but that's deprecated too?


Any tips for building the ToT Intel Vulkan driver?

-Brian



You need to have libclc-NN-dev installed matching with the llvm 
version, which on stock 22.04 would be libclc-13-dev.


I'm using llvm 15 and have libclc-15-dev installed.  I get the 
"ERROR: libclc shader missing. Consider installing the libclc 
package" issue I quoted above.




You'll need to install libclc-15 too. I think libclc-15-dev package is 
missing dependency to libclc-15 .. did not verify this but I've been 
able to install libclc-15-dev without the library package getting 
installed.



libclang-common-15-dev installed too?





I have llvm 15 installed because I also want to build the radv Vulkan 
driver.


-Brian

Re: [PATCH v2 01/12] drm/doc: add rfc section for small BAR uapi

2022-06-21 Thread Lionel Landwerlin


On 21/06/2022 13:44, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.
v4:
   - Various improvements all over. (Tvrtko)

v5:
   - Include newer integrated platforms when applying the non-recoverable
 context and error capture restriction. (Thomas)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin 
Acked-by: Akeem G Abodunrin 



With Jordan with have changes for Anv/Iris : 
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739


Acked-by: Lionel Landwerlin 



---
  Documentation/gpu/rfc/i915_small_bar.h   | 189 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  47 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 240 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..752bb2ceb399
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,189 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /**
+* @probed_size: Memory probed by the driver (-1 = unknown)
+*
+* Note that it should not be possible to ever encounter a zero value
+* here, also note that no current region type will ever return -1 here.
+* Although for future region types, this might be a possibility. The
+* same applies to the other size fields.
+*/
+   __u64 probed_size;
+
+   /**
+* @unallocated_size: Estimate of memory remaining (-1 = unknown)
+*
+* Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+* Without this (or if this is an older kernel) the value here will
+* always equal the @probed_size. Note this is only currently tracked
+* for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+* will always equal the @probed_size).
+*/
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder (if there is any) will not be CPU
+* accessible.
+*
+* On systems without small BAR, the @probed_size will
+* always equal the @probed_cpu_visible_size, since all
+* of it will be CPU accessible.
+*
+* Note this is only tracked for
+* I915_MEMORY_CLASS_DEVICE regions (for other types the
+* value here will always equal the @probed_size).
+*
+* Note that if the value returned here is zero, then
+* this must be an old kernel which lacks the relevant
+* small-bar uAPI support (including
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+* such systems we should never actually end up with a
+* small BAR configuration, assuming we are able to load
+* the kernel module. Hence it should be safe to treat
+* this the same as when @probed_cpu_visible_size ==
+* @probed_size.
+*/
+   __u64 probed_cpu_visib

Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi

2022-05-17 Thread Lionel Landwerlin


On 17/05/2022 12:23, Tvrtko Ursulin wrote:


On 17/05/2022 09:55, Lionel Landwerlin wrote:

On 17/05/2022 11:29, Tvrtko Ursulin wrote:


On 16/05/2022 19:11, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 164 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  47 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 215 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..4079d287750b
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,164 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;


Is -1 possible today or when it will be? For system memory it 
appears zeroes are returned today so that has to stay I think. Does 
it effectively mean userspace has to consider both 0 and -1 as 
unknown is the question.



I raised this on v2. As far as I can tell there are no situation 
where we would get -1.


Is it really probed_size=0 on smem?? It's not the case on the 
internal branch.


My bad, I misread the arguments to intel_memory_region_create while 
grepping:


struct intel_memory_region *i915_gem_shmem_setup(struct 
drm_i915_private *i915,

 u16 type, u16 instance)
{
return intel_memory_region_create(i915, 0,
  totalram_pages() << PAGE_SHIFT,
  PAGE_SIZE, 0, 0,
  type, instance,
  _region_ops);

I saw "0, 0" and wrongly assumed that would be the data, since it 
matched with my mental model and the comment against unallocated_size 
saying it's only tracked for device memory.


Although I'd say it is questionable for i915 to return this data. I 
wonder it use case is possible where it would even be wrong but don't 
know. I guess the cat is out of the bag now.



Not sure how questionable that is. There are a bunch of tools reporting 
the amount of memory available (free, top, htop, etc...).


It might not be totalram_pages() but probably something close to it.

Having a non 0 & non -1 value is useful.


-Lionel




If the situation is -1 for unknown and some valid size (not zero) I 
don't think there is a problem here.


Regards,

Tvrtko


Anv is not currently handling that case.


I would very much like to not deal with 0 for smem.

It really makes it easier for userspace rather than having to fish 
information from 2 different places and on top of dealing with 
multiple kernel versions.



-Lionel





+
+    /**
+ * @unallocated_size: Estimate of memory remaining (-1 = unknown)
+ *
+ * Note this is only currently tracked for 
I915_MEMORY_CLASS_DEVICE

+ * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get
+ * reliable accounting. Without this(or if this an older 
kernel) the


s/if this an/if this is an/

Also same question as above about -1.


+ * value here will always match the @probed_size.
+ */
+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).


Also question about -1. In this case this could be done since the 
field is yet to be added but I am curious if it ever can be -1.



+ *
+ * This will be always be <= @probed_size, and the
+

Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi

2022-05-17 Thread Lionel Landwerlin


On 17/05/2022 11:29, Tvrtko Ursulin wrote:


On 16/05/2022 19:11, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 164 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  47 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 215 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..4079d287750b
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,164 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;


Is -1 possible today or when it will be? For system memory it appears 
zeroes are returned today so that has to stay I think. Does it 
effectively mean userspace has to consider both 0 and -1 as unknown is 
the question.



I raised this on v2. As far as I can tell there are no situation where 
we would get -1.


Is it really probed_size=0 on smem?? It's not the case on the internal 
branch.


Anv is not currently handling that case.


I would very much like to not deal with 0 for smem.

It really makes it easier for userspace rather than having to fish 
information from 2 different places and on top of dealing with multiple 
kernel versions.



-Lionel





+
+    /**
+ * @unallocated_size: Estimate of memory remaining (-1 = unknown)
+ *
+ * Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE
+ * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get
+ * reliable accounting. Without this(or if this an older kernel) 
the


s/if this an/if this is an/

Also same question as above about -1.


+ * value here will always match the @probed_size.
+ */
+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).


Also question about -1. In this case this could be done since the 
field is yet to be added but I am curious if it ever can be -1.



+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ *
+ * On systems without small BAR, the @probed_size will
+ * always equal the @probed_cpu_visible_size, since all
+ * of it will be CPU accessible.
+ *
+ * Note that if the value returned here is zero, then
+ * this must be an old kernel which lacks the relevant
+ * small-bar uAPI support(including


I have noticed you prefer no space before parentheses throughout the 
text so I guess it's just my preference to have it. Very nitpicky even 
if I am right so up to you.



+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+ * such systems we should never actually end up with a
+ * small BAR configuration, assuming we are able to load
+ * the kernel module. Hence it should be safe to treat
+ * this the same as when @probed_cpu_visible_size ==
+ * @probed_size.
+ */
+    __u64 probed_cpu_visible_size;
+
+    /**
+ * @unallocated_cpu_visible_size: Estimate of CPU
+ * visible memory remaining (-1 = unknown).
+ *
+ * Note this is only currently

Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj

2022-05-16 Thread Lionel Landwerlin


On 14/05/2022 00:06, Jordan Justen wrote:

On 2022-05-13 05:31:00, Lionel Landwerlin wrote:

On 02/05/2022 17:15, Ramalingam C wrote:

Capture the impact of memory region preference list of the objects, on
their memory residency and Flat-CCS capability.

v2:
Fix the Flat-CCS capability of an obj with {lmem, smem} preference
list [Thomas]
v3:
Reworded the doc [Matt]

Signed-off-by: Ramalingam C
cc: Matthew Auld
cc: Thomas Hellstrom
cc: Daniel Vetter
cc: Jon Bloomfield
cc: Lionel Landwerlin
cc: Kenneth Graunke
cc:mesa-dev@lists.freedesktop.org
cc: Jordan Justen
cc: Tony Ye
Reviewed-by: Matthew Auld
---
   include/uapi/drm/i915_drm.h | 16 
   1 file changed, 16 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a2def7b27009..b7e1c2fe08dc 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext {
* At which point we get the object handle in 
_i915_gem_create_ext.handle,
* along with the final object size in _i915_gem_create_ext.size, which
* should account for any rounding up, if required.
+ *
+ * Note that userspace has no means of knowing the current backing region
+ * for objects where @num_regions is larger than one. The kernel will only
+ * ensure that the priority order of the @regions array is honoured, either
+ * when initially placing the object, or when moving memory around due to
+ * memory pressure
+ *
+ * On Flat-CCS capable HW, compression is supported for the objects residing
+ * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other
+ * memory class in @regions and migrated (by I915, due to memory
+ * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to
+ * decompress the content. But I915 dosen't have the required information to
+ * decompress the userspace compressed objects.
+ *
+ * So I915 supports Flat-CCS, only on the objects which can reside only on
+ * I915_MEMORY_CLASS_DEVICE regions.

I think it's fine to assume Flat-CSS surface will always be in lmem.

I see no issue for the Anv Vulkan driver.

Maybe Nanley or Ken can speak for the Iris GL driver?


Acked-by: Jordan Justen

I think Nanley has accounted for this on iris with:

https://gitlab.freedesktop.org/mesa/mesa/-/commit/42a865730ef72574e179b56a314f30fdccc6cba8

-Jordan


Thanks Jordan,


We might want to through in an additional : assert((|flags 
&||BO_ALLOC_SMEM) == 0); in the CCS case

|

|
|

|-Lionel
|

Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj

2022-05-13 Thread Lionel Landwerlin


On 02/05/2022 17:15, Ramalingam C wrote:

Capture the impact of memory region preference list of the objects, on
their memory residency and Flat-CCS capability.

v2:
   Fix the Flat-CCS capability of an obj with {lmem, smem} preference
   list [Thomas]
v3:
   Reworded the doc [Matt]

Signed-off-by: Ramalingam C 
cc: Matthew Auld 
cc: Thomas Hellstrom 
cc: Daniel Vetter 
cc: Jon Bloomfield 
cc: Lionel Landwerlin 
cc: Kenneth Graunke 
cc: mesa-dev@lists.freedesktop.org
cc: Jordan Justen 
cc: Tony Ye 
Reviewed-by: Matthew Auld 
---
  include/uapi/drm/i915_drm.h | 16 
  1 file changed, 16 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a2def7b27009..b7e1c2fe08dc 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext {
   * At which point we get the object handle in _i915_gem_create_ext.handle,
   * along with the final object size in _i915_gem_create_ext.size, which
   * should account for any rounding up, if required.
+ *
+ * Note that userspace has no means of knowing the current backing region
+ * for objects where @num_regions is larger than one. The kernel will only
+ * ensure that the priority order of the @regions array is honoured, either
+ * when initially placing the object, or when moving memory around due to
+ * memory pressure
+ *
+ * On Flat-CCS capable HW, compression is supported for the objects residing
+ * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other
+ * memory class in @regions and migrated (by I915, due to memory
+ * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to
+ * decompress the content. But I915 dosen't have the required information to
+ * decompress the userspace compressed objects.
+ *
+ * So I915 supports Flat-CCS, only on the objects which can reside only on
+ * I915_MEMORY_CLASS_DEVICE regions.



I think it's fine to assume Flat-CSS surface will always be in lmem.

I see no issue for the Anv Vulkan driver.


Maybe Nanley or Ken can speak for the Iris GL driver?


-Lionel



   */
  struct drm_i915_gem_create_ext_memory_regions {
/** @base: Extension link. See struct i915_user_extension. */

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin


On 03/05/2022 17:27, Matthew Auld wrote:

On 03/05/2022 11:39, Lionel Landwerlin wrote:

On 03/05/2022 13:22, Matthew Auld wrote:

On 02/05/2022 09:53, Lionel Landwerlin wrote:

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region 
as known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the 
driver

+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have 
an additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


unallocated_size should always give the same value as probed_size. 
We have the avail tracking, but we don't currently expose that 
through unallocated_size, due to lack of real userspace/user etc.




Now lmem is effectively divided into 2 heaps, but unallocated_size 
is tracking allocation from both parts of lmem.


Yeah, if we ever properly expose the unallocated_size, then we could 
also just add unallocated_cpu_visible_size.




Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


I don't think it's out of the question...

I guess user-space should be able to get the current flag behaviour 
just by specifying: device, system. And it does give more flexibly 
to allow something like: device, device-nm, smem.


We can also drop the probed_cpu_visible_size, which would now just 
be the probed_size with device/device-nm. And if we lack device-nm, 
then the entire thing must be CPU mappable.


One of the downsides though, is that we can no longer easily mix 
object pages from both device + device-nm, which we could previously 
do when we didn't specify the flag. At least according to the 
current design/behaviour for @regions that would not be allowed. I 
guess some kind of new flag like ALLOC_MIXED or so? Although 
currently that is only possible with device + device-nm in ttm/i915.



Thanks, I wasn't aware of the restrictions.

Adding unallocated_cpu_visible_size would be great.


So do we want this in the next version? i.e we already have a current 
real use case in mind for unallocated_size where probed_size is not 
good enough?



Yeah in the  next iteration.

We're using unallocated_size to implement VK_EXT_memory_budget and since 
I'm going to expose lmem mappable/unmappable as 2 different heaps on 
Vulkan, I would use that there too.



-Lionel







-Lionel







-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should b

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin


On 03/05/2022 13:22, Matthew Auld wrote:

On 02/05/2022 09:53, Lionel Landwerlin wrote:

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


unallocated_size should always give the same value as probed_size. We 
have the avail tracking, but we don't currently expose that through 
unallocated_size, due to lack of real userspace/user etc.




Now lmem is effectively divided into 2 heaps, but unallocated_size is 
tracking allocation from both parts of lmem.


Yeah, if we ever properly expose the unallocated_size, then we could 
also just add unallocated_cpu_visible_size.




Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


I don't think it's out of the question...

I guess user-space should be able to get the current flag behaviour 
just by specifying: device, system. And it does give more flexibly to 
allow something like: device, device-nm, smem.


We can also drop the probed_cpu_visible_size, which would now just be 
the probed_size with device/device-nm. And if we lack device-nm, then 
the entire thing must be CPU mappable.


One of the downsides though, is that we can no longer easily mix 
object pages from both device + device-nm, which we could previously 
do when we didn't specify the flag. At least according to the current 
design/behaviour for @regions that would not be allowed. I guess some 
kind of new flag like ALLOC_MIXED or so? Although currently that is 
only possible with device + device-nm in ttm/i915.



Thanks, I wasn't aware of the restrictions.

Adding unallocated_cpu_visible_size would be great.


-Lionel







-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for 
the stuff that
+ * is immutable. Previously we would have two ioctls, one to 
create the object
+ * with gem_create, and another to apply various parameters, 
however this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) al

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin


On 03/05/2022 12:07, Matthew Auld wrote:

On 02/05/2022 19:03, Lionel Landwerlin wrote:

On 02/05/2022 20:58, Abodunrin, Akeem G wrote:



-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, May 2, 2022 12:55 AM
To: Auld, Matthew ; 
intel-...@lists.freedesktop.org

Cc: dri-de...@lists.freedesktop.org; Thomas Hellström
; Bloomfield, Jon
; Daniel Vetter ; 
Justen,

Jordan L ; Kenneth Graunke
; Abodunrin, Akeem G
; mesa-dev@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
    - Some spelling fixes and other small tweaks. (Akeem & Thomas)
    - Rework error capture interactions, including no longer needing
  NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
    - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
   Documentation/gpu/rfc/i915_small_bar.h   | 190

+++

Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
   Documentation/gpu/rfc/index.rst  |   4 +
   3 files changed, 252 insertions(+)
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as
+known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct

drm_i915_query.

+ * For this new query we are adding the new query id
+DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown)

*/

+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+    * @probed_cpu_visible_size: Memory probed by the

driver

+    * that is CPU accessible. (-1 = unknown).
+    *
+    * This will be always be <= @probed_size, and 
the

+    * remainder(if there is any) will not be CPU
+    * accessible.
+    */
+   __u64 probed_cpu_visible_size;
+   };


Trying to implement userspace support in Vulkan for this, I have an 
additional

question about the value of probed_cpu_visible_size.

When is it set to -1?
I believe it is set to -1 if it is unknown, and/or not cpu 
accessible...


Cheers!
~Akeem



So what should I expect on system memory?


I guess just probed_cpu_visible_size == probed_size. Or maybe we can 
just use -1 here?




What value is returned when all of probed_size is CPU visible on 
local memory?


probed_size == probed_cpu_visible_size.



Thanks, looks good to me.

Then maybe we should update the comment to say that.

Looks like there are no cases where we'll get -1.


-Lionel







Thanks,


-Lionel



I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour,
+with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the
+stuff that
+ * is immutable. Previously we would have two ioctls, one to create
+the object
+ * with gem_create, and another to apply various parameters, however
+this
+ * creates some ambiguity for the params which are considered
+immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+    * @size: Requested size for the object.
+    *
+    * The (page-aligned) allocated size for the object will be 
returned.

+    *
+    * Note that for some devices we have might have further minimum
+    * page-size restrictions(larger than 4K), like for device 
local-memory.
+    * However in general the final size here should always 
reflect any

+    * rounding up, if for example using the

I915_GEM_CREATE_EXT_MEMORY_REGIONS

+    * extension to place the object in device local-memory.
+    */
+   __u64 size;
+   /**
+    * @handle: Returned handle for the object.
+    *
+    * Object handles are nonzero.
+    */
+   __u32

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin


On 02/05/2022 20:58, Abodunrin, Akeem G wrote:



-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, May 2, 2022 12:55 AM
To: Auld, Matthew ; intel-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org; Thomas Hellström
; Bloomfield, Jon
; Daniel Vetter ; Justen,
Jordan L ; Kenneth Graunke
; Abodunrin, Akeem G
; mesa-dev@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
- Some spelling fixes and other small tweaks. (Akeem & Thomas)
- Rework error capture interactions, including no longer needing
  NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
- Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
   Documentation/gpu/rfc/i915_small_bar.h   | 190

+++

   Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
   Documentation/gpu/rfc/index.rst  |   4 +
   3 files changed, 252 insertions(+)
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as
+known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct

drm_i915_query.

+ * For this new query we are adding the new query id
+DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown)

*/

+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the

driver

+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };


Trying to implement userspace support in Vulkan for this, I have an additional
question about the value of probed_cpu_visible_size.

When is it set to -1?

I believe it is set to -1 if it is unknown, and/or not cpu accessible...

Cheers!
~Akeem



So what should I expect on system memory?

What value is returned when all of probed_size is CPU visible on local 
memory?



Thanks,


-Lionel



I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour,
+with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the
+stuff that
+ * is immutable. Previously we would have two ioctls, one to create
+the object
+ * with gem_create, and another to apply various parameters, however
+this
+ * creates some ambiguity for the params which are considered
+immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the

I915_GEM_CREATE_EXT_MEMORY_REGIONS

+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the

kernel that

+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE,

and

+* only strictly required on platforms where only some of the device
+* memory is d

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin


On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


Now lmem is effectively divided into 2 heaps, but unallocated_size is 
tracking allocation from both parts of lmem.


Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, 
with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the 
stuff that
+ * is immutable. Previously we would have two ioctls, one to create 
the object
+ * with gem_create, and another to apply various parameters, however 
this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have further minimum
+ * page-size restrictions(larger than 4K), like for device 
local-memory.

+ * However in general the final size here should always reflect any
+ * rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS

+ * extension to place the object in device local-memory.
+ */
+    __u64 size;
+    /**
+ * @handle: Returned handle for the object.
+ *
+ * Object handles are nonzero.
+ */
+    __u32 handle;
+    /**
+ * @flags: Optional flags.
+ *
+ * Supported values:
+ *
+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the 
kernel that

+ * the object will need to be accessed via the CPU.
+ *
+ * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+ * only strictly required on platforms where only some of the 
device
+ * memory is directly visible or mappable through the CPU, like 
on DG2+.

+ *
+ * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+ * ensure we can always spill the allocation to system memory, 
if we

+ * can't place the object in the mappable part of
+ * I915_MEMORY_CLASS_DEVICE.
+ *
+ * Note that since the kernel only supports f

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin


On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that since the kernel only supports flat-CCS on objects that can
+* *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefor

Re: Mesa 20.0 backlog

2022-04-28 Thread Lionel Landwerlin


On 22/04/2022 07:56, Dylan Baker wrote:

Hi all,

I've spent a good deal of time this week crushing the backlog of
patches on the mesa 20.0 series before making the release today. As such
there are not only a dozen outstanding patches, mostly for zink, which I
couldn't figure out how to correctly backport.

I've touched base with Lionel about the anv patches, but the remaning
patches I'd appreciate guidance on what you'd like to do with them.

2022-03-14 FIXES  d5870c45ae panfrost: Optimise recalculation of max sampler 
view
2022-03-24 FIXES  f348103fce anv: fix dynamic state emission
2022-03-24 FIXES  a4f502de32 anv: fix VK_DYNAMIC_STATE_COLOR_WRITE_ENABLE_EXT 
state
2022-03-24 FIXES  1d250b7b95 anv: fix color write enable interaction with color 
mask



Those 3 fixes are unfortunately bringing perf regressions. So I think we 
should drop them from the list.


Thanks,

-Lionel



2022-03-30 CC 4eca6e3e5d lavapipe: fix xfb availability query copying
2022-04-05 CC 3dcb80da9d zink: fix barrier generation for ssbo descriptors
2022-04-11 FIXES  dd078d13cb zink: fix tessellation shader key matching.
2022-04-13 FIXES  bbdf22ce13 radv: Fix barriers with cp dma
2022-04-18 CC 8806f444a5 zink: fix extended restart prim types without 
dynamic state2
2022-04-21 CC 373c8001d6 zink: set VK_QUERY_RESULT_WAIT_BIT when copying to 
qbo
2022-04-21 CC a056cbc691 zink: fix synchronization when drawing from 
streamout
2022-04-21 CC fc5edf9b68 zink: fix xfb counter buffer barriers
2022-04-21 CC e509598470 zink: remove xfb_barrier flag

Cheers,
Dylan

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-27 Thread Lionel Landwerlin


On 27/04/2022 18:18, Matthew Auld wrote:

On 27/04/2022 07:48, Lionel Landwerlin wrote:
One question though, how do we detect that this flag 
(I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given 
kernel?
I assume older kernels are going to reject object creation if we use 
this flag?


From some offline discussion with Lionel, the plan here is to just do 
a dummy gem_create_ext to check if the kernel throws an error with the 
new flag or not.




I didn't plan to use __drm_i915_query_vma_info, but isn't it 
inconsistent to select the placement on the GEM object and then query 
whether it's mappable by address?
You made a comment stating this is racy, wouldn't querying on the GEM 
object prevent this?


Since mesa at this time doesn't currently have a use for this one, 
then I guess we should maybe just drop this part of the uapi, in this 
version at least, if no objections.



Just repeating what we discussed (maybe I missed some other discussion 
and that's why I was confused) :



The way I was planning to use this is to have 3 heaps in Vulkan :

    - heap0: local only, no cpu visible

    - heap1: system, cpu visible

    - heap2: local & cpu visible


With heap2 having the reported probed_cpu_visible_size size.

It is an error for the application to map from heap0 [1].


With that said, it means if we created a GEM BO without 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS, we'll never mmap it.


So why the query?

I guess it would be useful when we import a buffer from another 
application. But in that case, why not have the query on the BO?



-Lionel


[1] : 
https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkMapMemory.html 
(VUID-vkMapMemory-memory-00682)






Thanks,

-Lionel

On 27/04/2022 09:35, Lionel Landwerlin wrote:

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };
+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for 
the stuff that
+ * is immutable. Previously we would have two ioctls, one to 
create the object
+ * with gem_create, and another to apply various parameters, 
however this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have furt

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-27 Thread Lionel Landwerlin

One question though, how do we detect that this flag 
(I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given kernel?
I assume older kernels are going to reject object creation if we use 
this flag?


I didn't plan to use __drm_i915_query_vma_info, but isn't it 
inconsistent to select the placement on the GEM object and then query 
whether it's mappable by address?
You made a comment stating this is racy, wouldn't querying on the GEM 
object prevent this?


Thanks,

-Lionel

On 27/04/2022 09:35, Lionel Landwerlin wrote:

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };
+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, 
with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the 
stuff that
+ * is immutable. Previously we would have two ioctls, one to create 
the object
+ * with gem_create, and another to apply various parameters, however 
this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have further minimum
+ * page-size restrictions(larger than 4K), like for device 
local-memory.

+ * However in general the final size here should always reflect any
+ * rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS

+ * extension to place the object in device local-memory.
+ */
+    __u64 size;
+    /**
+ * @handle: Returned handle for the object.
+ *
+ * Object handles are nonzero.
+ */
+    __u32 handle;
+    /**
+ * @flags: Optional flags.
+ *
+ * Supported values:
+ *
+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the 
kernel that

+ * the object will need to be accessed via the CPU.
+ *
+ * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+ * only strictly required on platforms where only some of the 
device
+ * memory is directly visible or mappable through the CPU, like 
on DG2+.

+ *
+ * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+ * ensure we can always spill the allocation to system memory, 
if we

+ * can't place the object in the mappable part of
+ * I915_MEMORY_CLASS_DEVICE.
+ *
+ * Note that since the kernel only supports flat-CCS on objects 
that can

+

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-27 Thread Lionel Landwerlin


Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };
+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that since the kernel only supports flat-CCS on objects that can
+* *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore don't
+* support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with
+* flat-CCS.
+*
+* Without this hint, the kernel will assume that non-mappable
+* I915_MEMORY_CLASS_DEVICE is pr

Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi

2022-03-18 Thread Lionel Landwerlin


Hey Matthew, all,

This sounds like a good thing to have.
There are a number of DG2 machines where we have a small BAR and this is 
causing more apps to fail.


Anv currently reports 3 memory heaps to the app :

    - local device only (not host visible) -> mapped to lmem
    - device/cpu -> mapped to smem
    - local device but also host visible -> mapped to lmem

So we could use this straight away, by just not putting the 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the 
first heap.


One thing I don't see in this proposal is how can we get the size of the 
2 lmem heap : cpu visible, cpu not visible

We could use that to report the appropriate size to the app.
We probably want to report a new drm_i915_memory_region_info and either :
    - put one of the reserve field to use to indicate : cpu visible
    - or define a new enum value in drm_i915_gem_memory_class

Cheers,

-Lionel


On 18/02/2022 13:22, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 153 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  40 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 197 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..fa65835fd608
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,153 @@
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE,
+* will need to enable this hint, if the object can also be placed in
+* I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will
+* throw an error otherwise. This also means that such objects will need
+* I915_MEMORY_CLASS_SYSTEM set as a possible placement.
+*
+* Without this hint, the kernel will assume that non-mappable
+* I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+* kernel can still migrate the object to the mappable part, as a last
+* resort, if userspace ever CPU faults this object, but this might be
+* expensive, and so ideally should be avoided.
+*/
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
+   __u32 flags;
+   /**
+* @extensions: The chain of extensions to apply to this object.
+*
+* This will be useful in the future when we need to support several
+* different extensions, and we need to apply more than one when
+* creating the object. See struct i915_user_extension.
+*
+* If we

Re: [Mesa-dev] Mesa 21.0 release

2021-03-11 Thread Lionel Landwerlin


On 10/03/2021 22:39, Dylan Baker wrote:

Hi list,

I think we're just about ready for the mesa 21.0 release. Sorry I've
been really about this. Here's a lis tof all outstanding patches that
either don't apply, or don't compile. If you see something here you'd
like to backport for 21.0.0 please let me know. Otherwise I'm planning
to make the release tomorrow.

Cheers,
Dylan



Thanks for looking at the stable release!


-Lionel




  
8955d179d3 anv: fix MI_PREDICATE_RESULT write

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9522

2c02740a8c intel/mi_builder: Use AddCSMMIOStartOffset for LRI
As far as I can tell this isn't needed on 21.0, probably targeted the 
wrong commit :(

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Threaded Vulkan WSI

2021-02-22 Thread Lionel Landwerlin


Hi all,

I'm currently looking at implementing some new features in Vulkan WSI.
One particular feature I would like to add would need to do some processing
on events coming from the compositor/display without waiting for the
application to give us the opportunity to do so (by entering the WSI code
which currently happens on 2 entry points vkAcquireNextImageKHR &
vkQueuePresentKHR).

In other words, this would require a thread to process events.

Looking at the code we currently have for the various backends it seems :

  - the x11 backend will sometimes use threads depending on setup options

  - the wayland backend doesn't use any thread

  - the display backend uses a thread but only to listen to KMS events

I don't have a perfect understanding of the all the WSI issues, but so far
I can't think of a reason why we couldn't have a threaded event processing
for all backends.

My current thinking is that it should be possible to have some kind of
generic messaging mechanism between the Vulkan WSI API and threads
that implement acquire/present operations with a specific window system
API.

Does anybody see a problem with this approach?

Thanks a lot,

-Lionel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-14 Thread Lionel Landwerlin


On 14/02/2021 00:47, Rob Clark wrote:

On Sat, Feb 13, 2021 at 12:52 PM Lionel Landwerlin
 wrote:

On 13/02/2021 18:52, Rob Clark wrote:

On Sat, Feb 13, 2021 at 12:04 AM Lionel Landwerlin
 wrote:

On 13/02/2021 04:20, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
 wrote:

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?

It work,s but only a single app can use it at a time.


I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream

That's why I think, for Intel HW, something like gfx-pps is probably
best to pull out all the data on a timeline for the entire system.

Then the drivers could just provide timestamp on the timeline to
annotate it.


Looking at gfx-pps, my question is why is this not part of the mesa
tree?  That way I could use it for freedreno (either as stand-alone
process or part of driver) without duplicating all the perfcntr
tables, and information about different variants of a given generation
needed to interpret the raw counters into something useful for a
human.

Pulling gfx-pps into mesa seems like a sensible way forward.

BR,
-R

Yeah, I guess it depends on how your stack looks.

If the counters cover more than 3d and you have video drivers out of the
mesa tree, it might make sense to have it somewhere else.

Pulling intel video drivers into mesa and bringing some sanity to that
side of things would make more than a few people happy (but I digress
;-))

Until then, exporting a library from the mesa build might be an
option?  And possibly something like that would work for virgl host
side stuff as well?



For Intel, we have a small library in IGT [1], it's not big and most of 
that logic also exists in src/intel/perf [2] too.


So definitely possible.

Might also be a good occasion to try to make a common sublibrary for 
Intel drivers so we don't carry the statically link code from 
src/intel/perf in each driver.



-Lionel





Anyway I didn't want to sound negative, having a daemon like gfx-pps
thing in mesa to report system wide counters works for me :)

yeah, I think having it in the mesa tree is good for code-reuse (from
my experience, pulling external freedreno related tools into mesa
turned out to be a good thing)

At some point, whether the collection is in-process or not could just
be an implementation detail

BR,
-R


Then we can look into how to have each intel driver add annotation on
the timeline.


-Lionel



[1] : 
https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/lib/i915/perf.c


[2] : https://gitlab.freedesktop.org/mesa/mesa/-/tree/master/src/intel/perf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-13 Thread Lionel Landwerlin


On 13/02/2021 18:52, Rob Clark wrote:

On Sat, Feb 13, 2021 at 12:04 AM Lionel Landwerlin
 wrote:

On 13/02/2021 04:20, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
 wrote:

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?

It work,s but only a single app can use it at a time.


I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream


That's why I think, for Intel HW, something like gfx-pps is probably
best to pull out all the data on a timeline for the entire system.

Then the drivers could just provide timestamp on the timeline to
annotate it.


Looking at gfx-pps, my question is why is this not part of the mesa
tree?  That way I could use it for freedreno (either as stand-alone
process or part of driver) without duplicating all the perfcntr
tables, and information about different variants of a given generation
needed to interpret the raw counters into something useful for a
human.

Pulling gfx-pps into mesa seems like a sensible way forward.

BR,
-R


Yeah, I guess it depends on how your stack looks.

If the counters cover more than 3d and you have video drivers out of the 
mesa tree, it might make sense to have it somewhere else.



Anyway I didn't want to sound negative, having a daemon like gfx-pps 
thing in mesa to report system wide counters works for me :)


Then we can look into how to have each intel driver add annotation on 
the timeline.



-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-13 Thread Lionel Landwerlin

On 13/02/2021 04:20, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
wrote:

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?

It work,s but only a single app can use it at a time.

I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream

That's why I think, for Intel HW, something like gfx-pps is probably
best to pull out all the data on a timeline for the entire system.

Then the drivers could just provide timestamp on the timeline to
annotate it.

-Lionel

The best information we could add from mesa would a timestamp of when a
particular drawcall started.
But that's pretty much when timestamps queries are.

Were you thinking of particular GPU generated data you don't get from
gfx-pps?

>From the looks of it, currently I don't get *any* GPU generated data
from gfx-pps ;-)

Maybe file a bug? :
https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc

We can ofc sample counters from a separate process as well... I have a
curses tool (fdperf) which does this.. but running outside of gpu
cmdstream plus counters losing context across suspend/resume makes it
less than perfect.

Our counters are global so to give per application values, we need to
post process a stream of HW counter snapshots.

And something that works the same way as
AMD_performance_monitor under the hook gives a more precise look at
which shaders (for ex) are consuming the most cycles.

In our implementation that precision (in particular when a drawcall
ends) comes at a stalling cost unfortunately.

yeah, stalling on our end too for per-draw counter snapshots.. but if
you are looking for which shaders to optimize that doesn't matter
*that* much.. they'll be some overhead, but it's not really going to
change which draws/shaders are expensive.. just mean that you lose out
on pipelining of the state changes

BR,
-R

For cases where
we can profile a trace, frameretrace and related tools is pretty
great.. but it would be nice to have similar visibility for actual
games (which for me, mostly means android games, since so far no
aarch64 steam store), but also give game developers good tools (or at
least the same tools that they get with other closed src drivers on
android).

Sure, but frame analysis is different than live monitoring of the system.

On Intel's HW you don't get the same level of details in both cases, and
apart for a few timestamps, I think gfx-pps is as good as you gonna get
for live stuff.

-Lionel

BR,
-R

Thanks,

-Lionel

On 13/02/2021 00:12, Alyssa Rosenzweig wrote:

My 2c for Mali/Panfrost --

For us, capturing GPU perf counters is orthogonal to rendering. It's
expected (e.g. with Arm's tools) to do this from a separate process.
Neither Mesa nor the DDK should require custom instrumentation for the
low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
Perfetto as it is. So for us I don't see the value in modifying Mesa for
tracing.

On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:

(responding from correct address this time)

On Fri, Feb 12, 2021 at 12:03 PM Mark Janes wrote:

I've recently been using GPUVis to look at trace events. On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.

GPUVis is great. I would love to see that data combined with
userspace events without any need for local hacks. Perfetto provides
on-demand trace events with lower overhead compared to ftrace, so for
example it is acceptable to have production trace instrumentation that can
be captured without dev builds. To do that with ftrace it may require a way
to enable and disable the ftrace file writes to avoid the overhead when
tracing is not in use. This is what Android does with systrace/atrace, for
example, it uses Binder to notify processes about trace sessions. Perfetto
does that in a more portable way.

It is very easy to compile the GPUVis UI. Userspace instrumentation
requires a single C/C++ header. You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?

Yes, local UI builds are possible
<https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
trace data is not uploaded unless you use the 'share' feature.

Can it display
arbitra

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Lionel Landwerlin