Re: Intel clc dependency

2024-04-11 Thread Lionel Landwerlin

On 11/04/2024 06:05, Tapani Pälli wrote:


On 11.4.2024 1.15, Brian Paul wrote:

On 4/10/24 13:53, Timo Aaltonen wrote:

Brian Paul kirjoitti 6.4.2024 klo 1.05:
I'm trying to build the Intel Vulkan driver.  First time in a few 
months.  I'm having build problems related to clc.  I'm on Ubuntu 
22.04



[...]
[1347/3181] Generating src/intel/vulkan/...om command (wrapped by 
meson to set env)
FAILED: 
src/intel/vulkan/grl/gfx125_bvh_build_BFS_BFS_pass1_indexed_batchable.h 

env MESA_SHADER_CACHE_DISABLE=true MESA_SPIRV_LOG_LEVEL=error 
/home/brianp/build3/mesa/build/src/intel/compiler/intel_clc -p dg2 
--prefix gfx125_bvh_build_BFS_BFS_pass1_indexed_batchable -e 
BFS_pass1_indexed_batchable --in 
../src/intel/vulkan/grl/gpu/bvh_build_BFS.cl --in 
/home/brianp/build3/mesa/src/intel/vulkan/grl/gpu/libs/lsc_intrinsics_fallback.cl 
-o 
src/intel/vulkan/grl/gfx125_bvh_build_BFS_BFS_pass1_indexed_batchable.h 
-- -cl-std=cl2.0 -D__OPENCL_VERSION__=200 -DMAX_HW_SIMD_WIDTH=16 
-DMAX_WORKGROUP_SIZE=16 
-I/home/brianp/build3/mesa/src/intel/vulkan/grl/gpu 
-I/home/brianp/build3/mesa/src/intel/vulkan/grl/include

ERROR: libclc shader missing. Consider installing the libclc package
Aborted (core dumped)

I've installed every clc-related package I could find.  I've tried 
several options for the 'intel-clc' option without luck.


BTW, the description of intel-clc in meson_options.txt looks suspect:

option(
   'intel-clc',
   type : 'combo',
   deprecated: {'true': 'enabled', 'false': 'disabled'},
   value : 'disabled',
   choices : [
 'enabled', 'disabled', 'system',
   ],
   description : 'Build the intel-clc compiler (enables Vulkan 
Intel ' +

 'Ray Tracing on supported hardware).'
)

The default is 'disabled' but that's deprecated?  Choices include 
'enabled' but that's deprecated too?


Any tips for building the ToT Intel Vulkan driver?

-Brian



You need to have libclc-NN-dev installed matching with the llvm 
version, which on stock 22.04 would be libclc-13-dev.


I'm using llvm 15 and have libclc-15-dev installed.  I get the 
"ERROR: libclc shader missing. Consider installing the libclc 
package" issue I quoted above.




You'll need to install libclc-15 too. I think libclc-15-dev package is 
missing dependency to libclc-15 .. did not verify this but I've been 
able to install libclc-15-dev without the library package getting 
installed.



libclang-common-15-dev installed too?





I have llvm 15 installed because I also want to build the radv Vulkan 
driver.


-Brian






Re: [PATCH v2 01/12] drm/doc: add rfc section for small BAR uapi

2022-06-21 Thread Lionel Landwerlin

On 21/06/2022 13:44, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.
v4:
   - Various improvements all over. (Tvrtko)

v5:
   - Include newer integrated platforms when applying the non-recoverable
 context and error capture restriction. (Thomas)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin 
Acked-by: Akeem G Abodunrin 



With Jordan with have changes for Anv/Iris : 
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739


Acked-by: Lionel Landwerlin 



---
  Documentation/gpu/rfc/i915_small_bar.h   | 189 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  47 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 240 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..752bb2ceb399
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,189 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /**
+* @probed_size: Memory probed by the driver (-1 = unknown)
+*
+* Note that it should not be possible to ever encounter a zero value
+* here, also note that no current region type will ever return -1 here.
+* Although for future region types, this might be a possibility. The
+* same applies to the other size fields.
+*/
+   __u64 probed_size;
+
+   /**
+* @unallocated_size: Estimate of memory remaining (-1 = unknown)
+*
+* Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+* Without this (or if this is an older kernel) the value here will
+* always equal the @probed_size. Note this is only currently tracked
+* for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+* will always equal the @probed_size).
+*/
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder (if there is any) will not be CPU
+* accessible.
+*
+* On systems without small BAR, the @probed_size will
+* always equal the @probed_cpu_visible_size, since all
+* of it will be CPU accessible.
+*
+* Note this is only tracked for
+* I915_MEMORY_CLASS_DEVICE regions (for other types the
+* value here will always equal the @probed_size).
+*
+* Note that if the value returned here is zero, then
+* this must be an old kernel which lacks the relevant
+* small-bar uAPI support (including
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+* such systems we should never actually end up with a
+* small BAR configuration, assuming we are able to load
+* the kernel module. Hence it should be safe to treat
+* this the same as when @probed_cpu_visible_size ==
+* @probed_size.
+*/
+   __u64 probed_cpu_visib

Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi

2022-05-17 Thread Lionel Landwerlin

On 17/05/2022 12:23, Tvrtko Ursulin wrote:


On 17/05/2022 09:55, Lionel Landwerlin wrote:

On 17/05/2022 11:29, Tvrtko Ursulin wrote:


On 16/05/2022 19:11, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 164 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  47 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 215 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..4079d287750b
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,164 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;


Is -1 possible today or when it will be? For system memory it 
appears zeroes are returned today so that has to stay I think. Does 
it effectively mean userspace has to consider both 0 and -1 as 
unknown is the question.



I raised this on v2. As far as I can tell there are no situation 
where we would get -1.


Is it really probed_size=0 on smem?? It's not the case on the 
internal branch.


My bad, I misread the arguments to intel_memory_region_create while 
grepping:


struct intel_memory_region *i915_gem_shmem_setup(struct 
drm_i915_private *i915,

 u16 type, u16 instance)
{
return intel_memory_region_create(i915, 0,
  totalram_pages() << PAGE_SHIFT,
  PAGE_SIZE, 0, 0,
  type, instance,
  _region_ops);

I saw "0, 0" and wrongly assumed that would be the data, since it 
matched with my mental model and the comment against unallocated_size 
saying it's only tracked for device memory.


Although I'd say it is questionable for i915 to return this data. I 
wonder it use case is possible where it would even be wrong but don't 
know. I guess the cat is out of the bag now.



Not sure how questionable that is. There are a bunch of tools reporting 
the amount of memory available (free, top, htop, etc...).


It might not be totalram_pages() but probably something close to it.

Having a non 0 & non -1 value is useful.


-Lionel




If the situation is -1 for unknown and some valid size (not zero) I 
don't think there is a problem here.


Regards,

Tvrtko


Anv is not currently handling that case.


I would very much like to not deal with 0 for smem.

It really makes it easier for userspace rather than having to fish 
information from 2 different places and on top of dealing with 
multiple kernel versions.



-Lionel





+
+    /**
+ * @unallocated_size: Estimate of memory remaining (-1 = unknown)
+ *
+ * Note this is only currently tracked for 
I915_MEMORY_CLASS_DEVICE

+ * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get
+ * reliable accounting. Without this(or if this an older 
kernel) the


s/if this an/if this is an/

Also same question as above about -1.


+ * value here will always match the @probed_size.
+ */
+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).


Also question about -1. In this case this could be done since the 
field is yet to be added but I am curious if it ever can be -1.



+ *
+ * This will be always be <= @probed_size, and the
+ 

Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi

2022-05-17 Thread Lionel Landwerlin

On 17/05/2022 11:29, Tvrtko Ursulin wrote:


On 16/05/2022 19:11, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 164 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  47 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 215 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..4079d287750b
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,164 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;


Is -1 possible today or when it will be? For system memory it appears 
zeroes are returned today so that has to stay I think. Does it 
effectively mean userspace has to consider both 0 and -1 as unknown is 
the question.



I raised this on v2. As far as I can tell there are no situation where 
we would get -1.


Is it really probed_size=0 on smem?? It's not the case on the internal 
branch.


Anv is not currently handling that case.


I would very much like to not deal with 0 for smem.

It really makes it easier for userspace rather than having to fish 
information from 2 different places and on top of dealing with multiple 
kernel versions.



-Lionel





+
+    /**
+ * @unallocated_size: Estimate of memory remaining (-1 = unknown)
+ *
+ * Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE
+ * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get
+ * reliable accounting. Without this(or if this an older kernel) 
the


s/if this an/if this is an/

Also same question as above about -1.


+ * value here will always match the @probed_size.
+ */
+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).


Also question about -1. In this case this could be done since the 
field is yet to be added but I am curious if it ever can be -1.



+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ *
+ * On systems without small BAR, the @probed_size will
+ * always equal the @probed_cpu_visible_size, since all
+ * of it will be CPU accessible.
+ *
+ * Note that if the value returned here is zero, then
+ * this must be an old kernel which lacks the relevant
+ * small-bar uAPI support(including


I have noticed you prefer no space before parentheses throughout the 
text so I guess it's just my preference to have it. Very nitpicky even 
if I am right so up to you.



+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+ * such systems we should never actually end up with a
+ * small BAR configuration, assuming we are able to load
+ * the kernel module. Hence it should be safe to treat
+ * this the same as when @probed_cpu_visible_size ==
+ * @probed_size.
+ */
+    __u64 probed_cpu_visible_size;
+
+    /**
+ * @unallocated_cpu_visible_size: Estimate of CPU
+ * visible memory remaining (-1 = unknown).
+ *
+ * Note this is only currently 

Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj

2022-05-16 Thread Lionel Landwerlin

On 14/05/2022 00:06, Jordan Justen wrote:

On 2022-05-13 05:31:00, Lionel Landwerlin wrote:

On 02/05/2022 17:15, Ramalingam C wrote:

Capture the impact of memory region preference list of the objects, on
their memory residency and Flat-CCS capability.

v2:
Fix the Flat-CCS capability of an obj with {lmem, smem} preference
list [Thomas]
v3:
Reworded the doc [Matt]

Signed-off-by: Ramalingam C
cc: Matthew Auld
cc: Thomas Hellstrom
cc: Daniel Vetter
cc: Jon Bloomfield
cc: Lionel Landwerlin
cc: Kenneth Graunke
cc:mesa-dev@lists.freedesktop.org
cc: Jordan Justen
cc: Tony Ye
Reviewed-by: Matthew Auld
---
   include/uapi/drm/i915_drm.h | 16 
   1 file changed, 16 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a2def7b27009..b7e1c2fe08dc 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext {
* At which point we get the object handle in 
_i915_gem_create_ext.handle,
* along with the final object size in _i915_gem_create_ext.size, which
* should account for any rounding up, if required.
+ *
+ * Note that userspace has no means of knowing the current backing region
+ * for objects where @num_regions is larger than one. The kernel will only
+ * ensure that the priority order of the @regions array is honoured, either
+ * when initially placing the object, or when moving memory around due to
+ * memory pressure
+ *
+ * On Flat-CCS capable HW, compression is supported for the objects residing
+ * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other
+ * memory class in @regions and migrated (by I915, due to memory
+ * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to
+ * decompress the content. But I915 dosen't have the required information to
+ * decompress the userspace compressed objects.
+ *
+ * So I915 supports Flat-CCS, only on the objects which can reside only on
+ * I915_MEMORY_CLASS_DEVICE regions.

I think it's fine to assume Flat-CSS surface will always be in lmem.

I see no issue for the Anv Vulkan driver.

Maybe Nanley or Ken can speak for the Iris GL driver?


Acked-by: Jordan Justen

I think Nanley has accounted for this on iris with:

https://gitlab.freedesktop.org/mesa/mesa/-/commit/42a865730ef72574e179b56a314f30fdccc6cba8

-Jordan


Thanks Jordan,


We might want to through in an additional : assert((|flags 
&||BO_ALLOC_SMEM) == 0); in the CCS case

|

|
|

|-Lionel
|


Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj

2022-05-13 Thread Lionel Landwerlin

On 02/05/2022 17:15, Ramalingam C wrote:

Capture the impact of memory region preference list of the objects, on
their memory residency and Flat-CCS capability.

v2:
   Fix the Flat-CCS capability of an obj with {lmem, smem} preference
   list [Thomas]
v3:
   Reworded the doc [Matt]

Signed-off-by: Ramalingam C 
cc: Matthew Auld 
cc: Thomas Hellstrom 
cc: Daniel Vetter 
cc: Jon Bloomfield 
cc: Lionel Landwerlin 
cc: Kenneth Graunke 
cc: mesa-dev@lists.freedesktop.org
cc: Jordan Justen 
cc: Tony Ye 
Reviewed-by: Matthew Auld 
---
  include/uapi/drm/i915_drm.h | 16 
  1 file changed, 16 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a2def7b27009..b7e1c2fe08dc 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext {
   * At which point we get the object handle in _i915_gem_create_ext.handle,
   * along with the final object size in _i915_gem_create_ext.size, which
   * should account for any rounding up, if required.
+ *
+ * Note that userspace has no means of knowing the current backing region
+ * for objects where @num_regions is larger than one. The kernel will only
+ * ensure that the priority order of the @regions array is honoured, either
+ * when initially placing the object, or when moving memory around due to
+ * memory pressure
+ *
+ * On Flat-CCS capable HW, compression is supported for the objects residing
+ * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other
+ * memory class in @regions and migrated (by I915, due to memory
+ * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to
+ * decompress the content. But I915 dosen't have the required information to
+ * decompress the userspace compressed objects.
+ *
+ * So I915 supports Flat-CCS, only on the objects which can reside only on
+ * I915_MEMORY_CLASS_DEVICE regions.



I think it's fine to assume Flat-CSS surface will always be in lmem.

I see no issue for the Anv Vulkan driver.


Maybe Nanley or Ken can speak for the Iris GL driver?


-Lionel



   */
  struct drm_i915_gem_create_ext_memory_regions {
/** @base: Extension link. See struct i915_user_extension. */





Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin

On 03/05/2022 17:27, Matthew Auld wrote:

On 03/05/2022 11:39, Lionel Landwerlin wrote:

On 03/05/2022 13:22, Matthew Auld wrote:

On 02/05/2022 09:53, Lionel Landwerlin wrote:

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region 
as known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the 
driver

+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have 
an additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


unallocated_size should always give the same value as probed_size. 
We have the avail tracking, but we don't currently expose that 
through unallocated_size, due to lack of real userspace/user etc.




Now lmem is effectively divided into 2 heaps, but unallocated_size 
is tracking allocation from both parts of lmem.


Yeah, if we ever properly expose the unallocated_size, then we could 
also just add unallocated_cpu_visible_size.




Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


I don't think it's out of the question...

I guess user-space should be able to get the current flag behaviour 
just by specifying: device, system. And it does give more flexibly 
to allow something like: device, device-nm, smem.


We can also drop the probed_cpu_visible_size, which would now just 
be the probed_size with device/device-nm. And if we lack device-nm, 
then the entire thing must be CPU mappable.


One of the downsides though, is that we can no longer easily mix 
object pages from both device + device-nm, which we could previously 
do when we didn't specify the flag. At least according to the 
current design/behaviour for @regions that would not be allowed. I 
guess some kind of new flag like ALLOC_MIXED or so? Although 
currently that is only possible with device + device-nm in ttm/i915.



Thanks, I wasn't aware of the restrictions.

Adding unallocated_cpu_visible_size would be great.


So do we want this in the next version? i.e we already have a current 
real use case in mind for unallocated_size where probed_size is not 
good enough?



Yeah in the  next iteration.

We're using unallocated_size to implement VK_EXT_memory_budget and since 
I'm going to expose lmem mappable/unmappable as 2 different heaps on 
Vulkan, I would use that there too.



-Lionel







-Lionel







-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should b

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin

On 03/05/2022 13:22, Matthew Auld wrote:

On 02/05/2022 09:53, Lionel Landwerlin wrote:

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


unallocated_size should always give the same value as probed_size. We 
have the avail tracking, but we don't currently expose that through 
unallocated_size, due to lack of real userspace/user etc.




Now lmem is effectively divided into 2 heaps, but unallocated_size is 
tracking allocation from both parts of lmem.


Yeah, if we ever properly expose the unallocated_size, then we could 
also just add unallocated_cpu_visible_size.




Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


I don't think it's out of the question...

I guess user-space should be able to get the current flag behaviour 
just by specifying: device, system. And it does give more flexibly to 
allow something like: device, device-nm, smem.


We can also drop the probed_cpu_visible_size, which would now just be 
the probed_size with device/device-nm. And if we lack device-nm, then 
the entire thing must be CPU mappable.


One of the downsides though, is that we can no longer easily mix 
object pages from both device + device-nm, which we could previously 
do when we didn't specify the flag. At least according to the current 
design/behaviour for @regions that would not be allowed. I guess some 
kind of new flag like ALLOC_MIXED or so? Although currently that is 
only possible with device + device-nm in ttm/i915.



Thanks, I wasn't aware of the restrictions.

Adding unallocated_cpu_visible_size would be great.


-Lionel







-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for 
the stuff that
+ * is immutable. Previously we would have two ioctls, one to 
create the object
+ * with gem_create, and another to apply various parameters, 
however this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) al

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin

On 03/05/2022 12:07, Matthew Auld wrote:

On 02/05/2022 19:03, Lionel Landwerlin wrote:

On 02/05/2022 20:58, Abodunrin, Akeem G wrote:



-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, May 2, 2022 12:55 AM
To: Auld, Matthew ; 
intel-...@lists.freedesktop.org

Cc: dri-de...@lists.freedesktop.org; Thomas Hellström
; Bloomfield, Jon
; Daniel Vetter ; 
Justen,

Jordan L ; Kenneth Graunke
; Abodunrin, Akeem G
; mesa-dev@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
    - Some spelling fixes and other small tweaks. (Akeem & Thomas)
    - Rework error capture interactions, including no longer needing
  NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
    - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
   Documentation/gpu/rfc/i915_small_bar.h   | 190

+++

Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
   Documentation/gpu/rfc/index.rst  |   4 +
   3 files changed, 252 insertions(+)
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as
+known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct

drm_i915_query.

+ * For this new query we are adding the new query id
+DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown)

*/

+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+    * @probed_cpu_visible_size: Memory probed by the

driver

+    * that is CPU accessible. (-1 = unknown).
+    *
+    * This will be always be <= @probed_size, and 
the

+    * remainder(if there is any) will not be CPU
+    * accessible.
+    */
+   __u64 probed_cpu_visible_size;
+   };


Trying to implement userspace support in Vulkan for this, I have an 
additional

question about the value of probed_cpu_visible_size.

When is it set to -1?
I believe it is set to -1 if it is unknown, and/or not cpu 
accessible...


Cheers!
~Akeem



So what should I expect on system memory?


I guess just probed_cpu_visible_size == probed_size. Or maybe we can 
just use -1 here?




What value is returned when all of probed_size is CPU visible on 
local memory?


probed_size == probed_cpu_visible_size.



Thanks, looks good to me.

Then maybe we should update the comment to say that.

Looks like there are no cases where we'll get -1.


-Lionel







Thanks,


-Lionel



I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour,
+with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the
+stuff that
+ * is immutable. Previously we would have two ioctls, one to create
+the object
+ * with gem_create, and another to apply various parameters, however
+this
+ * creates some ambiguity for the params which are considered
+immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+    * @size: Requested size for the object.
+    *
+    * The (page-aligned) allocated size for the object will be 
returned.

+    *
+    * Note that for some devices we have might have further minimum
+    * page-size restrictions(larger than 4K), like for device 
local-memory.
+    * However in general the final size here should always 
reflect any

+    * rounding up, if for example using the

I915_GEM_CREATE_EXT_MEMORY_REGIONS

+    * extension to place the object in device local-memory.
+    */
+   __u64 size;
+   /**
+    * @handle: Returned handle for the object.
+    *
+    * Object handles are nonzero.
+    */
+   __u32

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin

On 02/05/2022 20:58, Abodunrin, Akeem G wrote:



-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, May 2, 2022 12:55 AM
To: Auld, Matthew ; intel-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org; Thomas Hellström
; Bloomfield, Jon
; Daniel Vetter ; Justen,
Jordan L ; Kenneth Graunke
; Abodunrin, Akeem G
; mesa-dev@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
- Some spelling fixes and other small tweaks. (Akeem & Thomas)
- Rework error capture interactions, including no longer needing
  NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
- Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
   Documentation/gpu/rfc/i915_small_bar.h   | 190

+++

   Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
   Documentation/gpu/rfc/index.rst  |   4 +
   3 files changed, 252 insertions(+)
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as
+known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct

drm_i915_query.

+ * For this new query we are adding the new query id
+DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown)

*/

+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the

driver

+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };


Trying to implement userspace support in Vulkan for this, I have an additional
question about the value of probed_cpu_visible_size.

When is it set to -1?

I believe it is set to -1 if it is unknown, and/or not cpu accessible...

Cheers!
~Akeem



So what should I expect on system memory?

What value is returned when all of probed_size is CPU visible on local 
memory?



Thanks,


-Lionel



I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour,
+with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the
+stuff that
+ * is immutable. Previously we would have two ioctls, one to create
+the object
+ * with gem_create, and another to apply various parameters, however
+this
+ * creates some ambiguity for the params which are considered
+immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the

I915_GEM_CREATE_EXT_MEMORY_REGIONS

+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the

kernel that

+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE,

and

+* only strictly required on platforms where only some of the device
+* memory is d

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


Now lmem is effectively divided into 2 heaps, but unallocated_size is 
tracking allocation from both parts of lmem.


Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, 
with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the 
stuff that
+ * is immutable. Previously we would have two ioctls, one to create 
the object
+ * with gem_create, and another to apply various parameters, however 
this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have further minimum
+ * page-size restrictions(larger than 4K), like for device 
local-memory.

+ * However in general the final size here should always reflect any
+ * rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS

+ * extension to place the object in device local-memory.
+ */
+    __u64 size;
+    /**
+ * @handle: Returned handle for the object.
+ *
+ * Object handles are nonzero.
+ */
+    __u32 handle;
+    /**
+ * @flags: Optional flags.
+ *
+ * Supported values:
+ *
+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the 
kernel that

+ * the object will need to be accessed via the CPU.
+ *
+ * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+ * only strictly required on platforms where only some of the 
device
+ * memory is directly visible or mappable through the CPU, like 
on DG2+.

+ *
+ * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+ * ensure we can always spill the allocation to system memory, 
if we

+ * can't place the object in the mappable part of
+ * I915_MEMORY_CLASS_DEVICE.
+ *
+ * Note that since the kernel only supports f

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that since the kernel only supports flat-CCS on objects that can
+* *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefor

Re: Mesa 20.0 backlog

2022-04-28 Thread Lionel Landwerlin

On 22/04/2022 07:56, Dylan Baker wrote:

Hi all,

I've spent a good deal of time this week crushing the backlog of
patches on the mesa 20.0 series before making the release today. As such
there are not only a dozen outstanding patches, mostly for zink, which I
couldn't figure out how to correctly backport.

I've touched base with Lionel about the anv patches, but the remaning
patches I'd appreciate guidance on what you'd like to do with them.

2022-03-14 FIXES  d5870c45ae panfrost: Optimise recalculation of max sampler 
view
2022-03-24 FIXES  f348103fce anv: fix dynamic state emission
2022-03-24 FIXES  a4f502de32 anv: fix VK_DYNAMIC_STATE_COLOR_WRITE_ENABLE_EXT 
state
2022-03-24 FIXES  1d250b7b95 anv: fix color write enable interaction with color 
mask



Those 3 fixes are unfortunately bringing perf regressions. So I think we 
should drop them from the list.


Thanks,

-Lionel



2022-03-30 CC 4eca6e3e5d lavapipe: fix xfb availability query copying
2022-04-05 CC 3dcb80da9d zink: fix barrier generation for ssbo descriptors
2022-04-11 FIXES  dd078d13cb zink: fix tessellation shader key matching.
2022-04-13 FIXES  bbdf22ce13 radv: Fix barriers with cp dma
2022-04-18 CC 8806f444a5 zink: fix extended restart prim types without 
dynamic state2
2022-04-21 CC 373c8001d6 zink: set VK_QUERY_RESULT_WAIT_BIT when copying to 
qbo
2022-04-21 CC a056cbc691 zink: fix synchronization when drawing from 
streamout
2022-04-21 CC fc5edf9b68 zink: fix xfb counter buffer barriers
2022-04-21 CC e509598470 zink: remove xfb_barrier flag

Cheers,
Dylan





Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-27 Thread Lionel Landwerlin

On 27/04/2022 18:18, Matthew Auld wrote:

On 27/04/2022 07:48, Lionel Landwerlin wrote:
One question though, how do we detect that this flag 
(I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given 
kernel?
I assume older kernels are going to reject object creation if we use 
this flag?


From some offline discussion with Lionel, the plan here is to just do 
a dummy gem_create_ext to check if the kernel throws an error with the 
new flag or not.




I didn't plan to use __drm_i915_query_vma_info, but isn't it 
inconsistent to select the placement on the GEM object and then query 
whether it's mappable by address?
You made a comment stating this is racy, wouldn't querying on the GEM 
object prevent this?


Since mesa at this time doesn't currently have a use for this one, 
then I guess we should maybe just drop this part of the uapi, in this 
version at least, if no objections.



Just repeating what we discussed (maybe I missed some other discussion 
and that's why I was confused) :



The way I was planning to use this is to have 3 heaps in Vulkan :

    - heap0: local only, no cpu visible

    - heap1: system, cpu visible

    - heap2: local & cpu visible


With heap2 having the reported probed_cpu_visible_size size.

It is an error for the application to map from heap0 [1].


With that said, it means if we created a GEM BO without 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS, we'll never mmap it.


So why the query?

I guess it would be useful when we import a buffer from another 
application. But in that case, why not have the query on the BO?



-Lionel


[1] : 
https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkMapMemory.html 
(VUID-vkMapMemory-memory-00682)






Thanks,

-Lionel

On 27/04/2022 09:35, Lionel Landwerlin wrote:

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };
+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for 
the stuff that
+ * is immutable. Previously we would have two ioctls, one to 
create the object
+ * with gem_create, and another to apply various parameters, 
however this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have furt

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-27 Thread Lionel Landwerlin
One question though, how do we detect that this flag 
(I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given kernel?
I assume older kernels are going to reject object creation if we use 
this flag?


I didn't plan to use __drm_i915_query_vma_info, but isn't it 
inconsistent to select the placement on the GEM object and then query 
whether it's mappable by address?
You made a comment stating this is racy, wouldn't querying on the GEM 
object prevent this?


Thanks,

-Lionel

On 27/04/2022 09:35, Lionel Landwerlin wrote:

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };
+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, 
with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the 
stuff that
+ * is immutable. Previously we would have two ioctls, one to create 
the object
+ * with gem_create, and another to apply various parameters, however 
this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have further minimum
+ * page-size restrictions(larger than 4K), like for device 
local-memory.

+ * However in general the final size here should always reflect any
+ * rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS

+ * extension to place the object in device local-memory.
+ */
+    __u64 size;
+    /**
+ * @handle: Returned handle for the object.
+ *
+ * Object handles are nonzero.
+ */
+    __u32 handle;
+    /**
+ * @flags: Optional flags.
+ *
+ * Supported values:
+ *
+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the 
kernel that

+ * the object will need to be accessed via the CPU.
+ *
+ * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+ * only strictly required on platforms where only some of the 
device
+ * memory is directly visible or mappable through the CPU, like 
on DG2+.

+ *
+ * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+ * ensure we can always spill the allocation to system memory, 
if we

+ * can't place the object in the mappable part of
+ * I915_MEMORY_CLASS_DEVICE.
+ *
+ * Note that since the kernel only supports flat-CCS on objects 
that can

+ 

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-27 Thread Lionel Landwerlin

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };
+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that since the kernel only supports flat-CCS on objects that can
+* *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore don't
+* support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with
+* flat-CCS.
+*
+* Without this hint, the kernel will assume that non-mappable
+* I915_MEMORY_CLASS_DEVICE is pr

Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi

2022-03-18 Thread Lionel Landwerlin

Hey Matthew, all,

This sounds like a good thing to have.
There are a number of DG2 machines where we have a small BAR and this is 
causing more apps to fail.


Anv currently reports 3 memory heaps to the app :

    - local device only (not host visible) -> mapped to lmem
    - device/cpu -> mapped to smem
    - local device but also host visible -> mapped to lmem

So we could use this straight away, by just not putting the 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the 
first heap.


One thing I don't see in this proposal is how can we get the size of the 
2 lmem heap : cpu visible, cpu not visible

We could use that to report the appropriate size to the app.
We probably want to report a new drm_i915_memory_region_info and either :
    - put one of the reserve field to use to indicate : cpu visible
    - or define a new enum value in drm_i915_gem_memory_class

Cheers,

-Lionel


On 18/02/2022 13:22, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-dev@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 153 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  40 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 197 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..fa65835fd608
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,153 @@
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE,
+* will need to enable this hint, if the object can also be placed in
+* I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will
+* throw an error otherwise. This also means that such objects will need
+* I915_MEMORY_CLASS_SYSTEM set as a possible placement.
+*
+* Without this hint, the kernel will assume that non-mappable
+* I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+* kernel can still migrate the object to the mappable part, as a last
+* resort, if userspace ever CPU faults this object, but this might be
+* expensive, and so ideally should be avoided.
+*/
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
+   __u32 flags;
+   /**
+* @extensions: The chain of extensions to apply to this object.
+*
+* This will be useful in the future when we need to support several
+* different extensions, and we need to apply more than one when
+* creating the object. See struct i915_user_extension.
+*
+* If we 

Re: [Mesa-dev] Mesa 21.0 release

2021-03-11 Thread Lionel Landwerlin

On 10/03/2021 22:39, Dylan Baker wrote:

Hi list,

I think we're just about ready for the mesa 21.0 release. Sorry I've
been really about this. Here's a lis tof all outstanding patches that
either don't apply, or don't compile. If you see something here you'd
like to backport for 21.0.0 please let me know. Otherwise I'm planning
to make the release tomorrow.

Cheers,
Dylan



Thanks for looking at the stable release!


-Lionel




  
8955d179d3 anv: fix MI_PREDICATE_RESULT write

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9522

2c02740a8c intel/mi_builder: Use AddCSMMIOStartOffset for LRI
As far as I can tell this isn't needed on 21.0, probably targeted the 
wrong commit :(

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Threaded Vulkan WSI

2021-02-22 Thread Lionel Landwerlin

Hi all,

I'm currently looking at implementing some new features in Vulkan WSI.
One particular feature I would like to add would need to do some processing
on events coming from the compositor/display without waiting for the
application to give us the opportunity to do so (by entering the WSI code
which currently happens on 2 entry points vkAcquireNextImageKHR &
vkQueuePresentKHR).

In other words, this would require a thread to process events.

Looking at the code we currently have for the various backends it seems :

  - the x11 backend will sometimes use threads depending on setup options

  - the wayland backend doesn't use any thread

  - the display backend uses a thread but only to listen to KMS events

I don't have a perfect understanding of the all the WSI issues, but so far
I can't think of a reason why we couldn't have a threaded event processing
for all backends.

My current thinking is that it should be possible to have some kind of
generic messaging mechanism between the Vulkan WSI API and threads
that implement acquire/present operations with a specific window system
API.

Does anybody see a problem with this approach?

Thanks a lot,

-Lionel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-14 Thread Lionel Landwerlin

On 14/02/2021 00:47, Rob Clark wrote:

On Sat, Feb 13, 2021 at 12:52 PM Lionel Landwerlin
 wrote:

On 13/02/2021 18:52, Rob Clark wrote:

On Sat, Feb 13, 2021 at 12:04 AM Lionel Landwerlin
 wrote:

On 13/02/2021 04:20, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
 wrote:

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?

It work,s but only a single app can use it at a time.


I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream

That's why I think, for Intel HW, something like gfx-pps is probably
best to pull out all the data on a timeline for the entire system.

Then the drivers could just provide timestamp on the timeline to
annotate it.


Looking at gfx-pps, my question is why is this not part of the mesa
tree?  That way I could use it for freedreno (either as stand-alone
process or part of driver) without duplicating all the perfcntr
tables, and information about different variants of a given generation
needed to interpret the raw counters into something useful for a
human.

Pulling gfx-pps into mesa seems like a sensible way forward.

BR,
-R

Yeah, I guess it depends on how your stack looks.

If the counters cover more than 3d and you have video drivers out of the
mesa tree, it might make sense to have it somewhere else.

Pulling intel video drivers into mesa and bringing some sanity to that
side of things would make more than a few people happy (but I digress
;-))

Until then, exporting a library from the mesa build might be an
option?  And possibly something like that would work for virgl host
side stuff as well?



For Intel, we have a small library in IGT [1], it's not big and most of 
that logic also exists in src/intel/perf [2] too.


So definitely possible.

Might also be a good occasion to try to make a common sublibrary for 
Intel drivers so we don't carry the statically link code from 
src/intel/perf in each driver.



-Lionel





Anyway I didn't want to sound negative, having a daemon like gfx-pps
thing in mesa to report system wide counters works for me :)

yeah, I think having it in the mesa tree is good for code-reuse (from
my experience, pulling external freedreno related tools into mesa
turned out to be a good thing)

At some point, whether the collection is in-process or not could just
be an implementation detail

BR,
-R


Then we can look into how to have each intel driver add annotation on
the timeline.


-Lionel



[1] : 
https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/lib/i915/perf.c


[2] : https://gitlab.freedesktop.org/mesa/mesa/-/tree/master/src/intel/perf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-13 Thread Lionel Landwerlin

On 13/02/2021 18:52, Rob Clark wrote:

On Sat, Feb 13, 2021 at 12:04 AM Lionel Landwerlin
 wrote:

On 13/02/2021 04:20, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
 wrote:

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?

It work,s but only a single app can use it at a time.


I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream


That's why I think, for Intel HW, something like gfx-pps is probably
best to pull out all the data on a timeline for the entire system.

Then the drivers could just provide timestamp on the timeline to
annotate it.


Looking at gfx-pps, my question is why is this not part of the mesa
tree?  That way I could use it for freedreno (either as stand-alone
process or part of driver) without duplicating all the perfcntr
tables, and information about different variants of a given generation
needed to interpret the raw counters into something useful for a
human.

Pulling gfx-pps into mesa seems like a sensible way forward.

BR,
-R


Yeah, I guess it depends on how your stack looks.

If the counters cover more than 3d and you have video drivers out of the 
mesa tree, it might make sense to have it somewhere else.



Anyway I didn't want to sound negative, having a daemon like gfx-pps 
thing in mesa to report system wide counters works for me :)


Then we can look into how to have each intel driver add annotation on 
the timeline.



-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-13 Thread Lionel Landwerlin

On 13/02/2021 04:20, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin
 wrote:

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?


It work,s but only a single app can use it at a time.


I see.. on the freedreno side we haven't really gone down the
preemption route yet, but we have a way to hook in some safe/restore
cmdstream



That's why I think, for Intel HW, something like gfx-pps is probably 
best to pull out all the data on a timeline for the entire system.


Then the drivers could just provide timestamp on the timeline to 
annotate it.



-Lionel






The best information we could add from mesa would a timestamp of when a
particular drawcall started.
But that's pretty much when timestamps queries are.

Were you thinking of particular GPU generated data you don't get from
gfx-pps?

>From the looks of it, currently I don't get *any* GPU generated data
from gfx-pps ;-)


Maybe file a bug? :
https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc



We can ofc sample counters from a separate process as well... I have a
curses tool (fdperf) which does this.. but running outside of gpu
cmdstream plus counters losing context across suspend/resume makes it
less than perfect.


Our counters are global so to give per application values, we need to
post process a stream of HW counter snapshots.



And something that works the same way as
AMD_performance_monitor under the hook gives a more precise look at
which shaders (for ex) are consuming the most cycles.


In our implementation that precision (in particular when a drawcall
ends) comes at a stalling cost unfortunately.

yeah, stalling on our end too for per-draw counter snapshots.. but if
you are looking for which shaders to optimize that doesn't matter
*that* much.. they'll be some overhead, but it's not really going to
change which draws/shaders are expensive.. just mean that you lose out
on pipelining of the state changes

BR,
-R


For cases where
we can profile a trace, frameretrace and related tools is pretty
great.. but it would be nice to have similar visibility for actual
games (which for me, mostly means android games, since so far no
aarch64 steam store), but also give game developers good tools (or at
least the same tools that they get with other closed src drivers on
android).


Sure, but frame analysis is different than live monitoring of the system.

On Intel's HW you don't get the same level of details in both cases, and
apart for a few timestamps, I think gfx-pps is as good as you gonna get
for live stuff.


-Lionel



BR,
-R


Thanks,

-Lionel


On 13/02/2021 00:12, Alyssa Rosenzweig wrote:

My 2c for Mali/Panfrost --

For us, capturing GPU perf counters is orthogonal to rendering. It's
expected (e.g. with Arm's tools) to do this from a separate process.
Neither Mesa nor the DDK should require custom instrumentation for the
low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
Perfetto as it is. So for us I don't see the value in modifying Mesa for
tracing.

On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:

(responding from correct address this time)

On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:


I've recently been using GPUVis to look at trace events.  On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.


GPUVis is great. I would love to see that data combined with
userspace events without any need for local hacks. Perfetto provides
on-demand trace events with lower overhead compared to ftrace, so for
example it is acceptable to have production trace instrumentation that can
be captured without dev builds. To do that with ftrace it may require a way
to enable and disable the ftrace file writes to avoid the overhead when
tracing is not in use. This is what Android does with systrace/atrace, for
example, it uses Binder to notify processes about trace sessions. Perfetto
does that in a more portable way.



It is very easy to compile the GPUVis UI.  Userspace instrumentation
requires a single C/C++ header.  You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?

Yes, local UI builds are possible
<https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
trace data is not uploaded unless you use the 'share' feature.



 Can it display
arbitra

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Lionel Landwerlin

On 13/02/2021 03:38, Rob Clark wrote:

On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin
 wrote:

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want
to build a timeline of the work (because preemption etc...).

ugg, does that mean extensions like AMD_performance_monitor doesn't
actually work on intel?



It work,s but only a single app can use it at a time.





The best information we could add from mesa would a timestamp of when a
particular drawcall started.
But that's pretty much when timestamps queries are.

Were you thinking of particular GPU generated data you don't get from
gfx-pps?

>From the looks of it, currently I don't get *any* GPU generated data
from gfx-pps ;-)



Maybe file a bug? : 
https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc





We can ofc sample counters from a separate process as well... I have a
curses tool (fdperf) which does this.. but running outside of gpu
cmdstream plus counters losing context across suspend/resume makes it
less than perfect.



Our counters are global so to give per application values, we need to 
post process a stream of HW counter snapshots.




   And something that works the same way as
AMD_performance_monitor under the hook gives a more precise look at
which shaders (for ex) are consuming the most cycles.



In our implementation that precision (in particular when a drawcall 
ends) comes at a stalling cost unfortunately.




   For cases where
we can profile a trace, frameretrace and related tools is pretty
great.. but it would be nice to have similar visibility for actual
games (which for me, mostly means android games, since so far no
aarch64 steam store), but also give game developers good tools (or at
least the same tools that they get with other closed src drivers on
android).



Sure, but frame analysis is different than live monitoring of the system.

On Intel's HW you don't get the same level of details in both cases, and 
apart for a few timestamps, I think gfx-pps is as good as you gonna get 
for live stuff.



-Lionel




BR,
-R


Thanks,

-Lionel


On 13/02/2021 00:12, Alyssa Rosenzweig wrote:

My 2c for Mali/Panfrost --

For us, capturing GPU perf counters is orthogonal to rendering. It's
expected (e.g. with Arm's tools) to do this from a separate process.
Neither Mesa nor the DDK should require custom instrumentation for the
low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
Perfetto as it is. So for us I don't see the value in modifying Mesa for
tracing.

On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:

(responding from correct address this time)

On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:


I've recently been using GPUVis to look at trace events.  On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.


GPUVis is great. I would love to see that data combined with
userspace events without any need for local hacks. Perfetto provides
on-demand trace events with lower overhead compared to ftrace, so for
example it is acceptable to have production trace instrumentation that can
be captured without dev builds. To do that with ftrace it may require a way
to enable and disable the ftrace file writes to avoid the overhead when
tracing is not in use. This is what Android does with systrace/atrace, for
example, it uses Binder to notify processes about trace sessions. Perfetto
does that in a more portable way.



It is very easy to compile the GPUVis UI.  Userspace instrumentation
requires a single C/C++ header.  You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?

Yes, local UI builds are possible
<https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>.
Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that
trace data is not uploaded unless you use the 'share' feature.



Can it display
arbitrary trace events that are written to
/sys/kernel/tracing/trace_marker ?

Yes, I believe it does support that via linux.ftrace data source
<https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for
example to overlay CPU sched data to show what process is on each core
throughout the timeline. There are many ftrace event types
<https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace>
in
the perfetto protos.



Can it be extended to show i915 and
i915-perf-recorder events?


It can be extended to consume custom data sources. One way this is done is
via a bridge daemon, such as traced_probes which is responsible for
capturing data from ftrace and /proc during a trace session and sending it
to traced. traced i

Re: [Mesa-dev] Perfetto CPU/GPU tracing

2021-02-12 Thread Lionel Landwerlin

We're kind of in the same boat for Intel.

Access to GPU perf counters is exclusive to a single process if you want 
to build a timeline of the work (because preemption etc...).


The best information we could add from mesa would a timestamp of when a 
particular drawcall started.

But that's pretty much when timestamps queries are.

Were you thinking of particular GPU generated data you don't get from 
gfx-pps?


Thanks,

-Lionel


On 13/02/2021 00:12, Alyssa Rosenzweig wrote:

My 2c for Mali/Panfrost --

For us, capturing GPU perf counters is orthogonal to rendering. It's
expected (e.g. with Arm's tools) to do this from a separate process.
Neither Mesa nor the DDK should require custom instrumentation for the
low-level data. Fahien's gfx-pps handles this correctly for Panfrost +
Perfetto as it is. So for us I don't see the value in modifying Mesa for
tracing.

On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote:

(responding from correct address this time)

On Fri, Feb 12, 2021 at 12:03 PM Mark Janes  wrote:


I've recently been using GPUVis to look at trace events.  On Intel
platforms, GPUVis incorporates ftrace events from the i915 driver,
performance metrics from igt-gpu-tools, and userspace ftrace markers
that I locally hack up in Mesa.


GPUVis is great. I would love to see that data combined with
userspace events without any need for local hacks. Perfetto provides
on-demand trace events with lower overhead compared to ftrace, so for
example it is acceptable to have production trace instrumentation that can
be captured without dev builds. To do that with ftrace it may require a way
to enable and disable the ftrace file writes to avoid the overhead when
tracing is not in use. This is what Android does with systrace/atrace, for
example, it uses Binder to notify processes about trace sessions. Perfetto
does that in a more portable way.



It is very easy to compile the GPUVis UI.  Userspace instrumentation
requires a single C/C++ header.  You don't have to access an external
web service to analyze trace data (a big no-no for devs working on
preproduction hardware).

Is it possible to build and run the Perfetto UI locally?


Yes, local UI builds are possible
.
Also confirmed with the perfetto team  that
trace data is not uploaded unless you use the 'share' feature.



   Can it display
arbitrary trace events that are written to
/sys/kernel/tracing/trace_marker ?


Yes, I believe it does support that via linux.ftrace data source
. We use that for
example to overlay CPU sched data to show what process is on each core
throughout the timeline. There are many ftrace event types

in
the perfetto protos.



Can it be extended to show i915 and
i915-perf-recorder events?


It can be extended to consume custom data sources. One way this is done is
via a bridge daemon, such as traced_probes which is responsible for
capturing data from ftrace and /proc during a trace session and sending it
to traced. traced is the main perfetto tracing daemon that notifies all
trace data sources to start/stop tracing and communicates with user tracing
requests via the 'perfetto' command.




John Bates  writes:


I recently opened issue 4262
 to begin the
discussion on integrating perfetto into mesa.

*Background*

System-wide tracing is an invaluable tool for developers to find and fix
performance problems. The perfetto project enables a combined view of

trace

data from kernel ftrace, GPU driver and various manually-instrumented
tracepoints throughout the application and system. This helps developers
quickly answer questions like:

- How long are frames taking?
- What caused a particular frame drop?
- Is it CPU bound or GPU bound?
- Did a CPU core frequency drop cause something to go slower than

usual?

- Is something else running that is stealing CPU or GPU time? Could I
fix that with better thread/context priorities?
- Are all CPU cores being used effectively? Do I need

sched_setaffinity

to keep my thread on a big or little core?
- What’s the latency between CPU frame submit and GPU start?

*What Does Mesa + Perfetto Provide?*

Mesa is in a unique position to produce GPU trace data for several GPU
vendors without requiring the developer to build and install additional
tools like gfx-pps .

The key is making it easy for developers to use. Ideally, perfetto is
eventually available by default in mesa so that if your system has

perfetto

traced running, you just need to run perfetto (perhaps along with setting
an environment variable) with the mesa categories to see:

- GPU processing timeline 

Re: [Mesa-dev] building Zink

2021-02-05 Thread Lionel Landwerlin

I'm using these meson options :

-Dgles2=true -Ddri-drivers= -Dglx=dri -Dplatforms=x11,wayland 
-Dgallium-drivers=zink -Dvulkan-drivers=intel -Dgallium-vdpau=false 
-Dglvnd=true


-Lionel

On 04/02/2021 19:10, Mark Segal wrote:
I'm interested in using Zink to run an OpenGL app on Vulkan. I'm 
having difficulty even figuring out how to configure and build it from 
the documentation I've been able to find. Can someone point me in the 
right direction? I'm also unsure if I should use the main Mesa3D 
branch or use Mike Blumenkrantz's zink-wip clone.


Once I understand configuration and building, I'll likely have to do 
some modification: the system I'm targeting doesn't have X11.


Thanks in advance for any pointers.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] question about MR8409 status

2021-02-02 Thread Lionel Landwerlin

Hi Andrii,

Just assigned it to Marge. Sorry for the delay.

-Lionel

On 02/02/2021 13:57, asimiklit wrote:

Hello,

Are there some issues/blockers which prevent MR8409 from being merged?

This MR already has r-b and I suspect that it was simply forgotten or 
there are some issues with it I am not aware of. I don't have merge 
access that is why I am asking about it.


Thanks,
Andrii.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Crash inside iris_dri.so when using the OpenGL backend in Cairo on Ubuntu 20.04

2020-09-07 Thread Lionel Landwerlin

Hi Filip,

Did you file an issue about this?

If not, could you open one with the details of the platform you're 
experiencing the crash on and the steps to reproduce?


Thanks,

-Lionel

On 04/09/2020 11:37, Filip Strömbäck wrote:

Dear mesa-dev,

I recently updated my system, and it seems like Mesa is now using the 
new iris driver for my Intel GPU (a Lenovo Thinkpad), which makes a 
program I'm developing crash somewhere inside iris_dri.so (it is 
drawing using the GLX backend in Cairo). Forcing the old i965 driver 
with the environment variable solves the problem, which makes me 
suspect a bug (or at least more strict behavior) in the new iris 
driver compared to the old one.


Is this a known issue, or should I report it as a bug? It might as 
well be the case that Cairo is misbehaving subtly, but that the 
behavior was tolerated by the old i965 driver.


My short-term solution (which is perhaps better anyway) is to migrate 
away from the experimental GL backend in Cairo, but as this could 
indicate a bug in the iris driver I thought it might be interesting 
for you to be aware of.


The program is available in compiled form here: 
https://www.ida.liu.se/~TDDI16/2020/tutorials/trees.tar (I might 
update it during the coming days to work around the bug). I can 
provide both source and compiled binaries that reproduce the problem 
more consistently if you wish to investigate further. The linked 
program requires some interaction to crash, but crashes fairly 
consistently.


With kind regards,
Filip Strömbäck
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 13:46, Michel Dänzer wrote:

On 2020-02-28 12:02 p.m., Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:

On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
 wrote:

On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:

Yeah, changes on vulkan drivers or backend compilers should be
fairly
sandboxed.

We also have tools that only work for intel stuff, that should
never
trigger anything on other people's HW.

Could something be worked out using the tags?

I think so! We have the pre-defined environment variable
CI_MERGE_REQUEST_LABELS, and we can do variable conditions:

https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables

That sounds like a pretty neat middle-ground to me. I just hope
that
new pipelines are triggered if new labels are added, because not
everyone is allowed to set labels, and sometimes people forget...

There's also this which is somewhat more robust:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569

I'm not sure it's more robust, but yeah that a useful tool too.

The reason I'm skeptical about the robustness is that we'll miss
testing if this misses a path.

Surely missing a path will be less likely / often to happen compared to
an MR missing a label. (Users which aren't members of the project can't
even set labels for an MR)



Sounds like a good alternative to tags.


-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 11:28, Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:

On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
wrote:

Hi all,

You might have read the short take in the X.org board meeting
minutes
already, here's the long version.

The good news: gitlab.fd.o has become very popular with our
communities, and is used extensively. This especially includes all
the
CI integration. Modern development process and tooling, yay!

The bad news: The cost in growth has also been tremendous, and it's
breaking our bank account. With reasonable estimates for continued
growth we're expecting hosting expenses totalling 75k USD this
year,
and 90k USD next year. With the current sponsors we've set up we
can't
sustain that. We estimate that hosting expenses for gitlab.fd.o
without any of the CI features enabled would total 30k USD, which
is
within X.org's ability to support through various sponsorships,
mostly
through XDC.

Note that X.org does no longer sponsor any CI runners themselves,
we've stopped that. The huge additional expenses are all just in
storing and serving build artifacts and images to outside CI
runners
sponsored by various companies. A related topic is that with the
growth in fd.o it's becoming infeasible to maintain it all on
volunteer admin time. X.org is therefore also looking for admin
sponsorship, at least medium term.

Assuming that we want cash flow reserves for one year of
gitlab.fd.o
(without CI support) and a trimmed XDC and assuming no sponsor
payment
meanwhile, we'd have to cut CI services somewhere between May and
June
this year. The board is of course working on acquiring sponsors,
but
filling a shortfall of this magnitude is neither easy nor quick
work,
and we therefore decided to give an early warning as soon as
possible.
Any help in finding sponsors for fd.o is very much appreciated.

a) Ouch.

b) we probably need to take a large step back here.


I kinda agree, but maybe the step doesn't have to be *too* large?

I wonder if we could solve this by restructuring the project a bit. I'm
talking purely from a Mesa point of view here, so it might not solve
the full problem, but:

1. It feels silly that we need to test changes to e.g the i965 driver
on dragonboards. We only have a big "do not run CI at all" escape-
hatch.



Yeah, changes on vulkan drivers or backend compilers should be fairly 
sandboxed.


We also have tools that only work for intel stuff, that should never 
trigger anything on other people's HW.


Could something be worked out using the tags?


-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Querying Vulkan driver Options

2019-12-11 Thread Lionel Landwerlin

On 11/12/2019 09:28, Jean Hertel wrote:

From: Lionel Landwerlin 

- In case the options are different (which is quite likely), how do we query 
the available options?

For OpenGL we have glXGetScreenDriver/glXGetDriverConfig

For EGL we have eglGetDisplayDriverName/eglGetDisplayDriverConfig [2]

What about Vulkan? How can we query such options?

I guess you'll have to add an extension to provide the same feature.

Do we have any documentation or high level steps on writing a Vulkan extension?
>From mesa side, providing a Merge Request with the necessary changes is enough?



Here are a few examples of adding a Vulkan extension to the specification :

https://github.com/KhronosGroup/Vulkan-Docs/pull/966

https://github.com/KhronosGroup/Vulkan-Docs/pull/1001

https://github.com/KhronosGroup/Vulkan-Docs/pull/87/


You can open a Mesa MR in parallel to the extension PR and once the 
extension is in public release of the Vulkan spec, the Mesa MR can be 
merged.



Thanks,


-Lionel




Kind regards,
Jean Hertel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Querying Vulkan driver Options

2019-12-08 Thread Lionel Landwerlin

On 08/12/2019 16:04, Jean Hertel wrote:

Dear Mesa developers,

I'm looking into further improving adriconf[1] as a tool to configure mesa 
driver options.
Vulkan mesa drivers can now read their configuration options from the .drirc 
configuration file.
With this in mind I have the following question to mesa developers:

- Are Vulkan drivers using the same options as their OpenGL drivers?
   eg: Does Intel ANV uses the same options as i965?



You can check the options used by anv at the top of 
src/intel/vulkan/anv_device.c


Almost nothing in common with i965.




- In case the options are different (which is quite likely), how do we query 
the available options?
For OpenGL we have glXGetScreenDriver/glXGetDriverConfig
For EGL we have eglGetDisplayDriverName/eglGetDisplayDriverConfig [2]
What about Vulkan? How can we query such options?


I guess you'll have to add an extension to provide the same feature.


-Lionel



Kind regards,
Jean Hertel

[1]: https://github.com/jlHertel/adriconf
[2]: 
https://www.khronos.org/registry/EGL/extensions/MESA/EGL_MESA_query_driver.txt
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: remove second prototype of brw_batch_reloc

2019-12-05 Thread Lionel Landwerlin

Hmm... I don't see it.

Are you not confused by brw_batch_reloc/brw_state_reloc?

-Lionel

On 05/12/2019 06:56, Ilia Mirkin wrote:

Signed-off-by: Ilia Mirkin 
---
  src/mesa/drivers/dri/i965/intel_batchbuffer.h | 5 -
  1 file changed, 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index 91720dad5b4..137158f168a 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -67,11 +67,6 @@ uint64_t brw_batch_reloc(struct intel_batchbuffer *batch,
   struct brw_bo *target,
   uint32_t target_offset,
   unsigned flags);
-uint64_t brw_state_reloc(struct intel_batchbuffer *batch,
- uint32_t batch_offset,
- struct brw_bo *target,
- uint32_t target_offset,
- unsigned flags);
  
  #define USED_BATCH(_batch) \

 ((uintptr_t)((_batch).map_next - (_batch).batch.map))



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GL_INTEL_performance_query & null hw query

2019-12-03 Thread Lionel Landwerlin

Meant to Cc Mark too :)

On 03/12/2019 21:24, Lionel Landwerlin wrote:

Hi all,

Our Windows drivers ships with a particular query in 
GL_INTEL_performance_query called Intel_Null_Hardware_Query.
The query doesn't report any counter. The query isn't even listed if 
you go through the list using glGetNextPerfQueryIdINTEL().

It's only returned using glGetPerfQueryIdByNameINTEL().
When active, this query turns the draw operations & compute dispatches 
into noop at the command streamer level.


Applications like GPA [1] have been using this query to detect 
bottlenecks on the CPU side since there is no work going beyond the 
front end of the rendering engine. So you still exercise the full 
stack (in our case i965/iris + i915).


To minimize the work of getting GPA fully supported on i965/iris we 
could implement the same behavior.
Last time I asked (probably on IRC), it was kind of a no-go because 
that is somewhat abusing the meaning/goal of queries (at the time, 
applications like frameretrace didn't deal very well with a query 
having no counters).
I'm not particularly kind on having this either but I would like to 
know other people's feel.


Alternatively we have another extension providing the same feature [2].
I sent patches for i965 a bit over 18months ago, but because people 
started to using a patch implementing the Windows driver behavior, 
this didn't get much traction.

This will a require a change on the GPA side but

Which way should we go?

-Lionel

[1] : https://software.intel.com/en-us/gpa
[2] : 
https://www.khronos.org/registry/OpenGL/extensions/INTEL/INTEL_blackhole_render.txt


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] GL_INTEL_performance_query & null hw query

2019-12-03 Thread Lionel Landwerlin

Hi all,

Our Windows drivers ships with a particular query in 
GL_INTEL_performance_query called Intel_Null_Hardware_Query.
The query doesn't report any counter. The query isn't even listed if you 
go through the list using glGetNextPerfQueryIdINTEL().

It's only returned using glGetPerfQueryIdByNameINTEL().
When active, this query turns the draw operations & compute dispatches 
into noop at the command streamer level.


Applications like GPA [1] have been using this query to detect 
bottlenecks on the CPU side since there is no work going beyond the 
front end of the rendering engine. So you still exercise the full stack 
(in our case i965/iris + i915).


To minimize the work of getting GPA fully supported on i965/iris we 
could implement the same behavior.
Last time I asked (probably on IRC), it was kind of a no-go because that 
is somewhat abusing the meaning/goal of queries (at the time, 
applications like frameretrace didn't deal very well with a query having 
no counters).
I'm not particularly kind on having this either but I would like to know 
other people's feel.


Alternatively we have another extension providing the same feature [2].
I sent patches for i965 a bit over 18months ago, but because people 
started to using a patch implementing the Windows driver behavior, this 
didn't get much traction.

This will a require a change on the GPA side but

Which way should we go?

-Lionel

[1] : https://software.intel.com/en-us/gpa
[2] : 
https://www.khronos.org/registry/OpenGL/extensions/INTEL/INTEL_blackhole_render.txt


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] iris now officially OpenGL 4.6 conformant

2019-11-21 Thread Lionel Landwerlin

W \o/

Nice work!

-Lionel

On 20/11/2019 20:31, Kenneth Graunke wrote:

Hi all,

iris is now officially a conformant OpenGL 4.6 implementation!

https://www.khronos.org/conformance/adopters/conformant-products/opengl#submission_253

This is on Gen9.  I've also submitted for Gen8, but that's still under
review; Gen11 is 99% of the way there but I'm hunting down one last bug.

--Ken

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel: Load the driver even if I915_PARAM_REVISION is not found.

2019-08-19 Thread Lionel Landwerlin

On 19/08/2019 21:28, Rafael Antognolli wrote:

This param is only available starting on kernel 4.16. Use a default
value of 0 if it is not found instead.



I trace the param to :


commit 27cd44618b92fc8c6889e4628407791e45422bac
Author: Neil Roberts 
Date:   Wed Mar 4 14:41:16 2015 +

    drm/i915: Add I915_PARAM_REVISION


That seems to be back into 4.1. Could it be another issue?


-Lionel




Cc: Jordan Justen 
Cc: Mark Janes 
---
  src/intel/dev/gen_device_info.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/dev/gen_device_info.c b/src/intel/dev/gen_device_info.c
index 3953a1f4af3..375d13630a5 100644
--- a/src/intel/dev/gen_device_info.c
+++ b/src/intel/dev/gen_device_info.c
@@ -1366,7 +1366,7 @@ gen_get_device_info_from_fd(int fd, struct 
gen_device_info *devinfo)
return false;
  
 if (!getparam(fd, I915_PARAM_REVISION, >revision))

-   return false;
+  devinfo->revision = 0;
  
 if (!query_topology(devinfo, fd)) {

if (devinfo->gen >= 10) {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] android: intel/perf: fix missing include path in makefile

2019-08-09 Thread Lionel Landwerlin

Hey Mauro,

I was kind of surprised that u_math.h would pull a gallium header file.
So I pulled the thread a bit and came up with this MR : 
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1625


Sorry it's all over the place :(

-Lionel

On 09/08/2019 15:06, Mauro Rossi wrote:

Fixes the following building error:

In file included from external/mesa/src/intel/perf/gen_perf.c:42:
external/mesa/src/util/u_math.h:42:10:
fatal error: 'pipe/p_compiler.h' file not found
  ^~~
1 error generated.

Fixes: 018f9b8 ("intel/perf: refactor gen_perf_begin_query into gen_perf")
Signed-off-by: Mauro Rossi 
---
  src/intel/Android.perf.mk | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/intel/Android.perf.mk b/src/intel/Android.perf.mk
index 0d7d746a63..c99d84c4be 100644
--- a/src/intel/Android.perf.mk
+++ b/src/intel/Android.perf.mk
@@ -31,7 +31,9 @@ LOCAL_MODULE_CLASS := STATIC_LIBRARIES
  
  intermediates := $(call local-generated-sources-dir)
  
-LOCAL_C_INCLUDES := $(MESA_TOP)/include/drm-uapi

+LOCAL_C_INCLUDES := \
+   $(MESA_TOP)/include/drm-uapi \
+   $(MESA_TOP)/src/gallium/include
  
  LOCAL_SRC_FILES := $(GEN_PERF_FILES)
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] android: vulkan/util: fix generating vk_enum_to_str.*

2019-07-05 Thread Lionel Landwerlin
Thanks this looks good to me : Reviewed-by: Lionel Landwerlin 



On 05/07/2019 11:52, Chih-Wei Huang wrote:

The gen_enum_to_str.py generates vk_enum_to_str.c and its header at once.
However, the makefiles incorrectly list both files parallel with the same
recipes. That means both two files may be generated simultaneously by two
processes. The generating files may be truncated by another process, as
shown below:

$ cd $OUT/obj/STATIC_LIBRARIES/libmesa_vulkan_util_intermediates/util
$ ls -l

-rw-rw-r-- 1 lh lh 193713 Jul  5 13:31 vk_enum_to_str.c
-rw-rw-r-- 1 lh lh   4609 Jul  5 13:31 vk_enum_to_str.d
-rw-rw-r-- 1 lh lh  0 Jul  5 16:21 vk_enum_to_str.h

Let one file depends on the other with empty recipe to avoid the issue.

Signed-off-by: Chih-Wei Huang 
---
  src/vulkan/Android.mk | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/vulkan/Android.mk b/src/vulkan/Android.mk
index c3230d8..71aa5e5 100644
--- a/src/vulkan/Android.mk
+++ b/src/vulkan/Android.mk
@@ -54,14 +54,16 @@ LOCAL_SRC_FILES := $(VULKAN_UTIL_FILES) $(VULKAN_WSI_FILES)
  
  vulkan_api_xml = $(MESA_TOP)/src/vulkan/registry/vk.xml
  
-$(LOCAL_GENERATED_SOURCES): $(MESA_TOP)/src/vulkan/util/gen_enum_to_str.py \

+$(firstword $(LOCAL_GENERATED_SOURCES)): 
$(MESA_TOP)/src/vulkan/util/gen_enum_to_str.py \
$(vulkan_api_xml)
@echo "target Generated: $(PRIVATE_MODULE) <= $(notdir $(@))"
@mkdir -p $(dir $@)
-   $(hide) $(MESA_PYTHON2) $(MESA_TOP)/src/vulkan/util/gen_enum_to_str.py \
+   $(hide) $(MESA_PYTHON2) $< \
--xml $(vulkan_api_xml) \
--outdir $(dir $@)
  
+$(lastword $(LOCAL_GENERATED_SOURCES)): $(firstword $(LOCAL_GENERATED_SOURCES))

+
  LOCAL_EXPORT_C_INCLUDE_DIRS := $(intermediates)/util
  
  ifeq ($(filter $(MESA_ANDROID_MAJOR_VERSION), 4 5 6 7),)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] imgui: fix undefined behaviour bitshift.

2019-05-17 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

And reported upstream : https://github.com/ocornut/imgui/pull/2561

On 17/05/2019 03:22, Dave Airlie wrote:

From: Dave Airlie 

imgui_draw.cpp:1781: error[shiftTooManyBitsSigned]: Shifting signed 32-bit 
value by 31 bits is undefined behaviour

Reported by coverity
---
  src/imgui/imgui_draw.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/imgui/imgui_draw.cpp b/src/imgui/imgui_draw.cpp
index 1d4e1d51692..c69c01ee801 100644
--- a/src/imgui/imgui_draw.cpp
+++ b/src/imgui/imgui_draw.cpp
@@ -1778,7 +1778,7 @@ static void UnpackBoolVectorToFlatIndexList(const 
ImBoolVector* in, ImVectorpush_back((int)((it - it_begin) << 5) + bit_n);
  }
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/query: ensure visibility of reset before copying query results

2019-05-01 Thread Lionel Landwerlin
I've uploaded this MR : 
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/776


Which actually highlights a bigger issue with our queries.

-Lionel

On 01/05/2019 11:58, Lionel Landwerlin wrote:
Experimenting a bit with the visibility of PIPE_CONTROL writes & MI_* 
commands I realized those are not coherent.
Essentially anything written with a PIPE_CONTROL will be visible to 
MI_* commands only after some time.
In my experiment I had to insert 2 MI instructions after the 
PIPE_CONTROL write to actually have the following MI_COPY_MEM_MEM 
command pick up what was written by the pipe control.


So my recommendation would be to always write the availability with 
the same part of the command streamer (PIPE_CONTROL).
So that the sequences of writes to the same location are not subject 
to complicated synchronization rules.


Essentially just make emit_query_availability() take an additional 
boolean and then use it in CmdResetQueryPool().


-Lionel

On 30/04/2019 18:18, Lionel Landwerlin wrote:

Let me check the new tests and see if where the problem is.
Thanks for letting us know!

-Lionel

On 30/04/2019 13:43, Iago Toral Quiroga wrote:

Specifically, vkCmdCopyQueryPoolResults is required to see the effect
of a previous vkCmdResetQueryPool. This may not work currently when
query execution is still on going, as some of the queries may become
available asynchronously after the reset.

Fixes new CTS tests:
dEQP-VK.query_pool.statistics_query.reset_before_copy.*
---

Jason, do you have any better ideas?

  src/intel/vulkan/genX_query.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/src/intel/vulkan/genX_query.c 
b/src/intel/vulkan/genX_query.c

index 146435c3f8f..08b013f6351 100644
--- a/src/intel/vulkan/genX_query.c
+++ b/src/intel/vulkan/genX_query.c
@@ -383,6 +383,19 @@ void genX(CmdResetQueryPool)(
 ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
 ANV_FROM_HANDLE(anv_query_pool, pool, queryPool);
  +   /* From the Vulkan spec:
+    *
+    *    "vkCmdCopyQueryPoolResults is guaranteed to see the effect of
+    * previous uses of vkCmdResetQueryPool in the same queue, 
without
+    * any additional synchronization. Thus, the results will 
always

+    * reflect the most recent use of the query."
+    *
+    * So we need to make sure that any on-going queries are 
finished by

+    * the time we emit the reset.
+    */
+   cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
+   genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
+
 for (uint32_t i = 0; i < queryCount; i++) {
    anv_batch_emit(_buffer->batch, GENX(MI_STORE_DATA_IMM), 
sdm) {

   sdm.Address = anv_query_address(pool, firstQuery + i);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/query: ensure visibility of reset before copying query results

2019-05-01 Thread Lionel Landwerlin
Experimenting a bit with the visibility of PIPE_CONTROL writes & MI_* 
commands I realized those are not coherent.
Essentially anything written with a PIPE_CONTROL will be visible to MI_* 
commands only after some time.
In my experiment I had to insert 2 MI instructions after the 
PIPE_CONTROL write to actually have the following MI_COPY_MEM_MEM 
command pick up what was written by the pipe control.


So my recommendation would be to always write the availability with the 
same part of the command streamer (PIPE_CONTROL).
So that the sequences of writes to the same location are not subject to 
complicated synchronization rules.


Essentially just make emit_query_availability() take an additional 
boolean and then use it in CmdResetQueryPool().


-Lionel

On 30/04/2019 18:18, Lionel Landwerlin wrote:

Let me check the new tests and see if where the problem is.
Thanks for letting us know!

-Lionel

On 30/04/2019 13:43, Iago Toral Quiroga wrote:

Specifically, vkCmdCopyQueryPoolResults is required to see the effect
of a previous vkCmdResetQueryPool. This may not work currently when
query execution is still on going, as some of the queries may become
available asynchronously after the reset.

Fixes new CTS tests:
dEQP-VK.query_pool.statistics_query.reset_before_copy.*
---

Jason, do you have any better ideas?

  src/intel/vulkan/genX_query.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/src/intel/vulkan/genX_query.c 
b/src/intel/vulkan/genX_query.c

index 146435c3f8f..08b013f6351 100644
--- a/src/intel/vulkan/genX_query.c
+++ b/src/intel/vulkan/genX_query.c
@@ -383,6 +383,19 @@ void genX(CmdResetQueryPool)(
 ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
 ANV_FROM_HANDLE(anv_query_pool, pool, queryPool);
  +   /* From the Vulkan spec:
+    *
+    *    "vkCmdCopyQueryPoolResults is guaranteed to see the effect of
+    * previous uses of vkCmdResetQueryPool in the same queue, 
without

+    * any additional synchronization. Thus, the results will always
+    * reflect the most recent use of the query."
+    *
+    * So we need to make sure that any on-going queries are finished by
+    * the time we emit the reset.
+    */
+   cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
+   genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
+
 for (uint32_t i = 0; i < queryCount; i++) {
    anv_batch_emit(_buffer->batch, GENX(MI_STORE_DATA_IMM), 
sdm) {

   sdm.Address = anv_query_address(pool, firstQuery + i);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/query: ensure visibility of reset before copying query results

2019-04-30 Thread Lionel Landwerlin

Let me check the new tests and see if where the problem is.
Thanks for letting us know!

-Lionel

On 30/04/2019 13:43, Iago Toral Quiroga wrote:

Specifically, vkCmdCopyQueryPoolResults is required to see the effect
of a previous vkCmdResetQueryPool. This may not work currently when
query execution is still on going, as some of the queries may become
available asynchronously after the reset.

Fixes new CTS tests:
dEQP-VK.query_pool.statistics_query.reset_before_copy.*
---

Jason, do you have any better ideas?

  src/intel/vulkan/genX_query.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/src/intel/vulkan/genX_query.c b/src/intel/vulkan/genX_query.c
index 146435c3f8f..08b013f6351 100644
--- a/src/intel/vulkan/genX_query.c
+++ b/src/intel/vulkan/genX_query.c
@@ -383,6 +383,19 @@ void genX(CmdResetQueryPool)(
 ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
 ANV_FROM_HANDLE(anv_query_pool, pool, queryPool);
  
+   /* From the Vulkan spec:

+*
+*"vkCmdCopyQueryPoolResults is guaranteed to see the effect of
+* previous uses of vkCmdResetQueryPool in the same queue, without
+* any additional synchronization. Thus, the results will always
+* reflect the most recent use of the query."
+*
+* So we need to make sure that any on-going queries are finished by
+* the time we emit the reset.
+*/
+   cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
+   genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
+
 for (uint32_t i = 0; i < queryCount; i++) {
anv_batch_emit(_buffer->batch, GENX(MI_STORE_DATA_IMM), sdm) {
   sdm.Address = anv_query_address(pool, firstQuery + i);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] anv/ehl: 36bits ppgtt support

2019-04-11 Thread Lionel Landwerlin
I started this MR : 
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/637


On 11/04/2019 13:06, Lionel Landwerlin wrote:
Sorry, upon rereading the code of the various drivers, it seems 
i965/iris handle this properly already.


I have some comments below.

On 11/04/2019 11:36, Lionel Landwerlin wrote:

Hi James,

Thanks a lot for reporting this.

I think this is something we should store in the gen_device_info and 
update with kernel ioctl when supported.

This affects other drivers, not just anv.

-Lionel

On 10/04/2019 23:55, James Xiong wrote:

From: "Xiong, James" 

The vma high heap's capacity and maximum address were pre-defined based
on 48bits ppgtt support, and the buffers allocated from the vma high 
heap

had invalid vma addresses to the ehl platform that only supports 36bits
ppgtt. As a result, KMD rejected all batchbuffers submitted by vulkan.

This patch:
1) initializes the vma high heap by retrieving the gtt capacity from 
KMD

and calculating the size and max address on the fly.
2) enables softpin when full ppgtt is enabled

V2: change commit messages and comments to refect the changes [Bob, 
Jason]

 remove define HIGH_HEAP_SIZE [Bob]
 make sure there's enough space to enable softspin [Jason]

Signed-off-by: Xiong, James 
---
  src/intel/vulkan/anv_device.c  | 30 +++---
  src/intel/vulkan/anv_private.h |  7 ---
  2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c 
b/src/intel/vulkan/anv_device.c

index 88b34c4..c3eff1c 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -434,7 +434,12 @@ anv_physical_device_init(struct 
anv_physical_device *device,

anv_gem_supports_syncobj_wait(fd);
 device->has_context_priority = anv_gem_has_context_priority(fd);
  +   /*
+    * make sure there are enough VA space(i.e. 32+bit support) and 
full ggtt

+    * is enabled.
+    */
 device->use_softpin = anv_gem_get_param(fd, 
I915_PARAM_HAS_EXEC_SOFTPIN)

+  && (anv_gem_get_param(fd, I915_PARAM_HAS_ALIASING_PPGTT) > 1)
    && device->supports_48bit_addresses;
   device->has_context_isolation =
@@ -1981,14 +1986,25 @@ VkResult anv_CreateDevice(
    device->vma_lo_available =
physical_device->memory.heaps[physical_device->memory.heap_count - 
1].size;
  -  /* Leave the last 4GiB out of the high vma range, so that 
no state base
-   * address + size can overflow 48 bits. For more information 
see the

-   * comment about Wa32bitGeneralStateOffset in anv_allocator.c
-   */
-  util_vma_heap_init(>vma_hi, HIGH_HEAP_MIN_ADDRESS,
- HIGH_HEAP_SIZE);
    device->vma_hi_available = 
physical_device->memory.heap_count == 1 ? 0 :

   physical_device->memory.heaps[0].size;
+
+  /* Retrieve the GTT's capacity and leave the last 4GiB out of 
the high vma
+   * range, so that no state base address + size can overflow 
the vma range. For
+   * more information see the comment about 
Wa32bitGeneralStateOffset in

+   * anv_allocator.c
+   */
+  uint64_t size = 0;
+  anv_gem_get_context_param(device->fd, 0, 
I915_CONTEXT_PARAM_GTT_SIZE,

+    );



I don't think you need to requery the gtt size, this is already done 
when initializing the physical device.


I think we can do something better by storing the bounds in the 
physical device and just reusing that at logical device creation.




+  if(size > HIGH_HEAP_MIN_ADDRESS + (1ull<<32)) {
+ size -= HIGH_HEAP_MIN_ADDRESS + (1ull<<32);
+ device->vma_hi_max_addr = HIGH_HEAP_MIN_ADDRESS + size - 1;
+  } else {
+ size = device->vma_hi_max_addr = 0;
+  }
+
+  util_vma_heap_init(>vma_hi, HIGH_HEAP_MIN_ADDRESS, 
size);

 }
   /* As per spec, the driver implementation may deny requests 
to acquire
@@ -2456,7 +2472,7 @@ anv_vma_free(struct anv_device *device, struct 
anv_bo *bo)

    device->vma_lo_available += bo->size;
 } else {
    assert(addr_48b >= HIGH_HEAP_MIN_ADDRESS &&
- addr_48b <= HIGH_HEAP_MAX_ADDRESS);
+ addr_48b <= device->vma_hi_max_addr);
    util_vma_heap_free(>vma_hi, addr_48b, bo->size);
    device->vma_hi_available += bo->size;
 }
diff --git a/src/intel/vulkan/anv_private.h 
b/src/intel/vulkan/anv_private.h

index 1664918..ef9b012 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -109,6 +109,9 @@ struct gen_l3_config;
   * heap. Various hardware units will read past the end of an 
object for
   * various reasons. This healthy margin prevents reads from 
wrapping around

   * 48-bit addresses.
+ *
+ * (4) the high vma heap size and max address are calculated based 
on the

+ * gtt capacity retrieved from KMD.
   */
  #define LOW_HEAP_MIN_ADDRESS  

Re: [Mesa-dev] [PATCH 2/2] intel/compiler: fix uninit non-static variable.

2019-04-11 Thread Lionel Landwerlin

To be honest we're not initializing nir_locals either :/

Reviewed-by: Lionel Landwerlin 

On 11/04/2019 11:32, Dave Airlie wrote:

From: Dave Airlie 

Pointed out by coverity.
---
  src/intel/compiler/brw_vec4_visitor.cpp | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/src/intel/compiler/brw_vec4_visitor.cpp 
b/src/intel/compiler/brw_vec4_visitor.cpp
index 16ee31d730a..fa3d7fc13b7 100644
--- a/src/intel/compiler/brw_vec4_visitor.cpp
+++ b/src/intel/compiler/brw_vec4_visitor.cpp
@@ -1887,6 +1887,8 @@ vec4_visitor::vec4_visitor(const struct brw_compiler 
*compiler,
 this->max_grf = devinfo->gen >= 7 ? GEN7_MRF_HACK_START : BRW_MAX_GRF;
  
 this->uniforms = 0;

+
+   this->nir_ssa_values = NULL;
  }
  
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] anv/ehl: 36bits ppgtt support

2019-04-11 Thread Lionel Landwerlin
Sorry, upon rereading the code of the various drivers, it seems 
i965/iris handle this properly already.


I have some comments below.

On 11/04/2019 11:36, Lionel Landwerlin wrote:

Hi James,

Thanks a lot for reporting this.

I think this is something we should store in the gen_device_info and 
update with kernel ioctl when supported.

This affects other drivers, not just anv.

-Lionel

On 10/04/2019 23:55, James Xiong wrote:

From: "Xiong, James" 

The vma high heap's capacity and maximum address were pre-defined based
on 48bits ppgtt support, and the buffers allocated from the vma high 
heap

had invalid vma addresses to the ehl platform that only supports 36bits
ppgtt. As a result, KMD rejected all batchbuffers submitted by vulkan.

This patch:
1) initializes the vma high heap by retrieving the gtt capacity from KMD
and calculating the size and max address on the fly.
2) enables softpin when full ppgtt is enabled

V2: change commit messages and comments to refect the changes [Bob, 
Jason]

 remove define HIGH_HEAP_SIZE [Bob]
 make sure there's enough space to enable softspin [Jason]

Signed-off-by: Xiong, James 
---
  src/intel/vulkan/anv_device.c  | 30 +++---
  src/intel/vulkan/anv_private.h |  7 ---
  2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c 
b/src/intel/vulkan/anv_device.c

index 88b34c4..c3eff1c 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -434,7 +434,12 @@ anv_physical_device_init(struct 
anv_physical_device *device,

anv_gem_supports_syncobj_wait(fd);
 device->has_context_priority = anv_gem_has_context_priority(fd);
  +   /*
+    * make sure there are enough VA space(i.e. 32+bit support) and 
full ggtt

+    * is enabled.
+    */
 device->use_softpin = anv_gem_get_param(fd, 
I915_PARAM_HAS_EXEC_SOFTPIN)

+  && (anv_gem_get_param(fd, I915_PARAM_HAS_ALIASING_PPGTT) > 1)
    && device->supports_48bit_addresses;
   device->has_context_isolation =
@@ -1981,14 +1986,25 @@ VkResult anv_CreateDevice(
    device->vma_lo_available =
physical_device->memory.heaps[physical_device->memory.heap_count - 
1].size;
  -  /* Leave the last 4GiB out of the high vma range, so that no 
state base
-   * address + size can overflow 48 bits. For more information 
see the

-   * comment about Wa32bitGeneralStateOffset in anv_allocator.c
-   */
-  util_vma_heap_init(>vma_hi, HIGH_HEAP_MIN_ADDRESS,
- HIGH_HEAP_SIZE);
    device->vma_hi_available = physical_device->memory.heap_count 
== 1 ? 0 :

   physical_device->memory.heaps[0].size;
+
+  /* Retrieve the GTT's capacity and leave the last 4GiB out of 
the high vma
+   * range, so that no state base address + size can overflow 
the vma range. For
+   * more information see the comment about 
Wa32bitGeneralStateOffset in

+   * anv_allocator.c
+   */
+  uint64_t size = 0;
+  anv_gem_get_context_param(device->fd, 0, 
I915_CONTEXT_PARAM_GTT_SIZE,

+    );



I don't think you need to requery the gtt size, this is already done 
when initializing the physical device.


I think we can do something better by storing the bounds in the physical 
device and just reusing that at logical device creation.




+  if(size > HIGH_HEAP_MIN_ADDRESS + (1ull<<32)) {
+ size -= HIGH_HEAP_MIN_ADDRESS + (1ull<<32);
+ device->vma_hi_max_addr = HIGH_HEAP_MIN_ADDRESS + size - 1;
+  } else {
+ size = device->vma_hi_max_addr = 0;
+  }
+
+  util_vma_heap_init(>vma_hi, HIGH_HEAP_MIN_ADDRESS, size);
 }
   /* As per spec, the driver implementation may deny requests to 
acquire
@@ -2456,7 +2472,7 @@ anv_vma_free(struct anv_device *device, struct 
anv_bo *bo)

    device->vma_lo_available += bo->size;
 } else {
    assert(addr_48b >= HIGH_HEAP_MIN_ADDRESS &&
- addr_48b <= HIGH_HEAP_MAX_ADDRESS);
+ addr_48b <= device->vma_hi_max_addr);
    util_vma_heap_free(>vma_hi, addr_48b, bo->size);
    device->vma_hi_available += bo->size;
 }
diff --git a/src/intel/vulkan/anv_private.h 
b/src/intel/vulkan/anv_private.h

index 1664918..ef9b012 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -109,6 +109,9 @@ struct gen_l3_config;
   * heap. Various hardware units will read past the end of an object 
for
   * various reasons. This healthy margin prevents reads from 
wrapping around

   * 48-bit addresses.
+ *
+ * (4) the high vma heap size and max address are calculated based 
on the

+ * gtt capacity retrieved from KMD.
   */
  #define LOW_HEAP_MIN_ADDRESS   0x1000ULL /* 4 
KiB */

  #define LOW_HEAP_MAX_ADDRESS   0xbfffULL
@@ -121,12 +124,9 @@ struct gen_l3_co

Re: [Mesa-dev] [PATCH 1/1] anv/ehl: 36bits ppgtt support

2019-04-11 Thread Lionel Landwerlin

Hi James,

Thanks a lot for reporting this.

I think this is something we should store in the gen_device_info and 
update with kernel ioctl when supported.

This affects other drivers, not just anv.

-Lionel

On 10/04/2019 23:55, James Xiong wrote:

From: "Xiong, James" 

The vma high heap's capacity and maximum address were pre-defined based
on 48bits ppgtt support, and the buffers allocated from the vma high heap
had invalid vma addresses to the ehl platform that only supports 36bits
ppgtt. As a result, KMD rejected all batchbuffers submitted by vulkan.

This patch:
1) initializes the vma high heap by retrieving the gtt capacity from KMD
and calculating the size and max address on the fly.
2) enables softpin when full ppgtt is enabled

V2: change commit messages and comments to refect the changes [Bob, Jason]
 remove define HIGH_HEAP_SIZE [Bob]
 make sure there's enough space to enable softspin [Jason]

Signed-off-by: Xiong, James 
---
  src/intel/vulkan/anv_device.c  | 30 +++---
  src/intel/vulkan/anv_private.h |  7 ---
  2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 88b34c4..c3eff1c 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -434,7 +434,12 @@ anv_physical_device_init(struct anv_physical_device 
*device,
anv_gem_supports_syncobj_wait(fd);
 device->has_context_priority = anv_gem_has_context_priority(fd);
  
+   /*

+* make sure there are enough VA space(i.e. 32+bit support) and full ggtt
+* is enabled.
+*/
 device->use_softpin = anv_gem_get_param(fd, I915_PARAM_HAS_EXEC_SOFTPIN)
+  && (anv_gem_get_param(fd, I915_PARAM_HAS_ALIASING_PPGTT) > 1)
&& device->supports_48bit_addresses;
  
 device->has_context_isolation =

@@ -1981,14 +1986,25 @@ VkResult anv_CreateDevice(
device->vma_lo_available =
   physical_device->memory.heaps[physical_device->memory.heap_count - 
1].size;
  
-  /* Leave the last 4GiB out of the high vma range, so that no state base

-   * address + size can overflow 48 bits. For more information see the
-   * comment about Wa32bitGeneralStateOffset in anv_allocator.c
-   */
-  util_vma_heap_init(>vma_hi, HIGH_HEAP_MIN_ADDRESS,
- HIGH_HEAP_SIZE);
device->vma_hi_available = physical_device->memory.heap_count == 1 ? 0 :
   physical_device->memory.heaps[0].size;
+
+  /* Retrieve the GTT's capacity and leave the last 4GiB out of the high 
vma
+   * range, so that no state base address + size can overflow the vma 
range. For
+   * more information see the comment about Wa32bitGeneralStateOffset in
+   * anv_allocator.c
+   */
+  uint64_t size = 0;
+  anv_gem_get_context_param(device->fd, 0, I915_CONTEXT_PARAM_GTT_SIZE,
+);
+  if(size > HIGH_HEAP_MIN_ADDRESS + (1ull<<32)) {
+ size -= HIGH_HEAP_MIN_ADDRESS + (1ull<<32);
+ device->vma_hi_max_addr = HIGH_HEAP_MIN_ADDRESS + size - 1;
+  } else {
+ size = device->vma_hi_max_addr = 0;
+  }
+
+  util_vma_heap_init(>vma_hi, HIGH_HEAP_MIN_ADDRESS, size);
 }
  
 /* As per spec, the driver implementation may deny requests to acquire

@@ -2456,7 +2472,7 @@ anv_vma_free(struct anv_device *device, struct anv_bo *bo)
device->vma_lo_available += bo->size;
 } else {
assert(addr_48b >= HIGH_HEAP_MIN_ADDRESS &&
- addr_48b <= HIGH_HEAP_MAX_ADDRESS);
+ addr_48b <= device->vma_hi_max_addr);
util_vma_heap_free(>vma_hi, addr_48b, bo->size);
device->vma_hi_available += bo->size;
 }
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 1664918..ef9b012 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -109,6 +109,9 @@ struct gen_l3_config;
   * heap. Various hardware units will read past the end of an object for
   * various reasons. This healthy margin prevents reads from wrapping around
   * 48-bit addresses.
+ *
+ * (4) the high vma heap size and max address are calculated based on the
+ * gtt capacity retrieved from KMD.
   */
  #define LOW_HEAP_MIN_ADDRESS   0x1000ULL /* 4 KiB */
  #define LOW_HEAP_MAX_ADDRESS   0xbfffULL
@@ -121,12 +124,9 @@ struct gen_l3_config;
  #define INSTRUCTION_STATE_POOL_MIN_ADDRESS 0x00018000ULL /* 6 GiB */
  #define INSTRUCTION_STATE_POOL_MAX_ADDRESS 0x0001bfffULL
  #define HIGH_HEAP_MIN_ADDRESS  0x0001c000ULL /* 7 GiB */
-#define HIGH_HEAP_MAX_ADDRESS  0xfffeULL
  
  #define LOW_HEAP_SIZE   \

 (LOW_HEAP_MAX_ADDRESS - LOW_HEAP_MIN_ADDRESS + 1)
-#define HIGH_HEAP_SIZE  \
-   (HIGH_HEAP_MAX_ADDRESS - HIGH_HEAP_MIN_ADDRESS + 1)
  #define DYNAMIC_STATE_POOL_SIZE \
 

Re: [Mesa-dev] [PATCH v3] wsi: allow to override the present mode with MESA_VK_WSI_PRESENT_MODE

2019-04-09 Thread Lionel Landwerlin

There is a small nit below which you don't have to apply.

With the correct bug reference :

Reviewed-by: Lionel Landwerlin 

On 09/04/2019 15:18, Samuel Pitoiset wrote:

This is common to all Vulkan drivers and all WSI.

v3: - move the invalid check in wsi_device_init()
 - use VK_PRESENT_MODE_MAX_ENUM_KHR as default value

v2: - store the override in wsi_device_init()
 - do not abort when an invalid value is detected
 - check supported present modes

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107391



Not sure this is the right bug?



Signed-off-by: Samuel Pitoiset 
---
  src/vulkan/wsi/wsi_common.c | 68 +
  src/vulkan/wsi/wsi_common.h |  1 +
  src/vulkan/wsi/wsi_common_display.c |  2 +-
  src/vulkan/wsi/wsi_common_private.h |  4 ++
  src/vulkan/wsi/wsi_common_wayland.c |  2 +-
  src/vulkan/wsi/wsi_common_x11.c |  2 +-
  6 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common.c b/src/vulkan/wsi/wsi_common.c
index 3cba0a4b06e..48d7ed53a0d 100644
--- a/src/vulkan/wsi/wsi_common.c
+++ b/src/vulkan/wsi/wsi_common.c
@@ -29,6 +29,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  
  VkResult

  wsi_device_init(struct wsi_device *wsi,
@@ -37,6 +39,7 @@ wsi_device_init(struct wsi_device *wsi,
  const VkAllocationCallbacks *alloc,
  int display_fd)
  {
+   const char *present_mode;
 VkResult result;
  
 memset(wsi, 0, sizeof(*wsi));

@@ -60,6 +63,7 @@ wsi_device_init(struct wsi_device *wsi,
 GetPhysicalDeviceProperties2(pdevice, );
  
 wsi->maxImageDimension2D = pdp2.properties.limits.maxImageDimension2D;

+   wsi->override_present_mode = VK_PRESENT_MODE_MAX_ENUM_KHR;
  
 GetPhysicalDeviceMemoryProperties(pdevice, >memory_props);

 GetPhysicalDeviceQueueFamilyProperties(pdevice, >queue_family_count, 
NULL);
@@ -112,6 +116,19 @@ wsi_device_init(struct wsi_device *wsi,
goto fail;
  #endif
  
+   present_mode = getenv("MESA_VK_WSI_PRESENT_MODE");

+   if (present_mode) {
+  if (!strcmp(present_mode, "fifo")) {
+ wsi->override_present_mode = VK_PRESENT_MODE_FIFO_KHR;
+  } else if (!strcmp(present_mode, "mailbox")) {
+ wsi->override_present_mode = VK_PRESENT_MODE_MAILBOX_KHR;
+  } else if (!strcmp(present_mode, "immediate")) {
+ wsi->override_present_mode = VK_PRESENT_MODE_IMMEDIATE_KHR;
+  } else {
+ fprintf(stderr, "Invalid MESA_VK_WSI_PRESENT_MODE value!\n");
+  }
+   }
+
 return VK_SUCCESS;
  
  fail:

@@ -202,6 +219,57 @@ fail:
 return result;
  }
  
+static bool

+wsi_swapchain_is_present_mode_supported(struct wsi_device *wsi,
+const VkSwapchainCreateInfoKHR 
*pCreateInfo,
+VkPresentModeKHR mode)
+{
+  ICD_FROM_HANDLE(VkIcdSurfaceBase, surface, pCreateInfo->surface);
+  struct wsi_interface *iface = wsi->wsi[surface->platform];
+  VkPresentModeKHR *present_modes;
+  uint32_t present_mode_count;
+  bool supported = false;
+  VkResult result;
+
+  result = iface->get_present_modes(surface, _mode_count, NULL);
+  if (result != VK_SUCCESS)
+ return supported;
+
+  present_modes = malloc(present_mode_count * sizeof(*present_modes));
+  if (!present_modes)
+ return supported;
+
+  result = iface->get_present_modes(surface, _mode_count,
+present_modes);
+  if (result != VK_SUCCESS)
+ goto fail;
+
+  for (uint32_t i = 0; i < present_mode_count; i++) {
+ if (present_modes[i] == mode)
+supported = true;



You can break once it's found ;)



+  }
+
+fail:
+  free(present_modes);
+  return supported;
+}
+
+enum VkPresentModeKHR
+wsi_swapchain_get_present_mode(struct wsi_device *wsi,
+   const VkSwapchainCreateInfoKHR *pCreateInfo)
+{
+   if (wsi->override_present_mode == VK_PRESENT_MODE_MAX_ENUM_KHR)
+  return pCreateInfo->presentMode;
+
+   if (!wsi_swapchain_is_present_mode_supported(wsi, pCreateInfo,
+wsi->override_present_mode)) {
+  fprintf(stderr, "Unsupported MESA_VK_WSI_PRESENT_MODE value!\n");
+  return pCreateInfo->presentMode;
+   }
+
+   return wsi->override_present_mode;
+}
+
  void
  wsi_swapchain_finish(struct wsi_swapchain *chain)
  {
diff --git a/src/vulkan/wsi/wsi_common.h b/src/vulkan/wsi/wsi_common.h
index e693e2be425..6446320b95d 100644
--- a/src/vulkan/wsi/wsi_common.h
+++ b/src/vulkan/wsi/wsi_common.h
@@ -101,6 +101,7 @@ struct wsi_device {
  
 bool supports_modifiers;

 uint32_t maxImageDimension2D;
+   VkPresentModeKHR override_present_mode;
  
 uint64_t (*image_get_modifier)(VkImage image);
  
diff --git a/src/vulkan/wsi/

Re: [Mesa-dev] [PATCH v2] wsi: allow to override the present mode with MESA_VK_WSI_PRESENT_MODE

2019-04-09 Thread Lionel Landwerlin

On 09/04/2019 14:31, Samuel Pitoiset wrote:

This is common to all Vulkan drivers and all WSI.

v2: - store the override in wsi_device_init()
 - do not abort when an invalid value is detected
 - check supported present modes

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107391
Signed-off-by: Samuel Pitoiset 
---
  src/vulkan/wsi/wsi_common.c | 67 +
  src/vulkan/wsi/wsi_common.h |  1 +
  src/vulkan/wsi/wsi_common_display.c |  2 +-
  src/vulkan/wsi/wsi_common_private.h |  4 ++
  src/vulkan/wsi/wsi_common_wayland.c |  2 +-
  src/vulkan/wsi/wsi_common_x11.c |  2 +-
  6 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common.c b/src/vulkan/wsi/wsi_common.c
index 3cba0a4b06e..1d70b04e18b 100644
--- a/src/vulkan/wsi/wsi_common.c
+++ b/src/vulkan/wsi/wsi_common.c
@@ -29,6 +29,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  
  VkResult

  wsi_device_init(struct wsi_device *wsi,
@@ -112,6 +114,8 @@ wsi_device_init(struct wsi_device *wsi,
goto fail;
  #endif
  
+   wsi->override_present_mode = getenv("MESA_VK_WSI_PRESENT_MODE");

+
 return VK_SUCCESS;
  
  fail:

@@ -202,6 +206,69 @@ fail:
 return result;
  }
  
+static bool

+wsi_swapchain_is_present_mode_supported(struct wsi_device *wsi,
+const VkSwapchainCreateInfoKHR 
*pCreateInfo,
+VkPresentModeKHR mode)
+{
+  ICD_FROM_HANDLE(VkIcdSurfaceBase, surface, pCreateInfo->surface);
+  struct wsi_interface *iface = wsi->wsi[surface->platform];
+  VkPresentModeKHR *present_modes;
+  uint32_t present_mode_count;
+  bool supported = false;
+  VkResult result;
+
+  result = iface->get_present_modes(surface, _mode_count, NULL);
+  if (result != VK_SUCCESS)
+ return supported;
+
+  present_modes = malloc(present_mode_count * sizeof(*present_modes));
+  if (!present_modes)
+ return supported;
+
+  result = iface->get_present_modes(surface, _mode_count,
+present_modes);
+  if (result != VK_SUCCESS)
+ goto fail;
+
+  for (uint32_t i = 0; i < present_mode_count; i++) {
+ if (present_modes[i] == mode)
+supported = true;
+  }
+
+fail:
+  free(present_modes);
+  return supported;
+}
+
+enum VkPresentModeKHR
+wsi_swapchain_get_present_mode(struct wsi_device *wsi,
+   const VkSwapchainCreateInfoKHR *pCreateInfo)
+{
+   VkPresentModeKHR mode;
+
+   if (!wsi->override_present_mode)
+  return pCreateInfo->presentMode;
+
+   if (!strcmp(wsi->override_present_mode, "fifo")) {
+  mode = VK_PRESENT_MODE_FIFO_KHR;
+   } else if (!strcmp(wsi->override_present_mode, "mailbox")) {
+  mode = VK_PRESENT_MODE_MAILBOX_KHR;
+   } else if (!strcmp(wsi->override_present_mode, "immediate")) {
+  mode = VK_PRESENT_MODE_IMMEDIATE_KHR;
+   } else {
+  fprintf(stderr, "Invalid MESA_VK_WSI_PRESENT_MODE value!\n");
+  return pCreateInfo->presentMode;
+   }



I would put the conversion of string -> VkPresentModeKHR in the init 
function.


You can use VK_PRESENT_MODE_MAX_ENUM_KHR as an unset value.


Otherwise looks good, thanks for the update :)


-Lionel



+
+   if (!wsi_swapchain_is_present_mode_supported(wsi, pCreateInfo, mode)) {
+  fprintf(stderr, "Unsupported MESA_VK_WSI_PRESENT_MODE value!\n");
+  return pCreateInfo->presentMode;
+   }
+
+   return mode;
+}
+
  void
  wsi_swapchain_finish(struct wsi_swapchain *chain)
  {
diff --git a/src/vulkan/wsi/wsi_common.h b/src/vulkan/wsi/wsi_common.h
index e693e2be425..5ca376b4c49 100644
--- a/src/vulkan/wsi/wsi_common.h
+++ b/src/vulkan/wsi/wsi_common.h
@@ -101,6 +101,7 @@ struct wsi_device {
  
 bool supports_modifiers;

 uint32_t maxImageDimension2D;
+   const char *override_present_mode;
  
 uint64_t (*image_get_modifier)(VkImage image);
  
diff --git a/src/vulkan/wsi/wsi_common_display.c b/src/vulkan/wsi/wsi_common_display.c

index 09c18315623..74ed36ed646 100644
--- a/src/vulkan/wsi/wsi_common_display.c
+++ b/src/vulkan/wsi/wsi_common_display.c
@@ -1757,7 +1757,7 @@ wsi_display_surface_create_swapchain(
 chain->base.get_wsi_image = wsi_display_get_wsi_image;
 chain->base.acquire_next_image = wsi_display_acquire_next_image;
 chain->base.queue_present = wsi_display_queue_present;
-   chain->base.present_mode = create_info->presentMode;
+   chain->base.present_mode = wsi_swapchain_get_present_mode(wsi_device, 
create_info);
 chain->base.image_count = num_images;
  
 chain->wsi = wsi;

diff --git a/src/vulkan/wsi/wsi_common_private.h 
b/src/vulkan/wsi/wsi_common_private.h
index a6f49fc3124..6d8f4b7a0e4 100644
--- a/src/vulkan/wsi/wsi_common_private.h
+++ b/src/vulkan/wsi/wsi_common_private.h
@@ -79,6 +79,10 @@ wsi_swapchain_init(const struct wsi_device *wsi,
 const VkSwapchainCreateInfoKHR 

Re: [Mesa-dev] [PATCH] wsi: allow to override the present mode with MESA_VK_WSI_PRESENT_MODE

2019-04-09 Thread Lionel Landwerlin

On 09/04/2019 13:29, Samuel Pitoiset wrote:


On 4/9/19 1:10 PM, Lionel Landwerlin wrote:

On 09/04/2019 08:08, Samuel Pitoiset wrote:

This is common to all Vulkan drivers and all WSI.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107391
Signed-off-by: Samuel Pitoiset 
---
  src/vulkan/wsi/wsi_common.c | 19 +++
  src/vulkan/wsi/wsi_common_display.c |  2 ++
  src/vulkan/wsi/wsi_common_private.h |  2 ++
  src/vulkan/wsi/wsi_common_wayland.c |  2 ++
  src/vulkan/wsi/wsi_common_x11.c |  2 ++
  5 files changed, 27 insertions(+)

diff --git a/src/vulkan/wsi/wsi_common.c b/src/vulkan/wsi/wsi_common.c
index 3cba0a4b06e..77f369f686d 100644
--- a/src/vulkan/wsi/wsi_common.c
+++ b/src/vulkan/wsi/wsi_common.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
    VkResult
  wsi_device_init(struct wsi_device *wsi,
@@ -202,6 +203,24 @@ fail:
 return result;
  }
  +void
+wsi_swapchain_override_present_mode(struct wsi_swapchain *swapchain)
+{
+   if (getenv("MESA_VK_WSI_PRESENT_MODE")) {
+  const char *mode = getenv("MESA_VK_WSI_PRESENT_MODE");
+
+  if (!strcmp(mode, "fifo")) {
+ swapchain->present_mode = VK_PRESENT_MODE_FIFO_KHR;
+  } else if (!strcmp(mode, "mailbox")) {
+ swapchain->present_mode = VK_PRESENT_MODE_MAILBOX_KHR;
+  } else if (!strcmp(mode, "immediate")) {
+ swapchain->present_mode = VK_PRESENT_MODE_IMMEDIATE_KHR;
+  } else {
+ unreachable("Invalid MESA_VK_WSI_PRESENT_MODE value");
+  }
+   }
+}
+



I would make this function a bit more robust.

Not all backends support all present modes.

Wayland only supports 2, while X11 supports 3.


There is some checking required to validate that we're not passing 
something invalid no?



I would also store the override in the wsi_device and only do the 
getenv() in wsi_device_init().


So it's only calling getenv once.


And then at swapchain creation do something like this maybe :


chain->base.present_mode =wsi_swapchain_get_present_mode(wsi_device, 
pCreateInfo->presentMode);
The whole approach looks better to me. Though, the only way to get the 
list of supported present modes is to call get_present_mode() but it 
requires a surface and we don't know the platform at device creation.



I forgot about the surface :)

Then checking should be done by 
wsi_swapchain_get_present_mode(wsi_device, pCreateInfo) at swapchain 
creation (the surface is available in pCreateInfo).


wsi_device_init() would just store the override.


-Lionel





-Lionel



  void
  wsi_swapchain_finish(struct wsi_swapchain *chain)
  {
diff --git a/src/vulkan/wsi/wsi_common_display.c 
b/src/vulkan/wsi/wsi_common_display.c

index 09c18315623..36ddb07a5bf 100644
--- a/src/vulkan/wsi/wsi_common_display.c
+++ b/src/vulkan/wsi/wsi_common_display.c
@@ -1765,6 +1765,8 @@ wsi_display_surface_create_swapchain(
   chain->surface = (VkIcdSurfaceDisplay *) icd_surface;
  + wsi_swapchain_override_present_mode(>base);
+
 for (uint32_t image = 0; image < chain->base.image_count; 
image++) {

    result = wsi_display_image_init(device, >base,
    create_info, allocator,
diff --git a/src/vulkan/wsi/wsi_common_private.h 
b/src/vulkan/wsi/wsi_common_private.h

index a6f49fc3124..5ed4029ed70 100644
--- a/src/vulkan/wsi/wsi_common_private.h
+++ b/src/vulkan/wsi/wsi_common_private.h
@@ -78,6 +78,8 @@ wsi_swapchain_init(const struct wsi_device *wsi,
 VkDevice device,
 const VkSwapchainCreateInfoKHR *pCreateInfo,
 const VkAllocationCallbacks *pAllocator);
+void
+wsi_swapchain_override_present_mode(struct wsi_swapchain *swapchain);
    void wsi_swapchain_finish(struct wsi_swapchain *chain);
  diff --git a/src/vulkan/wsi/wsi_common_wayland.c 
b/src/vulkan/wsi/wsi_common_wayland.c

index 03a47028ef2..dcb2f8733f6 100644
--- a/src/vulkan/wsi/wsi_common_wayland.c
+++ b/src/vulkan/wsi/wsi_common_wayland.c
@@ -1015,6 +1015,8 @@ 
wsi_wl_surface_create_swapchain(VkIcdSurfaceBase *icd_surface,

 chain->vk_format = pCreateInfo->imageFormat;
 chain->drm_format = 
wl_drm_format_for_vk_format(chain->vk_format, alpha);

  + wsi_swapchain_override_present_mode(>base);
+
 if (pCreateInfo->oldSwapchain) {
    /* If we have an oldSwapchain parameter, copy the display 
struct over

 * from the old one so we don't have to fully re-initialize it.
diff --git a/src/vulkan/wsi/wsi_common_x11.c 
b/src/vulkan/wsi/wsi_common_x11.c

index c87b9312636..f4dafb3b55e 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -1373,6 +1373,8 @@ x11_surface_create_swapchain(VkIcdSurfaceBase 
*icd_surface,

 chain->status = VK_SUCCESS;
 chain->has_dri3_modifiers = wsi_conn->has_dri3_modifiers;
  + wsi_swapchain_override_present_mode(>base);
+
 /* 

Re: [Mesa-dev] [PATCH] wsi: allow to override the present mode with MESA_VK_WSI_PRESENT_MODE

2019-04-09 Thread Lionel Landwerlin

On 09/04/2019 08:08, Samuel Pitoiset wrote:

This is common to all Vulkan drivers and all WSI.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107391
Signed-off-by: Samuel Pitoiset 
---
  src/vulkan/wsi/wsi_common.c | 19 +++
  src/vulkan/wsi/wsi_common_display.c |  2 ++
  src/vulkan/wsi/wsi_common_private.h |  2 ++
  src/vulkan/wsi/wsi_common_wayland.c |  2 ++
  src/vulkan/wsi/wsi_common_x11.c |  2 ++
  5 files changed, 27 insertions(+)

diff --git a/src/vulkan/wsi/wsi_common.c b/src/vulkan/wsi/wsi_common.c
index 3cba0a4b06e..77f369f686d 100644
--- a/src/vulkan/wsi/wsi_common.c
+++ b/src/vulkan/wsi/wsi_common.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  VkResult

  wsi_device_init(struct wsi_device *wsi,
@@ -202,6 +203,24 @@ fail:
 return result;
  }
  
+void

+wsi_swapchain_override_present_mode(struct wsi_swapchain *swapchain)
+{
+   if (getenv("MESA_VK_WSI_PRESENT_MODE")) {
+  const char *mode = getenv("MESA_VK_WSI_PRESENT_MODE");
+
+  if (!strcmp(mode, "fifo")) {
+ swapchain->present_mode = VK_PRESENT_MODE_FIFO_KHR;
+  } else if (!strcmp(mode, "mailbox")) {
+ swapchain->present_mode = VK_PRESENT_MODE_MAILBOX_KHR;
+  } else if (!strcmp(mode, "immediate")) {
+ swapchain->present_mode = VK_PRESENT_MODE_IMMEDIATE_KHR;
+  } else {
+ unreachable("Invalid MESA_VK_WSI_PRESENT_MODE value");
+  }
+   }
+}
+



I would make this function a bit more robust.

Not all backends support all present modes.

Wayland only supports 2, while X11 supports 3.


There is some checking required to validate that we're not passing 
something invalid no?



I would also store the override in the wsi_device and only do the 
getenv() in wsi_device_init().


So it's only calling getenv once.


And then at swapchain creation do something like this maybe :


chain->base.present_mode =wsi_swapchain_get_present_mode(wsi_device, 
pCreateInfo->presentMode);



-Lionel



  void
  wsi_swapchain_finish(struct wsi_swapchain *chain)
  {
diff --git a/src/vulkan/wsi/wsi_common_display.c 
b/src/vulkan/wsi/wsi_common_display.c
index 09c18315623..36ddb07a5bf 100644
--- a/src/vulkan/wsi/wsi_common_display.c
+++ b/src/vulkan/wsi/wsi_common_display.c
@@ -1765,6 +1765,8 @@ wsi_display_surface_create_swapchain(
  
 chain->surface = (VkIcdSurfaceDisplay *) icd_surface;
  
+   wsi_swapchain_override_present_mode(>base);

+
 for (uint32_t image = 0; image < chain->base.image_count; image++) {
result = wsi_display_image_init(device, >base,
create_info, allocator,
diff --git a/src/vulkan/wsi/wsi_common_private.h 
b/src/vulkan/wsi/wsi_common_private.h
index a6f49fc3124..5ed4029ed70 100644
--- a/src/vulkan/wsi/wsi_common_private.h
+++ b/src/vulkan/wsi/wsi_common_private.h
@@ -78,6 +78,8 @@ wsi_swapchain_init(const struct wsi_device *wsi,
 VkDevice device,
 const VkSwapchainCreateInfoKHR *pCreateInfo,
 const VkAllocationCallbacks *pAllocator);
+void
+wsi_swapchain_override_present_mode(struct wsi_swapchain *swapchain);
  
  void wsi_swapchain_finish(struct wsi_swapchain *chain);
  
diff --git a/src/vulkan/wsi/wsi_common_wayland.c b/src/vulkan/wsi/wsi_common_wayland.c

index 03a47028ef2..dcb2f8733f6 100644
--- a/src/vulkan/wsi/wsi_common_wayland.c
+++ b/src/vulkan/wsi/wsi_common_wayland.c
@@ -1015,6 +1015,8 @@ wsi_wl_surface_create_swapchain(VkIcdSurfaceBase 
*icd_surface,
 chain->vk_format = pCreateInfo->imageFormat;
 chain->drm_format = wl_drm_format_for_vk_format(chain->vk_format, alpha);
  
+   wsi_swapchain_override_present_mode(>base);

+
 if (pCreateInfo->oldSwapchain) {
/* If we have an oldSwapchain parameter, copy the display struct over
 * from the old one so we don't have to fully re-initialize it.
diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c
index c87b9312636..f4dafb3b55e 100644
--- a/src/vulkan/wsi/wsi_common_x11.c
+++ b/src/vulkan/wsi/wsi_common_x11.c
@@ -1373,6 +1373,8 @@ x11_surface_create_swapchain(VkIcdSurfaceBase 
*icd_surface,
 chain->status = VK_SUCCESS;
 chain->has_dri3_modifiers = wsi_conn->has_dri3_modifiers;
  
+   wsi_swapchain_override_present_mode(>base);

+
 /* If we are reallocating from an old swapchain, then we inherit its
  * last completion mode, to ensure we don't get into reallocation
  * cycles. If we are starting anew, we set 'COPY', as that is the only



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] drm-shim for faking GEM drivers using simulators or noop.

2019-03-22 Thread Lionel Landwerlin

On 21/03/2019 19:04, Eric Anholt wrote:

I've just put up a new repo that I'm hoping to use to enable automatic
shader-db results for various drivers on Gitlab CI merge requests (and
maybe with some more work, delete the simulator backends of vc4 and v3d).

https://gitlab.freedesktop.org/anholt/drm-shim

I've copied a bit of Mesa stuff out for convenience, but that ends up
actually being most of the code in the repo since it's not really that
much work.  Alternatively, I have a branch of most of it, inside of
Mesa, if we wanted to just add it in as one of the tools.

Does anyone have any strong feelings about where it should go?
mesa/mesa, mesa/drm-shim, or anholt/drm-shim?



I've been thinking about writing a validation layer for commands 
submitted by anv/i965/iris.


Because it would need to decode the stream and parse it, genxml is kind 
of needed, so obviously mesa/mesa would be easier.



-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: Constant values are per-column not per-component

2019-03-20 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 19/03/2019 19:15, Jason Ekstrand wrote:

---
  src/compiler/nir/nir.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 67304af1d64..e4f012809e5 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -59,6 +59,7 @@ extern "C" {
  #define NIR_FALSE 0u
  #define NIR_TRUE (~0u)
  #define NIR_MAX_VEC_COMPONENTS 4
+#define NIR_MAX_MATRIX_COLUMNS 4
  typedef uint8_t nir_component_mask_t;
  
  /** Defines a cast function

@@ -141,7 +142,7 @@ typedef struct nir_constant {
  * by the type associated with the \c nir_variable.  Constants may be
  * scalars, vectors, or matrices.
  */
-   nir_const_value values[NIR_MAX_VEC_COMPONENTS];
+   nir_const_value values[NIR_MAX_MATRIX_COLUMNS];
  
 /* we could get this from the var->type but makes clone *much* easier to

  * not have to care about the type.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Bump maxComputeWorkgroupInvocations

2019-03-19 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 19/03/2019 17:08, Jason Ekstrand wrote:

We initially set this lower because we didn't have SIMD32 support yet
but we've supported SIMD32 for quite some time now.  We should bump it
up to the real limit.
---
  src/intel/vulkan/anv_device.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a6ecc657f4b..7ba2e802c25 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1132,7 +1132,7 @@ void anv_GetPhysicalDeviceProperties(
.maxFragmentCombinedOutputResources   = 8,
.maxComputeSharedMemorySize   = 32768,
.maxComputeWorkGroupCount = { 65535, 65535, 65535 },
-  .maxComputeWorkGroupInvocations   = 16 * devinfo->max_cs_threads,
+  .maxComputeWorkGroupInvocations   = 32 * devinfo->max_cs_threads,
.maxComputeWorkGroupSize = {
   16 * devinfo->max_cs_threads,
   16 * devinfo->max_cs_threads,



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] vulkan/util: meson build - add wayland client include

2019-03-16 Thread Lionel Landwerlin
There is merge request opened about this issue : 
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/429


I think the deps need to be moved from src/vulkan/wsi/meson.build into 
src/vulkan/meson.build as they apply to the overlay, utils & wsi.


Thanks,

-Lionel

On 16/03/2019 18:56, Tobias Klausmann wrote:

Without this the build breaks with:

In file included from ../src/vulkan/util/vk_util.h:32,
  from ../src/vulkan/util/vk_util.c:28:
../include/vulkan/vulkan.h:51:10: fatal error: wayland-client.h: No such file or
directory
  #include 
   ^~
compilation terminated.

The above misses the include directory for wayland:
-I/usr/include/wayland

Signed-off-by: Tobias Klausmann 
---
  src/vulkan/util/meson.build | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/vulkan/util/meson.build b/src/vulkan/util/meson.build
index 6aba265cc81..b845f57a660 100644
--- a/src/vulkan/util/meson.build
+++ b/src/vulkan/util/meson.build
@@ -36,10 +36,17 @@ vk_enum_to_str = custom_target(
],
  )
  
+vulkan_util_deps = []

+
+if with_platform_wayland
+  vulkan_util_deps += dep_wayland_client
+endif
+
  libvulkan_util = static_library(
'vulkan_util',
[files_vulkan_util, vk_enum_to_str],
-  include_directories : inc_common,
+  include_directories : [inc_common],
+  dependencies : [vulkan_util_deps],
c_args : [c_vis_args, vulkan_wsi_args],
build_by_default : false,
  )



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Stop using VK_TRUE/FALSE

2019-03-13 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 12/03/2019 20:24, Jason Ekstrand wrote:

We've been fairly inconsistent about this so we should really choose
whether we're going to use VK_TRUE/FALSE or the C boolean values.  The
Vulkan #defines are set to 1 and 0 respectively so it's the same value
as C gives you when you cast a boolean expression to an integer.  Since
there are several places where we set a VkBool32 to a C logical
expression, let's just embrace C booleans and stop using the VK defines.
---
  src/intel/vulkan/anv_device.c | 42 +--
  1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 729cceb3e32..83fa3936c19 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -833,7 +833,7 @@ VkResult anv_EnumeratePhysicalDeviceGroups(
memset(p->physicalDevices, 0, sizeof(p->physicalDevices));
p->physicalDevices[0] =
   anv_physical_device_to_handle(>physicalDevice);
-  p->subsetAllocation = VK_FALSE;
+  p->subsetAllocation = false;
  
vk_foreach_struct(ext, p->pNext)

   anv_debug_ignored_stype(ext->sType);
@@ -967,7 +967,7 @@ void anv_GetPhysicalDeviceFeatures2(
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DEPTH_CLIP_ENABLE_FEATURES_EXT: {
   VkPhysicalDeviceDepthClipEnableFeaturesEXT *features =
  (VkPhysicalDeviceDepthClipEnableFeaturesEXT *)ext;
- features->depthClipEnable = VK_TRUE;
+ features->depthClipEnable = true;
   break;
}
  
@@ -990,7 +990,7 @@ void anv_GetPhysicalDeviceFeatures2(
  
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROTECTED_MEMORY_FEATURES: {

   VkPhysicalDeviceProtectedMemoryFeatures *features = (void *)ext;
- features->protectedMemory = VK_FALSE;
+ features->protectedMemory = false;
   break;
}
  
@@ -1024,23 +1024,23 @@ void anv_GetPhysicalDeviceFeatures2(

case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_TRANSFORM_FEEDBACK_FEATURES_EXT: 
{
   VkPhysicalDeviceTransformFeedbackFeaturesEXT *features =
  (VkPhysicalDeviceTransformFeedbackFeaturesEXT *)ext;
- features->transformFeedback = VK_TRUE;
- features->geometryStreams = VK_TRUE;
+ features->transformFeedback = true;
+ features->geometryStreams = true;
   break;
}
  
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VERTEX_ATTRIBUTE_DIVISOR_FEATURES_EXT: {

   VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT *features =
  (VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT *)ext;
- features->vertexAttributeInstanceRateDivisor = VK_TRUE;
- features->vertexAttributeInstanceRateZeroDivisor = VK_TRUE;
+ features->vertexAttributeInstanceRateDivisor = true;
+ features->vertexAttributeInstanceRateZeroDivisor = true;
   break;
}
  
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_YCBCR_IMAGE_ARRAYS_FEATURES_EXT: {

   VkPhysicalDeviceYcbcrImageArraysFeaturesEXT *features =
  (VkPhysicalDeviceYcbcrImageArraysFeaturesEXT *)ext;
- features->ycbcrImageArrays = VK_TRUE;
+ features->ycbcrImageArrays = true;
   break;
}
  
@@ -1234,8 +1234,8 @@ void anv_GetPhysicalDeviceProperties2(

 VK_RESOLVE_MODE_MAX_BIT_KHR;
   }
  
- props->independentResolveNone = VK_TRUE;

- props->independentResolve = VK_TRUE;
+ props->independentResolveNone = true;
+ props->independentResolve = true;
   break;
}
  
@@ -1372,7 +1372,7 @@ void anv_GetPhysicalDeviceProperties2(

 
VK_SUBGROUP_FEATURE_SHUFFLE_RELATIVE_BIT |
 VK_SUBGROUP_FEATURE_CLUSTERED_BIT |
 VK_SUBGROUP_FEATURE_QUAD_BIT;
- properties->quadOperationsInAllStages = VK_TRUE;
+ properties->quadOperationsInAllStages = true;
   break;
}
  
@@ -1386,10 +1386,10 @@ void anv_GetPhysicalDeviceProperties2(

   props->maxTransformFeedbackStreamDataSize = 128 * 4;
   props->maxTransformFeedbackBufferDataSize = 128 * 4;
   props->maxTransformFeedbackBufferDataStride = 2048;
- props->transformFeedbackQueries = VK_TRUE;
- props->transformFeedbackStreamsLinesTriangles = VK_FALSE;
- props->transformFeedbackRasterizationStreamSelect = VK_FALSE;
- props->transformFeedbackDraw = VK_TRUE;
+ props->transformFeedbackQueries = true;
+ props->transformFeedbackStreamsLinesTriangles = false;
+ props->transformFeedbackRasterizationStreamSelect = false;
+ props->transformFeedbackDraw = true;
   break;

Re: [Mesa-dev] [PATCH 9/9] anv: Enabled the VK_EXT_sample_locations extension

2019-03-11 Thread Lionel Landwerlin

On 11/03/2019 15:04, Eleni Maria Stea wrote:

Enabled the VK_EXT_sample_locations for Intel Gen >= 7.
---
  src/intel/vulkan/anv_extensions.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 9e4e03e46df..99007544732 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -129,7 +129,7 @@ EXTENSIONS = [
  Extension('VK_EXT_inline_uniform_block',  1, True),
  Extension('VK_EXT_pci_bus_info',  2, True),
  Extension('VK_EXT_post_depth_coverage',   1, 'device->info.gen 
>= 9'),
-Extension('VK_EXT_sample_locations',  1, False),
+Extension('VK_EXT_sample_locations',  1, 'device->info.gen 
>= 7'),
  Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen 
>= 9'),
  Extension('VK_EXT_scalar_block_layout',   1, True),
  Extension('VK_EXT_shader_stencil_export', 1, 'device->info.gen 
>= 9'),



Hi Eleni,

Anv doesn't support anything below gen7 so I could just put True there ;)

-Lionel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] intel/compiler: silence unitialized variable warning in opt_vector_float()

2019-03-08 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

Thanks!

On 08/03/2019 15:52, Brian Paul wrote:

---
  src/intel/compiler/brw_vec4.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp
index fe36851..2e9de29 100644
--- a/src/intel/compiler/brw_vec4.cpp
+++ b/src/intel/compiler/brw_vec4.cpp
@@ -414,7 +414,7 @@ vec4_visitor::opt_vector_float()
  
foreach_inst_in_block_safe(vec4_instruction, inst, block) {

   int vf = -1;
- enum brw_reg_type need_type;
+ enum brw_reg_type need_type = BRW_REGISTER_TYPE_LAST;
  
   /* Look for unconditional MOVs from an immediate with a partial

* writemask.  Skip type-conversion MOVs other than integer 0,



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] intel/decoders: silence uninitialized variable warnings in gen_print_batch()

2019-03-08 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 08/03/2019 15:52, Brian Paul wrote:

---
  src/intel/common/gen_batch_decoder.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/common/gen_batch_decoder.c 
b/src/intel/common/gen_batch_decoder.c
index ff898d8..5cac983 100644
--- a/src/intel/common/gen_batch_decoder.c
+++ b/src/intel/common/gen_batch_decoder.c
@@ -874,9 +874,9 @@ gen_print_batch(struct gen_batch_decode_ctx *ctx,
}
  
if (strcmp(inst_name, "MI_BATCH_BUFFER_START") == 0) {

- uint64_t next_batch_addr;
+ uint64_t next_batch_addr = 0;
   bool ppgtt = false;
- bool second_level;
+ bool second_level = false;
   struct gen_field_iterator iter;
   gen_field_iterator_init(, inst, p, 0, false);
   while (gen_field_iterator_next()) {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: allow direct UYVY import

2019-02-27 Thread Lionel Landwerlin

On 27/02/2019 06:55, Kenneth Graunke wrote:

On Tuesday, February 26, 2019 9:41:07 AM PST Christian Gmeiner wrote:

Push this format to the pipe driver unchanged.

Signed-off-by: Christian Gmeiner 
---
  include/GL/internal/dri_interface.h   | 1 +
  src/gallium/state_trackers/dri/dri2.c | 2 ++
  2 files changed, 3 insertions(+)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 6d134e3a40f..83aad4100a4 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1290,6 +1290,7 @@ struct __DRIdri2ExtensionRec {
  #define __DRI_IMAGE_FORMAT_XBGR2101010  0x1010
  #define __DRI_IMAGE_FORMAT_ABGR2101010  0x1011
  #define __DRI_IMAGE_FORMAT_SABGR8   0x1012
+#define __DRI_IMAGE_FORMAT_UYVY 0x1013
  
  #define __DRI_IMAGE_USE_SHARE		0x0001

  #define __DRI_IMAGE_USE_SCANOUT   0x0002
diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 668d177c371..acdf19e1f5d 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -110,6 +110,8 @@ static const struct dri2_format_mapping dri2_format_table[] 
= {
  __DRI_IMAGE_COMPONENTS_Y_UV,  PIPE_FORMAT_NV12 },
{ __DRI_IMAGE_FOURCC_YUYV,  __DRI_IMAGE_FORMAT_YUYV,
  __DRI_IMAGE_COMPONENTS_Y_XUXV,PIPE_FORMAT_YUYV },
+  { __DRI_IMAGE_FOURCC_UYVY,  __DRI_IMAGE_FORMAT_UYVY,
+__DRI_IMAGE_COMPONENTS_Y_UXVX,PIPE_FORMAT_UYVY },
  };
  
  static const struct dri2_format_mapping *



Looks good to me, though I'm no expert on planar formats.

Reviewed-by: Kenneth Graunke 



I'm pretty sure our HW can't do that until ICL.

2 different views on the same buffer will have the sampler pollute its 
cache with invalid data from the other view.


The fact that we support this in i965 and that is passes the tests might 
be pure luck (2x2 images might not be enough for the issue to manifest 
itself).



I should probably try to update the tests to generate bigger images.


-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect

2019-02-22 Thread Lionel Landwerlin

On 22/02/2019 16:02, Jason Ekstrand wrote:

This field was added on gen8 even though there's an identically defined
one in 3DSTATE_SF.
---
  src/intel/vulkan/genX_pipeline.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 6255e5d83c5..192a121324d 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1077,6 +1077,10 @@ emit_3dstate_clip(struct anv_pipeline *pipeline,
clip.APIMode  = APIMODE_D3D,
clip.ViewportXYClipTestEnable = true;
  
+#if GEN_GEN >= 8

+  clip.VertexSubPixelPrecisionSelect = _8Bit;
+#endif
+
clip.ClipMode = CLIPMODE_NORMAL;
  
clip.TriangleStripListProvokingVertexSelect = 0;



Well spotted.


Reviewed-by: Lionel Landwerlin 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 2/2] anv: advertise 8 subpixel precision bits

2019-02-22 Thread Lionel Landwerlin

On 22/02/2019 15:51, Juan A. Suarez Romero wrote:

On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is
used to select between 8 bit subpixel precision (value 0) or 4 bit
subpixel precision (value 1). As this value is not set, means it is
taking the value 0, so 8 bit are used.

On the other side, in the Vulkan CTS tests, if the reference rasterizer,
which uses 8 bit precision, as it is used to check what should be the
expected value for the tests, is changed to use 4 bit as ANV was
advertising so far, some of the tests will fail.

So it seems ANV is actually using 8 bits.

v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason)
v3: use _8Bit definition as value (Jason)

CC: Jason Ekstrand 
CC: Kenneth Graunke 
Signed-off-by: Juan A. Suarez Romero 



Reviewed-by: Lionel Landwerlin 


Cc: stable?



---
  src/intel/vulkan/anv_device.c| 2 +-
  src/intel/vulkan/genX_pipeline.c | 1 +
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 3120865466a..95224407318 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1095,7 +1095,7 @@ void anv_GetPhysicalDeviceProperties(
   16 * devinfo->max_cs_threads,
   16 * devinfo->max_cs_threads,
},
-  .subPixelPrecisionBits= 4 /* FIXME */,
+  .subPixelPrecisionBits= 8,
.subTexelPrecisionBits= 4 /* FIXME */,
.mipmapPrecisionBits  = 4 /* FIXME */,
.maxDrawIndexedIndexValue = UINT32_MAX,
diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 6255e5d83c5..3d36bb773e1 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -464,6 +464,7 @@ emit_rs_state(struct anv_pipeline *pipeline,
 sf.TriangleStripListProvokingVertexSelect = 0;
 sf.LineStripListProvokingVertexSelect = 0;
 sf.TriangleFanProvokingVertexSelect = 1;
+   sf.VertexSubPixelPrecisionSelect = _8Bit;
  
 const struct brw_vue_prog_data *last_vue_prog_data =

anv_pipeline_get_last_vue_prog_data(pipeline);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 1/2] genxml: add missing field values for 3DSTATE_SF

2019-02-22 Thread Lionel Landwerlin

On 22/02/2019 15:51, Juan A. Suarez Romero wrote:

Fill out "Vertex Sub Pixel Precision Select" possible values.

Signed-off-by: Juan A. Suarez Romero 



Reviewed-by: Lionel Landwerlin 



---
  src/intel/genxml/gen10.xml | 5 -
  src/intel/genxml/gen11.xml | 5 -
  src/intel/genxml/gen7.xml  | 5 -
  src/intel/genxml/gen75.xml | 5 -
  src/intel/genxml/gen8.xml  | 5 -
  src/intel/genxml/gen9.xml  | 5 -
  6 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
index 284633aedd4..4cb1f05ae25 100644
--- a/src/intel/genxml/gen10.xml
+++ b/src/intel/genxml/gen10.xml
@@ -2043,7 +2043,10 @@

  
  
-
+
+  
+  
+
  


diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
index 95a84a2f597..a7c06c5ab60 100644
--- a/src/intel/genxml/gen11.xml
+++ b/src/intel/genxml/gen11.xml
@@ -2063,7 +2063,10 @@

  
  
-
+
+  
+  
+
  


diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
index 363fd8664bf..1b2c7d996f9 100644
--- a/src/intel/genxml/gen7.xml
+++ b/src/intel/genxml/gen7.xml
@@ -1399,7 +1399,10 @@
  

  
-
+
+  
+  
+
  


diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index a1da9cae041..95b306139eb 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -1713,7 +1713,10 @@
  

  
-
+
+  
+  
+
  


diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
index 4676d9bca9c..0226d7c0c66 100644
--- a/src/intel/genxml/gen8.xml
+++ b/src/intel/genxml/gen8.xml
@@ -1816,7 +1816,10 @@

  
  
-
+
+  
+  
+
  


diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 8afa986df55..88fc2da7885 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -1995,7 +1995,10 @@

  
  
-
+
+  
+  
+
  





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/2] isl: the display engine requires 64B alignment for linear surfaces

2019-02-22 Thread Lionel Landwerlin

Pushed with the PRM quote, thanks!

On 22/02/2019 06:16, Samuel Iglesias Gonsálvez wrote:

Lionel, are you going to push it with this quote? I can add it
otherwise.

Sam

On Thu, 2019-02-21 at 13:41 +, Lionel Landwerlin wrote:

On 21/02/2019 13:30, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-02-21 12:57:09)

I did not find the PRM bit that says it must be 64b aligned, but
I can
see that's what i915 checks.

Chris: If you have a pointer to it, I could add the quote.

In amongst the register specs,
PLANE_STRIDE:
For Linear memory, this field specifies the stride in chunks of 64
bytes (1 cache line).
-Chris


Thanks a lot!




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir/lower_clip_cull: Fix an incorrect assert

2019-02-21 Thread Lionel Landwerlin

On 21/02/2019 16:55, Jason Ekstrand wrote:

Copy+paste error.  It was supposed to test cull and not clip.

Fixes: 4e69fba534e "nir: Rewrite lower_clip_cull_distance_arrays..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717
Cc: Kenneth Graunke 



Trivial, I really thought it was harder than that :)

Thanks!


Reviewed-by: Lionel Landwerlin 



---
  src/compiler/nir/nir_lower_clip_cull_distance_arrays.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_lower_clip_cull_distance_arrays.c 
b/src/compiler/nir/nir_lower_clip_cull_distance_arrays.c
index d98ffa69596..70578d6f3fd 100644
--- a/src/compiler/nir/nir_lower_clip_cull_distance_arrays.c
+++ b/src/compiler/nir/nir_lower_clip_cull_distance_arrays.c
@@ -101,7 +101,7 @@ combine_clip_cull(nir_shader *nir,
 }
  
 if (cull) {

-  assert(clip->data.compact);
+  assert(cull->data.compact);
cull->data.how_declared = nir_var_hidden;
cull->data.location = VARYING_SLOT_CLIP_DIST0 + clip_array_size / 4;
cull->data.location_frac = clip_array_size % 4;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/2] isl: the display engine requires 64B alignment for linear surfaces

2019-02-21 Thread Lionel Landwerlin

On 21/02/2019 13:30, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-02-21 12:57:09)

I did not find the PRM bit that says it must be 64b aligned, but I can
see that's what i915 checks.

Chris: If you have a pointer to it, I could add the quote.

In amongst the register specs,
PLANE_STRIDE:
For Linear memory, this field specifies the stride in chunks of 64 bytes (1 
cache line).
-Chris


Thanks a lot!

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/2] isl: the display engine requires 64B alignment for linear surfaces

2019-02-21 Thread Lionel Landwerlin
I did not find the PRM bit that says it must be 64b aligned, but I can 
see that's what i915 checks.


Chris: If you have a pointer to it, I could add the quote.

Thanks!

Reviewed-by: Lionel Landwerlin 

On 19/02/2019 12:06, Samuel Iglesias Gonsálvez wrote:

Signed-off-by: Samuel Iglesias Gonsálvez 
---
  src/intel/isl/isl.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index 5c34efb9a13..7fb469687fa 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -1519,6 +1519,9 @@ isl_surf_init_s(const struct isl_device *dev,
   }
}
base_alignment_B = isl_round_up_to_power_of_two(base_alignment_B);
+  /* The display engine requires 64B alignment for linear surfaces.  */
+  if (isl_surf_usage_is_display(info->usage))
+ base_alignment_B = MAX(base_alignment_B, 64);
 } else {
const uint32_t total_h_tl =
   isl_align_div(phys_total_el.h, tile_info.logical_extent_el.height);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] isl: remove the cache line size alignment requirement

2019-02-18 Thread Lionel Landwerlin

On 18/02/2019 15:08, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-02-18 15:06:15)

On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:

There are formats which bpp are not aligned to a power-of-two and
that can cause problems in the checks we do.

The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.

Fixes CTS's CL#3500 test:

dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm

Signed-off-by: Samuel Iglesias Gonsálvez 


That looks good to me :
Reviewed-by: Lionel Landwerlin 

I'm doing a CI run just to convince myself, so if you can wait for that.

Is scanout a concern? The display engine also requires 64B alignment for
linear.
-Chris


Thanks for reminding us :)


I guess we need an additional check with if 
(isl_surf_usage_is_display(info->usage)) base_alignment_B = 
MAX(base_alignment_B, 64);


-Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] isl: remove the cache line size alignment requirement

2019-02-18 Thread Lionel Landwerlin

On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:

There are formats which bpp are not aligned to a power-of-two and
that can cause problems in the checks we do.

The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.

Fixes CTS's CL#3500 test:

dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm

Signed-off-by: Samuel Iglesias Gonsálvez 



That looks good to me :
Reviewed-by: Lionel Landwerlin 

I'm doing a CI run just to convince myself, so if you can wait for that.

Thanks,

-Lionel



---
  src/intel/isl/isl.c | 21 -
  1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index eaaa28014a3..7f1f2339931 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -1381,20 +1381,6 @@ isl_calc_row_pitch(const struct isl_device *dev,
 uint32_t alignment_B =
isl_calc_row_pitch_alignment(surf_info, tile_info);
  
-   /* If pitch isn't given and it can be chosen freely, align it by cache line

-* allowing one to use blit engine on the surface.
-*/
-   if (surf_info->row_pitch_B == 0 && tile_info->tiling == ISL_TILING_LINEAR) {
-  /* From the Broadwell PRM docs for XY_SRC_COPY_BLT::SourceBaseAddress:
-   *
-   *"Base address of the destination surface: X=0, Y=0. Lower 32bits
-   *of the 48bit addressing. When Src Tiling is enabled (Bit_15
-   *enabled), this address must be 4KB-aligned. When Tiling is not
-   *enabled, this address should be CL (64byte) aligned."
-   */
-  alignment_B = MAX2(alignment_B, 64);
-   }
-
 const uint32_t min_row_pitch_B =
isl_calc_min_row_pitch(dev, surf_info, tile_info, phys_total_el,
   alignment_B);
@@ -1527,12 +1513,13 @@ isl_surf_init_s(const struct isl_device *dev,
base_alignment_B = MAX(1, info->min_alignment_B);
if (info->usage & ISL_SURF_USAGE_RENDER_TARGET_BIT) {
   if (isl_format_is_yuv(info->format)) {
-base_alignment_B = MAX(base_alignment_B, fmtl->bpb / 4);
+base_alignment_B = isl_align_npot(base_alignment_B, fmtl->bpb / 4);
   } else {
-base_alignment_B = MAX(base_alignment_B, fmtl->bpb / 8);
+base_alignment_B = isl_align_npot(base_alignment_B, fmtl->bpb / 8);
   }
+  } else {
+ base_alignment_B = isl_round_up_to_power_of_two(base_alignment_B);
}
-  base_alignment_B = isl_round_up_to_power_of_two(base_alignment_B);
 } else {
const uint32_t total_h_tl =
   isl_align_div(phys_total_el.h, tile_info.logical_extent_el.height);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] anv/image: fix offset's alignment to the surface alignment

2019-02-18 Thread Lionel Landwerlin

On 15/02/2019 14:43, Samuel Iglesias Gonsálvez wrote:

Signed-off-by: Samuel Iglesias Gonsálvez 



Hey Samuel,


Thanks for this change. Would you mind changing the align_u32 in the 
if() branch too?


It won't fix anything but that's just to be consistent.


With that :

Reviewed-by: Lionel Landwerlin 



---
  src/intel/vulkan/anv_image.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index 3999c7399d0..f4a65044a3b 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -142,7 +142,7 @@ add_surface(struct anv_image *image, struct anv_surface 
*surf, uint32_t plane)
 surf->isl.alignment_B);
/* Plane offset is always 0 when it's disjoint. */
 } else {
-  surf->offset = align_u32(image->size, surf->isl.alignment_B);
+  surf->offset = util_align_npot(image->size, surf->isl.alignment_B);
/* Determine plane's offset only once when the first surface is added. 
*/
if (image->planes[plane].size == 0)
   image->planes[plane].offset = image->size;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/fs: Bail in optimize_extract_to_float if we have modifiers

2019-02-14 Thread Lionel Landwerlin

On 12/02/2019 04:44, Jason Ekstrand wrote:

This fixes a bug in runscape where we were optimizing x >> 16 to an
extract and then negating and converting to float.  The NIR to fs pass
was dropping the negate on the floor breaking a geometry shader and
causing it to render nothing.

Fixes: 1f862e923cb "i965/fs: Optimize float conversions of byte/word..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601
Cc: Matt Turner 



Tested-by: Lionel Landwerlin 


With a new piglit test : https://patchwork.freedesktop.org/patch/286177/



---
  src/intel/compiler/brw_fs_nir.cpp | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index b80f4351b49..204640ac726 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -510,6 +510,11 @@ fs_visitor::optimize_extract_to_float(nir_alu_instr *instr,
 src0->op != nir_op_extract_i8 && src0->op != nir_op_extract_i16)
return false;
  
+   /* If either opcode has source modifiers, bail. */

+   if (instr->src[0].abs || instr->src[0].negate ||
+src0->src[0].abs || src0->src[0].negate)
+  return false;
+
 unsigned element = nir_src_as_uint(src0->src[1].src);
  
 /* Element type to extract.*/



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/dump_gpu: Disambiguate between BOs from different GEM handle spaces.

2019-02-11 Thread Lionel Landwerlin

On 08/02/2019 04:18, Francisco Jerez wrote:

This fixes a rather astonishing problem that came up while debugging
an issue in the Vulkan CTS.  Apparently the Vulkan CTS framework has
the tendency to create multiple VkDevices, each one with a separate
DRM device FD and therefore a disjoint GEM buffer object handle space.
Because the intel_dump_gpu tool wasn't making any distinction between
buffers from the different handle spaces, it was confusing the
instruction state pools from both devices, which happened to have the
exact same GEM handle and PPGTT virtual address, but completely
different shader contents.  This was causing the simulator to believe
that the ver



Thanks for finding this.


Reviewed-by: Lionel Landwerlin 



tex pipeline was executing a fragment shader, which didn't
end up well.
---
  src/intel/tools/intel_dump_gpu.c | 41 ++--
  1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/src/intel/tools/intel_dump_gpu.c b/src/intel/tools/intel_dump_gpu.c
index ffe49b10108..19e054c894c 100644
--- a/src/intel/tools/intel_dump_gpu.c
+++ b/src/intel/tools/intel_dump_gpu.c
@@ -58,6 +58,7 @@ static FILE *output_file = NULL;
  static int verbose = 0;
  static bool device_override;
  
+#define MAX_FD_COUNT 64

  #define MAX_BO_COUNT 64 * 1024
  
  struct bo {

@@ -94,12 +95,13 @@ fail_if(int cond, const char *format, ...)
  }
  
  static struct bo *

-get_bo(uint32_t handle)
+get_bo(unsigned fd, uint32_t handle)
  {
 struct bo *bo;
  
 fail_if(handle >= MAX_BO_COUNT, "bo handle too large\n");

-   bo = [handle];
+   fail_if(fd >= MAX_FD_COUNT, "bo fd too large\n");
+   bo = [handle + fd * MAX_BO_COUNT];
  
 return bo;

  }
@@ -115,7 +117,7 @@ static uint32_t device = 0;
  static struct aub_file aub_file;
  
  static void *

-relocate_bo(struct bo *bo, const struct drm_i915_gem_execbuffer2 *execbuffer2,
+relocate_bo(int fd, struct bo *bo, const struct drm_i915_gem_execbuffer2 
*execbuffer2,
  const struct drm_i915_gem_exec_object2 *obj)
  {
 const struct drm_i915_gem_exec_object2 *exec_objects =
@@ -137,7 +139,7 @@ relocate_bo(struct bo *bo, const struct 
drm_i915_gem_execbuffer2 *execbuffer2,
   handle = relocs[i].target_handle;
  
aub_write_reloc(, ((char *)relocated) + relocs[i].offset,

-  get_bo(handle)->offset + relocs[i].delta);
+  get_bo(fd, handle)->offset + relocs[i].delta);
 }
  
 return relocated;

@@ -226,7 +228,7 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 
*execbuffer2)
  
 for (uint32_t i = 0; i < execbuffer2->buffer_count; i++) {

obj = _objects[i];
-  bo = get_bo(obj->handle);
+  bo = get_bo(fd, obj->handle);
  
/* If bo->size == 0, this means they passed us an invalid

 * buffer.  The kernel will reject it and so should we.
@@ -262,13 +264,13 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 
*execbuffer2)
  
 batch_index = (execbuffer2->flags & I915_EXEC_BATCH_FIRST) ? 0 :

execbuffer2->buffer_count - 1;
-   batch_bo = get_bo(exec_objects[batch_index].handle);
+   batch_bo = get_bo(fd, exec_objects[batch_index].handle);
 for (uint32_t i = 0; i < execbuffer2->buffer_count; i++) {
obj = _objects[i];
-  bo = get_bo(obj->handle);
+  bo = get_bo(fd, obj->handle);
  
if (obj->relocation_count > 0)

- data = relocate_bo(bo, execbuffer2, obj);
+ data = relocate_bo(fd, bo, execbuffer2, obj);
else
   data = bo->map;
  
@@ -306,11 +308,12 @@ dump_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 *execbuffer2)

  }
  
  static void

-add_new_bo(int handle, uint64_t size, void *map)
+add_new_bo(unsigned fd, int handle, uint64_t size, void *map)
  {
-   struct bo *bo = [handle];
+   struct bo *bo = [handle + fd * MAX_BO_COUNT];
  
 fail_if(handle >= MAX_BO_COUNT, "bo handle out of range\n");

+   fail_if(fd >= MAX_FD_COUNT, "bo fd out of range\n");
 fail_if(size == 0, "bo size is invalid\n");
  
 bo->size = size;

@@ -318,9 +321,9 @@ add_new_bo(int handle, uint64_t size, void *map)
  }
  
  static void

-remove_bo(int handle)
+remove_bo(int fd, int handle)
  {
-   struct bo *bo = get_bo(handle);
+   struct bo *bo = get_bo(fd, handle);
  
 if (bo->map && !IS_USERPTR(bo->map))

munmap(bo->map, bo->size);
@@ -383,7 +386,7 @@ maybe_init(void)
 }
 fclose(config);
  
-   bos = calloc(MAX_BO_COUNT, sizeof(bos[0]));

+   bos = calloc(MAX_FD_COUNT * MAX_BO_COUNT, sizeof(bos[0]));
 fail_if(bos == NULL, "out of memory\n");
  }
  
@@ -455,7 +458,7 @@ ioctl(int fd, unsigned long request, ...)
  
   ret = libc_ioctl(fd, request, argp);

   if (ret == 0)
-add_new_bo(create->handle, create->size, NULL);
+add_new_bo(fd, cr

Re: [Mesa-dev] [PATCH libdrm] intel: sync i915_pciids.h with kernel

2019-02-04 Thread Lionel Landwerlin

On 02/02/2019 08:07, Rodrigo Vivi wrote:

Straight copy from the kernel file.

Add more PCI Device IDs for Coffee Lake, Ice Lake,
and Amber Lake. It also include a reorg on Whiskey Lake IDs.

Align with kernel commits:

5e0f5a58b167 ("drm/i915/cfl: Adding another PCI Device ID.")
03ca3cf8e9aa ("drm/i915/icl: Adding few more device IDs for Ice Lake")
c0c46ca461f1 ("drm/i915/aml: Add new Amber Lake PCI ID")
c1c8f6fa731b ("drm/i915: Redefine some Whiskey Lake SKUs")

Cc: José Roberto de Souza 
Signed-off-by: Rodrigo Vivi 



Acked-by: Lionel Landwerlin 




---
  intel/i915_pciids.h | 29 +
  1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/intel/i915_pciids.h b/intel/i915_pciids.h
index fd965ffb..d2fad7b0 100644
--- a/intel/i915_pciids.h
+++ b/intel/i915_pciids.h
@@ -365,16 +365,20 @@
INTEL_VGA_DEVICE(0x593B, info) /* Halo GT4 */
  
  /* AML/KBL Y GT2 */

-#define INTEL_AML_GT2_IDS(info) \
+#define INTEL_AML_KBL_GT2_IDS(info) \
INTEL_VGA_DEVICE(0x591C, info),  /* ULX GT2 */ \
INTEL_VGA_DEVICE(0x87C0, info) /* ULX GT2 */
  
+/* AML/CFL Y GT2 */

+#define INTEL_AML_CFL_GT2_IDS(info) \
+   INTEL_VGA_DEVICE(0x87CA, info)
+
  #define INTEL_KBL_IDS(info) \
INTEL_KBL_GT1_IDS(info), \
INTEL_KBL_GT2_IDS(info), \
INTEL_KBL_GT3_IDS(info), \
INTEL_KBL_GT4_IDS(info), \
-   INTEL_AML_GT2_IDS(info)
+   INTEL_AML_KBL_GT2_IDS(info)
  
  /* CFL S */

  #define INTEL_CFL_S_GT1_IDS(info) \
@@ -390,6 +394,9 @@
INTEL_VGA_DEVICE(0x3E9A, info)  /* SRV GT2 */
  
  /* CFL H */

+#define INTEL_CFL_H_GT1_IDS(info) \
+   INTEL_VGA_DEVICE(0x3E9C, info)
+
  #define INTEL_CFL_H_GT2_IDS(info) \
INTEL_VGA_DEVICE(0x3E9B, info), /* Halo GT2 */ \
INTEL_VGA_DEVICE(0x3E94, info)  /* Halo GT2 */
@@ -407,27 +414,29 @@
  
  /* WHL/CFL U GT1 */

  #define INTEL_WHL_U_GT1_IDS(info) \
-   INTEL_VGA_DEVICE(0x3EA1, info)
+   INTEL_VGA_DEVICE(0x3EA1, info), \
+   INTEL_VGA_DEVICE(0x3EA4, info)
  
  /* WHL/CFL U GT2 */

  #define INTEL_WHL_U_GT2_IDS(info) \
-   INTEL_VGA_DEVICE(0x3EA0, info)
+   INTEL_VGA_DEVICE(0x3EA0, info), \
+   INTEL_VGA_DEVICE(0x3EA3, info)
  
  /* WHL/CFL U GT3 */

  #define INTEL_WHL_U_GT3_IDS(info) \
-   INTEL_VGA_DEVICE(0x3EA2, info), \
-   INTEL_VGA_DEVICE(0x3EA3, info), \
-   INTEL_VGA_DEVICE(0x3EA4, info)
+   INTEL_VGA_DEVICE(0x3EA2, info)
  
  #define INTEL_CFL_IDS(info)	   \

INTEL_CFL_S_GT1_IDS(info), \
INTEL_CFL_S_GT2_IDS(info), \
+   INTEL_CFL_H_GT1_IDS(info), \
INTEL_CFL_H_GT2_IDS(info), \
INTEL_CFL_U_GT2_IDS(info), \
INTEL_CFL_U_GT3_IDS(info), \
INTEL_WHL_U_GT1_IDS(info), \
INTEL_WHL_U_GT2_IDS(info), \
-   INTEL_WHL_U_GT3_IDS(info)
+   INTEL_WHL_U_GT3_IDS(info), \
+   INTEL_AML_CFL_GT2_IDS(info)
  
  /* CNL */

  #define INTEL_CNL_IDS(info) \
@@ -452,9 +461,13 @@
INTEL_VGA_DEVICE(0x8A51, info), \
INTEL_VGA_DEVICE(0x8A5C, info), \
INTEL_VGA_DEVICE(0x8A5D, info), \
+   INTEL_VGA_DEVICE(0x8A59, info), \
+   INTEL_VGA_DEVICE(0x8A58, info), \
INTEL_VGA_DEVICE(0x8A52, info), \
INTEL_VGA_DEVICE(0x8A5A, info), \
INTEL_VGA_DEVICE(0x8A5B, info), \
+   INTEL_VGA_DEVICE(0x8A57, info), \
+   INTEL_VGA_DEVICE(0x8A56, info), \
INTEL_VGA_DEVICE(0x8A71, info), \
INTEL_VGA_DEVICE(0x8A70, info)
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel: Add more PCI Device IDs for Coffee Lake and Ice Lake.

2019-02-04 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 02/02/2019 08:07, Rodrigo Vivi wrote:

Align with kernel commits:

5e0f5a58b167 ("drm/i915/cfl: Adding another PCI Device ID.")
03ca3cf8e9aa ("drm/i915/icl: Adding few more device IDs for Ice Lake")

Cc: José Roberto de Souza 
Cc: Kenneth Graunke 
Cc: Anuj Phogat 
Signed-off-by: Rodrigo Vivi 
---
  include/pci_ids/i965_pci_ids.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 7201562d82..b91abd7a3f 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -171,6 +171,7 @@ CHIPSET(0x3185, glk_2x6, "Intel(R) UHD Graphics 600 (Geminilake 
2x6)")
  CHIPSET(0x3E90, cfl_gt1, "Intel(R) UHD Graphics 610 (Coffeelake 2x6 GT1)")
  CHIPSET(0x3E93, cfl_gt1, "Intel(R) UHD Graphics 610 (Coffeelake 2x6 GT1)")
  CHIPSET(0x3E99, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
+CHIPSET(0x3E9C, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
  CHIPSET(0x3E91, cfl_gt2, "Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2)")
  CHIPSET(0x3E92, cfl_gt2, "Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2)")
  CHIPSET(0x3E96, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
@@ -203,6 +204,10 @@ CHIPSET(0x5A54, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 
GT2)")
  CHIPSET(0x8A50, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
  CHIPSET(0x8A51, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
  CHIPSET(0x8A52, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
+CHIPSET(0x8A56, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
+CHIPSET(0x8A57, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
+CHIPSET(0x8A58, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
+CHIPSET(0x8A59, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
  CHIPSET(0x8A5A, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
  CHIPSET(0x8A5B, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
  CHIPSET(0x8A5C, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/deref: Drop zero ptr_as_array derefs

2019-02-03 Thread Lionel Landwerlin

On 03/02/2019 16:04, Jason Ekstrand wrote:

They are effectively ()[0] or * which does nothing.



Reviewed-by: Lionel Landwerlin 



---
  src/compiler/nir/nir_deref.c | 21 +
  1 file changed, 21 insertions(+)

diff --git a/src/compiler/nir/nir_deref.c b/src/compiler/nir/nir_deref.c
index 2f5fda643ca..13aa10c7532 100644
--- a/src/compiler/nir/nir_deref.c
+++ b/src/compiler/nir/nir_deref.c
@@ -670,6 +670,27 @@ opt_deref_ptr_as_array(nir_builder *b, nir_deref_instr 
*deref)
 assert(deref->deref_type == nir_deref_type_ptr_as_array);
  
 nir_deref_instr *parent = nir_deref_instr_parent(deref);

+
+   if (nir_src_is_const(deref->arr.index) &&
+   nir_src_as_int(deref->arr.index) == 0) {
+  /* If it's a ptr_as_array deref with an index of 0, it does nothing
+   * and we can just replace its uses with its parent.
+   *
+   * The source of a ptr_as_array deref always has a deref_type of
+   * nir_deref_type_array or nir_deref_type_cast.  If it's a cast, it
+   * may be trivial and we may be able to get rid of that too.  Any
+   * trivial cast of trivial cast cases should be handled already by
+   * opt_deref_cast() above.
+   */
+  if (parent->deref_type == nir_deref_type_cast &&
+  is_trivial_deref_cast(parent))
+ parent = nir_deref_instr_parent(parent);
+  nir_ssa_def_rewrite_uses(>dest.ssa,
+   nir_src_for_ssa(>dest.ssa));
+  nir_instr_remove(>instr);
+  return true;
+   }
+
 if (parent->deref_type != nir_deref_type_array &&
 parent->deref_type != nir_deref_type_ptr_as_array)
return false;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: update TODO

2019-01-31 Thread Lionel Landwerlin
Signed-off-by: Lionel Landwerlin 
---
 src/intel/vulkan/TODO | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/intel/vulkan/TODO b/src/intel/vulkan/TODO
index b4da05de2b1..5b27cb2bd63 100644
--- a/src/intel/vulkan/TODO
+++ b/src/intel/vulkan/TODO
@@ -7,7 +7,4 @@ Missing Features:
 
 Performance:
  - Multi-{sampled/gen8,LOD} HiZ
- - MSAA fast clears
- - Pushing pieces of UBOs?
  - Enable guardband clipping
- - Use soft-pin to avoid relocations
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vulkan: make generated enum to strings helpers available from c++

2019-01-22 Thread Lionel Landwerlin

Thanks Caio, pushed to master.

On 22/01/2019 18:13, Caio Marcelo de Oliveira Filho wrote:

Reviewed-by: Caio Marcelo de Oliveira Filho



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: WIP: vulkan overlay layer

2019-01-22 Thread Lionel Landwerlin

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/139

The start of a Vulkan layer to display some basic swapchain/draws/submit 
information.

Looks like this : https://i.imgur.com/4zyIiVb.png

There is probably plenty of improvements that can be made to get closer 
to the gallium HUD.


-Lionel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] vulkan: make generated enum to strings helpers available from c++

2019-01-22 Thread Lionel Landwerlin
Signed-off-by: Lionel Landwerlin 
---
 src/vulkan/util/gen_enum_to_str.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/vulkan/util/gen_enum_to_str.py 
b/src/vulkan/util/gen_enum_to_str.py
index fb9ecd65c6d..06f74eb487c 100644
--- a/src/vulkan/util/gen_enum_to_str.py
+++ b/src/vulkan/util/gen_enum_to_str.py
@@ -101,6 +101,10 @@ H_TEMPLATE = Template(textwrap.dedent(u"""\
 #include 
 #include 
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 % for ext in extensions:
 #define _${ext.name}_number (${ext.number})
 % endfor
@@ -109,6 +113,10 @@ H_TEMPLATE = Template(textwrap.dedent(u"""\
 const char * vk_${enum.name[2:]}_to_str(${enum.name} input);
 % endfor
 
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
 #endif"""),
 output_encoding='utf-8')
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Only parse pImmutableSamplers if the descriptor has samplers

2019-01-21 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 19/01/2019 19:04, Jason Ekstrand wrote:

---
  src/intel/vulkan/anv_descriptor_set.c | 43 +++
  1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/src/intel/vulkan/anv_descriptor_set.c 
b/src/intel/vulkan/anv_descriptor_set.c
index a308fbf8540..a4e466cf3dd 100644
--- a/src/intel/vulkan/anv_descriptor_set.c
+++ b/src/intel/vulkan/anv_descriptor_set.c
@@ -94,7 +94,22 @@ VkResult anv_CreateDescriptorSetLayout(
 uint32_t immutable_sampler_count = 0;
 for (uint32_t j = 0; j < pCreateInfo->bindingCount; j++) {
max_binding = MAX2(max_binding, pCreateInfo->pBindings[j].binding);
-  if (pCreateInfo->pBindings[j].pImmutableSamplers)
+
+  /* From the Vulkan 1.1.97 spec for VkDescriptorSetLayoutBinding:
+   *
+   *"If descriptorType specifies a VK_DESCRIPTOR_TYPE_SAMPLER or
+   *VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER type descriptor, then
+   *pImmutableSamplers can be used to initialize a set of immutable
+   *samplers. [...]  If descriptorType is not one of these descriptor
+   *types, then pImmutableSamplers is ignored.
+   *
+   * We need to be careful here and only parse pImmutableSamplers if we
+   * have one of the right descriptor types.
+   */
+  VkDescriptorType desc_type = pCreateInfo->pBindings[j].descriptorType;
+  if ((desc_type == VK_DESCRIPTOR_TYPE_SAMPLER ||
+   desc_type == VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER) &&
+  pCreateInfo->pBindings[j].pImmutableSamplers)
   immutable_sampler_count += pCreateInfo->pBindings[j].descriptorCount;
 }
  
@@ -153,6 +168,12 @@ VkResult anv_CreateDescriptorSetLayout(

if (binding == NULL)
   continue;
  
+  /* We temporarily stashed the pointer to the binding in the

+   * immutable_samplers pointer.  Now that we've pulled it back out
+   * again, we reset immutable_samplers to NULL.
+   */
+  set_layout->binding[b].immutable_samplers = NULL;
+
if (binding->descriptorCount == 0)
   continue;
  
@@ -170,6 +191,15 @@ VkResult anv_CreateDescriptorSetLayout(

  set_layout->binding[b].stage[s].sampler_index = sampler_count[s];
  sampler_count[s] += binding->descriptorCount;
   }
+
+ if (binding->pImmutableSamplers) {
+set_layout->binding[b].immutable_samplers = samplers;
+samplers += binding->descriptorCount;
+
+for (uint32_t i = 0; i < binding->descriptorCount; i++)
+   set_layout->binding[b].immutable_samplers[i] =
+  anv_sampler_from_handle(binding->pImmutableSamplers[i]);
+ }
   break;
default:
   break;
@@ -221,17 +251,6 @@ VkResult anv_CreateDescriptorSetLayout(
   break;
}
  
-  if (binding->pImmutableSamplers) {

- set_layout->binding[b].immutable_samplers = samplers;
- samplers += binding->descriptorCount;
-
- for (uint32_t i = 0; i < binding->descriptorCount; i++)
-set_layout->binding[b].immutable_samplers[i] =
-   anv_sampler_from_handle(binding->pImmutableSamplers[i]);
-  } else {
- set_layout->binding[b].immutable_samplers = NULL;
-  }
-
set_layout->shader_stages |= binding->stageFlags;
 }
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/genxml: add missing MI_PREDICATE compare operations

2019-01-19 Thread Lionel Landwerlin

Thanks, pushed to master.

On 18/01/2019 18:09, Rafael Antognolli wrote:

Reviewed-by: Rafael Antognolli

On Fri, Jan 18, 2019 at 05:01:58PM +, Lionel Landwerlin wrote:

Doesn't save us a great deal of lines but at least they get decoded in
aubinators.

Signed-off-by: Lionel Landwerlin
---
  src/intel/genxml/gen10.xml | 2 ++
  src/intel/genxml/gen11.xml | 2 ++
  src/intel/genxml/gen7.xml  | 2 ++
  src/intel/genxml/gen75.xml | 2 ++
  src/intel/genxml/gen8.xml  | 2 ++
  src/intel/genxml/gen9.xml  | 2 ++
  src/intel/vulkan/genX_cmd_buffer.c | 1 -
  7 files changed, 12 insertions(+), 1 deletion(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/genxml: add missing MI_PREDICATE compare operations

2019-01-18 Thread Lionel Landwerlin
Doesn't save us a great deal of lines but at least they get decoded in
aubinators.

Signed-off-by: Lionel Landwerlin 
---
 src/intel/genxml/gen10.xml | 2 ++
 src/intel/genxml/gen11.xml | 2 ++
 src/intel/genxml/gen7.xml  | 2 ++
 src/intel/genxml/gen75.xml | 2 ++
 src/intel/genxml/gen8.xml  | 2 ++
 src/intel/genxml/gen9.xml  | 2 ++
 src/intel/vulkan/genX_cmd_buffer.c | 1 -
 7 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
index 9ec311d6cc5..7043ab8995d 100644
--- a/src/intel/genxml/gen10.xml
+++ b/src/intel/genxml/gen10.xml
@@ -3047,6 +3047,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
index 6ab1f965650..3af80a6ed3d 100644
--- a/src/intel/genxml/gen11.xml
+++ b/src/intel/genxml/gen11.xml
@@ -3042,6 +3042,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
index 893c12b8af9..3c445757300 100644
--- a/src/intel/genxml/gen7.xml
+++ b/src/intel/genxml/gen7.xml
@@ -2051,6 +2051,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index 009a123ad69..3df7dc29939 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -2462,6 +2462,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
index fd19b0c8b33..4d1488dae62 100644
--- a/src/intel/genxml/gen8.xml
+++ b/src/intel/genxml/gen8.xml
@@ -2690,6 +2690,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 706d398babb..3f02e866d0c 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -2973,6 +2973,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 6fb19661ebb..544e2929990 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -3310,7 +3310,6 @@ void genX(CmdDispatchIndirect)(
}
 
/* predicate = !predicate; */
-#define COMPARE_FALSE   1
anv_batch_emit(batch, GENX(MI_PREDICATE), mip) {
   mip.LoadOperation= LOAD_LOADINV;
   mip.CombineOperation = COMBINE_OR;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Re-sort the extensions list

2019-01-18 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 18/01/2019 16:24, Jason Ekstrand wrote:

I like to keep things in good order so that you can find them.
---
  src/intel/vulkan/anv_extensions.py | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 2ea4cab0e97..2d212361955 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -71,8 +71,8 @@ MAX_API_VERSION = None # Computed later
  EXTENSIONS = [
  Extension('VK_ANDROID_external_memory_android_hardware_buffer', 3, 
'ANDROID'),
  Extension('VK_ANDROID_native_buffer', 5, 'ANDROID'),
-Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
  Extension('VK_KHR_8bit_storage',  1, 'device->info.gen 
>= 8'),
+Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
  Extension('VK_KHR_bind_memory2',  1, True),
  Extension('VK_KHR_create_renderpass2',1, True),
  Extension('VK_KHR_dedicated_allocation',  1, True),
@@ -80,6 +80,7 @@ EXTENSIONS = [
  Extension('VK_KHR_descriptor_update_template',1, True),
  Extension('VK_KHR_device_group',  1, True),
  Extension('VK_KHR_device_group_creation', 1, True),
+Extension('VK_KHR_display',  23, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_KHR_driver_properties', 1, True),
  Extension('VK_KHR_external_fence',1,
'device->has_syncobj_wait'),
@@ -101,6 +102,7 @@ EXTENSIONS = [
  Extension('VK_KHR_maintenance1',  1, True),
  Extension('VK_KHR_maintenance2',  1, True),
  Extension('VK_KHR_maintenance3',  1, True),
+Extension('VK_KHR_multiview', 1, True),
  Extension('VK_KHR_push_descriptor',   1, True),
  Extension('VK_KHR_relaxed_block_layout',  1, True),
  Extension('VK_KHR_sampler_mirror_clamp_to_edge',  1, True),
@@ -113,9 +115,8 @@ EXTENSIONS = [
  Extension('VK_KHR_wayland_surface',   6, 
'VK_USE_PLATFORM_WAYLAND_KHR'),
  Extension('VK_KHR_xcb_surface',   6, 
'VK_USE_PLATFORM_XCB_KHR'),
  Extension('VK_KHR_xlib_surface',  6, 
'VK_USE_PLATFORM_XLIB_KHR'),
-Extension('VK_KHR_multiview', 1, True),
-Extension('VK_KHR_display',  23, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_EXT_acquire_xlib_display',  1, 
'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
+Extension('VK_EXT_calibrated_timestamps', 1, True),
  Extension('VK_EXT_debug_report',  8, True),
  Extension('VK_EXT_direct_mode_display',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_EXT_display_control',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
@@ -124,13 +125,12 @@ EXTENSIONS = [
  Extension('VK_EXT_global_priority',   1,
'device->has_context_priority'),
  Extension('VK_EXT_pci_bus_info',  2, True),
+Extension('VK_EXT_post_depth_coverage',   1, 'device->info.gen 
>= 9'),
+Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen 
>= 9'),
  Extension('VK_EXT_scalar_block_layout',   1, True),
  Extension('VK_EXT_shader_viewport_index_layer',   1, True),
  Extension('VK_EXT_shader_stencil_export', 1, 'device->info.gen 
>= 9'),
  Extension('VK_EXT_vertex_attribute_divisor',  3, True),
-Extension('VK_EXT_post_depth_coverage',   1, 'device->info.gen 
>= 9'),
-Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen 
>= 9'),
-Extension('VK_EXT_calibrated_timestamps', 1, True),
  Extension('VK_GOOGLE_decorate_string',1, True),
  Extension('VK_GOOGLE_hlsl_functionality1',1, True),
  ]



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: Anv: document & improve pipeline flushes/invalidates

2019-01-18 Thread Lionel Landwerlin

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/132

2 change in this MR :

    * add some documentation to clarify how we choose pipeline flushes 
invalidations
    * narrow the CS stall & RT flushes for the query copies to track 
only operations that write a destination buffer


For the second change we have crucible test bug.108909 to verify that 
this is still correct.


-
Lionel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-17 Thread Lionel Landwerlin
Looking at the change the binding table emission, I think the image++ 
has been moved such that it doesn't produce the same tables anymore.
Trying this change on CI : 
https://github.com/djdeath/mesa/commit/a6b8eaf1325389d94d1d8a5b3bb952a362125eb2


On 17/01/2019 18:19, Clayton Craft wrote:

Quoting Mark Janes (2019-01-17 10:13:37)

Hi Iago,

It looks like you tested this patch in CI and got the same failures that
we are seeing on master:

http://mesa-ci-results.jf.intel.com/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845

The correct link is:
https://mesa-ci.01.org/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845


Why was this patch pushed?

-Mark

Mark Janes  writes:


This patch regresses thousands of tests on BDW and HSW:

http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails

https://mesa-ci.01.org/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails


I'll revert it as soon as my testing completes.

Iago Toral Quiroga  writes:


We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.

Furthermore, despite this, we are exposing up to 64 images per shader
stage on all gens, gen8 included.

This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+ while making sure that only pre-SKL gens
use push constant space to handle images.

v2:
  - <= instead of < in the assert (Eric, Lionel)
  - Change the way the assertion is written (Eric)

v3:
  - Revert the way the assertion is written to the form it had in v1,
the version in v2 was not equivalent and was incorrect. (Lionel)

v4:
  - gen9+ doesn't need push constants for images at all (Jason)
---
  src/intel/vulkan/anv_device.c |  7 --
  .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
  src/intel/vulkan/anv_private.h|  5 ++--
  src/intel/vulkan/genX_cmd_buffer.c| 25 +--
  4 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 523f1483e29..f85458b672e 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
 const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
   128 : 16;
  
+   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : MAX_IMAGES;

+
 VkSampleCountFlags sample_counts =
isl_device_get_sample_counts(>isl_dev);
  
+

 VkPhysicalDeviceLimits limits = {
.maxImageDimension1D  = (1 << 14),
.maxImageDimension2D  = (1 << 14),
@@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
.maxPerStageDescriptorUniformBuffers  = 64,
.maxPerStageDescriptorStorageBuffers  = 64,
.maxPerStageDescriptorSampledImages   = max_samplers,
-  .maxPerStageDescriptorStorageImages   = 64,
+  .maxPerStageDescriptorStorageImages   = max_images,
.maxPerStageDescriptorInputAttachments= 64,
.maxPerStageResources = 250,
.maxDescriptorSetSamplers = 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSamplers */
@@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
.maxDescriptorSetStorageBuffers   = 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageBuffers */
.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
.maxDescriptorSetSampledImages= 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSampledImages */
-  .maxDescriptorSetStorageImages= 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageImages */
+  .maxDescriptorSetStorageImages= 6 * max_images,   /* number 
of stages * maxPerStageDescriptorStorageImages */
.maxDescriptorSetInputAttachments = 256,
.maxVertexInputAttributes = MAX_VBS,
.maxVertexInputBindings   = MAX_VBS,
diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
index b3daf702bc0..623984b0f8c 100644
--- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
+++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
@@ -528,8 +528,8 @@ anv_nir_apply_pipeline_layout(const struct 
anv_physical_device *pdevice,
}
 }
  
-   if (map->image_count > 0) {

-  assert(map->image_count <= MAX_IMAGES);
+   if (map->image_count > 0 && pdevice->compiler->devinfo->gen < 9) {
+  assert(map->image_count <= MAX_GEN8_IMAGES);

Re: [Mesa-dev] [PATCH 2/2] i965/compute: Emit GPGPU_WALKER in genX_state_upload

2019-01-17 Thread Lionel Landwerlin

Put a few nits below, but otherwise looks good.

-
Lionel

On 13/11/2018 00:05, Jordan Justen wrote:

Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/brw_compute.c   | 131 +-
  src/mesa/drivers/dri/i965/brw_context.h   |   2 +
  src/mesa/drivers/dri/i965/genX_state_upload.c | 102 ++
  3 files changed, 105 insertions(+), 130 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index de08fc3ac16..6e4f5b593aa 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -34,135 +34,6 @@
  #include "brw_defines.h"
  
  
-static void

-prepare_indirect_gpgpu_walker(struct brw_context *brw)
-{
-   const struct gen_device_info *devinfo = >screen->devinfo;
-   GLintptr indirect_offset = brw->compute.num_work_groups_offset;
-   struct brw_bo *bo = brw->compute.num_work_groups_bo;
-
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo, indirect_offset + 
0);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo, indirect_offset + 
4);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo, indirect_offset + 
8);
-
-   if (devinfo->gen > 7)
-  return;
-
-   /* Clear upper 32-bits of SRC0 and all 64-bits of SRC1 */
-   BEGIN_BATCH(7);
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (7 - 2));
-   OUT_BATCH(MI_PREDICATE_SRC0 + 4);
-   OUT_BATCH(0u);
-   OUT_BATCH(MI_PREDICATE_SRC1 + 0);
-   OUT_BATCH(0u);
-   OUT_BATCH(MI_PREDICATE_SRC1 + 4);
-   OUT_BATCH(0u);
-   ADVANCE_BATCH();
-
-   /* Load compute_dispatch_indirect_x_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 0);
-
-   /* predicate = (compute_dispatch_indirect_x_size == 0); */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOAD |
- MI_PREDICATE_COMBINEOP_SET |
- MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
-   ADVANCE_BATCH();
-
-   /* Load compute_dispatch_indirect_y_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 4);
-
-   /* predicate |= (compute_dispatch_indirect_y_size == 0); */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOAD |
- MI_PREDICATE_COMBINEOP_OR |
- MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
-   ADVANCE_BATCH();
-
-   /* Load compute_dispatch_indirect_z_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 8);
-
-   /* predicate |= (compute_dispatch_indirect_z_size == 0); */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOAD |
- MI_PREDICATE_COMBINEOP_OR |
- MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
-   ADVANCE_BATCH();
-
-   /* predicate = !predicate; */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOADINV |
- MI_PREDICATE_COMBINEOP_OR |
- MI_PREDICATE_COMPAREOP_FALSE);
-   ADVANCE_BATCH();
-}
-
-static void
-brw_emit_gpgpu_walker(struct brw_context *brw)
-{
-   const struct gen_device_info *devinfo = >screen->devinfo;
-   const struct brw_cs_prog_data *prog_data =
-  brw_cs_prog_data(brw->cs.base.prog_data);
-
-   const GLuint *num_groups = brw->compute.num_work_groups;
-   uint32_t indirect_flag;
-
-   if (brw->compute.num_work_groups_bo == NULL) {
-  indirect_flag = 0;
-   } else {
-  indirect_flag =
- GEN7_GPGPU_INDIRECT_PARAMETER_ENABLE |
- (devinfo->gen == 7 ? GEN7_GPGPU_PREDICATE_ENABLE : 0);
-  prepare_indirect_gpgpu_walker(brw);
-   }
-
-   const unsigned simd_size = prog_data->simd_size;
-   unsigned group_size = prog_data->local_size[0] *
-  prog_data->local_size[1] * prog_data->local_size[2];
-   unsigned thread_width_max =
-  (group_size + simd_size - 1) / simd_size;
-
-   uint32_t right_mask = 0xu >> (32 - simd_size);
-   const unsigned right_non_aligned = group_size & (simd_size - 1);
-   if (right_non_aligned != 0)
-  right_mask >>= (simd_size - right_non_aligned);
-
-   uint32_t dwords = devinfo->gen < 8 ? 11 : 15;
-   BEGIN_BATCH(dwords);
-   OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2) | indirect_flag);
-   OUT_BATCH(0);
-   if (devinfo->gen >= 8) {
-  OUT_BATCH(0); /* Indirect Data Length */
-  OUT_BATCH(0); /* Indirect Data Start Address */
-   }
-   assert(thread_width_max <= brw->screen->devinfo.max_cs_threads);
-   OUT_BATCH(SET_FIELD(simd_size / 16, GPGPU_WALKER_SIMD_SIZE) |
- SET_FIELD(thread_width_max - 1, GPGPU_WALKER_THREAD_WIDTH_MAX));
-   OUT_BATCH(0);/* Thread Group ID Starting X */
-   if (devinfo->gen >= 8)
-  OUT_BATCH(0); /* MBZ */
-   OUT_BATCH(num_groups[0]);/* Thread Group ID X Dimension */
-   OUT_BATCH(0);/* Thread Group ID Starting Y */
-   if (devinfo->gen >= 8)
-  OUT_BATCH(0); 

Re: [Mesa-dev] [PATCH 1/2] i965/genX_state: Add register access functions

2019-01-17 Thread Lionel Landwerlin

Sorry, for replying so late, going through my unread emails :(

We already have functions for doing this :

brw_load_register_reg
brw_load_register_imm32/64
brw_load_register_mem

Why not use those?

-
Lionel

On 13/11/2018 00:05, Jordan Justen wrote:

Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/genX_state_upload.c | 31 +++
  1 file changed, 31 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 5acd0922922..6495862e700 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -197,6 +197,37 @@ KSP(UNUSED struct brw_context *brw, uint32_t offset)
  _brw_cmd_pack(cmd)(brw, (void *)_dst, ),  \
  _dst = NULL)
  
+#if GEN_GEN >= 7

+static void
+emit_lrm(struct brw_context *brw, uint32_t reg, struct brw_address addr)
+{
+   brw_batch_emit(brw, GENX(MI_LOAD_REGISTER_MEM), lrm) {
+  lrm.RegisterAddress  = reg;
+  lrm.MemoryAddress= addr;
+   }
+}
+#endif
+
+MAYBE_UNUSED static void
+emit_lri(struct brw_context *brw, uint32_t reg, uint32_t imm)
+{
+   brw_batch_emit(brw, GENX(MI_LOAD_REGISTER_IMM), lri) {
+  lri.RegisterOffset   = reg;
+  lri.DataDWord= imm;
+   }
+}
+
+#if GEN_IS_HASWELL || GEN_GEN >= 8
+MAYBE_UNUSED static void
+emit_lrr(struct brw_context *brw, uint32_t dst, uint32_t src)
+{
+   brw_batch_emit(brw, GENX(MI_LOAD_REGISTER_REG), lrr) {
+  lrr.SourceRegisterAddress= src;
+  lrr.DestinationRegisterAddress   = dst;
+   }
+}
+#endif
+
  /**
   * Polygon stipple packet
   */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-16 Thread Lionel Landwerlin

On 16/01/2019 14:01, Daniel Stone wrote:

Hi,

On Wed, 16 Jan 2019 at 13:01, Lionel Landwerlin
 wrote:

- It seems we only get notifications when adding to an MR, I could like to 
subscribe to particular tags

If you go to https://gitlab.freedesktop.org/mesa/mesa/labels/ then you
can subscribe to things per-label. That applies to both issues and
MRs, but might help.

Cheers,
Daniel


Duh, I think I was pointed to this at some point...

Awesome, thanks for the reminder!


-

Lionel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-16 Thread Lionel Landwerlin
- I'm pretty happy with the discussion on a particular point/location of 
a change.

  A lot more readable than a long chain of email.

- Having issues with the comments not always showing up on a particular 
commit of an MR (but it looks like gitlab is aware of that issue)


- The Rb/Ab tags were nice to have with patchwork but did not always 
work and also I tended to add them manually.
  So not really a problem, it's just so tempting to press the merge 
button without adding them first :)


- It seems we only get notifications when adding to an MR, I could like 
to subscribe to particular tags


Overall pretty happy :)

-
Lionel

On 11/01/2019 17:23, Danylo Piliaiev wrote:

My small thoughts/questions:

- First of all discussions are really much more convenient.
- Several (mine) merge requests were "Closed" and merged (not just 
merged, they are under "Closed" category), am I missing something?
- Is there a way to grant rights to creator of merge request to 
add/change tags if he doesn't have "Developer" role?

- Maybe adding more tags/more granular tags would be a good idea.
- Could Intel CI be integrated in some way with gitlab?

Overall as someone who didn't interact with mailing lists workflow 
much Gitlab is definitely a win.


On 1/11/19 6:57 PM, Jason Ekstrand wrote:

All,

The mesa project has now hit 100 merge requests (36 are still open).  I
(and I'm sure others) would be curious to hear people's initial thoughts on
the process.  What's working well?  What's not working?  Is it total fail
and should we go back to mailing lists?

--Jason


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] intel/sanitize_gpu: add help/gdb options to wrapper

2019-01-16 Thread Lionel Landwerlin

On 16/01/2019 01:31, Matt Turner wrote:

On Mon, Oct 29, 2018 at 11:16 AM Lionel Landwerlin
 wrote:

Signed-off-by: Lionel Landwerlin 
---
  src/intel/tools/intel_sanitize_gpu.in | 55 ++-
  1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/intel/tools/intel_sanitize_gpu.in 
b/src/intel/tools/intel_sanitize_gpu.in
index 3dac954c408..7e4c96d8738 100755
--- a/src/intel/tools/intel_sanitize_gpu.in
+++ b/src/intel/tools/intel_sanitize_gpu.in
@@ -1,4 +1,57 @@
  #!/bin/bash
  # -*- mode: sh -*-

-LD_PRELOAD="@install_libexecdir@/libintel_sanitize_gpu.so${LD_PRELOAD:+:$LD_PRELOAD}" 
exec "$@"
+function show_help() {
+cat <
No idea why this patch never landed, but

s/intel_aubdump/intel_sanitize_gpu/

(I just came across it when trying to figure out whether we ever moved
intel_aubdump from igt into Mesa.


I think it did land as (with your correction!) :

commit c5fca35af1694bd515883ade4b4ab723fe7fcad0
Author: Lionel Landwerlin 
Date:   Mon Oct 29 18:14:45 2018 +

    intel/sanitize_gpu: add help/gdb options to wrapper

    Signed-off-by: Lionel Landwerlin 
    Reviewed-by: Tapani Pälli 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/lower_tex: Fix the channel ordering during conversion of AYUV images

2019-01-14 Thread Lionel Landwerlin
When writing this I used this page to figure the bytes' ordering : 
https://docs.microsoft.com/en-us/windows/desktop/medfound/recommended-8-bit-yuv-formats-for-video-rendering#ayuv

Of course endianess confuses everything :(

sunxi seems to support AYUV & VUYA : 
https://github.com/allwinner-zh/linux-3.4-sunxi/blob/master/include/video/sunxi_display2.h#L40


Finally this patch (and its gstreamer comments) confuses me even more : 
https://patchwork.freedesktop.org/patch/255529/


I really don't know what's right or wrong here...

-
Lionel

On 15/01/2019 00:49, Vivek Kasireddy wrote:

From: "Kasireddy, Vivek" 

The channel ordering should be 1230 instead of 2103.

While displaying the packed YUV buffers generated by the Vivid (Virtual
Video) driver on Weston, it was observed that AYUV images were not
displayed correctly. Changing the ordering to 1230 makes AYUV buffers
display as expected.

CC: Lionel Landwerlin 
CC: Tapani Palli 
Signed-off-by: Vivek Kasireddy 
---
  src/compiler/nir/nir_lower_tex.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c
index a618b86b34c..7058c54f17c 100644
--- a/src/compiler/nir/nir_lower_tex.c
+++ b/src/compiler/nir/nir_lower_tex.c
@@ -434,10 +434,10 @@ lower_ayuv_external(nir_builder *b, nir_tex_instr *tex)
nir_ssa_def *ayuv = sample_plane(b, tex, 0);
  
convert_yuv_to_rgb(b, tex,

- nir_channel(b, ayuv, 2),
   nir_channel(b, ayuv, 1),
- nir_channel(b, ayuv, 0),
- nir_channel(b, ayuv, 3));
+ nir_channel(b, ayuv, 2),
+ nir_channel(b, ayuv, 3),
+ nir_channel(b, ayuv, 0));
  }
  
  /*



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] autotools: Deprecate the use of autotools

2019-01-13 Thread Lionel Landwerlin

On 12/01/2019 18:24, Ilia Mirkin wrote:

On Sat, Jan 12, 2019 at 9:40 AM Gert Wollny  wrote:

I will not push it with
the strong NAK you gave, Ilia. To me consensus means that all who
contribute significantly to the project (like you certainly do) agree
or abstain, but don't object.

A single actor should not prevent a group from adopting something he
doesn't like. Given that I had all the (in my consideration) major
issues that I did, I assumed that some others had run into them as
well and were equally (or at least partially) concerned. It seems like
that theory has not been borne out -- while I did get some private
feedback about this, no one seems to be publicly willing expend
political capital on this issue.

Perhaps my workflow is different, or perhaps my tolerance for BS is
lower. Probably some combination... I build for 3 different
cross-environments in addition to the native build (arm, arm64,
ppc64), and even x86_32 on the rare occasion. It all works out OK, and
I don't have to spend a ton of time remembering how each one works in
the intervening 6-12 months between each occasion that I spend time on
a particular platform. I suspect the majority of people just build one
thing and it's no trouble to remember how it works since you're doing
it every day.



Just throwing my 2 cents on cross compiling stuff :

For gputop (https://github.com/rib/gputop) I'm cross compiling C/C++ to 
javascript using emscripten.


Apart from figuring out the couple of entries in the cross compile meson 
files, I didn't run into any issue.


So hopefully it won't be more 20mn for you.


-Lionel



A switch to meson will require a re-investment of time into figuring
out how to make those various cases work properly, and all the issues
I ran into with meson (no-longer-working build dirs, inability to
retrieve configuration args, etc) conspire to require that
reinvestment to happen each time rather than once-per-lifetime. But
perhaps these will be fixed by then? Let's hope.

Cheers,

   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] spirv: Emit switch conditions on-the-fly

2019-01-12 Thread Lionel Landwerlin

On 12/01/2019 15:29, Jason Ekstrand wrote:
On January 12, 2019 03:06:07 Lionel Landwerlin 
 wrote:



On 12/01/2019 03:45, Jason Ekstrand wrote:

Instead of emitting all of the conditions for the cases of a switch
statement up-front, emit them on-the-fly as we emit the code for each
case.  The original justification for this was that we were going to
have to build a default case anyway which would need them all.  
However,

we can just trust CSE to clean up the mess in that case. Emitting each
condition right before the if statement that uses it reduces register
pressure and, in one customer benchmark, gets rid of spilling and
improves performance by about 2x.
---
src/compiler/spirv/vtn_cfg.c | 62 +++-
1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/src/compiler/spirv/vtn_cfg.c 
b/src/compiler/spirv/vtn_cfg.c

index aef1b7e18fb..34a910accc6 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -852,6 +852,30 @@ vtn_emit_branch(struct vtn_builder *b, enum 
vtn_branch_type branch_type,

}
}

+static nir_ssa_def *
+vtn_switch_case_condition(struct vtn_builder *b, struct vtn_switch 
*swtch,

+  nir_ssa_def *sel, struct vtn_case *cse)
+{
+   if (cse->is_default) {
+  nir_ssa_def *any = nir_imm_false(>nb);
+  list_for_each_entry(struct vtn_case, other, >cases, 
link) {

+ if (cse->is_default)
+    continue;
+
+ any = nir_ior(>nb, any,
+   vtn_switch_case_condition(b, swtch, sel, 
other));

+  }
+  return any;



return nir_inot(>nb, any); here?


Yup. As you can probably guess, I was lazy and didn't send it through 
Jenkins before sending.  I'll get that fixed.



With that fixed and CI happy:


Reviewed-by: Lionel Landwerlin 








+   } else {
+  nir_ssa_def *cond = nir_imm_false(>nb);
+  util_dynarray_foreach(>values, uint64_t, val) {
+ nir_ssa_def *imm = nir_imm_intN_t(>nb, *val, 
sel->bit_size);

+ cond = nir_ior(>nb, cond, nir_ieq(>nb, sel, imm));
+  }
+  return cond;
+   }
+}
+
static void
vtn_emit_cf_list(struct vtn_builder *b, struct list_head *cf_list,
  nir_variable *switch_fall_var, bool *has_switch_break,
@@ -978,46 +1002,13 @@ vtn_emit_cf_list(struct vtn_builder *b, 
struct list_head *cf_list,

 nir_local_variable_create(b->nb.impl, glsl_bool_type(), "fall");
  nir_store_var(>nb, fall_var, nir_imm_false(>nb), 1);

- /* Next, we gather up all of the conditions.  We have to 
do this
-  * up-front because we also need to build an "any" 
condition so

-  * that we can use !any for default.
-  */
- const int num_cases = list_length(_switch->cases);
- NIR_VLA(nir_ssa_def *, conditions, num_cases);
-
  nir_ssa_def *sel = vtn_ssa_value(b, vtn_switch->selector)->def;
- /* An accumulation of all conditions.  Used for the 
default */

- nir_ssa_def *any = NULL;
-
- int i = 0;
- list_for_each_entry(struct vtn_case, cse, 
_switch->cases, link) {

-    if (cse->is_default) {
-   conditions[i++] = NULL;
-   continue;
-    }
-
-    nir_ssa_def *cond = NULL;
-    util_dynarray_foreach(>values, uint64_t, val) {
-   nir_ssa_def *imm = nir_imm_intN_t(>nb, *val, 
sel->bit_size);

-   nir_ssa_def *is_val = nir_ieq(>nb, sel, imm);
-
-   cond = cond ? nir_ior(>nb, cond, is_val) : is_val;
-    }
-
-    any = any ? nir_ior(>nb, any, cond) : cond;
-    conditions[i++] = cond;
- }
- vtn_assert(i == num_cases);

  /* Now we can walk the list of cases and actually emit code */
- i = 0;
  list_for_each_entry(struct vtn_case, cse, _switch->cases, link) {
 /* Figure out the condition */
-    nir_ssa_def *cond = conditions[i++];
-    if (cse->is_default) {
-   vtn_assert(cond == NULL);
-   cond = nir_inot(>nb, any);
-    }
+    nir_ssa_def *cond =
+   vtn_switch_case_condition(b, vtn_switch, sel, cse);
 /* Take fallthrough into account */
 cond = nir_ior(>nb, cond, nir_load_var(>nb, fall_var));

@@ -1030,7 +1021,6 @@ vtn_emit_cf_list(struct vtn_builder *b, struct 
list_head *cf_list,


 nir_pop_if(>nb, case_if);
  }
- vtn_assert(i == num_cases);

  break;
}







___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] spirv: Emit switch conditions on-the-fly

2019-01-12 Thread Lionel Landwerlin

On 12/01/2019 03:45, Jason Ekstrand wrote:

Instead of emitting all of the conditions for the cases of a switch
statement up-front, emit them on-the-fly as we emit the code for each
case.  The original justification for this was that we were going to
have to build a default case anyway which would need them all.  However,
we can just trust CSE to clean up the mess in that case.  Emitting each
condition right before the if statement that uses it reduces register
pressure and, in one customer benchmark, gets rid of spilling and
improves performance by about 2x.
---
  src/compiler/spirv/vtn_cfg.c | 62 +++-
  1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index aef1b7e18fb..34a910accc6 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -852,6 +852,30 @@ vtn_emit_branch(struct vtn_builder *b, enum 
vtn_branch_type branch_type,
 }
  }
  
+static nir_ssa_def *

+vtn_switch_case_condition(struct vtn_builder *b, struct vtn_switch *swtch,
+  nir_ssa_def *sel, struct vtn_case *cse)
+{
+   if (cse->is_default) {
+  nir_ssa_def *any = nir_imm_false(>nb);
+  list_for_each_entry(struct vtn_case, other, >cases, link) {
+ if (cse->is_default)
+continue;
+
+ any = nir_ior(>nb, any,
+   vtn_switch_case_condition(b, swtch, sel, other));
+  }
+  return any;



return nir_inot(>nb, any); here?



+   } else {
+  nir_ssa_def *cond = nir_imm_false(>nb);
+  util_dynarray_foreach(>values, uint64_t, val) {
+ nir_ssa_def *imm = nir_imm_intN_t(>nb, *val, sel->bit_size);
+ cond = nir_ior(>nb, cond, nir_ieq(>nb, sel, imm));
+  }
+  return cond;
+   }
+}
+
  static void
  vtn_emit_cf_list(struct vtn_builder *b, struct list_head *cf_list,
   nir_variable *switch_fall_var, bool *has_switch_break,
@@ -978,46 +1002,13 @@ vtn_emit_cf_list(struct vtn_builder *b, struct list_head 
*cf_list,
  nir_local_variable_create(b->nb.impl, glsl_bool_type(), "fall");
   nir_store_var(>nb, fall_var, nir_imm_false(>nb), 1);
  
- /* Next, we gather up all of the conditions.  We have to do this

-  * up-front because we also need to build an "any" condition so
-  * that we can use !any for default.
-  */
- const int num_cases = list_length(_switch->cases);
- NIR_VLA(nir_ssa_def *, conditions, num_cases);
-
   nir_ssa_def *sel = vtn_ssa_value(b, vtn_switch->selector)->def;
- /* An accumulation of all conditions.  Used for the default */
- nir_ssa_def *any = NULL;
-
- int i = 0;
- list_for_each_entry(struct vtn_case, cse, _switch->cases, link) {
-if (cse->is_default) {
-   conditions[i++] = NULL;
-   continue;
-}
-
-nir_ssa_def *cond = NULL;
-util_dynarray_foreach(>values, uint64_t, val) {
-   nir_ssa_def *imm = nir_imm_intN_t(>nb, *val, sel->bit_size);
-   nir_ssa_def *is_val = nir_ieq(>nb, sel, imm);
-
-   cond = cond ? nir_ior(>nb, cond, is_val) : is_val;
-}
-
-any = any ? nir_ior(>nb, any, cond) : cond;
-conditions[i++] = cond;
- }
- vtn_assert(i == num_cases);
  
   /* Now we can walk the list of cases and actually emit code */

- i = 0;
   list_for_each_entry(struct vtn_case, cse, _switch->cases, link) {
  /* Figure out the condition */
-nir_ssa_def *cond = conditions[i++];
-if (cse->is_default) {
-   vtn_assert(cond == NULL);
-   cond = nir_inot(>nb, any);
-}
+nir_ssa_def *cond =
+   vtn_switch_case_condition(b, vtn_switch, sel, cse);
  /* Take fallthrough into account */
  cond = nir_ior(>nb, cond, nir_load_var(>nb, fall_var));
  
@@ -1030,7 +1021,6 @@ vtn_emit_cf_list(struct vtn_builder *b, struct list_head *cf_list,
  
  nir_pop_if(>nb, case_if);

   }
- vtn_assert(i == num_cases);
  
   break;

}



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/gcm: Support deref instructions

2019-01-12 Thread Lionel Landwerlin

On 12/01/2019 08:04, Jason Ekstrand wrote:

Even though no one's been brave enough to ever use this pass, I like to
keep it functionally working.



Reviewed-by: Lionel Landwerlin 



---
  src/compiler/nir/nir_opt_gcm.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/src/compiler/nir/nir_opt_gcm.c b/src/compiler/nir/nir_opt_gcm.c
index 879a77a884b..e7d3f8ec424 100644
--- a/src/compiler/nir/nir_opt_gcm.c
+++ b/src/compiler/nir/nir_opt_gcm.c
@@ -128,6 +128,10 @@ gcm_pin_instructions_block(nir_block *block, struct 
gcm_state *state)
   }
   break;
  
+  case nir_instr_type_deref:

+ instr->pass_flags = 0;
+ break;
+
case nir_instr_type_tex:
   switch (nir_instr_as_tex(instr)->op) {
   case nir_texop_tex:



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] spirv: Whack sampler/image pointers to uniform

2019-01-12 Thread Lionel Landwerlin

For the series :

Reviewed-by: Lionel Landwerlin 

On 11/01/2019 21:05, Jason Ekstrand wrote:

A long time in a galaxy far far away, there was a GLSLang bug with how
it handled samplers passed in as function parameters.  (The bug can be
found here: https://github.com/KhronosGroup/glslang/issues/179.)
Unfortunately, that version was shipped in several apps and has been
causing heartburn for our SPIR-V parser ever since.

Recent changes to NIR uncovered a moderately old bug in how we work
around this issue.  In particular, we ended up with a deref_cast from
uniform to local which is not a no-op cast so nir_opt_deref wasn't
getting rid of the cast.  The only reason why it worked before was
because someone just happened to call nir_fixup_deref_modes which
"fixed" the cast (that shouldn't be happening) and then a later round of
copy-prop would get rid of it.  The fact that the deref_cast survived
that long without causing trouble for other parts of NIR is a bit
surprising.

Just whacking the mode of the pointer seems to fix it fairly
unobtrusively.  Currently, only apps with this bug will have a local
variable containing an image or sampler.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109304
---
  src/compiler/spirv/vtn_variables.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 50cd1b42c69..6036295e61c 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -1800,6 +1800,18 @@ vtn_pointer_from_ssa(struct vtn_builder *b, nir_ssa_def 
*ssa,
 ptr->type = ptr_type->deref;
 ptr->ptr_type = ptr_type;
  
+   /* To work around https://github.com/KhronosGroup/glslang/issues/179 we

+* need to whack the mode because it creates a function parameter with the
+* Function storage class even though it's a pointer to a sampler.  If we
+* don't do this, then NIR won't get rid of the deref_cast for us.
+*/
+   if (ptr->mode == vtn_variable_mode_function &&
+   (ptr->type->base_type == vtn_base_type_sampler ||
+ptr->type->base_type == vtn_base_type_sampled_image)) {
+  ptr->mode = vtn_variable_mode_uniform;
+  nir_mode = nir_var_uniform;
+   }
+
 if (vtn_pointer_uses_ssa_offset(b, ptr)) {
/* This pointer type needs to have actual storage */
vtn_assert(ptr_type->type);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] anv/device: fix maximum number of images supported

2019-01-11 Thread Lionel Landwerlin

On 11/01/2019 15:12, Iago Toral Quiroga wrote:

We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.

Furthermore, despite this, we are exposing up to 64 images per shader
stage on all gens, gen8 included.

This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+. It also sets MAX_IMAGES to 64 so we can
actually use up to 64 images in shaders, and then adds an assert
specific to gen8 to check that we never exceed 8.

v2:
  - <= instead of < in the assert (Eric, Lionel)
  - Change the way the assertion is written (Eric)

v3:
  - Revert the way the assertion is written to the for it had in v1,
the version in v2 was not equivalent and was incorrect. (Lionel)



Thanks! Do you think this should go to stable?

Anyway :


Reviewed-by: Lionel Landwerlin 




---
  src/intel/vulkan/anv_device.c| 7 +--
  src/intel/vulkan/anv_nir_apply_pipeline_layout.c | 4 +++-
  src/intel/vulkan/anv_private.h   | 4 ++--
  3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 523f1483e2..f85458b672 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
 const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
   128 : 16;
  
+   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : MAX_IMAGES;

+
 VkSampleCountFlags sample_counts =
isl_device_get_sample_counts(>isl_dev);
  
+

 VkPhysicalDeviceLimits limits = {
.maxImageDimension1D  = (1 << 14),
.maxImageDimension2D  = (1 << 14),
@@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
.maxPerStageDescriptorUniformBuffers  = 64,
.maxPerStageDescriptorStorageBuffers  = 64,
.maxPerStageDescriptorSampledImages   = max_samplers,
-  .maxPerStageDescriptorStorageImages   = 64,
+  .maxPerStageDescriptorStorageImages   = max_images,
.maxPerStageDescriptorInputAttachments= 64,
.maxPerStageResources = 250,
.maxDescriptorSetSamplers = 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSamplers */
@@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
.maxDescriptorSetStorageBuffers   = 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageBuffers */
.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
.maxDescriptorSetSampledImages= 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSampledImages */
-  .maxDescriptorSetStorageImages= 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageImages */
+  .maxDescriptorSetStorageImages= 6 * max_images,   /* number 
of stages * maxPerStageDescriptorStorageImages */
.maxDescriptorSetInputAttachments = 256,
.maxVertexInputAttributes = MAX_VBS,
.maxVertexInputBindings   = MAX_VBS,
diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
index b3daf702bc..ae5c19578c 100644
--- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
+++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
@@ -529,7 +529,9 @@ anv_nir_apply_pipeline_layout(const struct 
anv_physical_device *pdevice,
 }
  
 if (map->image_count > 0) {

-  assert(map->image_count <= MAX_IMAGES);
+  assert(map->image_count <= MAX_IMAGES &&
+ (pdevice->compiler->devinfo->gen > 8 ||
+  map->image_count <= MAX_GEN8_IMAGES));
assert(shader->num_uniforms == prog_data->nr_params * 4);
state.first_image_uniform = shader->num_uniforms;
uint32_t *param = brw_stage_prog_data_add_params(prog_data,
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 770254e93e..9ddd41b1fa 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -157,7 +157,8 @@ struct gen_l3_config;
  #define MAX_SCISSORS16
  #define MAX_PUSH_CONSTANTS_SIZE 128
  #define MAX_DYNAMIC_BUFFERS 16
-#define MAX_IMAGES 8
+#define MAX_IMAGES 64
+#define MAX_GEN8_IMAGES 8
  #define MAX_PUSH_DESCRIPTORS 32 /* Minimum requirement */
  
  /* The kernel relocation API has a limitation of a 32-bit delta value

@@ -1882,7 +1883,6 @@ struct anv_push_constants {
 /* Used for vkCmdDispatchBase */
 uint32_t base_work_group_id

Re: [Mesa-dev] [PATCH v2] anv/device: fix maximum number of images supported

2019-01-11 Thread Lionel Landwerlin

On 11/01/2019 12:40, Iago Toral wrote:

On Fri, 2019-01-11 at 12:31 +, Lionel Landwerlin wrote:

On 11/01/2019 12:12, Iago Toral Quiroga wrote:

We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was
for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and
writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.

Furthermore, despite this, we are exposing up to 64 images per
shader
stage on all gens, gen8 included.

This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+. It also sets MAX_IMAGES to 64 so we can
actually use up to 64 images in shaders, and then adds an assert
specific to gen8 to check that we never exceed 8.

v2:
   - <= instead of < in the assert (Eric, Lionel)
   - Change the way the assertion is written (Eric)
---
   src/intel/vulkan/anv_device.c| 7 +--
   src/intel/vulkan/anv_nir_apply_pipeline_layout.c | 3 ++-
   src/intel/vulkan/anv_private.h   | 4 ++--
   3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c
b/src/intel/vulkan/anv_device.c
index 523f1483e2..f85458b672 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
  const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo-

is_haswell) ?

128 : 16;
   
+   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES

: MAX_IMAGES;
+
  VkSampleCountFlags sample_counts =
 isl_device_get_sample_counts(>isl_dev);
   
+

  VkPhysicalDeviceLimits limits = {
 .maxImageDimension1D  = (1 << 14),
 .maxImageDimension2D  = (1 << 14),
@@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
 .maxPerStageDescriptorUniformBuffers  = 64,
 .maxPerStageDescriptorStorageBuffers  = 64,
 .maxPerStageDescriptorSampledImages   = max_samplers,
-  .maxPerStageDescriptorStorageImages   = 64,
+  .maxPerStageDescriptorStorageImages   = max_images,
 .maxPerStageDescriptorInputAttachments= 64,
 .maxPerStageResources = 250,
 .maxDescriptorSetSamplers = 6 *
max_samplers, /* number of stages * maxPerStageDescriptorSamplers
*/
@@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
 .maxDescriptorSetStorageBuffers   = 6 *
64,   /* number of stages *
maxPerStageDescriptorStorageBuffers */
 .maxDescriptorSetStorageBuffersDynamic=
MAX_DYNAMIC_BUFFERS / 2,
 .maxDescriptorSetSampledImages= 6 *
max_samplers, /* number of stages *
maxPerStageDescriptorSampledImages */
-  .maxDescriptorSetStorageImages= 6 *
64,   /* number of stages *
maxPerStageDescriptorStorageImages */
+  .maxDescriptorSetStorageImages= 6 *
max_images,   /* number of stages *
maxPerStageDescriptorStorageImages */
 .maxDescriptorSetInputAttachments = 256,
 .maxVertexInputAttributes = MAX_VBS,
 .maxVertexInputBindings   = MAX_VBS,
diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
index b3daf702bc..53de347b3c 100644
--- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
+++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
@@ -529,7 +529,8 @@ anv_nir_apply_pipeline_layout(const struct
anv_physical_device *pdevice,
  }
   
  if (map->image_count > 0) {

-  assert(map->image_count <= MAX_IMAGES);
+  assert((pdevice->compiler->devinfo->gen <= 8 && map-

image_count <= MAX_GEN8_IMAGES) ||

+  map->image_count <= MAX_IMAGES);


I'm not sure why it wasn't < instead of <= previously, sounds like
it
should be <.

I think <= is fine, map->image_count is not an index but the total
count so its value should be fine up to the maximum allowed, right?



Oh dear, sorry, you're right.





Also I don't think this version works, your previous version was okay
:


image_count < MAX_GEN8_IMAGE || (gen > 8 && image_count < MAX_IMAGES)

Right, I didn't read Eric's proposal properly, and it would not work
for gen8. I'll revert the form of the assertion to the previous
version.


 assert(shader->num_uniforms == prog_data->nr_params * 4);
 state.first_image_uniform = shader->num_uniforms;
 uint32_t *param = brw_stage_prog_data_add_params(prog_data,
diff --git a/src/intel/vulkan/anv_private.h
b/src/intel/vulkan/anv_private.h
index 770254e93e..9ddd41b1fa 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -15

Re: [Mesa-dev] [PATCH v2] anv/device: fix maximum number of images supported

2019-01-11 Thread Lionel Landwerlin

On 11/01/2019 12:12, Iago Toral Quiroga wrote:

We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.

Furthermore, despite this, we are exposing up to 64 images per shader
stage on all gens, gen8 included.

This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+. It also sets MAX_IMAGES to 64 so we can
actually use up to 64 images in shaders, and then adds an assert
specific to gen8 to check that we never exceed 8.

v2:
  - <= instead of < in the assert (Eric, Lionel)
  - Change the way the assertion is written (Eric)
---
  src/intel/vulkan/anv_device.c| 7 +--
  src/intel/vulkan/anv_nir_apply_pipeline_layout.c | 3 ++-
  src/intel/vulkan/anv_private.h   | 4 ++--
  3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 523f1483e2..f85458b672 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
 const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
   128 : 16;
  
+   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : MAX_IMAGES;

+
 VkSampleCountFlags sample_counts =
isl_device_get_sample_counts(>isl_dev);
  
+

 VkPhysicalDeviceLimits limits = {
.maxImageDimension1D  = (1 << 14),
.maxImageDimension2D  = (1 << 14),
@@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
.maxPerStageDescriptorUniformBuffers  = 64,
.maxPerStageDescriptorStorageBuffers  = 64,
.maxPerStageDescriptorSampledImages   = max_samplers,
-  .maxPerStageDescriptorStorageImages   = 64,
+  .maxPerStageDescriptorStorageImages   = max_images,
.maxPerStageDescriptorInputAttachments= 64,
.maxPerStageResources = 250,
.maxDescriptorSetSamplers = 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSamplers */
@@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
.maxDescriptorSetStorageBuffers   = 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageBuffers */
.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
.maxDescriptorSetSampledImages= 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSampledImages */
-  .maxDescriptorSetStorageImages= 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageImages */
+  .maxDescriptorSetStorageImages= 6 * max_images,   /* number 
of stages * maxPerStageDescriptorStorageImages */
.maxDescriptorSetInputAttachments = 256,
.maxVertexInputAttributes = MAX_VBS,
.maxVertexInputBindings   = MAX_VBS,
diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
index b3daf702bc..53de347b3c 100644
--- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
+++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
@@ -529,7 +529,8 @@ anv_nir_apply_pipeline_layout(const struct 
anv_physical_device *pdevice,
 }
  
 if (map->image_count > 0) {

-  assert(map->image_count <= MAX_IMAGES);
+  assert((pdevice->compiler->devinfo->gen <= 8 && map->image_count <= 
MAX_GEN8_IMAGES) ||
+  map->image_count <= MAX_IMAGES);



I'm not sure why it wasn't < instead of <= previously, sounds like it 
should be <.



Also I don't think this version works, your previous version was okay :


image_count < MAX_GEN8_IMAGE || (gen > 8 && image_count < MAX_IMAGES)



assert(shader->num_uniforms == prog_data->nr_params * 4);
state.first_image_uniform = shader->num_uniforms;
uint32_t *param = brw_stage_prog_data_add_params(prog_data,
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 770254e93e..9ddd41b1fa 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -157,7 +157,8 @@ struct gen_l3_config;
  #define MAX_SCISSORS16
  #define MAX_PUSH_CONSTANTS_SIZE 128
  #define MAX_DYNAMIC_BUFFERS 16
-#define MAX_IMAGES 8
+#define MAX_IMAGES 64
+#define MAX_GEN8_IMAGES 8
  #define MAX_PUSH_DESCRIPTORS 32 /* Minimum requirement */
  
  /* The kernel relocation API has a limitation of a 32-bit delta value

@@ -1882,7 +1883,6 @@ struct anv_push_constants {
 /* Used for vkCmdDispatchBase */
 uint32_t base_work_group_id[3];
  
-   /* Image data for image_load_store on pre-SKL */

 struct brw_image_param images[MAX_IMAGES];

Re: [Mesa-dev] [PATCH] anv/pipeline_cache: free NIR shader cache

2019-01-11 Thread Lionel Landwerlin

On 11/01/2019 12:05, Iago Toral Quiroga wrote:

Fixes: f6aa9f718516 'anv/pipeline_cache: Add support for caching NIR'



Thanks a bunch :


Reviewed-by: Lionel Landwerlin 



---
  src/intel/vulkan/anv_pipeline_cache.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/src/intel/vulkan/anv_pipeline_cache.c 
b/src/intel/vulkan/anv_pipeline_cache.c
index f9733c5309..d96102c287 100644
--- a/src/intel/vulkan/anv_pipeline_cache.c
+++ b/src/intel/vulkan/anv_pipeline_cache.c
@@ -258,6 +258,13 @@ anv_pipeline_cache_finish(struct anv_pipeline_cache *cache)
  
_mesa_hash_table_destroy(cache->cache, NULL);

 }
+
+   if (cache->nir_cache) {
+  hash_table_foreach(cache->nir_cache, entry)
+ ralloc_free(entry->data);
+
+  _mesa_hash_table_destroy(cache->nir_cache, NULL);
+   }
  }
  
  static struct anv_shader_bin *



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/pipeline_cache: fix incorrect guards for NIR cache

2019-01-11 Thread Lionel Landwerlin

On 11/01/2019 11:41, Iago Toral wrote:

On Fri, 2019-01-11 at 11:13 +, Lionel Landwerlin wrote:

On 11/01/2019 10:50, Iago Toral Quiroga wrote:

Fixes: f6aa9f718516 'anv/pipeline_cache: Add support for caching
NIR'
---
   src/intel/vulkan/anv_pipeline_cache.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

Or maybe just check cache->cache instead, which I guess was the
original
intention, but I kind of prefer having all fields initialized.


While looking at this patch, I noticed that we're not freeing
nir_cache
in anv_pipeline_cache_finish.

Right, I'll send a separate patch for that.

Iago



Thanks a lot :)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   7   8   9   10   >