Re: [Mesa-dev] [PATCH] anv/pipeline: do not disable depth writes if depth testing is disabled

2017-05-18 Thread Kenneth Graunke
On Thursday, May 18, 2017 9:28:17 PM PDT Kenneth Graunke wrote:
> On Tuesday, March 28, 2017 11:58:54 PM PDT Iago Toral Quiroga wrote:
> > Writing and testing are two different things and they can be set separately
> > by the application. If an application wants to record depth data without
> > caring for the depth test, it can enable depth test and set the depth
> > compare function to VK_COMPARE_OP_ALWAYS or it can simply disable
> > depth testing altogether. Some CTS tests do the latter.
> > 
> > Fixes all multisample tests with depth-only formats in:
> > dEQP-VK.renderpass.multisample.*
> > ---
> >  src/intel/vulkan/genX_pipeline.c | 4 
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/genX_pipeline.c 
> > b/src/intel/vulkan/genX_pipeline.c
> > index 85a9e4f..dc393cb 100644
> > --- a/src/intel/vulkan/genX_pipeline.c
> > +++ b/src/intel/vulkan/genX_pipeline.c
> > @@ -728,10 +728,6 @@ 
> > sanitize_ds_state(VkPipelineDepthStencilStateCreateInfo *state,
> >  {
> > *stencilWriteEnable = state->stencilTestEnable;
> >  
> > -   /* If the depth test is disabled, we won't be writing anything. */
> > -   if (!state->depthTestEnable)
> > -  state->depthWriteEnable = false;
> > -
> > /* The Vulkan spec requires that if either depth or stencil is not 
> > present,
> >  * the pipeline is to act as if the test silently passes.
> >  */
> > 
> 
> I think those tests are broken.
> 
> According to section 25.10 (Depth Test) of Vulkan 1.0.49...
> https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#fragops-depth
> 
> "The comparison is enabled or disabled with the depthTestEnable member
>  of the VkPipelineDepthStencilStateCreateInfo structure. When disabled,
>  the depth comparison and subsequent possible updates to the value of the
>  depth component of the depth/stencil attachment are bypassed and the
>  fragment is passed to the next operation. The stencil value, however,
>  can be modified as indicated above as if the depth test passed. If
>  enabled, the comparison takes place and the depth/stencil attachment
>  value can subsequently be modified."
> 
> So it sounds like depth writes aren't supposed to happen if the depth
> test is disabled - but stencil writes may.  FWIW, that's how it works
> in OpenGL as well.  (Thanks to Ilia for confirming this with me on IRC.)

Oh, sorry...I see now that you figured this out a month ago.  I'd made a
note to look at this patch in my branch to port this depth/stencil
sanitizing stuff to GL.  For some reason my email client didn't load the
rest of the thread...

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Add RGBX8888 and RGBA8888 to EGL configure and reorder the list

2017-05-18 Thread Tapani Pälli



On 05/19/2017 04:19 AM, Xu, Randy wrote:

-Original Message-
From: Rob Herring [mailto:r...@kernel.org]
Sent: Friday, May 19, 2017 6:01 AM
To: Emil Velikov 
Cc: Chih-Wei Huang ; ML mesa-dev ; Xu, Randy 
Subject: Re: [Mesa-dev] [PATCH] i965: Add RGBX and RGBA to EGL
configure and reorder the list

On Thu, May 18, 2017 at 5:25 AM, Emil Velikov 
wrote:

On 18 May 2017 at 05:10, Chih-Wei Huang 

wrote:

2017-05-18 12:01 GMT+08:00 Xu, Randy :



-Original Message-
From: Palli, Tapani


It's just applied. Isn't it?
Yesterday without this patch
the color format is mismatch apparently.


Yeah we do need this. TBH I don't quite understand why but will try
to figure it out. I remember we used to have a patch in
Surfaceflinger at one point because visual was hardcoded there and this

might be related.


// Tapani


Sorry, that's for different issue, I mix it with RGB565 blending one.
This patch is required because some Android GLES test apps, like

gl2_basic, need to create RGBA surface.


Indeed it is.

As Emil pointed out, the patch was merged before but reverted later
since it broke desktop.

So what's the current upstreaming plan?


No idea about a plan, but how you can fix it once and for all:

Extend the loader extension(s) to have a get_caps() callback,
analogous to __DRIimageExtension::getCapabilities.
Then the DRI module will query [the loader] and advertise the
RGBX/RGBA visuals when possible.


Could we do something compile time instead to enable just for Android?
Seems like a lot of complexity when the platform needing these pixel formats
doesn't need the backwards compatibility. Or maybe there are other things
needing this interface?


I also wish that, can we use macro, like "#if ANDROID" ?
It's not necessary to enable these two visuals for all platform, but it's 
really important for Android.


IMO we should fix Android instead, there is no reason why it would 
*require* these formats.


// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/pipeline: do not disable depth writes if depth testing is disabled

2017-05-18 Thread Kenneth Graunke
On Tuesday, March 28, 2017 11:58:54 PM PDT Iago Toral Quiroga wrote:
> Writing and testing are two different things and they can be set separately
> by the application. If an application wants to record depth data without
> caring for the depth test, it can enable depth test and set the depth
> compare function to VK_COMPARE_OP_ALWAYS or it can simply disable
> depth testing altogether. Some CTS tests do the latter.
> 
> Fixes all multisample tests with depth-only formats in:
> dEQP-VK.renderpass.multisample.*
> ---
>  src/intel/vulkan/genX_pipeline.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/src/intel/vulkan/genX_pipeline.c 
> b/src/intel/vulkan/genX_pipeline.c
> index 85a9e4f..dc393cb 100644
> --- a/src/intel/vulkan/genX_pipeline.c
> +++ b/src/intel/vulkan/genX_pipeline.c
> @@ -728,10 +728,6 @@ sanitize_ds_state(VkPipelineDepthStencilStateCreateInfo 
> *state,
>  {
> *stencilWriteEnable = state->stencilTestEnable;
>  
> -   /* If the depth test is disabled, we won't be writing anything. */
> -   if (!state->depthTestEnable)
> -  state->depthWriteEnable = false;
> -
> /* The Vulkan spec requires that if either depth or stencil is not 
> present,
>  * the pipeline is to act as if the test silently passes.
>  */
> 

I think those tests are broken.

According to section 25.10 (Depth Test) of Vulkan 1.0.49...
https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#fragops-depth

"The comparison is enabled or disabled with the depthTestEnable member
 of the VkPipelineDepthStencilStateCreateInfo structure. When disabled,
 the depth comparison and subsequent possible updates to the value of the
 depth component of the depth/stencil attachment are bypassed and the
 fragment is passed to the next operation. The stencil value, however,
 can be modified as indicated above as if the depth test passed. If
 enabled, the comparison takes place and the depth/stencil attachment
 value can subsequently be modified."

So it sounds like depth writes aren't supposed to happen if the depth
test is disabled - but stencil writes may.  FWIW, that's how it works
in OpenGL as well.  (Thanks to Ilia for confirming this with me on IRC.)


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: add APPLE_vertex_array_object stubs

2017-05-18 Thread Timothy Arceri
APPLE_vertex_array_object support was removed in 7927d0378fc7.
However it turns out we can't remove the functions because this
can cause issues when libglapi is used together with DRI1 drivers
which were branched off from master a few years ago.
---
 src/mapi/glapi/gen/APPLE_vertex_array_object.xml | 27 
 src/mapi/glapi/gen/Makefile.am   |  1 +
 src/mapi/glapi/gen/gl_API.xml|  2 +-
 src/mapi/glapi/tests/check_table.cpp |  2 ++
 src/mesa/main/arrayobj.c | 16 ++
 src/mesa/main/arrayobj.h |  4 
 src/mesa/main/tests/dispatch_sanity.cpp  |  2 ++
 7 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100644 src/mapi/glapi/gen/APPLE_vertex_array_object.xml

diff --git a/src/mapi/glapi/gen/APPLE_vertex_array_object.xml 
b/src/mapi/glapi/gen/APPLE_vertex_array_object.xml
new file mode 100644
index 000..7312f9b
--- /dev/null
+++ b/src/mapi/glapi/gen/APPLE_vertex_array_object.xml
@@ -0,0 +1,27 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+   
+
+
+
+
+   
+
+
+
+
+   
+
+
+
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index ecd1c71..33139bd 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -182,20 +182,21 @@ API_XML = \
ARB_texture_view.xml \
ARB_uniform_buffer_object.xml \
ARB_vertex_array_object.xml \
ARB_vertex_attrib_64bit.xml \
ARB_vertex_attrib_binding.xml \
ARB_viewport_array.xml \
AMD_draw_buffers_blend.xml \
AMD_performance_monitor.xml \
ARB_vertex_type_2_10_10_10_rev.xml \
APPLE_object_purgeable.xml \
+   APPLE_vertex_array_object.xml \
EXT_draw_buffers2.xml \
EXT_framebuffer_object.xml \
EXT_gpu_shader4.xml \
EXT_packed_depth_stencil.xml \
EXT_provoking_vertex.xml \
EXT_separate_shader_objects.xml \
EXT_texture_array.xml \
EXT_texture_integer.xml \
EXT_transform_feedback.xml \
EXT_window_rectangles.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 762fb5a..630d6b8 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -12524,21 +12524,21 @@
 
 
 
 
 
 
 
 
 
 http://www.w3.org/2001/XInclude"/>
-
+http://www.w3.org/2001/XInclude"/>
 
 
 
 
 
 
 
 
 
 
diff --git a/src/mapi/glapi/tests/check_table.cpp 
b/src/mapi/glapi/tests/check_table.cpp
index a1041bce..09bf4f3 100644
--- a/src/mapi/glapi/tests/check_table.cpp
+++ b/src/mapi/glapi/tests/check_table.cpp
@@ -1397,21 +1397,23 @@ const struct name_offset known_dispatch[] = {
{ "glColorFragmentOp3ATI", _O(ColorFragmentOp3ATI) },
{ "glDeleteFragmentShaderATI", _O(DeleteFragmentShaderATI) },
{ "glEndFragmentShaderATI", _O(EndFragmentShaderATI) },
{ "glGenFragmentShadersATI", _O(GenFragmentShadersATI) },
{ "glPassTexCoordATI", _O(PassTexCoordATI) },
{ "glSampleMapATI", _O(SampleMapATI) },
{ "glSetFragmentShaderConstantATI", _O(SetFragmentShaderConstantATI) },
{ "glPointParameteri", _O(PointParameteri) },
{ "glPointParameteriv", _O(PointParameteriv) },
{ "glActiveStencilFaceEXT", _O(ActiveStencilFaceEXT) },
+   { "glBindVertexArrayAPPLE", _O(BindVertexArrayAPPLE) },
{ "glDeleteVertexArrays", _O(DeleteVertexArrays) },
+   { "glGenVertexArraysAPPLE", _O(GenVertexArraysAPPLE) },
{ "glIsVertexArray", _O(IsVertexArray) },
{ "glGetProgramNamedParameterdvNV", _O(GetProgramNamedParameterdvNV) },
{ "glGetProgramNamedParameterfvNV", _O(GetProgramNamedParameterfvNV) },
{ "glProgramNamedParameter4dNV", _O(ProgramNamedParameter4dNV) },
{ "glProgramNamedParameter4dvNV", _O(ProgramNamedParameter4dvNV) },
{ "glProgramNamedParameter4fNV", _O(ProgramNamedParameter4fNV) },
{ "glProgramNamedParameter4fvNV", _O(ProgramNamedParameter4fvNV) },
{ "glPrimitiveRestartIndex", _O(PrimitiveRestartIndex) },
{ "glPrimitiveRestartNV", _O(PrimitiveRestartNV) },
{ "glDepthBoundsEXT", _O(DepthBoundsEXT) },
diff --git a/src/mesa/main/arrayobj.c b/src/mesa/main/arrayobj.c
index 82c00fb..b986229 100644
--- a/src/mesa/main/arrayobj.c
+++ b/src/mesa/main/arrayobj.c
@@ -466,20 +466,28 @@ _mesa_BindVertexArray( GLuint id )
*/
   ctx->Array._DrawArrays = NULL;
   ctx->Array.DrawMethod = DRAW_NONE;
}
 
ctx->NewState |= _NEW_ARRAY;
_mesa_reference_vao(ctx, >Array.VAO, newObj);
 }
 
 
+void GLAPIENTRY
+_mesa_BindVertexArrayAPPLE(GLuint id)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   _mesa_problem(ctx, "APPLE_vertex_array_object is not supported!");
+}
+
+
 /**
  * Delete a set of array objects.
  *
  * \param n  Number of array objects to delete.
  * \param idsArray of \c n array object IDs.
  */
 void GLAPIENTRY
 

[Mesa-dev] [RFC v4 2/2] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Building on top of the previous patch which exported the concept
of engine classes and instances, we can also use this instead of
the current awkward engine selection uAPI.

This is primarily interesting for the VCS engine selection which
is a) currently done via disjoint set of flags, and b) the
current I915_EXEC_BSD flags has different semantics depending on
the underlying hardware which is bad.

Proposed idea here is to reserve 8-bits of flags, to pass in the
engine instance, re-use the existing engine selection bits for
the class selection, and a new flag named
I915_EXEC_CLASS_INSTANCE to tell the kernel this new engine
selection API is in use.

The new uAPI also removes access to the weak VCS engine
balancing as currently existing in the driver.

Example usage to send a command to VCS0:

  eb.flags = i915_execbuffer2_engine(I915_ENGINE_CLASS_VIDEO_DECODE, 0);

Or to send a command to VCS1:

  eb.flags = i915_execbuffer2_engine(I915_ENGINE_CLASS_VIDEO_DECODE, 1);

v2:
 * Fix unknown flags mask.
 * Use I915_EXEC_RING_MASK for class. (Chris Wilson)

v3:
 * Add a map for fast class-instance engine lookup. (Chris Wilson)

v4:
 * Update commit to reflect v3.
 * Export intel_engine_lookup for other users. (Chris Wilson)
 * Split out some warns. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin 
Cc: Ben Widawsky 
Cc: Chris Wilson 
Cc: Daniel Vetter 
Cc: Joonas Lahtinen 
Cc: Jon Bloomfield 
Cc: Daniel Charles 
Cc: "Rogozhkin, Dmitry V" 
Cc: Oscar Mateo 
Cc: "Gong, Zhipeng" 
Cc: intel-vaapi-me...@lists.01.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/i915_drv.h|  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 20 
 drivers/gpu/drm/i915/i915_reg.h|  3 +++
 drivers/gpu/drm/i915/intel_engine_cs.c | 20 
 drivers/gpu/drm/i915/intel_ringbuffer.h|  3 +++
 include/uapi/drm/i915_drm.h| 12 +++-
 6 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1e08b82c4823..53b41963f672 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2115,6 +2115,7 @@ struct drm_i915_private {
struct pci_dev *bridge_dev;
struct i915_gem_context *kernel_context;
struct intel_engine_cs *engine[I915_NUM_ENGINES];
+   struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 
1][MAX_ENGINE_INSTANCE + 1];
struct i915_vma *semaphore;
 
struct drm_dma_handle *status_page_dmah;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index af1965774e7b..006c8046af5f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1492,6 +1492,23 @@ gen8_dispatch_bsd_engine(struct drm_i915_private 
*dev_priv,
return file_priv->bsd_engine;
 }
 
+static struct intel_engine_cs *
+eb_select_engine_class_instance(struct drm_i915_private *i915,
+   struct drm_i915_gem_execbuffer2 *args)
+{
+   u8 class = args->flags & I915_EXEC_RING_MASK;
+   extern u8 user_class_map[I915_ENGINE_CLASS_MAX];
+   u8 instance;
+
+   if (class >= ARRAY_SIZE(user_class_map))
+   return NULL;
+
+   instance = (args->flags >> I915_EXEC_INSTANCE_SHIFT) &&
+  I915_EXEC_INSTANCE_MASK;
+
+   return intel_engine_lookup(i915, user_class_map[class], instance);
+}
+
 #define I915_USER_RINGS (4)
 
 static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
@@ -1510,6 +1527,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
struct intel_engine_cs *engine;
 
+   if (args->flags & I915_EXEC_CLASS_INSTANCE)
+   return eb_select_engine_class_instance(dev_priv, args);
+
if (user_ring_id > I915_USER_RINGS) {
DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
return NULL;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 89888adb9af1..fa4a3841c4af 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -92,6 +92,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define VIDEO_ENHANCEMENT_CLASS2
 #define COPY_ENGINE_CLASS  3
 #define OTHER_CLASS4
+#define MAX_ENGINE_CLASS   4
+
+#define MAX_ENGINE_INSTANCE1
 
 /* PCI config space */
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
b/drivers/gpu/drm/i915/intel_engine_cs.c
index 8d370c5d07bc..8f31514b01d5 100644
--- 

[Mesa-dev] [RFC v3 1/2] drm/i915: Engine discovery uAPI

2017-05-18 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Engine discovery uAPI allows userspace to probe for engine
configuration and features without needing to maintain the
internal PCI id based database.

This enables removal of code duplications across userspace
components.

Probing is done via the new DRM_IOCTL_I915_GEM_ENGINE_INFO ioctl
which returns the number and information on the specified engine
class.

Currently only general engine configuration and HEVC feature of
the VCS engine can be probed but the uAPI is designed to be
generic and extensible.

Code is based almost exactly on the earlier proposal on the
topic by Jon Bloomfield. Engine class and instance refactoring
made recently by Daniele Ceraolo Spurio enabled this to be
implemented in an elegant fashion.

To probe configuration userspace sets the engine class it wants
to query (struct drm_i915_gem_engine_info) and provides an array
of drm_i915_engine_info structs which will be filled in by the
driver. Userspace also has to tell i915 how many elements are in
the array, and the driver will report back the total number of
engine instances in any case.

v2:
 * Add a version field and consolidate to one engine count.
   (Chris Wilson)
 * Rename uAPI flags for VCS engines to DRM_I915_ENGINE_CLASS_VIDEO.
   (Gong Zhipeng)

v3:
 * Drop the DRM_ prefix from uAPI enums. (Chris Wilson)
 * Only reserve 8-bits for the engine info for time being.

Signed-off-by: Tvrtko Ursulin 
Cc: Ben Widawsky 
Cc: Chris Wilson 
Cc: Daniel Vetter 
Cc: Joonas Lahtinen 
Cc: Jon Bloomfield 
Cc: Daniel Charles 
Cc: "Rogozhkin, Dmitry V" 
Cc: Oscar Mateo 
Cc: "Gong, Zhipeng" 
Cc: intel-vaapi-me...@lists.01.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/i915_drv.c|  1 +
 drivers/gpu/drm/i915/i915_drv.h|  3 ++
 drivers/gpu/drm/i915/intel_engine_cs.c | 64 ++
 include/uapi/drm/i915_drm.h| 41 ++
 4 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 72fb47a439d2..dc268f41a609 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2623,6 +2623,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, 
i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, 
i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, 
DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(I915_GEM_ENGINE_INFO, i915_gem_engine_info_ioctl, 
DRM_RENDER_ALLOW),
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 17883a84b8c1..1e08b82c4823 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3521,6 +3521,9 @@ i915_gem_context_lookup_timeline(struct i915_gem_context 
*ctx,
 int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file);
 
+int i915_gem_engine_info_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file);
+
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
  u64 min_size, u64 alignment,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
b/drivers/gpu/drm/i915/intel_engine_cs.c
index 413bfd8d4bf4..8d370c5d07bc 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -25,6 +25,7 @@
 #include "i915_drv.h"
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
+#include 
 
 /* Haswell does have the CXT_SIZE register however it does not appear to be
  * valid. Now, docs explain in dwords what is in the context object. The full
@@ -1286,6 +1287,69 @@ void intel_engines_mark_idle(struct drm_i915_private 
*i915)
}
 }
 
+u8 user_class_map[I915_ENGINE_CLASS_MAX] = {
+   [I915_ENGINE_CLASS_OTHER] = OTHER_CLASS,
+   [I915_ENGINE_CLASS_RENDER] = RENDER_CLASS,
+   [I915_ENGINE_CLASS_COPY] = COPY_ENGINE_CLASS,
+   [I915_ENGINE_CLASS_VIDEO] = VIDEO_DECODE_CLASS,
+   [I915_ENGINE_CLASS_VIDEO_ENHANCE] = VIDEO_ENHANCEMENT_CLASS,
+};
+
+int i915_gem_engine_info_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file)
+{
+   struct drm_i915_private *i915 = to_i915(dev);
+   struct drm_i915_gem_engine_info *args = data;
+   struct drm_i915_engine_info __user *user_info =
+   u64_to_user_ptr(args->info_ptr);
+   unsigned int info_size = args->num_engines;
+   struct 

Re: [Mesa-dev] [PATCH v2 1/6] vulkan: Fix Wayland uninitialised registry

2017-05-18 Thread Lionel Landwerlin

This series is :

Reviewed-by: Lionel Landwerlin 

Thanks for doing this.
Although you seem to have a multithreaded scenario in mind, this will 
probably help single threaded stacks like Clutter/Cogl & Mesa all using 
the same event queue.


-
Lionel

On 16/05/17 11:05, Daniel Stone wrote:

Untangle the exit cleanup paths so we don't try to use the registry
variable before it's been initialised.

Signed-off-by: Daniel Stone 
Cc: mesa-sta...@lists.freedesktop.org
---
  src/vulkan/wsi/wsi_common_wayland.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/vulkan/wsi/wsi_common_wayland.c 
b/src/vulkan/wsi/wsi_common_wayland.c
index 5613283d9d..9c246b8d5c 100644
--- a/src/vulkan/wsi/wsi_common_wayland.c
+++ b/src/vulkan/wsi/wsi_common_wayland.c
@@ -272,7 +272,7 @@ wsi_wl_display_create(struct wsi_wayland *wsi, struct 
wl_display *wl_display)
  
 struct wl_registry *registry = wl_display_get_registry(wl_display);

 if (!registry)
-  return NULL;
+  goto fail;
  
 wl_registry_add_listener(registry, _listener, display);
  
@@ -280,24 +280,25 @@ wsi_wl_display_create(struct wsi_wayland *wsi, struct wl_display *wl_display)

 wl_display_roundtrip(wl_display);
  
 if (!display->drm)

-  goto fail;
+  goto fail_registry;
  
 /* Round-rip to get wl_drm formats and capabilities */

 wl_display_roundtrip(wl_display);
  
 /* We need prime support */

 if (!(display->capabilities & WL_DRM_CAPABILITY_PRIME))
-  goto fail;
+  goto fail_registry;
  
 /* We don't need this anymore */

 wl_registry_destroy(registry);
  
 return display;
  
-fail:

+fail_registry:
 if (registry)
wl_registry_destroy(registry);
  
+fail:

 wsi_wl_display_destroy(wsi, display);
 return NULL;
  }



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Add RGBX8888 and RGBA8888 to EGL configure and reorder the list

2017-05-18 Thread Xu, Randy
> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: Friday, May 19, 2017 6:01 AM
> To: Emil Velikov 
> Cc: Chih-Wei Huang ; ML mesa-dev  d...@lists.freedesktop.org>; Xu, Randy 
> Subject: Re: [Mesa-dev] [PATCH] i965: Add RGBX and RGBA to EGL
> configure and reorder the list
> 
> On Thu, May 18, 2017 at 5:25 AM, Emil Velikov 
> wrote:
> > On 18 May 2017 at 05:10, Chih-Wei Huang 
> wrote:
> >> 2017-05-18 12:01 GMT+08:00 Xu, Randy :
> >>>
>  -Original Message-
>  From: Palli, Tapani
>  >
>  > It's just applied. Isn't it?
>  > Yesterday without this patch
>  > the color format is mismatch apparently.
> 
>  Yeah we do need this. TBH I don't quite understand why but will try
>  to figure it out. I remember we used to have a patch in
>  Surfaceflinger at one point because visual was hardcoded there and this
> might be related.
> 
>  // Tapani
> >>>
> >>> Sorry, that's for different issue, I mix it with RGB565 blending one.
> >>> This patch is required because some Android GLES test apps, like
> gl2_basic, need to create RGBA surface.
> >>
> >> Indeed it is.
> >>
> >> As Emil pointed out, the patch was merged before but reverted later
> >> since it broke desktop.
> >>
> >> So what's the current upstreaming plan?
> >>
> > No idea about a plan, but how you can fix it once and for all:
> >
> > Extend the loader extension(s) to have a get_caps() callback,
> > analogous to __DRIimageExtension::getCapabilities.
> > Then the DRI module will query [the loader] and advertise the
> > RGBX/RGBA visuals when possible.
> 
> Could we do something compile time instead to enable just for Android?
> Seems like a lot of complexity when the platform needing these pixel formats
> doesn't need the backwards compatibility. Or maybe there are other things
> needing this interface?

I also wish that, can we use macro, like "#if ANDROID" ?
It's not necessary to enable these two visuals for all platform, but it's 
really important for Android.

> 
> Rob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 2/3] mesa: add an env var to force cache fallback

2017-05-18 Thread Timothy Arceri
For the gallium state tracker a tgsi binary may have been evicted
from the cache to make space. In this case we would take the
fallback path and recompile/link the shader.

On i965 there are a number of reasons we can get to the program
upload stage and have neither IR nor a valid cached binary.
For example the binary may have been evicted from the cache or
we need a variant that wasn't previously cached.

This environment variable enables us to force the fallback path that
would be taken in these cases and makes it easier to debug these
otherwise hard to reproduce scenarios.

V2: drop the infinite loop workaround, it was only required due to
a bug in the following patch.
---
 docs/shading.html| 2 ++
 src/mesa/main/mtypes.h   | 1 +
 src/mesa/main/shaderapi.c| 2 ++
 src/mesa/state_tracker/st_shader_cache.c | 6 +-
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/docs/shading.html b/docs/shading.html
index 7e3d2e4..c789102 100644
--- a/docs/shading.html
+++ b/docs/shading.html
@@ -43,20 +43,22 @@ Contents
 The MESA_GLSL environment variable can be set to a comma-separated
 list of keywords to control some aspects of the GLSL compiler and shader
 execution.  These are generally used for debugging.
 
 
 dump - print GLSL shader code to stdout at link time
 log - log all GLSL shaders to files.
 The filenames will be "shader_X.vert" or "shader_X.frag" where X
 the shader ID.
 cache_info - print debug information about shader cache
+cache_fb - force cached shaders to be ignored and do a full
+recompile via the fallback path
 uniform - print message to stdout when glUniform is called
 nopvert - force vertex shaders to be a simple shader that just 
transforms
 the vertex position with ftransform() and passes through the color and
 texcoord[0] attributes.
 nopfrag - force fragment shader to be a simple shader that passes
 through the color attribute.
 useprog - log glUseProgram calls to stderr
 
 
 Example:  export MESA_GLSL=dump,nopt
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 28d3d948..941311b 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2970,20 +2970,21 @@ struct gl_shader_program
 
 #define GLSL_DUMP  0x1  /**< Dump shaders to stdout */
 #define GLSL_LOG   0x2  /**< Write shaders to files */
 #define GLSL_UNIFORMS  0x4  /**< Print glUniform calls */
 #define GLSL_NOP_VERT  0x8  /**< Force no-op vertex shaders */
 #define GLSL_NOP_FRAG 0x10  /**< Force no-op fragment shaders */
 #define GLSL_USE_PROG 0x20  /**< Log glUseProgram calls */
 #define GLSL_REPORT_ERRORS 0x40  /**< Print compilation errors */
 #define GLSL_DUMP_ON_ERROR 0x80 /**< Dump shaders to stderr on compile error */
 #define GLSL_CACHE_INFO 0x100 /**< Print debug information about shader cache 
*/
+#define GLSL_CACHE_FALLBACK 0x200 /**< Force shader cache fallback paths */
 
 
 /**
  * Context state for GLSL vertex/fragment shaders.
  * Extended to support pipeline object
  */
 struct gl_pipeline_object
 {
/** Name of the pipeline object as received from glGenProgramPipelines.
 * It would be 0 for shaders without separate shader objects.
diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 68fb3fa..325542e 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -70,20 +70,22 @@ _mesa_get_shader_flags(void)
GLbitfield flags = 0x0;
const char *env = getenv("MESA_GLSL");
 
if (env) {
   if (strstr(env, "dump_on_error"))
  flags |= GLSL_DUMP_ON_ERROR;
   else if (strstr(env, "dump"))
  flags |= GLSL_DUMP;
   if (strstr(env, "log"))
  flags |= GLSL_LOG;
+  if (strstr(env, "cache_fb"))
+ flags |= GLSL_CACHE_FALLBACK;
   if (strstr(env, "cache_info"))
  flags |= GLSL_CACHE_INFO;
   if (strstr(env, "nopvert"))
  flags |= GLSL_NOP_VERT;
   if (strstr(env, "nopfrag"))
  flags |= GLSL_NOP_FRAG;
   if (strstr(env, "uniform"))
  flags |= GLSL_UNIFORMS;
   if (strstr(env, "useprog"))
  flags |= GLSL_USE_PROG;
diff --git a/src/mesa/state_tracker/st_shader_cache.c 
b/src/mesa/state_tracker/st_shader_cache.c
index 175d69d..45438e5 100644
--- a/src/mesa/state_tracker/st_shader_cache.c
+++ b/src/mesa/state_tracker/st_shader_cache.c
@@ -218,22 +218,26 @@ st_load_tgsi_from_disk_cache(struct gl_context *ctx,
}
 
/* Now that we have created the sha1 keys that will be used for writting to
 * the tgsi cache fallback to the regular glsl to tgsi path if we didn't
 * load the GLSL IR from cache. We do this as glsl to tgsi can alter things
 * such as gl_program_parameter_list which holds things like uniforms.
 */
if (prog->data->LinkStatus != linking_skipped)
   return false;
 
-   struct st_context *st = st_context(ctx);
uint8_t *buffer = NULL;
+   if (ctx->_Shader->Flags & GLSL_CACHE_FALLBACK) {
+  goto 

[Mesa-dev] [PATCH V2 3/3] st/mesa: don't mark the program as in cache_fallback when there is cache miss

2017-05-18 Thread Timothy Arceri
When we fallback currently the gl_program objects are re-allocated.

This is likely to change when the i965 cache lands, but for now
this fixes a crash when using MESA_GLSL=cache_fb. This env var
simulates the fallback path taken when a tgsi cache item doesn't
exist due to being evicted previously or some kind of error.

Unlike i965 we are always falling back at link time so it's safe to
just re-allocate everything. We will be unnecessarily freeing and
re-allocate a bunch of things here but it's probably not a huge deal,
and can be changed when the i965 code lands.

Fixes: 0e9991f957e2 ("glsl: don't reference shader prog data during cache 
fallback")

---
 src/compiler/glsl/shader_cache.cpp   | 2 +-
 src/mesa/main/mtypes.h   | 2 ++
 src/mesa/state_tracker/st_shader_cache.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/shader_cache.cpp 
b/src/compiler/glsl/shader_cache.cpp
index 800d3a2..dd56501 100644
--- a/src/compiler/glsl/shader_cache.cpp
+++ b/src/compiler/glsl/shader_cache.cpp
@@ -1285,21 +1285,21 @@ bool
 shader_cache_read_program_metadata(struct gl_context *ctx,
struct gl_shader_program *prog)
 {
/* Fixed function programs generated by Mesa are not cached. So don't
 * try to read metadata for them from the cache.
 */
if (prog->Name == 0)
   return false;
 
struct disk_cache *cache = ctx->Cache;
-   if (!cache || prog->data->cache_fallback)
+   if (!cache || prog->data->cache_fallback || prog->data->skip_cache)
   return false;
 
/* Include bindings when creating sha1. These bindings change the resulting
 * binary so they are just as important as the shader source.
 */
char *buf = ralloc_strdup(NULL, "vb: ");
prog->AttributeBindings->iterate(create_binding_str, );
ralloc_strcat(, "fb: ");
prog->FragDataBindings->iterate(create_binding_str, );
ralloc_strcat(, "fbi: ");
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 941311b..9ca4bb8 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2807,20 +2807,22 @@ struct gl_shader_program_data
 
struct gl_active_atomic_buffer *AtomicBuffers;
unsigned NumAtomicBuffers;
 
/* Shader cache variables used during restore */
unsigned NumUniformDataSlots;
union gl_constant_value *UniformDataSlots;
 
bool cache_fallback;
 
+   bool skip_cache;
+
/** List of all active resources after linking. */
struct gl_program_resource *ProgramResourceList;
unsigned NumProgramResourceList;
 
enum gl_link_status LinkStatus;   /**< GL_LINK_STATUS */
GLboolean Validated;
GLchar *InfoLog;
 
unsigned Version;   /**< GLSL version used for linking */
 
diff --git a/src/mesa/state_tracker/st_shader_cache.c 
b/src/mesa/state_tracker/st_shader_cache.c
index 45438e5..31c3430 100644
--- a/src/mesa/state_tracker/st_shader_cache.c
+++ b/src/mesa/state_tracker/st_shader_cache.c
@@ -384,15 +384,15 @@ st_load_tgsi_from_disk_cache(struct gl_context *ctx,
 fallback_recompile:
free(buffer);
 
if (ctx->_Shader->Flags & GLSL_CACHE_INFO)
   fprintf(stderr, "TGSI cache falling back to recompile.\n");
 
for (unsigned i = 0; i < prog->NumShaders; i++) {
   _mesa_glsl_compile_shader(ctx, prog->Shaders[i], false, false, true);
}
 
-   prog->data->cache_fallback = true;
+   prog->data->skip_cache = true;
_mesa_glsl_link_shader(ctx, prog);
 
return true;
 }
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 00/15] ac/surface, radv: use shared surface computation code

2017-05-18 Thread Dave Airlie
On 18 May 2017 at 19:53, Nicolai Hähnle  wrote:
> Hi all,
>
> This is v2 of the second half of a series I sent earlier, to move
> radv to a shared code base for surface computations. I integrated
> patches by Dave that should fix radv issues. The series is also at
> https://cgit.freedesktop.org/~nh/mesa/log/?h=ac_surface

Thanks for doing this, it looks good.

I'm working on a CI, but I think you've taken care of all the
regressions I found in this,
and I'm happy enough to find any others.

Reviewed-by: Dave Airlie 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Enable ASTC HDR for Broxton

2017-05-18 Thread Nanley Chery
This platform passes the following GLES3 tests:
ES3-CTS.functional.texture.compressed.astc.endpoint_value_hdr_cem_*

Signed-off-by: Nanley Chery 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 26c074638e..e76f61de1d 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -285,6 +285,9 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_post_depth_coverage = true;
}
 
+   if (brw->is_broxton)
+  ctx->Extensions.KHR_texture_compression_astc_hdr = true;
+
if (brw->gen >= 6)
   ctx->Extensions.INTEL_performance_query = true;
 
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] intel/isl: Add ASTC HDR to format lists and helpers

2017-05-18 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/isl/isl.h | 14 ++
 src/intel/isl/isl_format.c  | 32 ++--
 src/intel/isl/isl_format_layout.csv | 14 ++
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
index 7778551579..8131f45ae4 100644
--- a/src/intel/isl/isl.h
+++ b/src/intel/isl/isl.h
@@ -353,6 +353,20 @@ enum isl_format {
ISL_FORMAT_ASTC_LDR_2D_10X10_FLT16 =630,
ISL_FORMAT_ASTC_LDR_2D_12X10_FLT16 =638,
ISL_FORMAT_ASTC_LDR_2D_12X12_FLT16 =639,
+   ISL_FORMAT_ASTC_HDR_2D_4X4_FLT16 =  832,
+   ISL_FORMAT_ASTC_HDR_2D_5X4_FLT16 =  840,
+   ISL_FORMAT_ASTC_HDR_2D_5X5_FLT16 =  841,
+   ISL_FORMAT_ASTC_HDR_2D_6X5_FLT16 =  849,
+   ISL_FORMAT_ASTC_HDR_2D_6X6_FLT16 =  850,
+   ISL_FORMAT_ASTC_HDR_2D_8X5_FLT16 =  865,
+   ISL_FORMAT_ASTC_HDR_2D_8X6_FLT16 =  866,
+   ISL_FORMAT_ASTC_HDR_2D_8X8_FLT16 =  868,
+   ISL_FORMAT_ASTC_HDR_2D_10X5_FLT16 = 881,
+   ISL_FORMAT_ASTC_HDR_2D_10X6_FLT16 = 882,
+   ISL_FORMAT_ASTC_HDR_2D_10X8_FLT16 = 884,
+   ISL_FORMAT_ASTC_HDR_2D_10X10_FLT16 =886,
+   ISL_FORMAT_ASTC_HDR_2D_12X10_FLT16 =894,
+   ISL_FORMAT_ASTC_HDR_2D_12X12_FLT16 =895,
 
/* The formats that follow are internal to ISL and as such don't have an
 * explicit number.  We'll just let the C compiler assign it for us.  Any
diff --git a/src/intel/isl/isl_format.c b/src/intel/isl/isl_format.c
index 165c7e5b6b..e6d2a438d3 100644
--- a/src/intel/isl/isl_format.c
+++ b/src/intel/isl/isl_format.c
@@ -341,6 +341,20 @@ static const struct surface_format_info format_info[] = {
SF(90, 90,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_LDR_2D_10X10_U8SRGB)
SF(90, 90,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_LDR_2D_12X10_U8SRGB)
SF(90, 90,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_LDR_2D_12X12_U8SRGB)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_4X4_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_5X4_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_5X5_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_6X5_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_6X6_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_8X5_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_8X6_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_8X8_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_10X5_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_10X6_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_10X8_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_10X10_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_12X10_FLT16)
+   SF(100, 100,  x,  x,  x,  x,  x,  x,  x,x,  x,  x,   
ASTC_HDR_2D_12X12_FLT16)
 };
 #undef x
 #undef Y
@@ -387,10 +401,17 @@ isl_format_supports_sampling(const struct gen_device_info 
*devinfo,
  return true;
} else if (devinfo->is_cherryview) {
   const struct isl_format_layout *fmtl = isl_format_get_layout(format);
-  /* Support for ASTC exists on Cherry View even though big-core
+  /* Support for ASTC LDR exists on Cherry View even though big-core
* GPUs didn't get it until Skylake.
*/
   if (fmtl->txc == ISL_TXC_ASTC)
+ return format < ISL_FORMAT_ASTC_HDR_2D_4X4_FLT16;
+   } else if (devinfo->is_broxton) {
+  const struct isl_format_layout *fmtl = isl_format_get_layout(format);
+  /* Support for ASTC HDR exists on Broxton even though big-core
+   * GPUs didn't get it until Cannonlake.
+   */
+  if (fmtl->txc == ISL_TXC_ASTC)
  return true;
}
 
@@ -413,10 +434,17 @@ isl_format_supports_filtering(const struct 
gen_device_info *devinfo,
  return true;
} else if (devinfo->is_cherryview) {
   const struct isl_format_layout *fmtl = isl_format_get_layout(format);
-  /* Support for ASTC exists on Cherry View even though big-core
+  /* Support for ASTC LDR exists on Cherry View even though big-core
* GPUs didn't get it until Skylake.
*/
   if (fmtl->txc == ISL_TXC_ASTC)
+ return format < ISL_FORMAT_ASTC_HDR_2D_4X4_FLT16;
+   } else if (devinfo->is_broxton) {
+  const struct isl_format_layout *fmtl = 

[Mesa-dev] [PATCH 4/4] vc4: Use the RA callback to improve post-RA register selection's choices.

2017-05-18 Thread Eric Anholt
We simply pick r4 if available (anything else would force a MOV), then
round-robin through accumulators (avoids physical regfile RAW delay
slots), then round-robin through the physical regfile.

The effect on instruction count is pretty impressive:

total instructions in shared programs: 76563 -> 74526 (-2.66%)
instructions in affected programs: 66463 -> 64426 (-3.06%)

and we could probably do better with a little heuristic of "if we're going
to choose a physical reg, and all other operands of instructions using
this as a src have the same physical regfile, then use the other regfile".
---
 src/gallium/drivers/vc4/vc4_register_allocate.c | 53 -
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/vc4/vc4_register_allocate.c 
b/src/gallium/drivers/vc4/vc4_register_allocate.c
index 506fdb593a33..1d860ea058ee 100644
--- a/src/gallium/drivers/vc4/vc4_register_allocate.c
+++ b/src/gallium/drivers/vc4/vc4_register_allocate.c
@@ -101,7 +101,9 @@ static const struct qpu_reg vc4_regs[] = {
 QPU_R(B, 31),
 };
 #define ACC_INDEX 0
-#define AB_INDEX  (ACC_INDEX + 5)
+#define ACC_COUNT 5
+#define AB_INDEX  (ACC_INDEX + ACC_COUNT)
+#define AB_COUNT  64
 
 static void
 vc4_alloc_reg_set(struct vc4_context *vc4)
@@ -200,6 +202,49 @@ node_to_temp_priority(const void *in_a, const void *in_b)
 #define CLASS_BIT_R4   (1 << 2)
 #define CLASS_BIT_R0_R3(1 << 4)
 
+struct vc4_ra_select_callback_data {
+uint32_t next_acc;
+uint32_t next_ab;
+};
+
+static unsigned int
+vc4_ra_select_callback(struct ra_graph *g, BITSET_WORD *regs, void *data)
+{
+struct vc4_ra_select_callback_data *vc4_ra = data;
+
+/* If r4 is available, always choose it -- few other things can go
+ * there, and choosing anything else means inserting a mov.
+ */
+if (BITSET_TEST(regs, ACC_INDEX + 4))
+return ACC_INDEX + 4;
+
+/* Choose an accumulator if possible (no delay between write and
+ * read), but round-robin through them to give post-RA instruction
+ * selection more options.
+ */
+for (int i = 0; i < ACC_COUNT; i++) {
+int acc_off = (vc4_ra->next_acc + i) % ACC_COUNT;
+int acc = ACC_INDEX + acc_off;
+
+if (BITSET_TEST(regs, acc)) {
+vc4_ra->next_acc = acc_off + 1;
+return acc;
+}
+}
+
+for (int i = 0; i < AB_COUNT; i++) {
+int ab_off = (vc4_ra->next_ab + i) % AB_COUNT;
+int ab = AB_INDEX + ab_off;
+
+if (BITSET_TEST(regs, ab)) {
+vc4_ra->next_ab = ab_off + 1;
+return ab;
+}
+}
+
+unreachable("RA must pass us at least one possible reg.");
+}
+
 /**
  * Returns a mapping from QFILE_TEMP indices to struct qpu_regs.
  *
@@ -213,6 +258,10 @@ vc4_register_allocate(struct vc4_context *vc4, struct 
vc4_compile *c)
 uint8_t class_bits[c->num_temps];
 struct qpu_reg *temp_registers = calloc(c->num_temps,
 sizeof(*temp_registers));
+struct vc4_ra_select_callback_data callback_data = {
+.next_acc = 0,
+.next_ab = 0,
+};
 
 /* If things aren't ever written (undefined values), just read from
  * r0.
@@ -228,6 +277,8 @@ vc4_register_allocate(struct vc4_context *vc4, struct 
vc4_compile *c)
 /* Compute the live ranges so we can figure out interference. */
 qir_calculate_live_intervals(c);
 
+ra_set_select_reg_callback(g, vc4_ra_select_callback, _data);
+
 for (uint32_t i = 0; i < c->num_temps; i++) {
 map[i].temp = i;
 map[i].priority = c->temp_end[i] - c->temp_start[i];
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] ra: Don't put a node in its own adjacency set.

2017-05-18 Thread Eric Anholt
All the paths looping over adjacency had guards against considering
themselves (the non-obvious one was ra_any_neighbors_conflict(), which has
in_stack set).
---
 src/util/register_allocate.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 35ef9a714cc8..0e2dd4a376cd 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -392,11 +392,11 @@ ra_add_node_adjacency(struct ra_graph *g, unsigned int 
n1, unsigned int n2)
 {
BITSET_SET(g->nodes[n1].adjacency, n2);
 
-   if (n1 != n2) {
-  int n1_class = g->nodes[n1].class;
-  int n2_class = g->nodes[n2].class;
-  g->nodes[n1].q_total += g->regs->classes[n1_class]->q[n2_class];
-   }
+   assert(n1 != n2);
+
+   int n1_class = g->nodes[n1].class;
+   int n2_class = g->nodes[n2].class;
+   g->nodes[n1].q_total += g->regs->classes[n1_class]->q[n2_class];
 
if (g->nodes[n1].adjacency_count >=
g->nodes[n1].adjacency_list_size) {
@@ -433,7 +433,6 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned 
int count)
   g->nodes[i].adjacency_count = 0;
   g->nodes[i].q_total = 0;
 
-  ra_add_node_adjacency(g, i, i);
   g->nodes[i].reg = NO_REG;
}
 
@@ -451,7 +450,7 @@ void
 ra_add_node_interference(struct ra_graph *g,
  unsigned int n1, unsigned int n2)
 {
-   if (!BITSET_TEST(g->nodes[n1].adjacency, n2)) {
+   if (n1 != n2 && !BITSET_TEST(g->nodes[n1].adjacency, n2)) {
   ra_add_node_adjacency(g, n1, n2);
   ra_add_node_adjacency(g, n2, n1);
}
@@ -475,7 +474,7 @@ decrement_q(struct ra_graph *g, unsigned int n)
   unsigned int n2 = g->nodes[n].adjacency_list[i];
   unsigned int n2_class = g->nodes[n2].class;
 
-  if (n != n2 && !g->nodes[n2].in_stack) {
+  if (!g->nodes[n2].in_stack) {
  assert(g->nodes[n2].q_total >= 
g->regs->classes[n2_class]->q[n_class]);
  g->nodes[n2].q_total -= g->regs->classes[n2_class]->q[n_class];
   }
@@ -661,11 +660,9 @@ ra_get_spill_benefit(struct ra_graph *g, unsigned int n)
 */
for (j = 0; j < g->nodes[n].adjacency_count; j++) {
   unsigned int n2 = g->nodes[n].adjacency_list[j];
-  if (n != n2) {
-unsigned int n2_class = g->nodes[n2].class;
-benefit += ((float)g->regs->classes[n_class]->q[n2_class] /
-g->regs->classes[n_class]->p);
-  }
+  unsigned int n2_class = g->nodes[n2].class;
+  benefit += ((float)g->regs->classes[n_class]->q[n2_class] /
+  g->regs->classes[n_class]->p);
}
 
return benefit;
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] ra: Add a callback for selecting a register from what's available.

2017-05-18 Thread Eric Anholt
VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to
use low-numbered registers (more parallelism on Intel, fewer delay slots
on vc4), but in order to give instruction scheduling the most freedom to
avoid delays we want to round-robin between registers of the same cost.
Our two heuristics so far have chosen one end or the other of that
tradeoff.

The callback, instead, hands the driver the set of registers that are
available, and the driver gets to make its own choice.  This will be used
in vc4 to round-robin between registers of the same cost, and might be
used in the future for improving bank selection.
---
 src/util/register_allocate.c | 90 +---
 src/util/register_allocate.h |  6 +++
 2 files changed, 82 insertions(+), 14 deletions(-)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 0e2dd4a376cd..b06a61f24a37 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -174,6 +174,10 @@ struct ra_graph {
 * stack.
 */
unsigned int stack_optimistic_start;
+
+   unsigned int (*select_reg_callback)(struct ra_graph *g, BITSET_WORD *regs,
+   void *data);
+   void *select_reg_callback_data;
 };
 
 /**
@@ -439,6 +443,16 @@ ra_alloc_interference_graph(struct ra_regs *regs, unsigned 
int count)
return g;
 }
 
+void ra_set_select_reg_callback(struct ra_graph *g,
+unsigned int (*callback)(struct ra_graph *g,
+ BITSET_WORD *regs,
+ void *data),
+void *data)
+{
+   g->select_reg_callback = callback;
+   g->select_reg_callback_data = data;
+}
+
 void
 ra_set_node_class(struct ra_graph *g,
   unsigned int n, unsigned int class)
@@ -555,6 +569,41 @@ ra_any_neighbors_conflict(struct ra_graph *g, unsigned int 
n, unsigned int r)
return false;
 }
 
+/* Computes a bitfield of what regs are available for a given register
+ * selection.
+ *
+ * This lets drivers implement a more complicated policy than our simple first
+ * or round robin policies (which don't require knowing the whole bitset)
+ */
+static bool
+ra_compute_available_regs(struct ra_graph *g, unsigned int n, BITSET_WORD 
*regs)
+{
+   struct ra_class *c = g->regs->classes[g->nodes[n].class];
+
+   /* Populate with the set of regs that are in the node's class. */
+   memcpy(regs, c->regs, BITSET_WORDS(g->regs->count) * sizeof(BITSET_WORD));
+
+   /* Remove any regs that conflict with nodes that we're adjacent to and have
+* already colored.
+*/
+   for (int i = 0; i < g->nodes[n].adjacency_count; i++) {
+  unsigned int n2 = g->nodes[n].adjacency_list[i];
+  unsigned int r = g->nodes[n2].reg;
+
+  if (!g->nodes[n2].in_stack) {
+ for (int j = 0; j < BITSET_WORDS(g->regs->count); j++)
+regs[j] &= ~g->regs->regs[r].conflicts[j];
+  }
+   }
+
+   for (int i = 0; i < BITSET_WORDS(g->regs->count); i++) {
+  if (regs[i])
+ return true;
+   }
+
+   return false;
+}
+
 /**
  * Pops nodes from the stack back into the graph, coloring them with
  * registers as they go.
@@ -566,6 +615,10 @@ static bool
 ra_select(struct ra_graph *g)
 {
int start_search_reg = 0;
+   BITSET_WORD *select_regs = NULL;
+
+   if (g->select_reg_callback)
+  select_regs = malloc(BITSET_WORDS(g->regs->count) * sizeof(BITSET_WORD));
 
while (g->stack_count != 0) {
   unsigned int ri;
@@ -573,25 +626,34 @@ ra_select(struct ra_graph *g)
   int n = g->stack[g->stack_count - 1];
   struct ra_class *c = g->regs->classes[g->nodes[n].class];
 
-  /* Find the lowest-numbered reg which is not used by a member
-   * of the graph adjacent to us.
-   */
-  for (ri = 0; ri < g->regs->count; ri++) {
- r = (start_search_reg + ri) % g->regs->count;
- if (!reg_belongs_to_class(r, c))
-   continue;
-
- if (!ra_any_neighbors_conflict(g, n, r))
-break;
-  }
-
   /* set this to false even if we return here so that
* ra_get_best_spill_node() considers this node later.
*/
   g->nodes[n].in_stack = false;
 
-  if (ri == g->regs->count)
-return false;
+  if (g->select_reg_callback) {
+ if (!ra_compute_available_regs(g, n, select_regs)) {
+free(select_regs);
+return false;
+ }
+
+ r = g->select_reg_callback(g, select_regs, 
g->select_reg_callback_data);
+  } else {
+ /* Find the lowest-numbered reg which is not used by a member
+  * of the graph adjacent to us.
+  */
+ for (ri = 0; ri < g->regs->count; ri++) {
+r = (start_search_reg + ri) % g->regs->count;
+if (!reg_belongs_to_class(r, c))
+   continue;
+
+if (!ra_any_neighbors_conflict(g, n, r))
+   

[Mesa-dev] [PATCH 1/4] ra: Pull the body of a loop out to a helper function.

2017-05-18 Thread Eric Anholt
I was going to indent this code another level, and decided it would be
easier to read as a helper.
---
 src/util/register_allocate.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
index 8af93c0406c5..35ef9a714cc8 100644
--- a/src/util/register_allocate.c
+++ b/src/util/register_allocate.c
@@ -539,6 +539,23 @@ ra_simplify(struct ra_graph *g)
g->stack_optimistic_start = stack_optimistic_start;
 }
 
+static bool
+ra_any_neighbors_conflict(struct ra_graph *g, unsigned int n, unsigned int r)
+{
+   unsigned int i;
+
+   for (i = 0; i < g->nodes[n].adjacency_count; i++) {
+  unsigned int n2 = g->nodes[n].adjacency_list[i];
+
+  if (!g->nodes[n2].in_stack &&
+  BITSET_TEST(g->regs->regs[r].conflicts, g->nodes[n2].reg)) {
+ return true;
+  }
+   }
+
+   return false;
+}
+
 /**
  * Pops nodes from the stack back into the graph, coloring them with
  * registers as they go.
@@ -552,7 +569,6 @@ ra_select(struct ra_graph *g)
int start_search_reg = 0;
 
while (g->stack_count != 0) {
-  unsigned int i;
   unsigned int ri;
   unsigned int r = -1;
   int n = g->stack[g->stack_count - 1];
@@ -566,17 +582,8 @@ ra_select(struct ra_graph *g)
  if (!reg_belongs_to_class(r, c))
continue;
 
-/* Check if any of our neighbors conflict with this register choice. */
-for (i = 0; i < g->nodes[n].adjacency_count; i++) {
-   unsigned int n2 = g->nodes[n].adjacency_list[i];
-
-   if (!g->nodes[n2].in_stack &&
-   BITSET_TEST(g->regs->regs[r].conflicts, g->nodes[n2].reg)) {
-  break;
-   }
-}
-if (i == g->nodes[n].adjacency_count)
-   break;
+ if (!ra_any_neighbors_conflict(g, n, r))
+break;
   }
 
   /* set this to false even if we return here so that
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: Change the pointer hashing function

2017-05-18 Thread Eric Anholt
Thomas Helland  writes:

> Use our knowledge that pointers are at least 4 byte aligned to remove
> the useless digits. Then shift by 6, 10, and 14 bits and add this to
> the original pointer, effectively folding in the entropy of the higher
> bits of the pointer into a 4-bit section. Stopping at 14 means we can
> add the entropy from 18 bits, or at least a 600Kbyte section of memory.
> Assuming that ralloc allocates from a linearly allocated heap less than
> this we can make a very efficient pointer hashing function for our usecase.

The fnv1 on the pointer was basically a stand-in until someone did the
kind of analysis you've done here.  Thanks!

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/2] anv/formats: Update the three-channel BC1 mappings

2017-05-18 Thread Kenneth Graunke
Both patches are:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Add RGBX8888 and RGBA8888 to EGL configure and reorder the list

2017-05-18 Thread Rob Herring
On Thu, May 18, 2017 at 5:25 AM, Emil Velikov  wrote:
> On 18 May 2017 at 05:10, Chih-Wei Huang  wrote:
>> 2017-05-18 12:01 GMT+08:00 Xu, Randy :
>>>
 -Original Message-
 From: Palli, Tapani
 >
 > It's just applied. Isn't it?
 > Yesterday without this patch
 > the color format is mismatch apparently.

 Yeah we do need this. TBH I don't quite understand why but will try to 
 figure
 it out. I remember we used to have a patch in Surfaceflinger at one point
 because visual was hardcoded there and this might be related.

 // Tapani
>>>
>>> Sorry, that's for different issue, I mix it with RGB565 blending one.
>>> This patch is required because some Android GLES test apps, like gl2_basic, 
>>> need to create RGBA surface.
>>
>> Indeed it is.
>>
>> As Emil pointed out, the patch was merged before
>> but reverted later since it broke desktop.
>>
>> So what's the current upstreaming plan?
>>
> No idea about a plan, but how you can fix it once and for all:
>
> Extend the loader extension(s) to have a get_caps() callback,
> analogous to __DRIimageExtension::getCapabilities.
> Then the DRI module will query [the loader] and advertise the
> RGBX/RGBA visuals when possible.

Could we do something compile time instead to enable just for Android?
Seems like a lot of complexity when the platform needing these pixel
formats doesn't need the backwards compatibility. Or maybe there are
other things needing this interface?

Rob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main: Use _mesa_HashLock/UnlockMutex consistently

2017-05-18 Thread Timothy Arceri

On 19/05/17 05:01, Thomas Helland wrote:

This is shorter and easier on the eyes. At the same time this
also ensures that we are always asserting that the table pointer
is not NULL. Currently that was not done for all situations.


Can we also have another patch that moves _mesa_HashUnlockMutex and 
_mesa_HashLockMutex to the header and inlines them?


With that I'd give this a:

Reviewed-by: Timothy Arceri 


---
  src/mesa/main/hash.c | 23 ++-
  1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/hash.c b/src/mesa/main/hash.c
index 5b9132a311..e352a46aa4 100644
--- a/src/mesa/main/hash.c
+++ b/src/mesa/main/hash.c
@@ -205,10 +205,9 @@ void *
  _mesa_HashLookup(struct _mesa_HashTable *table, GLuint key)
  {
 void *res;
-   assert(table);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 res = _mesa_HashLookup_unlocked(table, key);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
 return res;
  }
  
@@ -315,10 +314,9 @@ _mesa_HashInsertLocked(struct _mesa_HashTable *table, GLuint key, void *data)

  void
  _mesa_HashInsert(struct _mesa_HashTable *table, GLuint key, void *data)
  {
-   assert(table);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 _mesa_HashInsert_unlocked(table, key, data);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
  }
  
  
@@ -364,9 +362,9 @@ _mesa_HashRemoveLocked(struct _mesa_HashTable *table, GLuint key)

  void
  _mesa_HashRemove(struct _mesa_HashTable *table, GLuint key)
  {
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 _mesa_HashRemove_unlocked(table, key);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
  }
  
  /**

@@ -385,9 +383,8 @@ _mesa_HashDeleteAll(struct _mesa_HashTable *table,
  {
 struct hash_entry *entry;
  
-   assert(table);

 assert(callback);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 table->InDeleteAll = GL_TRUE;
 hash_table_foreach(table->ht, entry) {
callback((uintptr_t)entry->key, entry->data, userData);
@@ -398,7 +395,7 @@ _mesa_HashDeleteAll(struct _mesa_HashTable *table,
table->deleted_key_data = NULL;
 }
 table->InDeleteAll = GL_FALSE;
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
  }
  
  
@@ -434,9 +431,9 @@ _mesa_HashWalk(const struct _mesa_HashTable *table,

 /* cast-away const */
 struct _mesa_HashTable *table2 = (struct _mesa_HashTable *) table;
  
-   mtx_lock(>Mutex);

+   _mesa_HashLockMutex(table2);
 hash_walk_unlocked(table, callback, userData);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table2);
  }
  
  void



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] anv/formats: Update the BC1_RGB mappings

2017-05-18 Thread Jason Ekstrand
rb

On Thu, May 11, 2017 at 4:46 PM, Nanley Chery  wrote:

> The DXT1_RGB* format does not provide the correct behavior for Vulkan in
> the case where color_0 <= color_1. BC1_RGB_UNORM with a alpha set to 1
> provides the behavior which matches the spec.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
> Cc: 
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_formats.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_
> formats.c
> index c39cec3800..25801e8b06 100644
> --- a/src/intel/vulkan/anv_formats.c
> +++ b/src/intel/vulkan/anv_formats.c
> @@ -179,8 +179,8 @@ static const struct anv_format anv_formats[] = {
> fmt(VK_FORMAT_D24_UNORM_S8_UINT,   ISL_FORMAT_R24_UNORM_X8_
> TYPELESS),
> fmt(VK_FORMAT_D32_SFLOAT_S8_UINT,  ISL_FORMAT_R32_FLOAT),
>
> -   fmt(VK_FORMAT_BC1_RGB_UNORM_BLOCK, ISL_FORMAT_DXT1_RGB),
> -   fmt(VK_FORMAT_BC1_RGB_SRGB_BLOCK,  ISL_FORMAT_DXT1_RGB_SRGB),
> +   swiz_fmt(VK_FORMAT_BC1_RGB_UNORM_BLOCK, ISL_FORMAT_BC1_UNORM,
> RGB1),
> +   swiz_fmt(VK_FORMAT_BC1_RGB_SRGB_BLOCK,
> ISL_FORMAT_BC1_UNORM_SRGB, RGB1),
> fmt(VK_FORMAT_BC1_RGBA_UNORM_BLOCK,ISL_FORMAT_BC1_UNORM),
> fmt(VK_FORMAT_BC1_RGBA_SRGB_BLOCK, ISL_FORMAT_BC1_UNORM_SRGB),
> fmt(VK_FORMAT_BC2_UNORM_BLOCK, ISL_FORMAT_BC2_UNORM),
> --
> 2.12.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi/gfx9: compile shaders with +xnack

2017-05-18 Thread Marek Olšák
On Thu, May 18, 2017 at 11:23 PM, Matt Arsenault  wrote:
>
>> On May 18, 2017, at 22:46, Marek Olšák  wrote:
>>
>> From: Marek Olšák 
>>
>> so that LLVM doesn't allocate SGPRs where XNACK is.
>>
>> Cc: 17.1 
>
> You shouldn’t be explicitly enabling xnack. This sounds like a workaround for 
> a backend bug, and this has other consequences than changing the reserved 
> registers. Do you have an example of where this is a problem?

What are the other consequences?

We don't want XNACK on Carrizo/Stoney, because XNACK is always
disabled on the graphics ring. It's only enabled on compute rings,
which we don't use. GFX9 enables XNACK on all rings at the moment.

XNACK AKA instruction replay is a per-VMID setting in the kernel
driver and it might change in the future. It's not a fixed property of
the GPU architecture.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] travis: Disable scons tests, since the gallium unit tests fail.

2017-05-18 Thread Eric Anholt
Every push a developer does to a travis-enabled github comes back as fail,
because the unit tests are already failing:

Testing util_format_dxt5_rgba_pack_rgba_float ...

FAILED: 40 bd 48 90 20 01 12 20 53 76 ab 32 00 10 15 50 obtained
f8 11 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a expected

Testing util_format_dxt5_rgba_unpack_rgba_float ...

FAILED: {0.549020, 1.00, 0.709804, 0.972549},
{0.192157, 0.33, 0.352941, 0.972549},
{0.549020, 1.00, 0.709804, 0.972549},
{0.549020, 1.00, 0.709804, 0.07},

Disable them from CI until someone cares to fix them.
---
 .travis.yml | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 5e060d0335cb..20a97c013913 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -208,8 +208,9 @@ matrix:
 - SCONSFLAGS="-j4"
 # Explicitly disable.
 - SCONS_TARGET="llvm=0"
-# Keep it symmetrical to the make build.
-- SCONS_CHECK_COMMAND="scons llvm=0 check"
+# Disable scons check until the gallium unit tests pass.
+#- SCONS_CHECK_COMMAND="scons llvm=0 check"
+- SCONS_CHECK_COMMAND="true"
   addons:
 apt:
   packages:
@@ -225,8 +226,9 @@ matrix:
 - BUILD=scons
 - SCONSFLAGS="-j4"
 - SCONS_TARGET="llvm=1"
-# Keep it symmetrical to the make build.
-- SCONS_CHECK_COMMAND="scons llvm=1 check"
+# Disable scons check until the gallium unit tests pass.
+# - SCONS_CHECK_COMMAND="scons llvm=1 check"
+- SCONS_CHECK_COMMAND="true"
 - LLVM_VERSION=3.3
 - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
   addons:
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: Change the pointer hashing function

2017-05-18 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Thu, May 18, 2017 at 9:39 PM, Thomas Helland
 wrote:
> Use our knowledge that pointers are at least 4 byte aligned to remove
> the useless digits. Then shift by 6, 10, and 14 bits and add this to
> the original pointer, effectively folding in the entropy of the higher
> bits of the pointer into a 4-bit section. Stopping at 14 means we can
> add the entropy from 18 bits, or at least a 600Kbyte section of memory.
> Assuming that ralloc allocates from a linearly allocated heap less than
> this we can make a very efficient pointer hashing function for our usecase.
>
> The 4 bit increment on the shifts is chosen rather arbitrarily; if we
> had chosen a 3 bit increment we would need to add another xor to
> cover a decently sized memorypool. Increasing it to 5 bits would
> spread our entropy more, possibly hurting us with more collisions on
> hash tables of size less than 32. With a hash table of size 16 there
> are a max of 11 entries, and we can assume that with such a small table
> collisions are not that painfull.
>
> This allows us to hash the whole 32 or 64 bit pointer at once,
> instead of running FNV1a, looping through each byte and doing
> increments, decrements, muls, and xors on every byte. This cuts
> _mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer
> show up with a 0.09% share. Collisions on insertion actually seems to be
> ever so slightly lower with this hash function, as found by printing
> a loop counter and sorting the data.
>
> perf stat shows a 1.5% reduction in instruction count,
> and a 5% reduction in stalled cycles, on a shader-db run.
> Shader-db execution time goes from 225 to 220 seconds.
>
> No instruction-count changes in shader-db, but there are some minor
> changes in cycle-count that is likely caused by nir walking a set
> in some of its passes, and this causing a different ordering.
> That might eventually lead to a difference in register allocation.
> However, the effect is a net positive;
>
> total cycles in shared programs: 24739550 -> 24738482 (-0.00%)
> cycles in affected programs: 374468 -> 373400 (-0.29%)
> helped: 178
> HURT: 49
> ---
>  src/util/hash_table.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/util/hash_table.h b/src/util/hash_table.h
> index b35ee871bb..c7f577665d 100644
> --- a/src/util/hash_table.h
> +++ b/src/util/hash_table.h
> @@ -105,7 +105,8 @@ static inline uint32_t _mesa_key_hash_string(const void 
> *key)
>
>  static inline uint32_t _mesa_hash_pointer(const void *pointer)
>  {
> -   return _mesa_hash_data(, sizeof(pointer));
> +   uintptr_t num = (uintptr_t) pointer;
> +   return (uint32_t) ((num >> 2) ^ (num >> 6) ^ (num >> 10) ^ (num >> 14));
>  }
>
>  enum {
> --
> 2.12.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Add an option to abort on device loss

2017-05-18 Thread Lionel Landwerlin

Cool, thanks!

Reviewed-by: Lionel Landwerlin 

On 18/05/17 22:10, Jason Ekstrand wrote:

I just sent a patch to address that.  Thanks!

On Thu, May 18, 2017 at 2:04 PM, Lionel Landwerlin 
> 
wrote:


This looks good, but I wonder whether we're missing a vk_errorf()
in anv_QueueSubmit() when we get an error from
anv_cmd_buffer_execbuf().
In that case it looks like we won't abort.


On 18/05/17 21:51, Jason Ekstrand wrote:

This is mostly for running in our CI system to prevent dEQP from
continuing on to the next test if we get a GPU hang. As it
currently
stands, dEQP uses the same VkDevice for almost all tests and
if one of
the tests hangs, we set the anv_device::device_lost flag and
report
VK_ERROR_DEVICE_LOST for all queue operations from that point
forward
without sending anything to the GPU.  dEQP will happily
continue trying
to run tests and reporting failures until it eventually gets
crash that
forces the test runner to start over.  This circumvents the
problem by
just aborting the process if we ever get a GPU hang. Since
this is not
the recommended behavior most of the time, we hide it behind an
environment variable.

Cc: Mark Janes >
---
  src/intel/vulkan/anv_util.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/intel/vulkan/anv_util.c
b/src/intel/vulkan/anv_util.c
index ba91733..4b916e2 100644
--- a/src/intel/vulkan/anv_util.c
+++ b/src/intel/vulkan/anv_util.c
@@ -30,6 +30,7 @@
#include "anv_private.h"
  #include "vk_enum_to_str.h"
+#include "util/debug.h"
/** Log an error message.  */
  void anv_printflike(1, 2)
@@ -95,5 +96,9 @@ __vk_errorf(VkResult error, const char
*file, int line, const char *format, ...)
fprintf(stderr, "%s:%d: %s\n", file, line, error_str);
 }
  +   if (error == VK_ERROR_DEVICE_LOST &&
+   env_var_as_boolean("ANV_ABORT_ON_DEVICE_LOSS", false))
+  abort();
+
 return error;
  }






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Wrap the device lost error in vk_error in QueueSubmit

2017-05-18 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 18/05/17 22:09, Jason Ekstrand wrote:

We weren't wrapping this before because anv_cmd_buffer_execbuf may throw
a more meaningful error message.  However, we do change the error code
into VK_ERROR_DEVICE_LOST, so we should print a new message.
---
  src/intel/vulkan/anv_queue.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_queue.c b/src/intel/vulkan/anv_queue.c
index 86eee28..be7fd31 100644
--- a/src/intel/vulkan/anv_queue.c
+++ b/src/intel/vulkan/anv_queue.c
@@ -213,7 +213,7 @@ out:
 * VK_ERROR_DEVICE_LOST to ensure that clients do not attempt to
 * submit the same job again to this device.
 */
-  result = VK_ERROR_DEVICE_LOST;
+  result = vk_errorf(VK_ERROR_DEVICE_LOST, "vkQueueSubmit() failed");
device->lost = true;
  
/* If we return VK_ERROR_DEVICE LOST here, we need to ensure that



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi/gfx9: compile shaders with +xnack

2017-05-18 Thread Matt Arsenault

> On May 18, 2017, at 22:46, Marek Olšák  wrote:
> 
> From: Marek Olšák 
> 
> so that LLVM doesn't allocate SGPRs where XNACK is.
> 
> Cc: 17.1 

You shouldn’t be explicitly enabling xnack. This sounds like a workaround for a 
backend bug, and this has other consequences than changing the reserved 
registers. Do you have an example of where this is a problem?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Add an option to abort on device loss

2017-05-18 Thread Jason Ekstrand
I just sent a patch to address that.  Thanks!

On Thu, May 18, 2017 at 2:04 PM, Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> This looks good, but I wonder whether we're missing a vk_errorf() in
> anv_QueueSubmit() when we get an error from anv_cmd_buffer_execbuf().
> In that case it looks like we won't abort.
>
>
> On 18/05/17 21:51, Jason Ekstrand wrote:
>
>> This is mostly for running in our CI system to prevent dEQP from
>> continuing on to the next test if we get a GPU hang.  As it currently
>> stands, dEQP uses the same VkDevice for almost all tests and if one of
>> the tests hangs, we set the anv_device::device_lost flag and report
>> VK_ERROR_DEVICE_LOST for all queue operations from that point forward
>> without sending anything to the GPU.  dEQP will happily continue trying
>> to run tests and reporting failures until it eventually gets crash that
>> forces the test runner to start over.  This circumvents the problem by
>> just aborting the process if we ever get a GPU hang.  Since this is not
>> the recommended behavior most of the time, we hide it behind an
>> environment variable.
>>
>> Cc: Mark Janes 
>> ---
>>   src/intel/vulkan/anv_util.c | 5 +
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/src/intel/vulkan/anv_util.c b/src/intel/vulkan/anv_util.c
>> index ba91733..4b916e2 100644
>> --- a/src/intel/vulkan/anv_util.c
>> +++ b/src/intel/vulkan/anv_util.c
>> @@ -30,6 +30,7 @@
>> #include "anv_private.h"
>>   #include "vk_enum_to_str.h"
>> +#include "util/debug.h"
>> /** Log an error message.  */
>>   void anv_printflike(1, 2)
>> @@ -95,5 +96,9 @@ __vk_errorf(VkResult error, const char *file, int line,
>> const char *format, ...)
>> fprintf(stderr, "%s:%d: %s\n", file, line, error_str);
>>  }
>>   +   if (error == VK_ERROR_DEVICE_LOST &&
>> +   env_var_as_boolean("ANV_ABORT_ON_DEVICE_LOSS", false))
>> +  abort();
>> +
>>  return error;
>>   }
>>
>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: Wrap the device lost error in vk_error in QueueSubmit

2017-05-18 Thread Jason Ekstrand
We weren't wrapping this before because anv_cmd_buffer_execbuf may throw
a more meaningful error message.  However, we do change the error code
into VK_ERROR_DEVICE_LOST, so we should print a new message.
---
 src/intel/vulkan/anv_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_queue.c b/src/intel/vulkan/anv_queue.c
index 86eee28..be7fd31 100644
--- a/src/intel/vulkan/anv_queue.c
+++ b/src/intel/vulkan/anv_queue.c
@@ -213,7 +213,7 @@ out:
* VK_ERROR_DEVICE_LOST to ensure that clients do not attempt to
* submit the same job again to this device.
*/
-  result = VK_ERROR_DEVICE_LOST;
+  result = vk_errorf(VK_ERROR_DEVICE_LOST, "vkQueueSubmit() failed");
   device->lost = true;
 
   /* If we return VK_ERROR_DEVICE LOST here, we need to ensure that
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Add an option to abort on device loss

2017-05-18 Thread Lionel Landwerlin
This looks good, but I wonder whether we're missing a vk_errorf() in 
anv_QueueSubmit() when we get an error from anv_cmd_buffer_execbuf().

In that case it looks like we won't abort.

On 18/05/17 21:51, Jason Ekstrand wrote:

This is mostly for running in our CI system to prevent dEQP from
continuing on to the next test if we get a GPU hang.  As it currently
stands, dEQP uses the same VkDevice for almost all tests and if one of
the tests hangs, we set the anv_device::device_lost flag and report
VK_ERROR_DEVICE_LOST for all queue operations from that point forward
without sending anything to the GPU.  dEQP will happily continue trying
to run tests and reporting failures until it eventually gets crash that
forces the test runner to start over.  This circumvents the problem by
just aborting the process if we ever get a GPU hang.  Since this is not
the recommended behavior most of the time, we hide it behind an
environment variable.

Cc: Mark Janes 
---
  src/intel/vulkan/anv_util.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/intel/vulkan/anv_util.c b/src/intel/vulkan/anv_util.c
index ba91733..4b916e2 100644
--- a/src/intel/vulkan/anv_util.c
+++ b/src/intel/vulkan/anv_util.c
@@ -30,6 +30,7 @@
  
  #include "anv_private.h"

  #include "vk_enum_to_str.h"
+#include "util/debug.h"
  
  /** Log an error message.  */

  void anv_printflike(1, 2)
@@ -95,5 +96,9 @@ __vk_errorf(VkResult error, const char *file, int line, const 
char *format, ...)
fprintf(stderr, "%s:%d: %s\n", file, line, error_str);
 }
  
+   if (error == VK_ERROR_DEVICE_LOST &&

+   env_var_as_boolean("ANV_ABORT_ON_DEVICE_LOSS", false))
+  abort();
+
 return error;
  }



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/16] anv: Require vertex buffers to come from a 32-bit heap

2017-05-18 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_device.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 8eed7f3..532d4b9 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -153,6 +153,18 @@ anv_physical_device_init_heaps(struct anv_physical_device 
*device, int fd)
for (uint32_t heap = 0; heap < device->memory.heap_count; heap++) {
   uint32_t valid_buffer_usage = ~0;
 
+  /* There appears to be a hardware issue in the VF cache where it only
+   * considers the bottom 32 bits of memory addresses.  If you happen to
+   * have two vertex buffers which get placed exactly 4 GiB apart and use
+   * them in back-to-back draw calls, you can get collisions.  In order to
+   * solve this problem, we require vertex and index buffers be bound to
+   * memory allocated out of the 32-bit heap.
+   */
+  if (device->memory.heaps[heap].supports_48bit_addresses) {
+ valid_buffer_usage &= ~(VK_BUFFER_USAGE_INDEX_BUFFER_BIT |
+ VK_BUFFER_USAGE_VERTEX_BUFFER_BIT);
+  }
+
   if (device->info.has_llc) {
  /* Big core GPUs share LLC with the CPU and thus one memory type can 
be
   * both cached and coherent at the same time.
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/16] anv: Make supports_48bit_addresses a heap property

2017-05-18 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_device.c  |  6 --
 src/intel/vulkan/anv_private.h | 11 ++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 9444ff8..bb02378 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -149,9 +149,10 @@ anv_physical_device_init_heaps(struct anv_physical_device 
*device, int fd)
}
 
device->memory.heap_count = 1;
-   device->memory.heaps[0] = (VkMemoryHeap) {
+   device->memory.heaps[0] = (struct anv_memory_heap) {
   .size = heap_size,
   .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
+  .supports_48bit_addresses = device->supports_48bit_addresses,
};
 
return VK_SUCCESS;
@@ -1530,7 +1531,8 @@ VkResult anv_AllocateMemory(
  goto fail;
}
 
-   if (pdevice->supports_48bit_addresses)
+   assert(mem->type->heapIndex < pdevice->memory.heap_count);
+   if (pdevice->memory.heaps[mem->type->heapIndex].supports_48bit_addresses)
   mem->bo->flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
 
if (pdevice->has_exec_async)
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index de52c5b..a095d4d 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -649,6 +649,15 @@ struct anv_memory_type {
VkBufferUsageFlags  valid_buffer_usage;
 };
 
+struct anv_memory_heap {
+   /* Standard bits passed on to the client */
+   VkDeviceSize  size;
+   VkMemoryHeapFlags flags;
+
+   /* Driver-internal book-keeping */
+   bool  supports_48bit_addresses;
+};
+
 struct anv_physical_device {
 VK_LOADER_DATA  _loader_data;
 
@@ -678,7 +687,7 @@ struct anv_physical_device {
   uint32_t  type_count;
   struct anv_memory_typetypes[VK_MAX_MEMORY_TYPES];
   uint32_t  heap_count;
-  VkMemoryHeap  heaps[VK_MAX_MEMORY_HEAPS];
+  struct anv_memory_heapheaps[VK_MAX_MEMORY_HEAPS];
 } memory;
 
 uint8_t 
pipeline_cache_uuid[VK_UUID_SIZE];
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC 16/16] intel/blorp: Add gen7-8 support to ccs_ambiguate

2017-05-18 Thread Jason Ekstrand
This patch is completely untested but the math and logic should be
right.  It's mostly intended as an example of how to extend the pass.
---
 src/intel/blorp/blorp_clear.c | 80 ++-
 1 file changed, 63 insertions(+), 17 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index 0df2dde..c995fdb 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -302,11 +302,12 @@ get_fast_clear_rect(const struct isl_device *dev,
*y1 = ALIGN(*y1, y_align) / y_scaledown;
 }
 
-void
-blorp_fast_clear(struct blorp_batch *batch,
- const struct blorp_surf *surf, enum isl_format format,
- uint32_t level, uint32_t start_layer, uint32_t num_layers,
- uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1)
+static void
+fast_clear(struct blorp_batch *batch,
+   const struct blorp_surf *surf, enum isl_format format,
+   uint32_t level, uint32_t start_layer, uint32_t num_layers,
+   uint8_t clear_value,
+   uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1)
 {
struct blorp_params params;
blorp_params_init();
@@ -317,7 +318,7 @@ blorp_fast_clear(struct blorp_batch *batch,
params.x1 = x1;
params.y1 = y1;
 
-   memset(_inputs.clear_color, 0xff, 4*sizeof(float));
+   memset(_inputs.clear_color, clear_value, 4*sizeof(float));
params.fast_clear_op = BLORP_FAST_CLEAR_OP_CLEAR;
 
get_fast_clear_rect(batch->blorp->isl_dev, surf->aux_surf,
@@ -333,6 +334,16 @@ blorp_fast_clear(struct blorp_batch *batch,
batch->blorp->exec(batch, );
 }
 
+void
+blorp_fast_clear(struct blorp_batch *batch,
+ const struct blorp_surf *surf, enum isl_format format,
+ uint32_t level, uint32_t start_layer, uint32_t num_layers,
+ uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1)
+{
+   fast_clear(batch, surf, format, level, start_layer, num_layers,
+  0xff, x0, y0, x1, y1);
+}
+
 static union isl_color_value
 swizzle_color_value(union isl_color_value src, struct isl_swizzle swizzle)
 {
@@ -743,15 +754,31 @@ blorp_ccs_ambiguate(struct blorp_batch *batch,
 struct blorp_surf *surf, uint32_t level,
 uint32_t layer, uint32_t z)
 {
-   struct blorp_params params;
-   blorp_params_init();
-
-   assert(ISL_DEV_GEN(batch->blorp->isl_dev) >= 9);
+   assert(ISL_DEV_GEN(batch->blorp->isl_dev) >= 7);
 
const struct isl_format_layout *aux_fmtl =
   isl_format_get_layout(surf->aux_surf->format);
+
assert(aux_fmtl->txc == ISL_TXC_CCS);
 
+   const uint32_t width_px = minify(surf->surf->logical_level0_px.width, 
level);
+   const uint32_t height_px = minify(surf->surf->logical_level0_px.height, 
level);
+   if (ISL_DEV_GEN(batch->blorp->isl_dev) == 7) {
+  /* On gen7, we can do an ambiguate by simply doing a fast clear with a
+   * clear value of 0x0.
+   */
+  fast_clear(batch, surf, ISL_FORMAT_UNSUPPORTED, level, layer + z, 1,
+ 0x0, 0, 0, width_px, height_px);
+  return;
+   }
+
+   /* On Broadwell and above, the hardware ignores the value written out by
+* the shader so we have to do things manually.
+*/
+
+   struct blorp_params params;
+   blorp_params_init();
+
params.dst = (struct brw_blorp_surface_info) {
   .enabled = true,
   .addr = surf->aux_addr,
@@ -776,8 +803,6 @@ blorp_ccs_ambiguate(struct blorp_batch *batch,
   _B, _offset_el, _offset_el);
params.dst.addr.offset += offset_B;
 
-   const uint32_t width_px = minify(surf->surf->logical_level0_px.width, 
level);
-   const uint32_t height_px = minify(surf->surf->logical_level0_px.height, 
level);
const uint32_t width_el = DIV_ROUND_UP(width_px, aux_fmtl->bw);
const uint32_t height_el = DIV_ROUND_UP(height_px, aux_fmtl->bh);
 
@@ -796,12 +821,33 @@ blorp_ccs_ambiguate(struct blorp_batch *batch,
 * maps to a Y-tiled cache line.  Fortunately, CCS layouts are calculated
 * with a very large alignment so we can round up without worrying about
 * overdraw.
+*
+* On Broadwell, it's a 16x32 block for a Y-tiled main surface and a 8x64
+* block for an X-tiled main surface.
+*
+* We're going to be rendering as RGBA32 so each Y-tiled cache line is 1x4
+* elements in the resulting color surface.
 */
-   assert(x_offset_el % 16 == 0 && y_offset_el % 4 == 0);
-   const uint32_t x_offset_rgba_px = x_offset_el / 16;
-   const uint32_t y_offset_rgba_px = y_offset_el / 4;
-   const uint32_t width_rgba_px = DIV_ROUND_UP(width_el, 16);
-   const uint32_t height_rgba_px = DIV_ROUND_UP(height_el, 4);
+   uint32_t scale_x, scale_y;
+   if (ISL_DEV_GEN(batch->blorp->isl_dev) >= 9) {
+  scale_x = 16;
+  scale_y = 4;
+   } else {
+  assert(ISL_DEV_GEN(batch->blorp->isl_dev) == 8);
+  if (surf->surf->tiling == ISL_TILING_X) {
+ scale_x = 8;
+ scale_y = 

[Mesa-dev] [PATCH 13/16] anv: Refactor memory type setup

2017-05-18 Thread Jason Ekstrand
This makes us walk over the heaps one at a time and add the types for
LLC and !LLC to each heap.
---
 src/intel/vulkan/anv_device.c | 76 +++
 1 file changed, 40 insertions(+), 36 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index bb02378..6ea8dfe 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -112,42 +112,6 @@ anv_physical_device_init_heaps(struct anv_physical_device 
*device, int fd)
if (result != VK_SUCCESS)
   return result;
 
-   if (device->info.has_llc) {
-  /* Big core GPUs share LLC with the CPU and thus one memory type can be
-   * both cached and coherent at the same time.
-   */
-  device->memory.type_count = 1;
-  device->memory.types[0] = (struct anv_memory_type) {
- .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
-  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
-  VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
- .heapIndex = 0,
- .valid_buffer_usage = ~0,
-  };
-   } else {
-  /* The spec requires that we expose a host-visible, coherent memory
-   * type, but Atom GPUs don't share LLC. Thus we offer two memory types
-   * to give the application a choice between cached, but not coherent and
-   * coherent but uncached (WC though).
-   */
-  device->memory.type_count = 2;
-  device->memory.types[0] = (struct anv_memory_type) {
- .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
-  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
- .heapIndex = 0,
- .valid_buffer_usage = ~0,
-  };
-  device->memory.types[1] = (struct anv_memory_type) {
- .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
-  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-  VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
- .heapIndex = 0,
- .valid_buffer_usage = ~0,
-  };
-   }
-
device->memory.heap_count = 1;
device->memory.heaps[0] = (struct anv_memory_heap) {
   .size = heap_size,
@@ -155,6 +119,46 @@ anv_physical_device_init_heaps(struct anv_physical_device 
*device, int fd)
   .supports_48bit_addresses = device->supports_48bit_addresses,
};
 
+   uint32_t type_count = 0;
+   for (uint32_t heap = 0; heap < device->memory.heap_count; heap++) {
+  uint32_t valid_buffer_usage = ~0;
+
+  if (device->info.has_llc) {
+ /* Big core GPUs share LLC with the CPU and thus one memory type can 
be
+  * both cached and coherent at the same time.
+  */
+ device->memory.types[type_count++] = (struct anv_memory_type) {
+.propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+ VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+ VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
+ VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
+.heapIndex = heap,
+.valid_buffer_usage = valid_buffer_usage,
+ };
+  } else {
+ /* The spec requires that we expose a host-visible, coherent memory
+  * type, but Atom GPUs don't share LLC. Thus we offer two memory types
+  * to give the application a choice between cached, but not coherent 
and
+  * coherent but uncached (WC though).
+  */
+ device->memory.types[type_count++] = (struct anv_memory_type) {
+.propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+ VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+ VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
+.heapIndex = heap,
+.valid_buffer_usage = valid_buffer_usage,
+ };
+ device->memory.types[type_count++] = (struct anv_memory_type) {
+.propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+ VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+ VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
+.heapIndex = heap,
+.valid_buffer_usage = valid_buffer_usage,
+ };
+  }
+   }
+   device->memory.type_count = type_count;
+
return VK_SUCCESS;
 }
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/16] anv: Set image memory types based on the type count

2017-05-18 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_device.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 8bf52cc..b0ccbbb 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1758,6 +1758,7 @@ void anv_GetImageMemoryRequirements(
 {
ANV_FROM_HANDLE(anv_image, image, _image);
ANV_FROM_HANDLE(anv_device, device, _device);
+   struct anv_physical_device *pdevice = >instance->physicalDevice;
 
/* The Vulkan spec (git aaed022) says:
 *
@@ -1766,12 +1767,13 @@ void anv_GetImageMemoryRequirements(
 *only if the memory type `i` in the VkPhysicalDeviceMemoryProperties
 *structure for the physical device is supported.
 *
-* We support exactly one memory type on LLC, two on non-LLC.
+* All types are currently supported for images.
 */
-   pMemoryRequirements->memoryTypeBits = device->info.has_llc ? 1 : 3;
+   uint32_t memory_types = (1ull << pdevice->memory.type_count) - 1;
 
pMemoryRequirements->size = image->size;
pMemoryRequirements->alignment = image->alignment;
+   pMemoryRequirements->memoryTypeBits = memory_types;
 }
 
 void anv_GetImageSparseMemoryRequirements(
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/16] anv: Add valid_bufer_usage to the memory type metadata

2017-05-18 Thread Jason Ekstrand
Instead of returning valid types as just a number, we now walk the list
and check the buffer's usage against the usage flags we store in the new
anv_memory_type structure.  Currently, valid_buffer_usage == ~0.
---
 src/intel/vulkan/anv_device.c  | 21 +++--
 src/intel/vulkan/anv_private.h | 13 +++--
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 4a0115e..8bf52cc 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -117,12 +117,13 @@ anv_physical_device_init_heaps(struct anv_physical_device 
*device, int fd)
* both cached and coherent at the same time.
*/
   device->memory.type_count = 1;
-  device->memory.types[0] = (VkMemoryType) {
+  device->memory.types[0] = (struct anv_memory_type) {
  .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
   VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
  .heapIndex = 0,
+ .valid_buffer_usage = ~0,
   };
} else {
   /* The spec requires that we expose a host-visible, coherent memory
@@ -131,17 +132,19 @@ anv_physical_device_init_heaps(struct anv_physical_device 
*device, int fd)
* coherent but uncached (WC though).
*/
   device->memory.type_count = 2;
-  device->memory.types[0] = (VkMemoryType) {
+  device->memory.types[0] = (struct anv_memory_type) {
  .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
  .heapIndex = 0,
+ .valid_buffer_usage = ~0,
   };
-  device->memory.types[1] = (VkMemoryType) {
+  device->memory.types[1] = (struct anv_memory_type) {
  .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
   VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
  .heapIndex = 0,
+ .valid_buffer_usage = ~0,
   };
}
 
@@ -1727,6 +1730,7 @@ void anv_GetBufferMemoryRequirements(
 {
ANV_FROM_HANDLE(anv_buffer, buffer, _buffer);
ANV_FROM_HANDLE(anv_device, device, _device);
+   struct anv_physical_device *pdevice = >instance->physicalDevice;
 
/* The Vulkan spec (git aaed022) says:
 *
@@ -1734,13 +1738,17 @@ void anv_GetBufferMemoryRequirements(
 *supported memory type for the resource. The bit `1<memoryTypeBits = device->info.has_llc ? 1 : 3;
+   uint32_t memory_types = 0;
+   for (uint32_t i = 0; i < pdevice->memory.type_count; i++) {
+  uint32_t valid_usage = pdevice->memory.types[i].valid_buffer_usage;
+  if ((valid_usage & buffer->usage) == buffer->usage)
+ memory_types |= (1u << i);
+   }
 
pMemoryRequirements->size = buffer->size;
pMemoryRequirements->alignment = 16;
+   pMemoryRequirements->memoryTypeBits = memory_types;
 }
 
 void anv_GetImageMemoryRequirements(
@@ -1793,6 +1801,7 @@ VkResult anv_BindBufferMemory(
ANV_FROM_HANDLE(anv_buffer, buffer, _buffer);
 
if (mem) {
+  assert((buffer->usage & mem->type->valid_buffer_usage) == buffer->usage);
   buffer->bo = mem->bo;
   buffer->offset = memoryOffset;
} else {
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 8ed8e0f..de52c5b 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -640,6 +640,15 @@ void anv_bo_cache_release(struct anv_device *device,
   struct anv_bo_cache *cache,
   struct anv_bo *bo);
 
+struct anv_memory_type {
+   /* Standard bits passed on to the client */
+   VkMemoryPropertyFlags   propertyFlags;
+   uint32_theapIndex;
+
+   /* Driver-internal book-keeping */
+   VkBufferUsageFlags  valid_buffer_usage;
+};
+
 struct anv_physical_device {
 VK_LOADER_DATA  _loader_data;
 
@@ -667,7 +676,7 @@ struct anv_physical_device {
 
 struct {
   uint32_t  type_count;
-  VkMemoryType  types[VK_MAX_MEMORY_TYPES];
+  struct anv_memory_typetypes[VK_MAX_MEMORY_TYPES];
   uint32_t  heap_count;
   VkMemoryHeap  heaps[VK_MAX_MEMORY_HEAPS];
 } memory;
@@ -1002,7 +1011,7 @@ _anv_combine_address(struct anv_batch *batch, void 
*location,
 
 struct anv_device_memory {
struct anv_bo *  bo;
-   

[Mesa-dev] [PATCH 11/16] anv: Stop setting BO flags in bo_init_new

2017-05-18 Thread Jason Ekstrand
The idea behind doing this was to make it easier to set various flags.
However, we have enough custom flag settings floating around the driver
that this is more of a nuisance than a help.
---
 src/intel/vulkan/anv_allocator.c | 17 ++---
 src/intel/vulkan/anv_device.c| 12 ++--
 src/intel/vulkan/anv_queue.c |  4 ++--
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index b767542..d637867 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -1025,6 +1025,12 @@ anv_bo_pool_alloc(struct anv_bo_pool *pool, struct 
anv_bo *bo, uint32_t size)
if (result != VK_SUCCESS)
   return result;
 
+   if (pool->device->instance->physicalDevice.supports_48bit_addresses)
+  new_bo.flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
+
+   if (pool->device->instance->physicalDevice.has_exec_async)
+  new_bo.flags |= EXEC_OBJECT_ASYNC;
+
assert(new_bo.size == pow2_size);
 
new_bo.map = anv_gem_mmap(pool->device, new_bo.gem_handle, 0, pow2_size, 0);
@@ -1154,7 +1160,10 @@ anv_scratch_pool_alloc(struct anv_device *device, struct 
anv_scratch_pool *pool,
 *
 * so nothing will ever touch the top page.
 */
-   bo->bo.flags &= ~EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
+   assert(!(bo->bo.flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS));
+
+   if (device->instance->physicalDevice.has_exec_async)
+  bo->bo.flags |= EXEC_OBJECT_ASYNC;
 
/* Set the exists last because it may be read by other threads */
__sync_synchronize();
@@ -1309,12 +1318,6 @@ anv_bo_cache_import(struct anv_device *device,
 
   anv_bo_init(>bo, gem_handle, size);
 
-  if (device->instance->physicalDevice.supports_48bit_addresses)
- bo->bo.flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
-
-  if (device->instance->physicalDevice.has_exec_async)
- bo->bo.flags |= EXEC_OBJECT_ASYNC;
-
   _mesa_hash_table_insert(cache->bo_map, (void *)(uintptr_t)gem_handle, 
bo);
}
 
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index b0ccbbb..9444ff8 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1453,12 +1453,6 @@ anv_bo_init_new(struct anv_bo *bo, struct anv_device 
*device, uint64_t size)
 
anv_bo_init(bo, gem_handle, size);
 
-   if (device->instance->physicalDevice.supports_48bit_addresses)
-  bo->flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
-
-   if (device->instance->physicalDevice.has_exec_async)
-  bo->flags |= EXEC_OBJECT_ASYNC;
-
return VK_SUCCESS;
 }
 
@@ -1536,6 +1530,12 @@ VkResult anv_AllocateMemory(
  goto fail;
}
 
+   if (pdevice->supports_48bit_addresses)
+  mem->bo->flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
+
+   if (pdevice->has_exec_async)
+  mem->bo->flags |= EXEC_OBJECT_ASYNC;
+
*pMem = anv_device_memory_to_handle(mem);
 
return VK_SUCCESS;
diff --git a/src/intel/vulkan/anv_queue.c b/src/intel/vulkan/anv_queue.c
index fac979a..86eee28 100644
--- a/src/intel/vulkan/anv_queue.c
+++ b/src/intel/vulkan/anv_queue.c
@@ -553,7 +553,7 @@ VkResult anv_CreateSemaphore(
   /* If we're going to use this as a fence, we need to *not* have the
* EXEC_OBJECT_ASYNC bit set.
*/
-  semaphore->permanent.bo->flags &= ~EXEC_OBJECT_ASYNC;
+  assert(!(semaphore->permanent.bo->flags & EXEC_OBJECT_ASYNC));
} else {
   assert(!"Unknown handle type");
   vk_free2(>alloc, pAllocator, semaphore);
@@ -644,7 +644,7 @@ VkResult anv_ImportSemaphoreFdKHX(
   /* If we're going to use this as a fence, we need to *not* have the
* EXEC_OBJECT_ASYNC bit set.
*/
-  bo->flags &= ~EXEC_OBJECT_ASYNC;
+  assert(!(bo->flags & EXEC_OBJECT_ASYNC));
 
   anv_semaphore_impl_cleanup(device, >permanent);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/16] anv: Advertise both 32-bit and 48-bit heaps when we have enough memory

2017-05-18 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_device.c | 42 --
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 6ea8dfe..8eed7f3 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -112,12 +112,42 @@ anv_physical_device_init_heaps(struct anv_physical_device 
*device, int fd)
if (result != VK_SUCCESS)
   return result;
 
-   device->memory.heap_count = 1;
-   device->memory.heaps[0] = (struct anv_memory_heap) {
-  .size = heap_size,
-  .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
-  .supports_48bit_addresses = device->supports_48bit_addresses,
-   };
+   if (heap_size <= 3ull * (1ull << 30)) {
+  /* In this case, everything fits nicely into the 32-bit address space,
+   * so there's no need for supporting 48bit addresses on clinet-allocated
+   * memory objects.
+   */
+  device->memory.heap_count = 1;
+  device->memory.heaps[0] = (struct anv_memory_heap) {
+ .size = heap_size,
+ .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
+ .supports_48bit_addresses = false,
+  };
+   } else {
+  /* Not everything will fit nicely into a 32-bit address space.  In this
+   * case we need a 64-bit heap.  Advertise a small 32-bit heap and a
+   * larger 48-bit heap.  If we're in this case, then we have a total heap
+   * size larger than 3GiB which most likely means they have 8 GiB of
+   * video memory and so carving off 2 GiB for the 32-bit heap should be
+   * reasonable.
+   */
+  const uint32_t heap_size_32bit = 2ull * (1ull << 30);
+  const uint32_t heap_size_48bit = heap_size - heap_size_32bit;
+
+  assert(device->supports_48bit_addresses);
+
+  device->memory.heap_count = 2;
+  device->memory.heaps[0] = (struct anv_memory_heap) {
+ .size = heap_size_48bit,
+ .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
+ .supports_48bit_addresses = true,
+  };
+  device->memory.heaps[1] = (struct anv_memory_heap) {
+ .size = heap_size_32bit,
+ .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
+ .supports_48bit_addresses = false,
+  };
+   }
 
uint32_t type_count = 0;
for (uint32_t heap = 0; heap < device->memory.heap_count; heap++) {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/16] anv: Handle transitioning depth from UNDEFINED to other layouts

2017-05-18 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_image.c   | 12 ++--
 src/intel/vulkan/genX_cmd_buffer.c |  9 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index d21e055..d65ef47 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -488,12 +488,20 @@ anv_layout_to_aux_usage(const struct gen_device_info * 
const devinfo,
/* According to the Vulkan Spec, the following layouts are valid only as
 * initial layouts in a layout transition and don't support device access.
 */
-   case VK_IMAGE_LAYOUT_UNDEFINED:
-   case VK_IMAGE_LAYOUT_PREINITIALIZED:
case VK_IMAGE_LAYOUT_RANGE_SIZE:
case VK_IMAGE_LAYOUT_MAX_ENUM:
   unreachable("Invalid image layout for device access.");
 
+   /* Undefined layouts
+*
+* The pre-initialized layout is equivalent to the undefined layout for
+* optimally-tiled images.  We can only do color compression (CCS or HiZ)
+* on tiled images.
+*/
+   case VK_IMAGE_LAYOUT_UNDEFINED:
+   case VK_IMAGE_LAYOUT_PREINITIALIZED:
+  return ISL_AUX_USAGE_NONE;
+
 
/* Transfer Layouts
 *
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 1f30a12..8d5f61b 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -355,15 +355,8 @@ transition_depth_buffer(struct anv_cmd_buffer *cmd_buffer,
 * The undefined layout indicates that the user doesn't care about the data
 * that's currently in the buffer. Therefore, a data-preserving resolve
 * operation is not needed.
-*
-* The pre-initialized layout is equivalent to the undefined layout for
-* optimally-tiled images. Anv only exposes support for optimally-tiled
-* depth buffers.
 */
-   if (image->aux_usage != ISL_AUX_USAGE_HIZ ||
-   initial_layout == final_layout ||
-   initial_layout == VK_IMAGE_LAYOUT_UNDEFINED ||
-   initial_layout == VK_IMAGE_LAYOUT_PREINITIALIZED)
+   if (image->aux_usage != ISL_AUX_USAGE_HIZ || initial_layout == final_layout)
   return;
 
const bool hiz_enabled = ISL_AUX_USAGE_HIZ ==
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/16] anv: Determine the type of mapping based on type metadata

2017-05-18 Thread Jason Ekstrand
Before, we were just comparing the type index to 0.  Now we actually
look the type up in the table and check its properties to determine what
kind of mapping we want to do.
---
 src/intel/vulkan/anv_device.c  | 12 ++--
 src/intel/vulkan/anv_private.h |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 523d40e..4a0115e 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1466,6 +1466,7 @@ VkResult anv_AllocateMemory(
 VkDeviceMemory* pMem)
 {
ANV_FROM_HANDLE(anv_device, device, _device);
+   struct anv_physical_device *pdevice = >instance->physicalDevice;
struct anv_device_memory *mem;
VkResult result = VK_SUCCESS;
 
@@ -1474,10 +1475,6 @@ VkResult anv_AllocateMemory(
/* The Vulkan 1.0.33 spec says "allocationSize must be greater than 0". */
assert(pAllocateInfo->allocationSize > 0);
 
-   /* We support exactly one memory heap. */
-   assert(pAllocateInfo->memoryTypeIndex == 0 ||
-  (!device->info.has_llc && pAllocateInfo->memoryTypeIndex < 2));
-
/* The kernel relocation API has a limitation of a 32-bit delta value
 * applied to the address before it is written which, in spite of it being
 * unsigned, is treated as signed .  Because of the way that this maps to
@@ -1505,7 +1502,8 @@ VkResult anv_AllocateMemory(
if (mem == NULL)
   return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
 
-   mem->type_index = pAllocateInfo->memoryTypeIndex;
+   assert(pAllocateInfo->memoryTypeIndex < pdevice->memory.type_count);
+   mem->type = >memory.types[pAllocateInfo->memoryTypeIndex];
mem->map = NULL;
mem->map_size = 0;
 
@@ -1630,7 +1628,9 @@ VkResult anv_MapMemory(
 * userspace. */
 
uint32_t gem_flags = 0;
-   if (!device->info.has_llc && mem->type_index == 0)
+
+   if (!device->info.has_llc &&
+   (mem->type->propertyFlags & VK_MEMORY_PROPERTY_HOST_COHERENT_BIT))
   gem_flags |= I915_MMAP_WC;
 
/* GEM will fail to map if the offset isn't 4k-aligned.  Round down. */
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 5287daf..8ed8e0f 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1002,7 +1002,7 @@ _anv_combine_address(struct anv_batch *batch, void 
*location,
 
 struct anv_device_memory {
struct anv_bo *  bo;
-   uint32_t type_index;
+   VkMemoryType *   type;
VkDeviceSize map_size;
void *   map;
 };
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/16] anv: Predicate 48bit support on gen >= 8

2017-05-18 Thread Jason Ekstrand
This doesn't matter right now since it only affects whether or not we
set the kernel bit but, if we ever do anything else based on it, we'll
want it to be correct per-gen.
---
 src/intel/vulkan/anv_device.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 50486b6..b80715f 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -225,7 +225,12 @@ anv_physical_device_init(struct anv_physical_device 
*device,
   goto fail;
}
 
-   device->supports_48bit_addresses = anv_gem_supports_48b_addresses(fd);
+   /* The kernel query only tells us whether or not the kernel supports the
+* EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag and not whether or not the
+* hardware has actual 48bit address support.
+*/
+   device->supports_48bit_addresses =
+  (device->info.gen >= 8) && anv_gem_supports_48b_addresses(fd);
 
result = anv_compute_heap_size(fd, >heap_size);
if (result != VK_SUCCESS)
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/16] anv: Set up memory types and heaps during physical device init

2017-05-18 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_device.c  | 113 +
 src/intel/vulkan/anv_private.h |   8 ++-
 2 files changed, 77 insertions(+), 44 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index b80715f..523d40e 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -98,6 +98,63 @@ anv_compute_heap_size(int fd, uint64_t *heap_size)
 }
 
 static VkResult
+anv_physical_device_init_heaps(struct anv_physical_device *device, int fd)
+{
+   /* The kernel query only tells us whether or not the kernel supports the
+* EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag and not whether or not the
+* hardware has actual 48bit address support.
+*/
+   device->supports_48bit_addresses =
+  (device->info.gen >= 8) && anv_gem_supports_48b_addresses(fd);
+
+   uint64_t heap_size;
+   VkResult result = anv_compute_heap_size(fd, _size);
+   if (result != VK_SUCCESS)
+  return result;
+
+   if (device->info.has_llc) {
+  /* Big core GPUs share LLC with the CPU and thus one memory type can be
+   * both cached and coherent at the same time.
+   */
+  device->memory.type_count = 1;
+  device->memory.types[0] = (VkMemoryType) {
+ .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
+  VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
+ .heapIndex = 0,
+  };
+   } else {
+  /* The spec requires that we expose a host-visible, coherent memory
+   * type, but Atom GPUs don't share LLC. Thus we offer two memory types
+   * to give the application a choice between cached, but not coherent and
+   * coherent but uncached (WC though).
+   */
+  device->memory.type_count = 2;
+  device->memory.types[0] = (VkMemoryType) {
+ .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
+ .heapIndex = 0,
+  };
+  device->memory.types[1] = (VkMemoryType) {
+ .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+  VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
+ .heapIndex = 0,
+  };
+   }
+
+   device->memory.heap_count = 1;
+   device->memory.heaps[0] = (VkMemoryHeap) {
+  .size = heap_size,
+  .flags = VK_MEMORY_HEAP_DEVICE_LOCAL_BIT,
+   };
+
+   return VK_SUCCESS;
+}
+
+static VkResult
 anv_physical_device_init_uuids(struct anv_physical_device *device)
 {
const struct build_id_note *note = build_id_find_nhdr("libvulkan_intel.so");
@@ -225,14 +282,7 @@ anv_physical_device_init(struct anv_physical_device 
*device,
   goto fail;
}
 
-   /* The kernel query only tells us whether or not the kernel supports the
-* EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag and not whether or not the
-* hardware has actual 48bit address support.
-*/
-   device->supports_48bit_addresses =
-  (device->info.gen >= 8) && anv_gem_supports_48b_addresses(fd);
-
-   result = anv_compute_heap_size(fd, >heap_size);
+   result = anv_physical_device_init_heaps(device, fd);
if (result != VK_SUCCESS)
   goto fail;
 
@@ -885,44 +935,21 @@ void anv_GetPhysicalDeviceMemoryProperties(
 {
ANV_FROM_HANDLE(anv_physical_device, physical_device, physicalDevice);
 
-   if (physical_device->info.has_llc) {
-  /* Big core GPUs share LLC with the CPU and thus one memory type can be
-   * both cached and coherent at the same time.
-   */
-  pMemoryProperties->memoryTypeCount = 1;
-  pMemoryProperties->memoryTypes[0] = (VkMemoryType) {
- .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
-  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
-  VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
- .heapIndex = 0,
-  };
-   } else {
-  /* The spec requires that we expose a host-visible, coherent memory
-   * type, but Atom GPUs don't share LLC. Thus we offer two memory types
-   * to give the application a choice between cached, but not coherent and
-   * coherent but uncached (WC though).
-   */
-  pMemoryProperties->memoryTypeCount = 2;
-  pMemoryProperties->memoryTypes[0] = (VkMemoryType) {
- .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
-  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-  VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
- .heapIndex = 0,
-  };
-  pMemoryProperties->memoryTypes[1] = (VkMemoryType) {
- .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
-  VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-  

[Mesa-dev] [PATCH 05/16] anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hack

2017-05-18 Thread Jason Ekstrand
Up until now, we've been memsetting the auxiliary surface to 0 at
BindImageMemory time to ensure that it is properly initialized.
However, this isn't correct because apps are allowed to freely alias
memory between different images and buffers so long as they properly
track whether or not a particular image is valid and, if it isn't,
transition from UNINITIALIZED to something else before using it.  We
now implement those transitions so we can drop the hack.
---
 src/intel/vulkan/anv_image.c | 28 
 1 file changed, 28 deletions(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index d65ef47..3510186 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -332,7 +332,6 @@ VkResult anv_BindImageMemory(
 VkDeviceMemory  _memory,
 VkDeviceSizememoryOffset)
 {
-   ANV_FROM_HANDLE(anv_device, device, _device);
ANV_FROM_HANDLE(anv_device_memory, mem, _memory);
ANV_FROM_HANDLE(anv_image, image, _image);
 
@@ -345,33 +344,6 @@ VkResult anv_BindImageMemory(
image->bo = mem->bo;
image->offset = memoryOffset;
 
-   if (image->aux_surface.isl.size > 0) {
-
-  /* The offset and size must be a multiple of 4K or else the
-   * anv_gem_mmap call below will fail.
-   */
-  assert((image->offset + image->aux_surface.offset) % 4096 == 0);
-  assert(image->aux_surface.isl.size % 4096 == 0);
-
-  /* Auxiliary surfaces need to have their memory cleared to 0 before they
-   * can be used.  For CCS surfaces, this puts them in the "resolved"
-   * state so they can be used with CCS enabled before we ever touch it
-   * from the GPU.  For HiZ, we need something valid or else we may get
-   * GPU hangs on some hardware and 0 works fine.
-   */
-  void *map = anv_gem_mmap(device, image->bo->gem_handle,
-   image->offset + image->aux_surface.offset,
-   image->aux_surface.isl.size,
-   device->info.has_llc ? 0 : I915_MMAP_WC);
-
-  if (map == MAP_FAILED)
- return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
-
-  memset(map, 0, image->aux_surface.isl.size);
-
-  anv_gem_munmap(map, image->aux_surface.isl.size);
-   }
-
return VK_SUCCESS;
 }
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/16] isl: Make get_intratile_offset_el take the element size in bits

2017-05-18 Thread Jason Ekstrand
---
 src/intel/isl/isl.c| 7 +++
 src/intel/isl/isl.h| 6 ++
 src/mesa/drivers/dri/i965/intel_blit.c | 2 +-
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
index f89f351..86eaf61 100644
--- a/src/intel/isl/isl.c
+++ b/src/intel/isl/isl.c
@@ -1967,7 +1967,7 @@ isl_surf_get_image_offset_el(const struct isl_surf *surf,
 void
 isl_tiling_get_intratile_offset_el(const struct isl_device *dev,
enum isl_tiling tiling,
-   uint8_t bs,
+   uint32_t bpb,
uint32_t row_pitch,
uint32_t total_x_offset_el,
uint32_t total_y_offset_el,
@@ -1976,15 +1976,14 @@ isl_tiling_get_intratile_offset_el(const struct 
isl_device *dev,
uint32_t *y_offset_el)
 {
if (tiling == ISL_TILING_LINEAR) {
+  assert(bpb % 8 == 0);
   *base_address_offset = total_y_offset_el * row_pitch +
- total_x_offset_el * bs;
+ total_x_offset_el * (bpb / 8);
   *x_offset_el = 0;
   *y_offset_el = 0;
   return;
}
 
-   const uint32_t bpb = bs * 8;
-
struct isl_tile_info tile_info;
isl_tiling_get_info(dev, tiling, bpb, _info);
 
diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
index 7778551..dae599c 100644
--- a/src/intel/isl/isl.h
+++ b/src/intel/isl/isl.h
@@ -1529,7 +1529,7 @@ isl_surf_get_image_offset_el(const struct isl_surf *surf,
 void
 isl_tiling_get_intratile_offset_el(const struct isl_device *dev,
enum isl_tiling tiling,
-   uint8_t bs,
+   uint32_t bpb,
uint32_t row_pitch,
uint32_t total_x_offset_el,
uint32_t total_y_offset_el,
@@ -1550,8 +1550,6 @@ isl_tiling_get_intratile_offset_sa(const struct 
isl_device *dev,
 {
const struct isl_format_layout *fmtl = isl_format_get_layout(format);
 
-   assert(fmtl->bpb % 8 == 0);
-
/* For computing the intratile offsets, we actually want a strange unit
 * which is samples for multisampled surfaces but elements for compressed
 * surfaces.
@@ -1561,7 +1559,7 @@ isl_tiling_get_intratile_offset_sa(const struct 
isl_device *dev,
const uint32_t total_x_offset = total_x_offset_sa / fmtl->bw;
const uint32_t total_y_offset = total_y_offset_sa / fmtl->bh;
 
-   isl_tiling_get_intratile_offset_el(dev, tiling, fmtl->bpb / 8, row_pitch,
+   isl_tiling_get_intratile_offset_el(dev, tiling, fmtl->bpb, row_pitch,
   total_x_offset, total_y_offset,
   base_address_offset,
   x_offset_sa, y_offset_sa);
diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
b/src/mesa/drivers/dri/i965/intel_blit.c
index 568ed54..1089e8c 100644
--- a/src/mesa/drivers/dri/i965/intel_blit.c
+++ b/src/mesa/drivers/dri/i965/intel_blit.c
@@ -173,7 +173,7 @@ get_blit_intratile_offset_el(const struct brw_context *brw,
 {
enum isl_tiling tiling = intel_miptree_get_isl_tiling(mt);
isl_tiling_get_intratile_offset_el(>isl_dev,
-  tiling, mt->cpp, mt->pitch,
+  tiling, mt->cpp * 8, mt->pitch,
   total_x_offset_el, total_y_offset_el,
   base_address_offset,
   x_offset_el, y_offset_el);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/16] intel/blorp: Add a CCS ambiguation pass

2017-05-18 Thread Jason Ekstrand
---
 src/intel/blorp/blorp.h   |   5 +++
 src/intel/blorp/blorp_clear.c | 101 ++
 2 files changed, 106 insertions(+)

diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
index eab75d7..9d3c9ad 100644
--- a/src/intel/blorp/blorp.h
+++ b/src/intel/blorp/blorp.h
@@ -191,6 +191,11 @@ blorp_ccs_resolve(struct blorp_batch *batch,
   enum isl_format format,
   enum blorp_fast_clear_op resolve_op);
 
+void
+blorp_ccs_ambiguate(struct blorp_batch *batch,
+struct blorp_surf *surf, uint32_t level,
+uint32_t layer, uint32_t z);
+
 /**
  * For an overview of the HiZ operations, see the following sections of the
  * Sandy Bridge PRM, Volume 1, Part2:
diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index a9eb6b9..0df2dde 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -731,3 +731,104 @@ blorp_ccs_resolve(struct blorp_batch *batch,
 
batch->blorp->exec(batch, );
 }
+
+/** Clear a CCS to the "uncompressed" state
+ *
+ * This pass is the CCS equivalent of a "HiZ resolve".  It sets the CCS values
+ * for a given layer/level of a surface to 0x0 which is the "uncompressed"
+ * state which tells the sampler to go look at the main surface.
+ */
+void
+blorp_ccs_ambiguate(struct blorp_batch *batch,
+struct blorp_surf *surf, uint32_t level,
+uint32_t layer, uint32_t z)
+{
+   struct blorp_params params;
+   blorp_params_init();
+
+   assert(ISL_DEV_GEN(batch->blorp->isl_dev) >= 9);
+
+   const struct isl_format_layout *aux_fmtl =
+  isl_format_get_layout(surf->aux_surf->format);
+   assert(aux_fmtl->txc == ISL_TXC_CCS);
+
+   params.dst = (struct brw_blorp_surface_info) {
+  .enabled = true,
+  .addr = surf->aux_addr,
+  .view = {
+ .usage = ISL_SURF_USAGE_RENDER_TARGET_BIT,
+ .format = ISL_FORMAT_R32G32B32A32_UINT,
+ .base_level = 0,
+ .base_array_layer = 0,
+ .levels = 1,
+ .array_len = 1,
+ .swizzle = ISL_SWIZZLE_IDENTITY,
+  },
+   };
+
+   uint32_t offset_B, x_offset_el, y_offset_el;
+   isl_surf_get_image_offset_el(surf->aux_surf, level, layer, z,
+_offset_el, _offset_el);
+   isl_tiling_get_intratile_offset_el(batch->blorp->isl_dev,
+  surf->aux_surf->tiling, aux_fmtl->bpb,
+  surf->aux_surf->row_pitch,
+  x_offset_el, y_offset_el,
+  _B, _offset_el, _offset_el);
+   params.dst.addr.offset += offset_B;
+
+   const uint32_t width_px = minify(surf->surf->logical_level0_px.width, 
level);
+   const uint32_t height_px = minify(surf->surf->logical_level0_px.height, 
level);
+   const uint32_t width_el = DIV_ROUND_UP(width_px, aux_fmtl->bw);
+   const uint32_t height_el = DIV_ROUND_UP(height_px, aux_fmtl->bh);
+
+   /* We're going to map it as a regular RGBA32_UINT surface.  We need to
+* downscale a good deal.  From the Sky Lake PRM Vol. 12 in the section on
+* planes:
+*
+*"The Color Control Surface (CCS) contains the compression status
+*of the cache-line pairs. The compression state of the cache-line
+*pair is specified by 2 bits in the CCS.  Each CCS cache-line
+*represents an area on the main surface of 16x16 sets of 128 byte
+*Y-tiled cache-line-pairs. CCS is always Y tiled."
+*
+* Each 2-bit surface element in the CCS corresponds to a single cache-line
+* pair in the main surface.  This means that 16x16 el block in the CCS
+* maps to a Y-tiled cache line.  Fortunately, CCS layouts are calculated
+* with a very large alignment so we can round up without worrying about
+* overdraw.
+*/
+   assert(x_offset_el % 16 == 0 && y_offset_el % 4 == 0);
+   const uint32_t x_offset_rgba_px = x_offset_el / 16;
+   const uint32_t y_offset_rgba_px = y_offset_el / 4;
+   const uint32_t width_rgba_px = DIV_ROUND_UP(width_el, 16);
+   const uint32_t height_rgba_px = DIV_ROUND_UP(height_el, 4);
+
+   MAYBE_UNUSED bool ok =
+  isl_surf_init(batch->blorp->isl_dev, ,
+.dim = ISL_SURF_DIM_2D,
+.format = ISL_FORMAT_R32G32B32A32_UINT,
+.width = width_rgba_px + x_offset_rgba_px,
+.height = height_rgba_px + y_offset_rgba_px,
+.depth = 1,
+.levels = 1,
+.array_len = 1,
+.samples = 1,
+.row_pitch = surf->aux_surf->row_pitch,
+.usage = ISL_SURF_USAGE_RENDER_TARGET_BIT,
+.tiling_flags = ISL_TILING_Y0_BIT);
+   assert(ok);
+
+   params.x0 = x_offset_rgba_px;
+   params.y0 = y_offset_rgba_px;
+   params.x1 = x_offset_rgba_px + width_rgba_px;
+   params.y1 = y_offset_rgba_px + 

[Mesa-dev] [PATCH 03/16] anv: Handle color layout transitions from the UNINITIALIZED layout

2017-05-18 Thread Jason Ekstrand
This causes dEQP-VK.api.copy_and_blit.resolve_image.partial.* to start
failing due to test bugs.  See CL 1031 for a test fix.
---
 src/intel/vulkan/anv_blorp.c   | 40 ++
 src/intel/vulkan/anv_private.h |  5 +
 src/intel/vulkan/genX_cmd_buffer.c | 25 
 3 files changed, 70 insertions(+)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index 7b6944a..51964c5 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1373,6 +1373,46 @@ void anv_CmdResolveImage(
blorp_batch_finish();
 }
 
+void
+anv_image_ccs_ambiguate(struct anv_cmd_buffer *cmd_buffer,
+const struct anv_image *image,
+const VkImageSubresourceRange *subresourceRange)
+{
+   assert(image->type == VK_IMAGE_TYPE_3D || image->extent.depth == 1);
+
+   struct blorp_batch batch;
+   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
+
+   struct blorp_surf surf;
+   get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_COLOR_BIT,
+image->aux_usage, );
+
+   /* We're about to bind a CCS surface and render to it.  Who knows what will
+* happen to caching at that point.  It's probably best if we don't let
+* this happen at the same time as other rendering.  Flush the render cache
+* and stall prior to this operation.
+*/
+   cmd_buffer->state.pending_pipe_bits |=
+  ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
+
+   const uint32_t level_count = anv_get_levelCount(image, subresourceRange);
+   const uint32_t layer_count = anv_get_layerCount(image, subresourceRange);
+   for (uint32_t l = 0; l < level_count; l++) {
+  const uint32_t level = subresourceRange->baseMipLevel + l;
+
+  for (uint32_t a = 0; a < layer_count; a++) {
+ const uint32_t layer = subresourceRange->baseArrayLayer + a;
+ const uint32_t depth = anv_minify(image->extent.depth, level);
+
+ for (uint32_t z = 0; z < depth; z++)
+blorp_ccs_ambiguate(, , level, layer, z);
+  }
+   }
+
+   cmd_buffer->state.pending_pipe_bits |=
+  ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
+}
+
 static void
 ccs_resolve_attachment(struct anv_cmd_buffer *cmd_buffer,
struct blorp_batch *batch,
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 9b0dd67..10ad247 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -2064,6 +2064,11 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer 
*cmd_buffer,
 const struct anv_image *image,
 enum blorp_hiz_op op);
 
+void
+anv_image_ccs_ambiguate(struct anv_cmd_buffer *cmd_buffer,
+const struct anv_image *image,
+const VkImageSubresourceRange *subresourceRange);
+
 enum isl_aux_usage
 anv_layout_to_aux_usage(const struct gen_device_info * const devinfo,
 const struct anv_image *image,
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index ef9b7d0..1f30a12 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -388,6 +388,24 @@ transition_depth_buffer(struct anv_cmd_buffer *cmd_buffer,
   anv_gen8_hiz_op_resolve(cmd_buffer, image, hiz_op);
 }
 
+static void
+transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,
+const struct anv_image *image,
+VkImageLayout initial_layout,
+VkImageLayout final_layout,
+const VkImageSubresourceRange *subresourceRange)
+{
+   if (image->aux_usage != ISL_AUX_USAGE_CCS_E)
+  return;
+
+   if (initial_layout != VK_IMAGE_LAYOUT_UNDEFINED &&
+   initial_layout != VK_IMAGE_LAYOUT_PREINITIALIZED)
+  return;
+
+#if GEN_GEN >= 9
+   anv_image_ccs_ambiguate(cmd_buffer, image, subresourceRange);
+#endif
+}
 
 /**
  * Setup anv_cmd_state::attachments for vkCmdBeginRenderPass.
@@ -976,6 +994,13 @@ void genX(CmdPipelineBarrier)(
  pImageMemoryBarriers[i].oldLayout,
  pImageMemoryBarriers[i].newLayout);
   }
+  if (pImageMemoryBarriers[i].subresourceRange.aspectMask &
+  VK_IMAGE_ASPECT_COLOR_BIT) {
+ transition_color_buffer(cmd_buffer, image,
+ pImageMemoryBarriers[i].oldLayout,
+ pImageMemoryBarriers[i].newLayout,
+ [i].subresourceRange);
+  }
}
 
cmd_buffer->state.pending_pipe_bits |=
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/16] anv: Fix our 48-bit problems

2017-05-18 Thread Jason Ekstrand
This patch series aims to fix the remaining 48-bit problems in the Vulkan
driver.  As such, the entire thing will be CC'd to stable before landing.

The first 5 patches fix the driver to handle memory aliasing correctly.
Vulkan allows you to bind multiple buffers or images to overlapping memory
regions so long as you get your layout transitions correct.  Up until now,
we've been doing a memset at vkBindImageMemory time to initialize auxiliary
surfaces which isn't valid in light of aliasing.  Instead, these patches
provide actual support for layout transitions from UNDEFINED to other
layouts.  This isn't actually a 48-bit issue but the other patches cause a
change in the behavior of some CTS tests which makes them start failing due
to memory aliasing problems.

The next 10 patches refactor memory type setup and make us advertise 2
heaps on platforms with a lot of memory.  For justification, see the
comment in patch 15.

The last patch just extends the new pass added in patch 2 for gen7-8.  It's
fairly straightforward but completely untested.  Hopefully it will help
Nanley or someone else if they ever need it.

Cc: "17.1" 
Cc: Nanley Chery 

Jason Ekstrand (16):
  isl: Make get_intratile_offset_el take the element size in bits
  intel/blorp: Add a CCS ambiguation pass
  anv: Handle color layout transitions from the UNINITIALIZED layout
  anv: Handle transitioning depth from UNDEFINED to other layouts
  anv/image: Get rid of the memset(aux, 0, sizeof(aux)) hack
  anv: Predicate 48bit support on gen >= 8
  anv: Set up memory types and heaps during physical device init
  anv: Determine the type of mapping based on type metadata
  anv: Add valid_bufer_usage to the memory type metadata
  anv: Set image memory types based on the type count
  anv: Stop setting BO flags in bo_init_new
  anv: Make supports_48bit_addresses a heap property
  anv: Refactor memory type setup
  anv: Advertise both 32-bit and 48-bit heaps when we have enough memory
  anv: Require vertex buffers to come from a 32-bit heap
  intel/blorp: Add gen7-8 support to ccs_ambiguate

 src/intel/blorp/blorp.h|   5 +
 src/intel/blorp/blorp_clear.c  | 159 +-
 src/intel/isl/isl.c|   7 +-
 src/intel/isl/isl.h|   6 +-
 src/intel/vulkan/anv_allocator.c   |  17 +--
 src/intel/vulkan/anv_blorp.c   |  40 +++
 src/intel/vulkan/anv_device.c  | 201 -
 src/intel/vulkan/anv_image.c   |  40 ++-
 src/intel/vulkan/anv_private.h |  33 +-
 src/intel/vulkan/anv_queue.c   |   4 +-
 src/intel/vulkan/genX_cmd_buffer.c |  33 --
 src/mesa/drivers/dri/i965/intel_blit.c |   2 +-
 12 files changed, 428 insertions(+), 119 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: Add an option to abort on device loss

2017-05-18 Thread Jason Ekstrand
This is mostly for running in our CI system to prevent dEQP from
continuing on to the next test if we get a GPU hang.  As it currently
stands, dEQP uses the same VkDevice for almost all tests and if one of
the tests hangs, we set the anv_device::device_lost flag and report
VK_ERROR_DEVICE_LOST for all queue operations from that point forward
without sending anything to the GPU.  dEQP will happily continue trying
to run tests and reporting failures until it eventually gets crash that
forces the test runner to start over.  This circumvents the problem by
just aborting the process if we ever get a GPU hang.  Since this is not
the recommended behavior most of the time, we hide it behind an
environment variable.

Cc: Mark Janes 
---
 src/intel/vulkan/anv_util.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/intel/vulkan/anv_util.c b/src/intel/vulkan/anv_util.c
index ba91733..4b916e2 100644
--- a/src/intel/vulkan/anv_util.c
+++ b/src/intel/vulkan/anv_util.c
@@ -30,6 +30,7 @@
 
 #include "anv_private.h"
 #include "vk_enum_to_str.h"
+#include "util/debug.h"
 
 /** Log an error message.  */
 void anv_printflike(1, 2)
@@ -95,5 +96,9 @@ __vk_errorf(VkResult error, const char *file, int line, const 
char *format, ...)
   fprintf(stderr, "%s:%d: %s\n", file, line, error_str);
}
 
+   if (error == VK_ERROR_DEVICE_LOST &&
+   env_var_as_boolean("ANV_ABORT_ON_DEVICE_LOSS", false))
+  abort();
+
return error;
 }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi/gfx9: compile shaders with +xnack

2017-05-18 Thread Marek Olšák
From: Marek Olšák 

so that LLVM doesn't allocate SGPRs where XNACK is.

Cc: 17.1 
---
 src/gallium/drivers/radeonsi/si_pipe.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index eaa3348..6f82e29 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -26,23 +26,20 @@
 #include "si_shader_internal.h"
 #include "sid.h"
 
 #include "radeon/radeon_uvd.h"
 #include "util/u_memory.h"
 #include "util/u_suballoc.h"
 #include "util/u_tests.h"
 #include "vl/vl_decoder.h"
 #include "../ddebug/dd_util.h"
 
-#define SI_LLVM_DEFAULT_FEATURES \
-   "+DumpCode,+vgpr-spilling,-fp32-denormals,-xnack"
-
 /*
  * pipe_context
  */
 static void si_destroy_context(struct pipe_context *context)
 {
struct si_context *sctx = (struct si_context *)context;
int i;
 
/* Unreference the framebuffer normally to disable related logic
 * properly.
@@ -120,26 +117,30 @@ static void si_emit_string_marker(struct pipe_context 
*ctx,
 {
struct si_context *sctx = (struct si_context *)ctx;
 
dd_parse_apitrace_marker(string, len, >apitrace_call_number);
 }
 
 static LLVMTargetMachineRef
 si_create_llvm_target_machine(struct si_screen *sscreen)
 {
const char *triple = "amdgcn--";
+   char features[256];
+
+   snprintf(features, sizeof(features),
+"+DumpCode,+vgpr-spilling,-fp32-denormals,+fp64-denormals%s%s",
+sscreen->b.chip_class >= GFX9 ? ",+xnack" : ",-xnack",
+sscreen->b.debug_flags & DBG_SI_SCHED ? ",+si-scheduler" : "");
 
return LLVMCreateTargetMachine(si_llvm_get_amdgpu_target(triple), 
triple,
   
r600_get_llvm_processor_name(sscreen->b.family),
-  sscreen->b.debug_flags & DBG_SI_SCHED ?
-  SI_LLVM_DEFAULT_FEATURES 
",+si-scheduler" :
-  SI_LLVM_DEFAULT_FEATURES,
+  features,
   LLVMCodeGenLevelDefault,
   LLVMRelocDefault,
   LLVMCodeModelDefault);
 }
 
 static struct pipe_context *si_create_context(struct pipe_screen *screen,
   unsigned flags)
 {
struct si_context *sctx = CALLOC_STRUCT(si_context);
struct si_screen* sscreen = (struct si_screen *)screen;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/2] anv/formats: Update the three-channel BC1 mappings

2017-05-18 Thread Nanley Chery
+Ken

On Tue, May 16, 2017 at 10:17:11AM -0700, Nanley Chery wrote:
> The procedure for decompressing an opaque BC1 Vulkan format is dependant on 
> the
> comparison of two colors stored in the first 32 bits of the compressed block.
> Here's the specified OpenGL (and Vulkan) behavior for reference:
> 
>The RGB color for a texel at location (x,y) in the block is given by:
> 
>   RGB0,  if color0 > color1 and code(x,y) == 0
>   RGB1,  if color0 > color1 and code(x,y) == 1
>   (2*RGB0+RGB1)/3,   if color0 > color1 and code(x,y) == 2
>   (RGB0+2*RGB1)/3,   if color0 > color1 and code(x,y) == 3
> 
>   RGB0,  if color0 <= color1 and code(x,y) == 0
>   RGB1,  if color0 <= color1 and code(x,y) == 1
>   (RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2
>   BLACK, if color0 <= color1 and code(x,y) == 3
> 
> The sampling operation performed on an opaque DXT1 Intel format essentially
> hard-codes the comparison result of the two colors as color0 > color1. This
> means that the behavior is incompatible with OpenGL and Vulkan. This is stated
> in the SKL PRM, Vol 5: Memory Views:
> 
>Opaque Textures (DXT1_RGB)
>   Texture format DXT1_RGB is identical to DXT1, with the exception that 
> the
>   One-bit Alpha encoding is removed. Color 0 and Color 1 are not 
> compared, and
>   the resulting texel color is derived strictly from the Opaque Color 
> Encoding.
>   The alpha channel defaults to 1.0.
> 
>   Programming Note
>   Context: Opaque Textures (DXT1_RGB)
>   The behavior of this format is not compliant with the OGL spec.
> 
> The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in
> exactly the same way except the BLACK value must have a transparent alpha
> channel in the latter. Use the four-channel BC1 Intel formats with the alpha
> set to 1 to provide the behavior required by the spec.
> 
> v2 (Kenneth Graunke):
> - Provide a more detailed commit message.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
> Cc: 
> Reviewed-by: Kenneth Graunke  (v1)
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_formats.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_formats.c
> index c39cec3800..25801e8b06 100644
> --- a/src/intel/vulkan/anv_formats.c
> +++ b/src/intel/vulkan/anv_formats.c
> @@ -179,8 +179,8 @@ static const struct anv_format anv_formats[] = {
> fmt(VK_FORMAT_D24_UNORM_S8_UINT,   ISL_FORMAT_R24_UNORM_X8_TYPELESS),
> fmt(VK_FORMAT_D32_SFLOAT_S8_UINT,  ISL_FORMAT_R32_FLOAT),
>  
> -   fmt(VK_FORMAT_BC1_RGB_UNORM_BLOCK, ISL_FORMAT_DXT1_RGB),
> -   fmt(VK_FORMAT_BC1_RGB_SRGB_BLOCK,  ISL_FORMAT_DXT1_RGB_SRGB),
> +   swiz_fmt(VK_FORMAT_BC1_RGB_UNORM_BLOCK, ISL_FORMAT_BC1_UNORM, RGB1),
> +   swiz_fmt(VK_FORMAT_BC1_RGB_SRGB_BLOCK,  ISL_FORMAT_BC1_UNORM_SRGB, 
> RGB1),
> fmt(VK_FORMAT_BC1_RGBA_UNORM_BLOCK,ISL_FORMAT_BC1_UNORM),
> fmt(VK_FORMAT_BC1_RGBA_SRGB_BLOCK, ISL_FORMAT_BC1_UNORM_SRGB),
> fmt(VK_FORMAT_BC2_UNORM_BLOCK, ISL_FORMAT_BC2_UNORM),
> -- 
> 2.12.2
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main: Use _mesa_HashLock/UnlockMutex consistently

2017-05-18 Thread Samuel Pitoiset

Looks good to me.

Reviewed-by: Samuel Pitoiset 

On 05/18/2017 09:01 PM, Thomas Helland wrote:

This is shorter and easier on the eyes. At the same time this
also ensures that we are always asserting that the table pointer
is not NULL. Currently that was not done for all situations.
---
  src/mesa/main/hash.c | 23 ++-
  1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/hash.c b/src/mesa/main/hash.c
index 5b9132a311..e352a46aa4 100644
--- a/src/mesa/main/hash.c
+++ b/src/mesa/main/hash.c
@@ -205,10 +205,9 @@ void *
  _mesa_HashLookup(struct _mesa_HashTable *table, GLuint key)
  {
 void *res;
-   assert(table);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 res = _mesa_HashLookup_unlocked(table, key);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
 return res;
  }
  
@@ -315,10 +314,9 @@ _mesa_HashInsertLocked(struct _mesa_HashTable *table, GLuint key, void *data)

  void
  _mesa_HashInsert(struct _mesa_HashTable *table, GLuint key, void *data)
  {
-   assert(table);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 _mesa_HashInsert_unlocked(table, key, data);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
  }
  
  
@@ -364,9 +362,9 @@ _mesa_HashRemoveLocked(struct _mesa_HashTable *table, GLuint key)

  void
  _mesa_HashRemove(struct _mesa_HashTable *table, GLuint key)
  {
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 _mesa_HashRemove_unlocked(table, key);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
  }
  
  /**

@@ -385,9 +383,8 @@ _mesa_HashDeleteAll(struct _mesa_HashTable *table,
  {
 struct hash_entry *entry;
  
-   assert(table);

 assert(callback);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
 table->InDeleteAll = GL_TRUE;
 hash_table_foreach(table->ht, entry) {
callback((uintptr_t)entry->key, entry->data, userData);
@@ -398,7 +395,7 @@ _mesa_HashDeleteAll(struct _mesa_HashTable *table,
table->deleted_key_data = NULL;
 }
 table->InDeleteAll = GL_FALSE;
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
  }
  
  
@@ -434,9 +431,9 @@ _mesa_HashWalk(const struct _mesa_HashTable *table,

 /* cast-away const */
 struct _mesa_HashTable *table2 = (struct _mesa_HashTable *) table;
  
-   mtx_lock(>Mutex);

+   _mesa_HashLockMutex(table2);
 hash_walk_unlocked(table, callback, userData);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table2);
  }
  
  void



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] util: Change the pointer hashing function

2017-05-18 Thread Thomas Helland
Use our knowledge that pointers are at least 4 byte aligned to remove
the useless digits. Then shift by 6, 10, and 14 bits and add this to
the original pointer, effectively folding in the entropy of the higher
bits of the pointer into a 4-bit section. Stopping at 14 means we can
add the entropy from 18 bits, or at least a 600Kbyte section of memory.
Assuming that ralloc allocates from a linearly allocated heap less than
this we can make a very efficient pointer hashing function for our usecase.

The 4 bit increment on the shifts is chosen rather arbitrarily; if we
had chosen a 3 bit increment we would need to add another xor to
cover a decently sized memorypool. Increasing it to 5 bits would
spread our entropy more, possibly hurting us with more collisions on
hash tables of size less than 32. With a hash table of size 16 there
are a max of 11 entries, and we can assume that with such a small table
collisions are not that painfull.

This allows us to hash the whole 32 or 64 bit pointer at once,
instead of running FNV1a, looping through each byte and doing
increments, decrements, muls, and xors on every byte. This cuts
_mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer
show up with a 0.09% share. Collisions on insertion actually seems to be
ever so slightly lower with this hash function, as found by printing
a loop counter and sorting the data.

perf stat shows a 1.5% reduction in instruction count,
and a 5% reduction in stalled cycles, on a shader-db run.
Shader-db execution time goes from 225 to 220 seconds.

No instruction-count changes in shader-db, but there are some minor
changes in cycle-count that is likely caused by nir walking a set
in some of its passes, and this causing a different ordering.
That might eventually lead to a difference in register allocation.
However, the effect is a net positive;

total cycles in shared programs: 24739550 -> 24738482 (-0.00%)
cycles in affected programs: 374468 -> 373400 (-0.29%)
helped: 178
HURT: 49
---
 src/util/hash_table.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/util/hash_table.h b/src/util/hash_table.h
index b35ee871bb..c7f577665d 100644
--- a/src/util/hash_table.h
+++ b/src/util/hash_table.h
@@ -105,7 +105,8 @@ static inline uint32_t _mesa_key_hash_string(const void 
*key)
 
 static inline uint32_t _mesa_hash_pointer(const void *pointer)
 {
-   return _mesa_hash_data(, sizeof(pointer));
+   uintptr_t num = (uintptr_t) pointer;
+   return (uint32_t) ((num >> 2) ^ (num >> 6) ^ (num >> 10) ^ (num >> 14));
 }
 
 enum {
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] main: Use _mesa_HashLock/UnlockMutex consistently

2017-05-18 Thread Thomas Helland
This is shorter and easier on the eyes. At the same time this
also ensures that we are always asserting that the table pointer
is not NULL. Currently that was not done for all situations.
---
 src/mesa/main/hash.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/hash.c b/src/mesa/main/hash.c
index 5b9132a311..e352a46aa4 100644
--- a/src/mesa/main/hash.c
+++ b/src/mesa/main/hash.c
@@ -205,10 +205,9 @@ void *
 _mesa_HashLookup(struct _mesa_HashTable *table, GLuint key)
 {
void *res;
-   assert(table);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
res = _mesa_HashLookup_unlocked(table, key);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
return res;
 }
 
@@ -315,10 +314,9 @@ _mesa_HashInsertLocked(struct _mesa_HashTable *table, 
GLuint key, void *data)
 void
 _mesa_HashInsert(struct _mesa_HashTable *table, GLuint key, void *data)
 {
-   assert(table);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
_mesa_HashInsert_unlocked(table, key, data);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
 }
 
 
@@ -364,9 +362,9 @@ _mesa_HashRemoveLocked(struct _mesa_HashTable *table, 
GLuint key)
 void
 _mesa_HashRemove(struct _mesa_HashTable *table, GLuint key)
 {
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
_mesa_HashRemove_unlocked(table, key);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
 }
 
 /**
@@ -385,9 +383,8 @@ _mesa_HashDeleteAll(struct _mesa_HashTable *table,
 {
struct hash_entry *entry;
 
-   assert(table);
assert(callback);
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table);
table->InDeleteAll = GL_TRUE;
hash_table_foreach(table->ht, entry) {
   callback((uintptr_t)entry->key, entry->data, userData);
@@ -398,7 +395,7 @@ _mesa_HashDeleteAll(struct _mesa_HashTable *table,
   table->deleted_key_data = NULL;
}
table->InDeleteAll = GL_FALSE;
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table);
 }
 
 
@@ -434,9 +431,9 @@ _mesa_HashWalk(const struct _mesa_HashTable *table,
/* cast-away const */
struct _mesa_HashTable *table2 = (struct _mesa_HashTable *) table;
 
-   mtx_lock(>Mutex);
+   _mesa_HashLockMutex(table2);
hash_walk_unlocked(table, callback, userData);
-   mtx_unlock(>Mutex);
+   _mesa_HashUnlockMutex(table2);
 }
 
 void
-- 
2.12.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] GL_OES_required_internalformat

2017-05-18 Thread Eric Anholt
Eric Anholt  writes:

> This series came out of fixing dEQP failures on vc4's GLES2 context.
> Mesa was allowing RGB565 textures, which is only valid with
> GL_OES_required_internalformat.  Rather than disable RGB565, I decided
> the extension was easy enough to support.
>
> I've sent one piglit test for renderbuffer sizing, and dEQP has tests
> for whether enums get accepted for TexImage.
>
> There's a functional question in patch #2, see the comment there, and
> there's a question of whether the extension should be dummy_true in
> patch #5.
>
> branch: https://github.com/anholt/mesa/commits/required-internalformat

I would still love review on this series.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Bug in 17.1.0-rc4 source packaging for swr?

2017-05-18 Thread Kai Wasserbäch
Hey,
Emil Velikov wrote on 18.05.2017 18:17:
> On 18 May 2017 at 16:34, Rowley, Timothy O  wrote:
>> We could use a gen_builder.h generated from llvm-3.9 for all current
>> versions of llvm at this time (as the changes are now just llvm IR
>> additions), but I’m not sure how to enforce that the person creating the
>> tarball has a particular version of llvm installed.
>>
> Must admit that I don't know much about LLVM or SWR, so I hope you've
> thoroughly tested LLVM 3.9 generated files built with LLVM 4.0 or
> later.
> Using LLVM 3.9 should be doable, but I'm curious if we can avoid such
> 'acrobatics' in the long run.
> 
> Any ideas, if that's at all possible?

how about the SWR maintainers provide a "golden" fallback file, that is
pre-generated and known good for all supported LLVM (right now I would guess
that to be 3.9 and 4.0) versions? That file is always shipped in the tarball.

During build there's a rule, that checks if $generator is there. If not, the
fallback file is copied to the correct name, otherwise the regular generation
takes place.

Cheers,
Kai



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 1/2] anv/formats: Update the BC1_RGB mappings

2017-05-18 Thread Nanley Chery
On Thu, May 18, 2017 at 06:38:04PM +0100, Emil Velikov wrote:
> On 18 May 2017 at 18:32, Emil Velikov  wrote:
> > Hi Nanley,
> >
> > Has the patch need superseded by any chance or it's lacking review?
> >
> I'm blind - of course it is (alongside v3 2/2).
> 
> At the same time neither of the two has landed in master even though
> they've been reviewed by Ken.
> Just a humble ping that they've not fallen through the cracks.
> 
> -Emil

Hi Emil,

v1 of the series has been reviewed. I'll ask him if his RB's still apply
to v3.

-Nanley

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/4] radeonsi: load patch_id for TES-as-ES when exporting for PS

2017-05-18 Thread Emil Velikov
Hi Nicolai,

Afaict this patch depends the following two. And while the latter
seems fairly trivial, I'm less confident about the former.
Can you shed some light here - do we need this patch for 17.1, does it
have any requirements?

9fd9a7d0ba39ed2328b1d48cd8ae83f070202f51 radeonsi: remove VS epilog
code, compile VS with PrimID export on demand
678d568c7b2c1e7c92ce6b2800ff35be71e3ec0d radeonsi: don't load PrimID
in TES if it's not used

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 1/2] anv/formats: Update the BC1_RGB mappings

2017-05-18 Thread Emil Velikov
On 18 May 2017 at 18:32, Emil Velikov  wrote:
> Hi Nanley,
>
> Has the patch need superseded by any chance or it's lacking review?
>
I'm blind - of course it is (alongside v3 2/2).

At the same time neither of the two has landed in master even though
they've been reviewed by Ken.
Just a humble ping that they've not fallen through the cracks.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 1/2] anv/formats: Update the BC1_RGB mappings

2017-05-18 Thread Emil Velikov
Hi Nanley,

Has the patch need superseded by any chance or it's lacking review?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] gallivm: Make sure module has the correct data layout when pass manager runs

2017-05-18 Thread Emil Velikov
Gents,

seems like this patch hasn't landed in master yet.
Has it fallen through the cracks or?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 101071] compiling glsl fails with undefined reference to `pthread_create'

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=101071

--- Comment #5 from war...@o2.pl ---
Emil,
One thing:

in configure I see:

checking for the pthreads library -lpthreads... no
checking whether pthreads work without any flags... yes
checking for joinable pthread attribute... PTHREAD_CREATE_JOINABLE
checking if more special flags are required for pthreads... no
checking for PTHREAD_PRIO_INHERIT... no

why then build process asking for libpthread functions?

there is definitely something different in 17.1 as 17.0.6 builds OK in exactly
this environment.

If there is unmet, new requirement for 17.1 which is not needed for 17.0.6 -
then issue is that 17.1 configure process is not catching this missing
dependency.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] egl: Partially revert 23c86c74, fix eglMakeCurrent

2017-05-18 Thread Chad Versace
On Thu 18 May 2017, Tapani Pälli wrote:
> On 05/18/2017 12:16 AM, Tomasz Figa wrote:
> > Hi,
> > 
> > On Tue, May 16, 2017 at 11:35 PM, Tapani Pälli  
> > wrote:
> > > 
> > > On 05/16/2017 08:10 PM, Chad Versace wrote:
> > > > On Tue 16 May 2017, Tapani Pälli wrote:
> > > > > 
> > > > > 
> > > > > On 05/16/2017 02:04 AM, Chad Versace wrote:
> > > > > > Fixes regressions in Android CtsVerifier.apk on Intel Chrome OS 
> > > > > > devices
> > > > > > due to incorrect error handling in eglMakeCurrent. See below on how 
> > > > > > to
> > > > > > confirm the regression is fixed.
> > > > > > 
> > > > > > This partially reverts
> > > > > > 
> > > > > >commit 23c86c74cc450a23848b85cfe914376caede1cdf
> > > > > >Author:  Chad Versace 
> > > > > >Subject: egl: Emit error when EGLSurface is lost
> > > > > > 
> > > > > > The bad commit added the error handling below. #2 and #3 were right,
> > > > > > but #1 was wrong.
> > > > > > 
> > > > > >1. eglMakeCurrent emits EGL_BAD_CURRENT_SURFACE if the 
> > > > > > calling
> > > > > >   thread has unflushed commands and either previous surface 
> > > > > > is no
> > > > > >   longer valid.
> > > > > > 
> > > > > >2. eglMakeCurrent emits EGL_BAD_NATIVE_WINDOW if either new
> > > > > > surface
> > > > > >   is no longer valid.
> > > > > > 
> > > > > >3. eglSwapBuffers emits EGL_BAD_NATIVE_WINDOW if the swapped
> > > > > > surface
> > > > > >   is no longer valid.
> > > > > > 
> > > > > > Whe I wrote the bad commit, I misunderstood the EGL spec language
> > > > > > for #1. The correct behavior is, if I understand correctly now:
> > > > > > 
> > > > > >- Assume a bound EGLSurface is no longer valid.
> > > > > >- Assume the bound EGLContext has unflushed commands.
> > > > > >- The app calls eglMakeCurrent. The spec requires 
> > > > > > eglMakeCurrent
> > > > > > to
> > > > > >  implicitly flush. After flushing, eglMakeCurrent emits
> > > > > >  EGL_BAD_CURRENT_SURFACE and does *not* alter the thread's
> > > > > >  current bindings.
> > > > > >- If the app calls eglMakeCurrent again, and the app inserts 
> > > > > > no
> > > > > >  commands into the GL command stream between the two
> > > > > > eglMakeCurrent
> > > > > >  calls, then this second eglMakeCurrent succeeds without 
> > > > > > emitting
> > > > > > an
> > > > > >  error.
> > > > > > 
> > > > > > How to confirm this fixes the regression:
> > > > > > 
> > > > > >Download android-cts-verifier-7.1_r5-linux_x86-x86.zip from
> > > > > >source.android.com, unpack, and `adb install 
> > > > > > CtsVerifier.apk`.
> > > > > >Run test "Projection Cube". Click the Pass button (a
> > > > > >green checkmark). Then run test "Projection Widget". Confirm 
> > > > > > that
> > > > > >widgets are visible and that logcat does not complain about
> > > > > >eglMakeCurrent failure.
> > > > > > 
> > > > > >Then confirm there are no regressions in the cts-traded 
> > > > > > module
> > > > > > that
> > > > > >commit 263243b1 fixed:
> > > > > > 
> > > > > >cts-tf > run cts --skip-preconditions --skip-device-info 
> > > > > > \
> > > > > > -m CtsCameraTestCases \
> > > > > > -t android.hardware.camera2.cts.RobustnessTest
> > > > > > 
> > > > > >Tested with Chrome OS board "reef".
> > > > > 
> > > > > both tests passed on Android-IA with this patch ... but if I minimize
> > > > > "Projection Widget" test it starts to bang EGL_BAD_NATIVE_WINDOW 
> > > > > heavily.
> > > > > Is
> > > > > this expected behavior?
> > > > 
> > > > I'm unsure. I haven't yet tried that experiment. I'll give it a try when
> > > > I'm back at my desk.
> > > > 
> > > > Which EGL function is emitting EGL_BAD_NATIVE_WINDOW in logcat?
> > > > eglMakeCurrent or eglSwapBuffers, or something else?
> > > 
> > > doh sorry, I actually meant 'Projection Cube' test, not the widget test. 
> > > But
> > > yep, when tapping switcher button eglMakeCurrent starts to emit the error
> > > (likely the app just continues to submit frames even when it's
> > > 'minimized'?). When switching back to app it cannot draw anymore as it 
> > > keeps
> > > getting the error.
> > I don't remember exactly and I'm not an Android app expert either, but
> > I remember hearing something that the app needs to stop drawing when
> > the activity is paused (stopped?). Isn't that what's happening when an
> > app gets minimized?
> 
> Yep, this is very likely the case and the app here is just not obeying the
> rules. Considering that Android wraps EGL layer it seems weird that it just
> let's application to do this. Any case, since everything else works fine
> this is definitely not a blocker for me for this patch;
> 
> Acked-by: Tapani Pälli 

Tapani, thanks for 

Re: [Mesa-dev] [PATCH 16/24] i965/cnl: Start using gen10 specific functions

2017-05-18 Thread Anuj Phogat
On Fri, May 12, 2017 at 4:38 PM, Anuj Phogat  wrote:
> gen10 specific functions:
> isl_gen10*()
> gen10_blorp_exec()
> gen10_init_atoms()
>
> Signed-off-by: Anuj Phogat 
> ---
>  src/intel/isl/isl.c  | 12 +---
>  src/mesa/drivers/dri/i965/brw_blorp.c|  2 +-
>  src/mesa/drivers/dri/i965/brw_state_upload.c |  4 +++-
>  3 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index 5dc41fa..0ae72a4 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -1672,9 +1672,11 @@ isl_surf_fill_state_s(const struct isl_device *dev, 
> void *state,
>isl_gen8_surf_fill_state_s(dev, state, info);
>break;
> case 9:
> -   case 10:
>isl_gen9_surf_fill_state_s(dev, state, info);
>break;
> +   case 10:
> +  isl_gen10_surf_fill_state_s(dev, state, info);
> +  break;
> default:
>assert(!"Cannot fill surface state for this gen");
> }
> @@ -1704,9 +1706,11 @@ isl_buffer_fill_state_s(const struct isl_device *dev, 
> void *state,
>isl_gen8_buffer_fill_state_s(state, info);
>break;
> case 9:
> -   case 10:
>isl_gen9_buffer_fill_state_s(state, info);
>break;
> +   case 10:
> +  isl_gen10_buffer_fill_state_s(state, info);
> +  break;
> default:
>assert(!"Cannot fill surface state for this gen");
> }
> @@ -1772,9 +1776,11 @@ isl_emit_depth_stencil_hiz_s(const struct isl_device 
> *dev, void *batch,
>isl_gen8_emit_depth_stencil_hiz_s(dev, batch, info);
>break;
> case 9:
> -   case 10:
>isl_gen9_emit_depth_stencil_hiz_s(dev, batch, info);
>break;
> +   case 10:
> +  isl_gen10_emit_depth_stencil_hiz_s(dev, batch, info);
> +  break;
> default:
>assert(!"Cannot fill surface state for this gen");
> }
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index eae925f..bcc72df 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -103,7 +103,7 @@ brw_blorp_init(struct brw_context *brw)
>brw->blorp.mocs.tex = CNL_MOCS_WB;
>brw->blorp.mocs.rb = CNL_MOCS_PTE;
>brw->blorp.mocs.vb = CNL_MOCS_WB;
> -  brw->blorp.exec = gen9_blorp_exec;
> +  brw->blorp.exec = gen10_blorp_exec;
>break;
> default:
>unreachable("Invalid gen");
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index bcb7ff1..35962df 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -134,7 +134,9 @@ void brw_init_state( struct brw_context *brw )
>
> brw_init_caches(brw);
>
> -   if (brw->gen >= 9)
> +   if (brw->gen >= 10)
> +  gen10_init_atoms(brw);
> +   else if (brw->gen >= 9)
>gen9_init_atoms(brw);
> else if (brw->gen >= 8)
>gen8_init_atoms(brw);
> --
> 2.9.3
>
I've squashed this patch in below patch:
[PATCH V2 14/24] i965/cnl: Handle gen10 in switch cases across the driver
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/wayland: Ensure we get a back buffer

2017-05-18 Thread Daniel Stone
Hi Emil,

On 18 May 2017 at 17:22, Emil Velikov  wrote:
> On 16 May 2017 at 11:02, Daniel Stone  wrote:
>> Ideally get_back_bo() failing should store a failure flag inside the
>> surface and cause the next SwapBuffers to fail, but for the meantime,
>> restore the correct behaviour such that get_back_bo() no longer fails.
> JFYI, we already have the egl/main plumbing and egl/android support. See 
> commits
>
> 23c86c74cc450a23848b85cfe914376caede1cdf
> 0212db350407e1331ff23f04136684cf2b7396cf
> e5eace586848511f4ceaffaa2d45131c31c45ae0

Fair enough. My thought was just to fail SwapBuffers, because failure
to acquire one image doesn't necessarily mean we'll never be able to
acquire any more images in future. It's also EGL_BAD_ALLOC rather than
EGL_BAD_NATIVE_WINDOW ...

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Chris Wilson
On Thu, May 18, 2017 at 05:20:38PM +0100, Tvrtko Ursulin wrote:
> 
> On 18/05/2017 14:37, Chris Wilson wrote:
> >On Thu, May 18, 2017 at 02:06:35PM +0100, Tvrtko Ursulin wrote:
> >>
> >>But this problem in general can also be solved separately from
> >>class-instance addressing via engine feature masking.
> >
> >But imo all members of a class should have the same features. That would
> >be my definition of a class!
> 
> That sounds very totalitarian! :)) To me a class is a group of some
> entities which share some common characteristics - not necessarily
> completely uniform.

The problem otherwise is that we then have to define yet another
interface based on features. To me that sounds like too much
duplication, that we could avoid from the beginning. Curse the hw for
being asymmetical!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] st/mesa: improve shader cache debug info

2017-05-18 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Thu, May 18, 2017 at 7:22 AM, Timothy Arceri  wrote:
> This will explicitly state that we are following the fallback
> path when we find invalid/corrupt cache items. It will also
> output the fallback message when the fallback path is forced
> via an environment variable, the following patches will allow
> this.
> ---
>  src/mesa/state_tracker/st_shader_cache.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_shader_cache.c 
> b/src/mesa/state_tracker/st_shader_cache.c
> index 23416c0..175d69d 100644
> --- a/src/mesa/state_tracker/st_shader_cache.c
> +++ b/src/mesa/state_tracker/st_shader_cache.c
> @@ -361,32 +361,34 @@ st_load_tgsi_from_disk_cache(struct gl_context *ctx,
>
>   st_set_prog_affected_state_flags(glprog);
>   _mesa_associate_uniform_storage(ctx, prog, glprog->Parameters,
>   false);
>
>   free(buffer);
>} else {
>   /* Failed to find a matching cached shader so fallback to recompile.
>*/
>   if (ctx->_Shader->Flags & GLSL_CACHE_INFO) {
> -fprintf(stderr, "TGSI cache item not found falling back to "
> -"compile.\n");
> +fprintf(stderr, "TGSI cache item not found.\n");
>   }
>
>   goto fallback_recompile;
>}
> }
>
> return true;
>
>  fallback_recompile:
> free(buffer);
>
> +   if (ctx->_Shader->Flags & GLSL_CACHE_INFO)
> +  fprintf(stderr, "TGSI cache falling back to recompile.\n");
> +
> for (unsigned i = 0; i < prog->NumShaders; i++) {
>_mesa_glsl_compile_shader(ctx, prog->Shaders[i], false, false, true);
> }
>
> prog->data->cache_fallback = true;
> _mesa_glsl_link_shader(ctx, prog);
>
> return true;
>  }
> --
> 2.9.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/15] ac/surface/gfx6: explicitly support S8 surfaces

2017-05-18 Thread Marek Olšák
This patch:

Reviewed-by: Marek Olšák 

Marek

On Thu, May 18, 2017 at 11:53 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> This is needed by radv for dEQP-VK.renderpass.simple.stencil
> ---
>  src/amd/common/ac_surface.c | 75 
> ++---
>  1 file changed, 50 insertions(+), 25 deletions(-)
>
> diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
> index d77b490..51e15d0 100644
> --- a/src/amd/common/ac_surface.c
> +++ b/src/amd/common/ac_surface.c
> @@ -380,20 +380,45 @@ static unsigned cik_get_macro_tile_index(struct 
> radeon_surf *surf)
> tileb = MIN2(surf->u.legacy.tile_split, tileb);
>
> for (index = 0; tileb > 64; index++)
> tileb >>= 1;
>
> assert(index < 16);
> return index;
>  }
>
>  /**
> + * Copy surface-global settings like pipe/bank config from level 0 surface
> + * computation.
> + */
> +static void gfx6_surface_settings(const struct radeon_info* info,
> + ADDR_COMPUTE_SURFACE_INFO_OUTPUT* csio,
> + struct radeon_surf *surf)
> +{
> +   surf->surf_alignment = csio->baseAlign;
> +   surf->u.legacy.pipe_config = csio->pTileInfo->pipeConfig - 1;
> +   gfx6_set_micro_tile_mode(surf, info);
> +
> +   /* For 2D modes only. */
> +   if (csio->tileMode >= ADDR_TM_2D_TILED_THIN1) {
> +   surf->u.legacy.bankw = csio->pTileInfo->bankWidth;
> +   surf->u.legacy.bankh = csio->pTileInfo->bankHeight;
> +   surf->u.legacy.mtilea = csio->pTileInfo->macroAspectRatio;
> +   surf->u.legacy.tile_split = csio->pTileInfo->tileSplitBytes;
> +   surf->u.legacy.num_banks = csio->pTileInfo->banks;
> +   surf->u.legacy.macro_tile_index = csio->macroModeIndex;
> +   } else {
> +   surf->u.legacy.macro_tile_index = 0;
> +   }
> +}
> +
> +/**
>   * Fill in the tiling information in \p surf based on the given surface 
> config.
>   *
>   * The following fields of \p surf must be initialized by the caller:
>   * blk_w, blk_h, bpe, flags.
>   */
>  static int gfx6_compute_surface(ADDR_HANDLE addrlib,
> const struct radeon_info *info,
> const struct ac_surf_config *config,
> enum radeon_surf_mode mode,
> struct radeon_surf *surf)
> @@ -577,44 +602,36 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
> }
>
> surf->num_dcc_levels = 0;
> surf->surf_size = 0;
> surf->dcc_size = 0;
> surf->dcc_alignment = 1;
> surf->htile_size = 0;
> surf->htile_slice_size = 0;
> surf->htile_alignment = 1;
>
> +   const bool only_stencil = (surf->flags & RADEON_SURF_SBUFFER) &&
> + !(surf->flags & RADEON_SURF_ZBUFFER);
> +
> /* Calculate texture layout information. */
> -   for (level = 0; level < config->info.levels; level++) {
> -   r = gfx6_compute_level(addrlib, config, surf, false, level, 
> compressed,
> -  , ,
> -  , , , 
> );
> -   if (r)
> -   return r;
> +   if (!only_stencil) {
> +   for (level = 0; level < config->info.levels; level++) {
> +   r = gfx6_compute_level(addrlib, config, surf, false, 
> level, compressed,
> +  , 
> ,
> +  , , 
> , );
> +   if (r)
> +   return r;
>
> -   if (level == 0) {
> -   surf->surf_alignment = AddrSurfInfoOut.baseAlign;
> -   surf->u.legacy.pipe_config = 
> AddrSurfInfoOut.pTileInfo->pipeConfig - 1;
> -   gfx6_set_micro_tile_mode(surf, info);
> -
> -   /* For 2D modes only. */
> -   if (AddrSurfInfoOut.tileMode >= 
> ADDR_TM_2D_TILED_THIN1) {
> -   surf->u.legacy.bankw = 
> AddrSurfInfoOut.pTileInfo->bankWidth;
> -   surf->u.legacy.bankh = 
> AddrSurfInfoOut.pTileInfo->bankHeight;
> -   surf->u.legacy.mtilea = 
> AddrSurfInfoOut.pTileInfo->macroAspectRatio;
> -   surf->u.legacy.tile_split = 
> AddrSurfInfoOut.pTileInfo->tileSplitBytes;
> -   surf->u.legacy.num_banks = 
> AddrSurfInfoOut.pTileInfo->banks;
> -   surf->u.legacy.macro_tile_index = 
> AddrSurfInfoOut.macroModeIndex;
> -   } else {
> -   surf->u.legacy.macro_tile_index = 0;
> -   }
> +   if (level > 0)
> + 

Re: [Mesa-dev] [PATCH] egl/wayland: Ensure we get a back buffer

2017-05-18 Thread Emil Velikov
Hi Dan,

On 16 May 2017 at 11:02, Daniel Stone  wrote:

> Ideally get_back_bo() failing should store a failure flag inside the
> surface and cause the next SwapBuffers to fail, but for the meantime,
> restore the correct behaviour such that get_back_bo() no longer fails.
JFYI, we already have the egl/main plumbing and egl/android support. See commits

23c86c74cc450a23848b85cfe914376caede1cdf
0212db350407e1331ff23f04136684cf2b7396cf
e5eace586848511f4ceaffaa2d45131c31c45ae0

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Tvrtko Ursulin


On 18/05/2017 14:37, Chris Wilson wrote:

On Thu, May 18, 2017 at 02:06:35PM +0100, Tvrtko Ursulin wrote:


On 18/05/2017 13:24, Chris Wilson wrote:

Yes, I would argue to defer it until later. One problem I have at the
moment is that not all members of a class are equal, HEVC-capable
engines and the reset do not belong to the same class (i.e. my hope is
that we could just say  | [  | INSTANCE_MASK ] | LOAD_BALANCE)
So I can see the sense in having instance as a mask, or at least making
the current instance field large enough to store a mask in future. I
just feel uneasy as that field could grow quite large, and maybe it will
be better to set the constraint via a context param (all dependency on
frequency and tuning of the LOAD_BALANCE). Hmm, liking having the
instance-mask on the context atm.


I don't think per context mask would work unless you won't to
mandate multi-contexts where they wouldn't otherwise be needed.


Contexts are not thread-friendly. About the only way you can make them
safe (wrt execbuf) is through the use of userspace GTT allocation (i.e.
assigning an address on creation and making it permanent with softpin).

So in general you end up serialising around execbuf and copying the
state in/out of the ioctl. That gives a window of opportunity to use
context_setparam as an extension for irregular parameter updates.

It is not as nice as providing space in the execbuf ioctl, because of
the extra state being carried around in the context.


Yes not nice, I can't say that I like this very much.


But this problem in general can also be solved separately from
class-instance addressing via engine feature masking.


But imo all members of a class should have the same features. That would
be my definition of a class!


That sounds very totalitarian! :)) To me a class is a group of some 
entities which share some common characteristics - not necessarily 
completely uniform.


Regards,

Tvrtko
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Bug in 17.1.0-rc4 source packaging for swr?

2017-05-18 Thread Emil Velikov
On 18 May 2017 at 16:34, Rowley, Timothy O  wrote:

>
> We could use a gen_builder.h generated from llvm-3.9 for all current
> versions of llvm at this time (as the changes are now just llvm IR
> additions), but I’m not sure how to enforce that the person creating the
> tarball has a particular version of llvm installed.
>
Must admit that I don't know much about LLVM or SWR, so I hope you've
thoroughly tested LLVM 3.9 generated files built with LLVM 4.0 or
later.
Using LLVM 3.9 should be doable, but I'm curious if we can avoid such
'acrobatics' in the long run.

Any ideas, if that's at all possible?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Bug in 17.1.0-rc4 source packaging for swr?

2017-05-18 Thread Rowley, Timothy O

On May 17, 2017, at 12:08 PM, Emil Velikov 
> wrote:

On 10 May 2017 at 03:51, Chuck Atkins 
> wrote:
I just tried to build 17.0.4-rc4 from the tarball with swr enabled and got
errors about my python not having mako:

make[5]: Entering directory
'/tmp/atkins3/mesa/build/mesa-17.1.0-rc4_gcc-6.3.0_haswell/src/gallium/drivers/swr'
 GEN  rasterizer/jitter/gen_builder.hpp
Traceback (most recent call last):
 File
"../../../../../../mesa-17.1.0-rc4/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py",
line 24, in 
   from gen_common import MakoTemplateWriter, ArgumentParser
 File
"/tmp/atkins3/mesa/mesa-17.1.0-rc4/src/gallium/drivers/swr/rasterizer/codegen/gen_common.py",
line 27, in 
   from mako.template import Template
ImportError: No module named mako.template
Makefile:2424: recipe for target 'rasterizer/jitter/gen_builder.hpp' failed

As I understood it, mako should only be required when building out of git.
Unless this has changed, it seems there are some generated files missing
from the source tarball.

You're spot on here Chuck. Tarball should build without any tools such
as python/lex/etc.

At the moment the rasterizer/jitter/gen_builder.hpp file isn't shipped
in the tarball, hence the problem.
The file is omitted intentionally, as mentioned in the Makefile [1].
Would be great if we can fix that, so any suggestions would be
appreciated.

Tim, I believe we briefly had a chat about this a while back. Do you
have any ideas how to generate the file that works across all
supported LLVM versions?

We could use a gen_builder.h generated from llvm-3.9 for all current versions 
of llvm at this time (as the changes are now just llvm IR additions), but I’m 
not sure how to enforce that the person creating the tarball has a particular 
version of llvm installed.

-Tim


Thanks
Emil

[1] 
https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/swr/Makefile.am#n195

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC v4 2/2] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Chris Wilson
On Thu, May 18, 2017 at 03:58:26PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Building on top of the previous patch which exported the concept
> of engine classes and instances, we can also use this instead of
> the current awkward engine selection uAPI.
> 
> This is primarily interesting for the VCS engine selection which
> is a) currently done via disjoint set of flags, and b) the
> current I915_EXEC_BSD flags has different semantics depending on
> the underlying hardware which is bad.
> 
> Proposed idea here is to reserve 8-bits of flags, to pass in the
> engine instance, re-use the existing engine selection bits for
> the class selection, and a new flag named
> I915_EXEC_CLASS_INSTANCE to tell the kernel this new engine
> selection API is in use.
> 
> The new uAPI also removes access to the weak VCS engine
> balancing as currently existing in the driver.
> 
> Example usage to send a command to VCS0:
> 
>   eb.flags = i915_execbuffer2_engine(I915_ENGINE_CLASS_VIDEO_DECODE, 0);
> 
> Or to send a command to VCS1:
> 
>   eb.flags = i915_execbuffer2_engine(I915_ENGINE_CLASS_VIDEO_DECODE, 1);
> 
> v2:
>  * Fix unknown flags mask.
>  * Use I915_EXEC_RING_MASK for class. (Chris Wilson)
> 
> v3:
>  * Add a map for fast class-instance engine lookup. (Chris Wilson)
> 
> v4:
>  * Update commit to reflect v3.
>  * Export intel_engine_lookup for other users. (Chris Wilson)
>  * Split out some warns. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Ben Widawsky 
> Cc: Chris Wilson 
> Cc: Daniel Vetter 
> Cc: Joonas Lahtinen 
> Cc: Jon Bloomfield 
> Cc: Daniel Charles 
> Cc: "Rogozhkin, Dmitry V" 
> Cc: Oscar Mateo 
> Cc: "Gong, Zhipeng" 
> Cc: intel-vaapi-me...@lists.01.org
> Cc: mesa-dev@lists.freedesktop.org
> ---
>  drivers/gpu/drm/i915/i915_drv.h|  1 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 20 
>  drivers/gpu/drm/i915/i915_reg.h|  3 +++
>  drivers/gpu/drm/i915/intel_engine_cs.c | 20 
>  drivers/gpu/drm/i915/intel_ringbuffer.h|  3 +++
>  include/uapi/drm/i915_drm.h| 12 +++-
>  6 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1e08b82c4823..53b41963f672 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2115,6 +2115,7 @@ struct drm_i915_private {
>   struct pci_dev *bridge_dev;
>   struct i915_gem_context *kernel_context;
>   struct intel_engine_cs *engine[I915_NUM_ENGINES];
> + struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 
> 1][MAX_ENGINE_INSTANCE + 1];
>   struct i915_vma *semaphore;
>  
>   struct drm_dma_handle *status_page_dmah;
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index af1965774e7b..006c8046af5f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1492,6 +1492,23 @@ gen8_dispatch_bsd_engine(struct drm_i915_private 
> *dev_priv,
>   return file_priv->bsd_engine;
>  }
>  
> +static struct intel_engine_cs *
> +eb_select_engine_class_instance(struct drm_i915_private *i915,
> + struct drm_i915_gem_execbuffer2 *args)
> +{
> + u8 class = args->flags & I915_EXEC_RING_MASK;
> + extern u8 user_class_map[I915_ENGINE_CLASS_MAX];
> + u8 instance;
> +
> + if (class >= ARRAY_SIZE(user_class_map))
> + return NULL;
> +
> + instance = (args->flags >> I915_EXEC_INSTANCE_SHIFT) &&
> +I915_EXEC_INSTANCE_MASK;
> +
> + return intel_engine_lookup(i915, user_class_map[class], instance);

The class mapping is going to be the same for all users of
intel_engine_lookup() presumably? (At least I hope we plan on using the
same id everywhere!)

intel_engine_lookup_from_user(), just to reinforce where those ids are
coming from?

So other than the conversation of what class means, looks good to me.

To give some context for other users, one example is a context param for
setting per-engine settings (e.g. sseu, watchdog thresholds):

struct drm_i915_gem_context_param_sseu {
__u32 engine;
__u32 instance;

__u32 slice_mask;
__u32 subslice_mask;
__u32 min_eu_per_subslice;
__u32 max_eu_per_subslice;
};

static int
i915_gem_context_setparam__sseu(struct i915_gem_context *ctx,
const struct drm_i915_gem_context_param *args)
{
const struct drm_i915_gem_context_param_sseu *user =
u64_to_user_ptr(args->value);
const unsigned int size = args->size;

Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Tvrtko Ursulin


On 18/05/2017 15:10, Emil Velikov wrote:

Hi Tvrtko,

On 18 May 2017 at 14:06, Tvrtko Ursulin  wrote:


struct drm_i915_engine_info {
/** Engine instance number. */
__u32   instance;
__u32   rsvd;

/** Engine specific features and info. */
#define DRM_I915_ENGINE_HAS_HEVCBIT(0)
__u8 features;
__u8 rsvd;

__32 info;
};


To spare yourself from writing a compat ioctl, you want to have the
members at the same offset on both 32 and 64bit builds.
At the same times the whole struct size should be multiple of 64bit.

This and a few others are covered in Daniel Vetter's "Botching up ioctls" [1]


Yes, thanks, I failed to add up to 64. :)

Regards,

Tvrtko
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98833] [REGRESSION, bisected] Wayland revert commit breaks non-Vsync fullscreen frame updates

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98833

--- Comment #19 from Daniel Stone  ---
(In reply to Daniel Stone from comment #18)
> Right, here we are:
> https://lists.freedesktop.org/archives/mesa-dev/2017-May/155791.html

Eero, are you able to test this?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Emil Velikov
Hi Tvrtko,

On 18 May 2017 at 14:06, Tvrtko Ursulin  wrote:

> struct drm_i915_engine_info {
> /** Engine instance number. */
> __u32   instance;
> __u32   rsvd;
>
> /** Engine specific features and info. */
> #define DRM_I915_ENGINE_HAS_HEVCBIT(0)
> __u8 features;
> __u8 rsvd;
>
> __32 info;
> };
>
To spare yourself from writing a compat ioctl, you want to have the
members at the same offset on both 32 and 64bit builds.
At the same times the whole struct size should be multiple of 64bit.

This and a few others are covered in Daniel Vetter's "Botching up ioctls" [1]

-Emil

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ioctl/botching-up-ioctls.txt
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Chris Wilson
On Thu, May 18, 2017 at 02:30:56PM +0100, Tvrtko Ursulin wrote:
> 
> On 17/05/2017 17:44, Chris Wilson wrote:
> >On Wed, May 17, 2017 at 04:40:57PM +0100, Tvrtko Ursulin wrote:
> >>+static struct intel_engine_cs *get_engine_class(struct drm_i915_private 
> >>*i915,
> >>+   u8 class, u8 instance)
> >>+{
> >>+   if (class > MAX_ENGINE_CLASS || instance > MAX_ENGINE_INSTANCE)
> >>+   return NULL;
> >>+
> >>+   return i915->engine_class[class][instance];
> >>+}
> >
> >Be bold make this this an intel_engine_lookup(), I forsee some other
> >users appearing very shortly.
> 
> Still static or you want to export it straight away? Preference for
> where to put it? Here or intel_engine_cs.c?

Let's go with intel_engine_cs.c. We know we're going to move it
otherwise.

> >>diff --git a/drivers/gpu/drm/i915/i915_reg.h 
> >>b/drivers/gpu/drm/i915/i915_reg.h
> >>index ee144ec57935..a3b59043b991 100644
> >>--- a/drivers/gpu/drm/i915/i915_reg.h
> >>+++ b/drivers/gpu/drm/i915/i915_reg.h
> >>@@ -92,6 +92,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
> >> #define VIDEO_ENHANCEMENT_CLASS2
> >> #define COPY_ENGINE_CLASS  3
> >> #define OTHER_CLASS4
> >>+#define MAX_ENGINE_CLASS   4
> >>+
> >>+#define MAX_ENGINE_INSTANCE1
> >
> >We also need the names in the uapi.h as well. Do we want to mention
> >OTHER_CLASS before it is defined? I trust your crystal ball.
> 
> I have mentioned it just by the virtue of it existing in i915_reg.h
> Crystal ball is otherwise quiet.
> 
> >Amusingly I notice that you do claim that you have these names in the
> >changelog. :) We don't need DRM_I915 all the time. I915_ENGINE_CLASS is
> >going to be unique, at least for those users of i915_drm.h
> 
> They are all there courtesy of the previous patch.

Hey, definitely only one v3 patch in my inbox and no 2/2 to clue me in
:)

> I will drop the DRM_ prefix throughout.
> 
> >
> >> /* PCI config space */
> >>
> >>diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c 
> >>b/drivers/gpu/drm/i915/intel_engine_cs.c
> >>index 7566cf48012f..c5ad51c43d23 100644
> >>--- a/drivers/gpu/drm/i915/intel_engine_cs.c
> >>+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> >>@@ -225,7 +225,14 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
> >>
> >>ATOMIC_INIT_NOTIFIER_HEAD(>context_status_notifier);
> >>
> >>+   if (WARN_ON(info->class > MAX_ENGINE_CLASS ||
> >>+   info->instance > MAX_ENGINE_INSTANCE ||
> >>+   dev_priv->engine_class[info->class][info->instance]))
> >
> >Spread these out or add a message telling what was wrong.
> 
> Can do, but I considered these to be such a basic programming errors
> which should be caught during development. Only thing which
> prevented me from putting in under the GEM_BUG_ON is the fact
> someone could not have it enabled and that this function can already
> handle failures.

Spread them out and use GEM_WARN_ON?

> 
> >>+   return -EINVAL;
> >>+
> >>+   dev_priv->engine_class[info->class][info->instance] = engine;
> >>dev_priv->engine[id] = engine;
> >>+
> >>return 0;
> >> }
> >>
> >>diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> >>index 2ac6667e57ea..6a26bdf5e684 100644
> >>--- a/include/uapi/drm/i915_drm.h
> >>+++ b/include/uapi/drm/i915_drm.h
> >>@@ -906,7 +906,12 @@ struct drm_i915_gem_execbuffer2 {
> >>  */
> >> #define I915_EXEC_FENCE_OUT(1<<17)
> >>
> >>-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_OUT<<1))
> >>+#define I915_EXEC_CLASS_INSTANCE   (1<<18)
> >>+
> >>+#define I915_EXEC_INSTANCE_SHIFT   (19)
> >>+#define I915_EXEC_INSTANCE_MASK(0xff << 
> >>I915_EXEC_INSTANCE_SHIFT)
> >>+
> >>+#define __I915_EXEC_UNKNOWN_FLAGS (-((1 << 27) << 1))
> >>
> >> #define I915_EXEC_CONTEXT_ID_MASK  (0x)
> >> #define i915_execbuffer2_set_context_id(eb2, context) \
> >>@@ -914,6 +919,10 @@ struct drm_i915_gem_execbuffer2 {
> >> #define i915_execbuffer2_get_context_id(eb2) \
> >>((eb2).rsvd1 & I915_EXEC_CONTEXT_ID_MASK)
> >>
> >>+#define i915_execbuffer2_engine(class, instance) \
> >>+   (I915_EXEC_CLASS_INSTANCE | (class) | \
> >>+   ((instance) << I915_EXEC_INSTANCE_SHIFT))
> >
> >(class |
> > (instance) << I915_EXEC_INSTANCE_SHIFT |
> > I915_EXEC_CLASS_INSTANCE)
> >
> >Just suggesting to spread it out over another line for better
> >readability.
> 
> Okay but I'd like for the flag to come first, followed by class and
> then instance. Okay with that?

Ok. I don't think we can stop it forming a triangle however we reorder
them.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Chris Wilson
On Thu, May 18, 2017 at 02:06:35PM +0100, Tvrtko Ursulin wrote:
> 
> On 18/05/2017 13:24, Chris Wilson wrote:
> >Yes, I would argue to defer it until later. One problem I have at the
> >moment is that not all members of a class are equal, HEVC-capable
> >engines and the reset do not belong to the same class (i.e. my hope is
> >that we could just say  | [  | INSTANCE_MASK ] | LOAD_BALANCE)
> >So I can see the sense in having instance as a mask, or at least making
> >the current instance field large enough to store a mask in future. I
> >just feel uneasy as that field could grow quite large, and maybe it will
> >be better to set the constraint via a context param (all dependency on
> >frequency and tuning of the LOAD_BALANCE). Hmm, liking having the
> >instance-mask on the context atm.
> 
> I don't think per context mask would work unless you won't to
> mandate multi-contexts where they wouldn't otherwise be needed.

Contexts are not thread-friendly. About the only way you can make them
safe (wrt execbuf) is through the use of userspace GTT allocation (i.e.
assigning an address on creation and making it permanent with softpin).

So in general you end up serialising around execbuf and copying the
state in/out of the ioctl. That gives a window of opportunity to use
context_setparam as an extension for irregular parameter updates.

It is not as nice as providing space in the execbuf ioctl, because of
the extra state being carried around in the context.
 
> But this problem in general can also be solved separately from
> class-instance addressing via engine feature masking.

But imo all members of a class should have the same features. That would
be my definition of a class!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Tvrtko Ursulin


On 17/05/2017 17:44, Chris Wilson wrote:

On Wed, May 17, 2017 at 04:40:57PM +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Building on top of the previous patch which exported the concept
of engine classes and instances, we can also use this instead of
the current awkward engine selection uAPI.

This is primarily interesting for the VCS engine selection which
is a) currently done via disjoint set of flags, and b) the
current I915_EXEC_BSD flags has different semantics depending on
the underlying hardware which is bad.

Proposed idea here is to reserve 16-bits of flags, to pass in
the engine class and instance (8 bits each), and a new flag
named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
selection API is in use.

The new uAPI also removes access to the weak VCS engine
balancing as currently existing in the driver.

Example usage to send a command to VCS0:

  eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 0);

Or to send a command to VCS1:

  eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 1);

v2:
 * Fix unknown flags mask.
 * Use I915_EXEC_RING_MASK for class. (Chris Wilson)

v3:
 * Add a map for fast class-instance engine lookup. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin 
Cc: Ben Widawsky 
Cc: Chris Wilson 
Cc: Daniel Vetter 
Cc: Joonas Lahtinen 
Cc: Jon Bloomfield 
Cc: Daniel Charles 
Cc: "Rogozhkin, Dmitry V" 
Cc: Oscar Mateo 
Cc: "Gong, Zhipeng" 
Cc: intel-vaapi-me...@lists.01.org
Cc: mesa-dev@lists.freedesktop.org
---
 drivers/gpu/drm/i915/i915_drv.h|  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 30 ++
 drivers/gpu/drm/i915/i915_reg.h|  3 +++
 drivers/gpu/drm/i915/intel_engine_cs.c |  7 +++
 include/uapi/drm/i915_drm.h| 11 ++-
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5dfa4a12e647..7bf4fd42480c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2066,6 +2066,7 @@ struct drm_i915_private {
struct pci_dev *bridge_dev;
struct i915_gem_context *kernel_context;
struct intel_engine_cs *engine[I915_NUM_ENGINES];
+   struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 
1][MAX_ENGINE_INSTANCE + 1];
struct i915_vma *semaphore;

struct drm_dma_handle *status_page_dmah;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index af1965774e7b..c1ad49ab64cd 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1492,6 +1492,33 @@ gen8_dispatch_bsd_engine(struct drm_i915_private 
*dev_priv,
return file_priv->bsd_engine;
 }

+extern u8 user_class_map[DRM_I915_ENGINE_CLASS_MAX];
+
+static struct intel_engine_cs *get_engine_class(struct drm_i915_private *i915,
+   u8 class, u8 instance)
+{
+   if (class > MAX_ENGINE_CLASS || instance > MAX_ENGINE_INSTANCE)
+   return NULL;
+
+   return i915->engine_class[class][instance];
+}


Be bold make this this an intel_engine_lookup(), I forsee some other
users appearing very shortly.


Still static or you want to export it straight away? Preference for 
where to put it? Here or intel_engine_cs.c?



+static struct intel_engine_cs *
+eb_select_engine_class_instance(struct drm_i915_private *i915,
+   struct drm_i915_gem_execbuffer2 *args)
+{
+   u8 class, instance;
+
+   class = args->flags & I915_EXEC_RING_MASK;
+   if (class >= DRM_I915_ENGINE_CLASS_MAX)
+   return NULL;
+
+   instance = (args->flags >> I915_EXEC_INSTANCE_SHIFT) &&
+  I915_EXEC_INSTANCE_MASK;
+
+   return get_engine_class(i915, user_class_map[class], instance);
+}
+
 #define I915_USER_RINGS (4)

 static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
@@ -1510,6 +1537,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
struct intel_engine_cs *engine;

+   if (args->flags & I915_EXEC_CLASS_INSTANCE)
+   return eb_select_engine_class_instance(dev_priv, args);
+
if (user_ring_id > I915_USER_RINGS) {
DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
return NULL;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ee144ec57935..a3b59043b991 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -92,6 +92,9 @@ static inline bool 

[Mesa-dev] [Bug 101071] compiling glsl fails with undefined reference to `pthread_create'

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=101071

--- Comment #4 from Emil Velikov  ---
> [piotro@minimyth-dev ~]$ ldd -r
> /home/piotro/minimyth-dev/images/build/usr/lib/gcc/x86_64-minimyth-linux-gnu/
> 5.3.0/libstdc++.so
> linux-vdso.so.1 (0x7ffee99fd000)
> libm.so.6 => /usr/lib/libm.so.6 (0x7faef68e6000)
> libc.so.6 => /usr/lib/libc.so.6 (0x7faef6542000)
> /usr/lib64/ld-linux-x86-64.so.2 (0x55db774d7000)
> libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7faef632b000)

Right, so the libstdc++.so is fine and the link error is misleading.

> why then I hasn't any issues like this with last 10.0 ... 17.0 mesa versions?

Your guess is as good as mine. As you can see in the command line we explicitly
request linkage against libpthread.so (the -lpthread bit), yet GCC fails to do
so.

Couple of things to check:
 - does the minimyth environment have a libpthread.so library?
 - can you trace that correct one is attempted/used in the process?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Tvrtko Ursulin


On 18/05/2017 13:24, Chris Wilson wrote:

On Thu, May 18, 2017 at 01:13:20PM +0100, Tvrtko Ursulin wrote:


On 18/05/2017 12:10, Chris Wilson wrote:

On Thu, May 18, 2017 at 01:55:59PM +0300, Joonas Lahtinen wrote:

On ke, 2017-05-17 at 16:40 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Building on top of the previous patch which exported the concept
of engine classes and instances, we can also use this instead of
the current awkward engine selection uAPI.

This is primarily interesting for the VCS engine selection which
is a) currently done via disjoint set of flags, and b) the
current I915_EXEC_BSD flags has different semantics depending on
the underlying hardware which is bad.

Proposed idea here is to reserve 16-bits of flags, to pass in
the engine class and instance (8 bits each), and a new flag
named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
selection API is in use.


Note this text doesn't describe the interface in v3.


Would it make sense to use bitmask for future proofing? Then we could
allow passing multiple options in the future.


No. The first use case has to be explicit control of engine. That's
orthogonal to asking to select any of a particular class.


Was the suggestion to allow instance=any and instance=N? Or even all
the way to instance=N-or-M?

If not the latter, then I can think of two other possible approaches:

1. Reserve 0xff as instance=any, aka 128 instances should be enough
for everbody. :) Could simply enforce in the uAPI that instance ==
I915_EXEC_INSTANCE_MASK = -EINVAL for now or something.

2. Do nothing now and add say I915_EXEC_CLASS at a later point. This
patch adds I915_EXEC_CLASS_INSTANCE so I915_EXEC_CLASS would not
sound out of place.


Yes, I would argue to defer it until later. One problem I have at the
moment is that not all members of a class are equal, HEVC-capable
engines and the reset do not belong to the same class (i.e. my hope is
that we could just say  | [  | INSTANCE_MASK ] | LOAD_BALANCE)
So I can see the sense in having instance as a mask, or at least making
the current instance field large enough to store a mask in future. I
just feel uneasy as that field could grow quite large, and maybe it will
be better to set the constraint via a context param (all dependency on
frequency and tuning of the LOAD_BALANCE). Hmm, liking having the
instance-mask on the context atm.


I don't think per context mask would work unless you won't to mandate 
multi-contexts where they wouldn't otherwise be needed.


But this problem in general can also be solved separately from 
class-instance addressing via engine feature masking.


As the GEM_ENGINE_INFO ioctl proposal defines a set of engine flags, at 
a later point execbuf could be extended with a new flag. For example:


eb.flags = I915_EXEC_CLASS | I915_ENGINE_CLASS_VIDEO | 
I915_EXEC_ENGINE_FEATURES | I915_ENGINE_HAS_HEVC;


Only problem I see that engine flags in the current proposal are u64 so 
it would be a bit challenging to fit that in eb.flags.


Not sure if it would make sense to split the engine info flags into a 
smaller and larger set. u8 which would be flags "compatible" with 
I915_EXEC_ENGINE_FEATURES, and the remainder which would be something 
else, or unused/reserved? Like:


struct drm_i915_engine_info {
/** Engine instance number. */
__u32   instance;
__u32   rsvd;

/** Engine specific features and info. */
#define DRM_I915_ENGINE_HAS_HEVCBIT(0)
__u8 features;
__u8 rsvd;

__32 info;
};

Or at that point you bring on eb3.

Regards,

Tvrtko
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 101088] `gallium: remove pipe_index_buffer and set_index_buffer` causes glitches and crash in gallium nine

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=101088

--- Comment #3 from raffa...@zoho.com ---
Sorry previously attached backtrace is invalid, please ignore it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99467] [radv] DOOM 2016 + wine. Green screen everywhere (but can be started)

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99467

Sven Arvidsson  changed:

   What|Removed |Added

 CC||s...@whiz.se

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 101088] `gallium: remove pipe_index_buffer and set_index_buffer` causes glitches and crash in gallium nine

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=101088

--- Comment #2 from raffa...@zoho.com ---
Created attachment 131405
  --> https://bugs.freedesktop.org/attachment.cgi?id=131405=edit
winedbg backtrace

winedbg crashes at startup with memory corruption

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 101071] compiling glsl fails with undefined reference to `pthread_create'

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=101071

--- Comment #3 from war...@o2.pl ---
Emil,

BTW, You wrote:

Can you check with the libstdc++.so authors that they use "-no-undefined
-Wl,--no-undefined" to the LDFLAGS. It will flag issues as the binary is
linked.


why then I hasn't any issues like this with last 10.0 ... 17.0 mesa versions?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Chris Wilson
On Thu, May 18, 2017 at 01:13:20PM +0100, Tvrtko Ursulin wrote:
> 
> On 18/05/2017 12:10, Chris Wilson wrote:
> >On Thu, May 18, 2017 at 01:55:59PM +0300, Joonas Lahtinen wrote:
> >>On ke, 2017-05-17 at 16:40 +0100, Tvrtko Ursulin wrote:
> >>>From: Tvrtko Ursulin 
> >>>
> >>>Building on top of the previous patch which exported the concept
> >>>of engine classes and instances, we can also use this instead of
> >>>the current awkward engine selection uAPI.
> >>>
> >>>This is primarily interesting for the VCS engine selection which
> >>>is a) currently done via disjoint set of flags, and b) the
> >>>current I915_EXEC_BSD flags has different semantics depending on
> >>>the underlying hardware which is bad.
> >>>
> >>>Proposed idea here is to reserve 16-bits of flags, to pass in
> >>>the engine class and instance (8 bits each), and a new flag
> >>>named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
> >>>selection API is in use.
> >
> >Note this text doesn't describe the interface in v3.
> >
> >>Would it make sense to use bitmask for future proofing? Then we could
> >>allow passing multiple options in the future.
> >
> >No. The first use case has to be explicit control of engine. That's
> >orthogonal to asking to select any of a particular class.
> 
> Was the suggestion to allow instance=any and instance=N? Or even all
> the way to instance=N-or-M?
> 
> If not the latter, then I can think of two other possible approaches:
> 
> 1. Reserve 0xff as instance=any, aka 128 instances should be enough
> for everbody. :) Could simply enforce in the uAPI that instance ==
> I915_EXEC_INSTANCE_MASK = -EINVAL for now or something.
> 
> 2. Do nothing now and add say I915_EXEC_CLASS at a later point. This
> patch adds I915_EXEC_CLASS_INSTANCE so I915_EXEC_CLASS would not
> sound out of place.

Yes, I would argue to defer it until later. One problem I have at the
moment is that not all members of a class are equal, HEVC-capable
engines and the reset do not belong to the same class (i.e. my hope is
that we could just say  | [  | INSTANCE_MASK ] | LOAD_BALANCE)
So I can see the sense in having instance as a mask, or at least making
the current instance field large enough to store a mask in future. I
just feel uneasy as that field could grow quite large, and maybe it will
be better to set the constraint via a context param (all dependency on
frequency and tuning of the LOAD_BALANCE). Hmm, liking having the
instance-mask on the context atm.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Tvrtko Ursulin


On 18/05/2017 12:10, Chris Wilson wrote:

On Thu, May 18, 2017 at 01:55:59PM +0300, Joonas Lahtinen wrote:

On ke, 2017-05-17 at 16:40 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Building on top of the previous patch which exported the concept
of engine classes and instances, we can also use this instead of
the current awkward engine selection uAPI.

This is primarily interesting for the VCS engine selection which
is a) currently done via disjoint set of flags, and b) the
current I915_EXEC_BSD flags has different semantics depending on
the underlying hardware which is bad.

Proposed idea here is to reserve 16-bits of flags, to pass in
the engine class and instance (8 bits each), and a new flag
named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
selection API is in use.


Note this text doesn't describe the interface in v3.


Would it make sense to use bitmask for future proofing? Then we could
allow passing multiple options in the future.


No. The first use case has to be explicit control of engine. That's
orthogonal to asking to select any of a particular class.


Was the suggestion to allow instance=any and instance=N? Or even all the 
way to instance=N-or-M?


If not the latter, then I can think of two other possible approaches:

1. Reserve 0xff as instance=any, aka 128 instances should be enough for 
everbody. :) Could simply enforce in the uAPI that instance == 
I915_EXEC_INSTANCE_MASK = -EINVAL for now or something.


2. Do nothing now and add say I915_EXEC_CLASS at a later point. This 
patch adds I915_EXEC_CLASS_INSTANCE so I915_EXEC_CLASS would not sound 
out of place.


Regards,

Tvrtko
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 101088] `gallium: remove pipe_index_buffer and set_index_buffer` causes glitches and crash in gallium nine

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=101088

--- Comment #1 from raffa...@zoho.com ---
`-O2` was looking fine, but turns out it's just a little more stable. Less
glitches and fewer crashes, but still broken.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 101088] `gallium: remove pipe_index_buffer and set_index_buffer` causes glitches and crash in gallium nine

2017-05-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=101088

Bug ID: 101088
   Summary: `gallium: remove pipe_index_buffer and
set_index_buffer` causes glitches and crash in gallium
nine
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: raffa...@zoho.com
QA Contact: mesa-dev@lists.freedesktop.org

commit 330d0607ed60fd3edca192e54b4246310f06652f `gallium: remove
pipe_index_buffer and set_index_buffer` causes glitches and crash in 32bit
build with `-O3` optimization level. Can't test 64bit.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] radeonsi: only upload (dump to L2) those descriptors that are used by shaders

2017-05-18 Thread Nicolai Hähnle

On 18.05.2017 12:43, Marek Olšák wrote:

On Thu, May 18, 2017 at 12:41 PM, Marek Olšák  wrote:

On Thu, May 18, 2017 at 11:31 AM, Nicolai Hähnle  wrote:

On 17.05.2017 21:38, Marek Olšák wrote:


From: Marek Olšák 

This decreases the size of CE RAM dumps to L2, or the size of descriptor
uploads without CE.
---
 src/gallium/drivers/radeonsi/si_compute.c   | 28 ++--
 src/gallium/drivers/radeonsi/si_descriptors.c   | 85
-
 src/gallium/drivers/radeonsi/si_state.h | 18 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c |  6 ++
 4 files changed, 113 insertions(+), 24 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c
b/src/gallium/drivers/radeonsi/si_compute.c
index 22ef111..4c98066 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -201,21 +201,38 @@ static void *si_create_compute_state(
return NULL;
}
}

return program;
 }

 static void si_bind_compute_state(struct pipe_context *ctx, void *state)
 {
struct si_context *sctx = (struct si_context*)ctx;
-   sctx->cs_shader_state.program = (struct si_compute*)state;
+   struct si_compute *program = (struct si_compute*)state;
+
+   sctx->cs_shader_state.program = program;
+   if (!program)
+   return;
+
+   /* Wait because we need active slot usage masks. */
+   if (program->ir_type == PIPE_SHADER_IR_TGSI)
+   util_queue_fence_wait(>ready);
+
+   si_set_active_descriptors(sctx,
+ SI_DESCS_FIRST_COMPUTE +
+
SI_SHADER_DESCS_CONST_AND_SHADER_BUFFERS,
+
program->active_const_and_shader_buffers);
+   si_set_active_descriptors(sctx,
+ SI_DESCS_FIRST_COMPUTE +
+ SI_SHADER_DESCS_SAMPLERS_AND_IMAGES,
+ program->active_samplers_and_images);
 }

 static void si_set_global_binding(
struct pipe_context *ctx, unsigned first, unsigned n,
struct pipe_resource **resources,
uint32_t **handles)
 {
unsigned i;
struct si_context *sctx = (struct si_context*)ctx;
struct si_compute *program = sctx->cs_shader_state.program;
@@ -749,26 +766,23 @@ static void si_launch_grid(
bool cs_regalloc_hang =
(sctx->b.chip_class == SI ||
 sctx->b.family == CHIP_BONAIRE ||
 sctx->b.family == CHIP_KABINI) &&
info->block[0] * info->block[1] * info->block[2] > 256;

if (cs_regalloc_hang)
sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
 SI_CONTEXT_CS_PARTIAL_FLUSH;

-   if (program->ir_type == PIPE_SHADER_IR_TGSI) {
-   util_queue_fence_wait(>ready);
-
-   if (program->shader.compilation_failed)
-   return;
-   }
+   if (program->ir_type == PIPE_SHADER_IR_TGSI &&
+   program->shader.compilation_failed)
+   return;

si_decompress_compute_textures(sctx);

/* Add buffer sizes for memory checking in need_cs_space. */
r600_context_add_resource_size(ctx, >shader.bo->b.b);
/* TODO: add the scratch buffer */

if (info->indirect) {
r600_context_add_resource_size(ctx, info->indirect);

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 38e4ae1..a2f40a8 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -118,26 +118,28 @@ static void si_init_descriptors(struct
si_descriptors *desc,
}
 }

 static void si_release_descriptors(struct si_descriptors *desc)
 {
r600_resource_reference(>buffer, NULL);
FREE(desc->list);
 }

 static bool si_ce_upload(struct si_context *sctx, unsigned ce_offset,
unsigned size,
-unsigned *out_offset, struct r600_resource
**out_buf) {
+unsigned *out_offset, struct r600_resource
**out_buf)
+{
uint64_t va;

u_suballocator_alloc(sctx->ce_suballocator, size,
-sctx->screen->b.info.tcc_cache_line_size,
-out_offset, (struct pipe_resource**)out_buf);
+si_optimal_tcc_alignment(sctx, size),
+(unsigned*)out_offset,



The extra cast of out_offset is unnecessary.



+(struct pipe_resource**)out_buf);
if (!out_buf)
return false;

va = (*out_buf)->gpu_address + *out_offset;

radeon_emit(sctx->ce_ib, PKT3(PKT3_DUMP_CONST_RAM, 3, 0));
radeon_emit(sctx->ce_ib, ce_offset);
radeon_emit(sctx->ce_ib, size / 4);
radeon_emit(sctx->ce_ib, va);
radeon_emit(sctx->ce_ib, va >> 

Re: [Mesa-dev] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Chris Wilson
On Thu, May 18, 2017 at 01:55:59PM +0300, Joonas Lahtinen wrote:
> On ke, 2017-05-17 at 16:40 +0100, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin 
> > 
> > Building on top of the previous patch which exported the concept
> > of engine classes and instances, we can also use this instead of
> > the current awkward engine selection uAPI.
> > 
> > This is primarily interesting for the VCS engine selection which
> > is a) currently done via disjoint set of flags, and b) the
> > current I915_EXEC_BSD flags has different semantics depending on
> > the underlying hardware which is bad.
> > 
> > Proposed idea here is to reserve 16-bits of flags, to pass in
> > the engine class and instance (8 bits each), and a new flag
> > named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
> > selection API is in use.

Note this text doesn't describe the interface in v3.

> Would it make sense to use bitmask for future proofing? Then we could
> allow passing multiple options in the future.

No. The first use case has to be explicit control of engine. That's
orthogonal to asking to select any of a particular class.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] egl: Partially revert 23c86c74, fix eglMakeCurrent

2017-05-18 Thread Tapani Pälli

On 05/18/2017 12:16 AM, Tomasz Figa wrote:

Hi,

On Tue, May 16, 2017 at 11:35 PM, Tapani Pälli  wrote:


On 05/16/2017 08:10 PM, Chad Versace wrote:

On Tue 16 May 2017, Tapani Pälli wrote:



On 05/16/2017 02:04 AM, Chad Versace wrote:

Fixes regressions in Android CtsVerifier.apk on Intel Chrome OS devices
due to incorrect error handling in eglMakeCurrent. See below on how to
confirm the regression is fixed.

This partially reverts

   commit 23c86c74cc450a23848b85cfe914376caede1cdf
   Author:  Chad Versace 
   Subject: egl: Emit error when EGLSurface is lost

The bad commit added the error handling below. #2 and #3 were right,
but #1 was wrong.

   1. eglMakeCurrent emits EGL_BAD_CURRENT_SURFACE if the calling
  thread has unflushed commands and either previous surface is no
  longer valid.

   2. eglMakeCurrent emits EGL_BAD_NATIVE_WINDOW if either new
surface
  is no longer valid.

   3. eglSwapBuffers emits EGL_BAD_NATIVE_WINDOW if the swapped
surface
  is no longer valid.

Whe I wrote the bad commit, I misunderstood the EGL spec language
for #1. The correct behavior is, if I understand correctly now:

   - Assume a bound EGLSurface is no longer valid.
   - Assume the bound EGLContext has unflushed commands.
   - The app calls eglMakeCurrent. The spec requires eglMakeCurrent
to
 implicitly flush. After flushing, eglMakeCurrent emits
 EGL_BAD_CURRENT_SURFACE and does *not* alter the thread's
 current bindings.
   - If the app calls eglMakeCurrent again, and the app inserts no
 commands into the GL command stream between the two
eglMakeCurrent
 calls, then this second eglMakeCurrent succeeds without emitting
an
 error.

How to confirm this fixes the regression:

   Download android-cts-verifier-7.1_r5-linux_x86-x86.zip from
   source.android.com, unpack, and `adb install CtsVerifier.apk`.
   Run test "Projection Cube". Click the Pass button (a
   green checkmark). Then run test "Projection Widget". Confirm that
   widgets are visible and that logcat does not complain about
   eglMakeCurrent failure.

   Then confirm there are no regressions in the cts-traded module
that
   commit 263243b1 fixed:

   cts-tf > run cts --skip-preconditions --skip-device-info \
-m CtsCameraTestCases \
-t android.hardware.camera2.cts.RobustnessTest

   Tested with Chrome OS board "reef".


both tests passed on Android-IA with this patch ... but if I minimize
"Projection Widget" test it starts to bang EGL_BAD_NATIVE_WINDOW heavily.
Is
this expected behavior?


I'm unsure. I haven't yet tried that experiment. I'll give it a try when
I'm back at my desk.

Which EGL function is emitting EGL_BAD_NATIVE_WINDOW in logcat?
eglMakeCurrent or eglSwapBuffers, or something else?


doh sorry, I actually meant 'Projection Cube' test, not the widget test. But
yep, when tapping switcher button eglMakeCurrent starts to emit the error
(likely the app just continues to submit frames even when it's
'minimized'?). When switching back to app it cannot draw anymore as it keeps
getting the error.

I don't remember exactly and I'm not an Android app expert either, but
I remember hearing something that the app needs to stop drawing when
the activity is paused (stopped?). Isn't that what's happening when an
app gets minimized?


Yep, this is very likely the case and the app here is just not obeying 
the rules. Considering that Android wraps EGL layer it seems weird that 
it just let's application to do this. Any case, since everything else 
works fine this is definitely not a blocker for me for this patch;


Acked-by: Tapani Pälli 


Best regards,
Tomasz


What was the behavior before this patch? And before
commit 23c86c74 (egl: Emit error when EGLSurface is lost)?


Without this patch (and before commit 23c86c74) I can minimize app to
switcher and bring it back and it continues to draw.

Note that everything else seems to work just fine though (tested lots of GL
apps) so is this just something specific what cts-verifier is doing or not
handling properly?




Cc: "17.1" 
Cc: Tomasz Figa 
Cc: Nicolas Boichat 
Cc: Emil Velikov 
Fixes: 23c86c74 (egl: Emit error when EGLSurface is lost)
---
src/egl/main/eglapi.c | 19 ---
1 file changed, 19 deletions(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index aa0eb94666..9cea2f41ff 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -828,25 +828,6 @@ eglMakeCurrent(EGLDisplay dpy, EGLSurface draw,
EGLSurface read,
 RETURN_EGL_ERROR(disp, EGL_BAD_MATCH, EGL_FALSE);
   }
-   _EGLThreadInfo *t =_eglGetCurrentThread();
-   _EGLContext *old_ctx = t->CurrentContext;
-   

Re: [Mesa-dev] [RFC v3] drm/i915: Select engines via class and instance in execbuffer2

2017-05-18 Thread Joonas Lahtinen
On ke, 2017-05-17 at 16:40 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Building on top of the previous patch which exported the concept
> of engine classes and instances, we can also use this instead of
> the current awkward engine selection uAPI.
> 
> This is primarily interesting for the VCS engine selection which
> is a) currently done via disjoint set of flags, and b) the
> current I915_EXEC_BSD flags has different semantics depending on
> the underlying hardware which is bad.
> 
> Proposed idea here is to reserve 16-bits of flags, to pass in
> the engine class and instance (8 bits each), and a new flag
> named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
> selection API is in use.

Would it make sense to use bitmask for future proofing? Then we could
allow passing multiple options in the future.

Other than that, looks good. Chris already commented some on the code
itself.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] radeonsi: only upload (dump to L2) those descriptors that are used by shaders

2017-05-18 Thread Marek Olšák
On Thu, May 18, 2017 at 12:41 PM, Marek Olšák  wrote:
> On Thu, May 18, 2017 at 11:31 AM, Nicolai Hähnle  wrote:
>> On 17.05.2017 21:38, Marek Olšák wrote:
>>>
>>> From: Marek Olšák 
>>>
>>> This decreases the size of CE RAM dumps to L2, or the size of descriptor
>>> uploads without CE.
>>> ---
>>>  src/gallium/drivers/radeonsi/si_compute.c   | 28 ++--
>>>  src/gallium/drivers/radeonsi/si_descriptors.c   | 85
>>> -
>>>  src/gallium/drivers/radeonsi/si_state.h | 18 +-
>>>  src/gallium/drivers/radeonsi/si_state_shaders.c |  6 ++
>>>  4 files changed, 113 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_compute.c
>>> b/src/gallium/drivers/radeonsi/si_compute.c
>>> index 22ef111..4c98066 100644
>>> --- a/src/gallium/drivers/radeonsi/si_compute.c
>>> +++ b/src/gallium/drivers/radeonsi/si_compute.c
>>> @@ -201,21 +201,38 @@ static void *si_create_compute_state(
>>> return NULL;
>>> }
>>> }
>>>
>>> return program;
>>>  }
>>>
>>>  static void si_bind_compute_state(struct pipe_context *ctx, void *state)
>>>  {
>>> struct si_context *sctx = (struct si_context*)ctx;
>>> -   sctx->cs_shader_state.program = (struct si_compute*)state;
>>> +   struct si_compute *program = (struct si_compute*)state;
>>> +
>>> +   sctx->cs_shader_state.program = program;
>>> +   if (!program)
>>> +   return;
>>> +
>>> +   /* Wait because we need active slot usage masks. */
>>> +   if (program->ir_type == PIPE_SHADER_IR_TGSI)
>>> +   util_queue_fence_wait(>ready);
>>> +
>>> +   si_set_active_descriptors(sctx,
>>> + SI_DESCS_FIRST_COMPUTE +
>>> +
>>> SI_SHADER_DESCS_CONST_AND_SHADER_BUFFERS,
>>> +
>>> program->active_const_and_shader_buffers);
>>> +   si_set_active_descriptors(sctx,
>>> + SI_DESCS_FIRST_COMPUTE +
>>> + SI_SHADER_DESCS_SAMPLERS_AND_IMAGES,
>>> + program->active_samplers_and_images);
>>>  }
>>>
>>>  static void si_set_global_binding(
>>> struct pipe_context *ctx, unsigned first, unsigned n,
>>> struct pipe_resource **resources,
>>> uint32_t **handles)
>>>  {
>>> unsigned i;
>>> struct si_context *sctx = (struct si_context*)ctx;
>>> struct si_compute *program = sctx->cs_shader_state.program;
>>> @@ -749,26 +766,23 @@ static void si_launch_grid(
>>> bool cs_regalloc_hang =
>>> (sctx->b.chip_class == SI ||
>>>  sctx->b.family == CHIP_BONAIRE ||
>>>  sctx->b.family == CHIP_KABINI) &&
>>> info->block[0] * info->block[1] * info->block[2] > 256;
>>>
>>> if (cs_regalloc_hang)
>>> sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
>>>  SI_CONTEXT_CS_PARTIAL_FLUSH;
>>>
>>> -   if (program->ir_type == PIPE_SHADER_IR_TGSI) {
>>> -   util_queue_fence_wait(>ready);
>>> -
>>> -   if (program->shader.compilation_failed)
>>> -   return;
>>> -   }
>>> +   if (program->ir_type == PIPE_SHADER_IR_TGSI &&
>>> +   program->shader.compilation_failed)
>>> +   return;
>>>
>>> si_decompress_compute_textures(sctx);
>>>
>>> /* Add buffer sizes for memory checking in need_cs_space. */
>>> r600_context_add_resource_size(ctx, >shader.bo->b.b);
>>> /* TODO: add the scratch buffer */
>>>
>>> if (info->indirect) {
>>> r600_context_add_resource_size(ctx, info->indirect);
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
>>> b/src/gallium/drivers/radeonsi/si_descriptors.c
>>> index 38e4ae1..a2f40a8 100644
>>> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
>>> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>>> @@ -118,26 +118,28 @@ static void si_init_descriptors(struct
>>> si_descriptors *desc,
>>> }
>>>  }
>>>
>>>  static void si_release_descriptors(struct si_descriptors *desc)
>>>  {
>>> r600_resource_reference(>buffer, NULL);
>>> FREE(desc->list);
>>>  }
>>>
>>>  static bool si_ce_upload(struct si_context *sctx, unsigned ce_offset,
>>> unsigned size,
>>> -unsigned *out_offset, struct r600_resource
>>> **out_buf) {
>>> +unsigned *out_offset, struct r600_resource
>>> **out_buf)
>>> +{
>>> uint64_t va;
>>>
>>> u_suballocator_alloc(sctx->ce_suballocator, size,
>>> -sctx->screen->b.info.tcc_cache_line_size,
>>> -out_offset, (struct pipe_resource**)out_buf);
>>> +si_optimal_tcc_alignment(sctx, size),
>>> +(unsigned*)out_offset,
>>
>>
>> The extra cast of out_offset 

Re: [Mesa-dev] [PATCH 08/10] radeonsi: only upload (dump to L2) those descriptors that are used by shaders

2017-05-18 Thread Marek Olšák
On Thu, May 18, 2017 at 11:31 AM, Nicolai Hähnle  wrote:
> On 17.05.2017 21:38, Marek Olšák wrote:
>>
>> From: Marek Olšák 
>>
>> This decreases the size of CE RAM dumps to L2, or the size of descriptor
>> uploads without CE.
>> ---
>>  src/gallium/drivers/radeonsi/si_compute.c   | 28 ++--
>>  src/gallium/drivers/radeonsi/si_descriptors.c   | 85
>> -
>>  src/gallium/drivers/radeonsi/si_state.h | 18 +-
>>  src/gallium/drivers/radeonsi/si_state_shaders.c |  6 ++
>>  4 files changed, 113 insertions(+), 24 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_compute.c
>> b/src/gallium/drivers/radeonsi/si_compute.c
>> index 22ef111..4c98066 100644
>> --- a/src/gallium/drivers/radeonsi/si_compute.c
>> +++ b/src/gallium/drivers/radeonsi/si_compute.c
>> @@ -201,21 +201,38 @@ static void *si_create_compute_state(
>> return NULL;
>> }
>> }
>>
>> return program;
>>  }
>>
>>  static void si_bind_compute_state(struct pipe_context *ctx, void *state)
>>  {
>> struct si_context *sctx = (struct si_context*)ctx;
>> -   sctx->cs_shader_state.program = (struct si_compute*)state;
>> +   struct si_compute *program = (struct si_compute*)state;
>> +
>> +   sctx->cs_shader_state.program = program;
>> +   if (!program)
>> +   return;
>> +
>> +   /* Wait because we need active slot usage masks. */
>> +   if (program->ir_type == PIPE_SHADER_IR_TGSI)
>> +   util_queue_fence_wait(>ready);
>> +
>> +   si_set_active_descriptors(sctx,
>> + SI_DESCS_FIRST_COMPUTE +
>> +
>> SI_SHADER_DESCS_CONST_AND_SHADER_BUFFERS,
>> +
>> program->active_const_and_shader_buffers);
>> +   si_set_active_descriptors(sctx,
>> + SI_DESCS_FIRST_COMPUTE +
>> + SI_SHADER_DESCS_SAMPLERS_AND_IMAGES,
>> + program->active_samplers_and_images);
>>  }
>>
>>  static void si_set_global_binding(
>> struct pipe_context *ctx, unsigned first, unsigned n,
>> struct pipe_resource **resources,
>> uint32_t **handles)
>>  {
>> unsigned i;
>> struct si_context *sctx = (struct si_context*)ctx;
>> struct si_compute *program = sctx->cs_shader_state.program;
>> @@ -749,26 +766,23 @@ static void si_launch_grid(
>> bool cs_regalloc_hang =
>> (sctx->b.chip_class == SI ||
>>  sctx->b.family == CHIP_BONAIRE ||
>>  sctx->b.family == CHIP_KABINI) &&
>> info->block[0] * info->block[1] * info->block[2] > 256;
>>
>> if (cs_regalloc_hang)
>> sctx->b.flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
>>  SI_CONTEXT_CS_PARTIAL_FLUSH;
>>
>> -   if (program->ir_type == PIPE_SHADER_IR_TGSI) {
>> -   util_queue_fence_wait(>ready);
>> -
>> -   if (program->shader.compilation_failed)
>> -   return;
>> -   }
>> +   if (program->ir_type == PIPE_SHADER_IR_TGSI &&
>> +   program->shader.compilation_failed)
>> +   return;
>>
>> si_decompress_compute_textures(sctx);
>>
>> /* Add buffer sizes for memory checking in need_cs_space. */
>> r600_context_add_resource_size(ctx, >shader.bo->b.b);
>> /* TODO: add the scratch buffer */
>>
>> if (info->indirect) {
>> r600_context_add_resource_size(ctx, info->indirect);
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c
>> b/src/gallium/drivers/radeonsi/si_descriptors.c
>> index 38e4ae1..a2f40a8 100644
>> --- a/src/gallium/drivers/radeonsi/si_descriptors.c
>> +++ b/src/gallium/drivers/radeonsi/si_descriptors.c
>> @@ -118,26 +118,28 @@ static void si_init_descriptors(struct
>> si_descriptors *desc,
>> }
>>  }
>>
>>  static void si_release_descriptors(struct si_descriptors *desc)
>>  {
>> r600_resource_reference(>buffer, NULL);
>> FREE(desc->list);
>>  }
>>
>>  static bool si_ce_upload(struct si_context *sctx, unsigned ce_offset,
>> unsigned size,
>> -unsigned *out_offset, struct r600_resource
>> **out_buf) {
>> +unsigned *out_offset, struct r600_resource
>> **out_buf)
>> +{
>> uint64_t va;
>>
>> u_suballocator_alloc(sctx->ce_suballocator, size,
>> -sctx->screen->b.info.tcc_cache_line_size,
>> -out_offset, (struct pipe_resource**)out_buf);
>> +si_optimal_tcc_alignment(sctx, size),
>> +(unsigned*)out_offset,
>
>
> The extra cast of out_offset is unnecessary.
>
>
>> +(struct pipe_resource**)out_buf);
>> if (!out_buf)
>> return false;
>>
>> va = (*out_buf)->gpu_address + 

Re: [Mesa-dev] [PATCH] i965: Add RGBX8888 and RGBA8888 to EGL configure and reorder the list

2017-05-18 Thread Mike Lothian
Can you make sure whatever you do it's well tested

I shudder every time I see such requests as it nearly always seems to cause
breakage for me

On Thu, 18 May 2017 at 11:25 Emil Velikov  wrote:

> On 18 May 2017 at 05:10, Chih-Wei Huang  wrote:
> > 2017-05-18 12:01 GMT+08:00 Xu, Randy :
> >>
> >>> -Original Message-
> >>> From: Palli, Tapani
> >>> >
> >>> > It's just applied. Isn't it?
> >>> > Yesterday without this patch
> >>> > the color format is mismatch apparently.
> >>>
> >>> Yeah we do need this. TBH I don't quite understand why but will try to
> figure
> >>> it out. I remember we used to have a patch in Surfaceflinger at one
> point
> >>> because visual was hardcoded there and this might be related.
> >>>
> >>> // Tapani
> >>
> >> Sorry, that's for different issue, I mix it with RGB565 blending one.
> >> This patch is required because some Android GLES test apps, like
> gl2_basic, need to create RGBA surface.
> >
> > Indeed it is.
> >
> > As Emil pointed out, the patch was merged before
> > but reverted later since it broke desktop.
> >
> > So what's the current upstreaming plan?
> >
> No idea about a plan, but how you can fix it once and for all:
>
> Extend the loader extension(s) to have a get_caps() callback,
> analogous to __DRIimageExtension::getCapabilities.
> Then the DRI module will query [the loader] and advertise the
> RGBX/RGBA visuals when possible.
>
> -Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] radeonsi: merge constant and shader buffers descriptor lists into one

2017-05-18 Thread Marek Olšák
On Thu, May 18, 2017 at 11:28 AM, Nicolai Hähnle  wrote:
> On 17.05.2017 21:38, Marek Olšák wrote:
>>
>> From: Marek Olšák 
>>
>> Constant buffers: slot[16], .. slot[31] (ascending)
>> Shader buffers: slot[15], .. slot[0] (descending)
>>
>> The idea is that if we have 4 constant buffers and 2 shader buffers, we
>> only
>> have to upload 6 slots. That optimization is left for a later commit.
>> ---
>>  src/gallium/drivers/radeonsi/si_debug.c   |  44 ---
>>  src/gallium/drivers/radeonsi/si_descriptors.c | 141
>> +++---
>>  src/gallium/drivers/radeonsi/si_pipe.h|   3 +-
>>  src/gallium/drivers/radeonsi/si_shader.c  |  32 ++---
>>  src/gallium/drivers/radeonsi/si_shader.h  |  20 ++-
>>  src/gallium/drivers/radeonsi/si_shader_internal.h |   3 +-
>>  src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c |  13 +-
>>  src/gallium/drivers/radeonsi/si_state.h   |  25 +++-
>>  8 files changed, 150 insertions(+), 131 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_debug.c
>> b/src/gallium/drivers/radeonsi/si_debug.c
>> index d1159ad..25c3882 100644
>> --- a/src/gallium/drivers/radeonsi/si_debug.c
>> +++ b/src/gallium/drivers/radeonsi/si_debug.c
>> @@ -373,37 +373,38 @@ static void si_dump_framebuffer(struct si_context
>> *sctx, FILE *f)
>> }
>>
>> if (state->zsbuf) {
>> rtex = (struct r600_texture*)state->zsbuf->texture;
>> fprintf(f, COLOR_YELLOW "Depth-stencil buffer:"
>> COLOR_RESET "\n");
>> r600_print_texture_info(sctx->b.screen, rtex, f);
>> fprintf(f, "\n");
>> }
>>  }
>>
>> +typedef unsigned (*slot_remap_func)(unsigned);
>> +
>>  static void si_dump_descriptor_list(struct si_descriptors *desc,
>> const char *shader_name,
>> const char *elem_name,
>> unsigned num_elements,
>> +   slot_remap_func slot_remap,
>> FILE *f)
>>  {
>> unsigned i, j;
>> -   uint32_t *cpu_list = desc->list;
>> -   uint32_t *gpu_list = desc->gpu_list;
>> -   const char *list_note = "GPU list";
>> -
>> -   if (!gpu_list) {
>> -   gpu_list = cpu_list;
>> -   list_note = "CPU list";
>> -   }
>>
>> for (i = 0; i < num_elements; i++) {
>> +   unsigned dw_offset = slot_remap(i) *
>> desc->element_dw_size;
>> +   uint32_t *gpu_ptr = desc->gpu_list ? desc->gpu_list :
>> desc->list;
>> +   const char *list_note = desc->gpu_list ? "GPU list" : "CPU
>> list";
>> +   uint32_t *cpu_list = desc->list + dw_offset;
>> +   uint32_t *gpu_list = gpu_ptr + dw_offset;
>> +
>> fprintf(f, COLOR_GREEN "%s%s slot %u (%s):" COLOR_RESET
>> "\n",
>> shader_name, elem_name, i, list_note);
>>
>> switch (desc->element_dw_size) {
>> case 4:
>> for (j = 0; j < 4; j++)
>> ac_dump_reg(f, R_008F00_SQ_BUF_RSRC_WORD0
>> + j*4,
>> gpu_list[j], 0x);
>> break;
>> case 8:
>> @@ -437,63 +438,75 @@ static void si_dump_descriptor_list(struct
>> si_descriptors *desc,
>> gpu_list[12+j], 0x);
>> break;
>> }
>>
>> if (memcmp(gpu_list, cpu_list, desc->element_dw_size * 4)
>> != 0) {
>> fprintf(f, COLOR_RED "! This slot was
>> corrupted in GPU memory !"
>> COLOR_RESET "\n");
>> }
>>
>> fprintf(f, "\n");
>> -   gpu_list += desc->element_dw_size;
>> -   cpu_list += desc->element_dw_size;
>> }
>>  }
>>
>> +static unsigned si_identity(unsigned slot)
>> +{
>> +   return slot;
>> +}
>> +
>>  static void si_dump_descriptors(struct si_context *sctx,
>> enum pipe_shader_type processor,
>> const struct tgsi_shader_info *info, FILE
>> *f)
>>  {
>> struct si_descriptors *descs =
>> >descriptors[SI_DESCS_FIRST_SHADER +
>>processor * SI_NUM_SHADER_DESCS];
>> static const char *shader_name[] = {"VS", "PS", "GS", "TCS",
>> "TES", "CS"};
>>
>> static const char *elem_name[] = {
>> " - Constant buffer",
>> " - Shader buffer",
>> " - Sampler",
>> " - Image",
>> };
>> +   static const slot_remap_func remap_func[] = {
>> +   si_get_constbuf_slot,
>> +   si_get_shaderbuf_slot,
>> +   si_identity,
>> +   

Re: [Mesa-dev] [PATCH] Android: correct libz dependency

2017-05-18 Thread Emil Velikov
On 18 May 2017 at 03:55, Chih-Wei Huang  wrote:
> 2017-05-17 21:11 GMT+08:00 Emil Velikov :
>> On 17 May 2017 at 13:45, Rob Herring  wrote:
>>> On Wed, May 17, 2017 at 12:10 AM, Chih-Wei Huang
>>>  wrote:
 Commit 6facb0c0 ("android: fix libz dynamic library dependencies")
 unconditionally adds libz as a dependency to all shared libraries.
 That is unnecessary.

 Commit 85a9b1b5 introduced libz as a dependency to libmesa_util.
 So only the shared libraries that use libmesa_util need libz.

 Fix Android Lollipop build by adding the include path of zlib to
 libmesa_util explicitly instead of getting the path implicitly
 from zlib since it doesn't export the include path in Lollipop.

 Fixes: 6facb0c0 "android: fix libz dynamic library dependencies"

 Signed-off-by: Chih-Wei Huang 
 ---
  Android.common.mk  | 4 
  src/gallium/targets/dri/Android.mk | 3 ++-
  src/intel/Android.vulkan.mk| 2 +-
  src/mesa/drivers/dri/Android.mk| 3 ++-
  src/util/Android.mk| 1 +
  5 files changed, 6 insertions(+), 7 deletions(-)
>>>
>>> Reviewed-by: Rob Herring 
>> Thanks Rob. Pushed to master.
>
> Thank you.
>
> BTW, since libmesa_util is used by several places,
> is it better to change it to a shared library?
>
Adding more shared libraries is not what you want.

it increases the chances of symbol collision (think - libc.so and
libfoo.so exposing the same symbol) and implies ABI and/or API
stability.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Add RGBX8888 and RGBA8888 to EGL configure and reorder the list

2017-05-18 Thread Emil Velikov
On 18 May 2017 at 05:10, Chih-Wei Huang  wrote:
> 2017-05-18 12:01 GMT+08:00 Xu, Randy :
>>
>>> -Original Message-
>>> From: Palli, Tapani
>>> >
>>> > It's just applied. Isn't it?
>>> > Yesterday without this patch
>>> > the color format is mismatch apparently.
>>>
>>> Yeah we do need this. TBH I don't quite understand why but will try to 
>>> figure
>>> it out. I remember we used to have a patch in Surfaceflinger at one point
>>> because visual was hardcoded there and this might be related.
>>>
>>> // Tapani
>>
>> Sorry, that's for different issue, I mix it with RGB565 blending one.
>> This patch is required because some Android GLES test apps, like gl2_basic, 
>> need to create RGBA surface.
>
> Indeed it is.
>
> As Emil pointed out, the patch was merged before
> but reverted later since it broke desktop.
>
> So what's the current upstreaming plan?
>
No idea about a plan, but how you can fix it once and for all:

Extend the loader extension(s) to have a get_caps() callback,
analogous to __DRIimageExtension::getCapabilities.
Then the DRI module will query [the loader] and advertise the
RGBX/RGBA visuals when possible.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/android: fix segfault within swap_buffers

2017-05-18 Thread Emil Velikov
On 18 May 2017 at 08:21, Tapani Pälli  wrote:
> Function droid_swap_buffers may get called without dri2_surf->buffer set,
> in these cases we don't have a back buffer set either. Patch fixes segfault
> seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI.
>
> backtrace:
>#00 pc 00013f88  /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104)
>#01 pc 000117b2  /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50)
>#02 pc 58b2  /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386)
>#03 pc 00011329  /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553)
>#04 pc 000118e7  /system/lib/libEGL.so (eglSwapBuffers+55)
>#05 pc 000754dc  /system/lib/libandroid_runtime.so
>
> Note, this is v1 as v2 caused dEQP regressions.
>
> Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension")
> Signed-off-by: Tapani Pälli 
> ---
>  src/egl/drivers/dri2/platform_android.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/egl/drivers/dri2/platform_android.c 
> b/src/egl/drivers/dri2/platform_android.c
> index 49cbeb4..88f6af8 100644
> --- a/src/egl/drivers/dri2/platform_android.c
> +++ b/src/egl/drivers/dri2/platform_android.c
> @@ -628,7 +628,9 @@ droid_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, 
> _EGLSurface *draw)
>if (dri2_surf->color_buffers[i].age > 0)
>   dri2_surf->color_buffers[i].age++;
> }
> -   dri2_surf->back->age = 1;
> +
> +   if (dri2_surf->back)
> +  dri2_surf->back->age = 1;

Doesn't seem quite right to use get_back_bo() elsewhere and a NULL check here.

With a inline comment vaguely like "XXX: we don't use get_back_bo()
since it causes regressions in $app/test."
Acked-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >