Re: [Mesa-dev] [PATCH v3 00/10] Initial gl_spirv support in Mesa and i965

2017-12-14 Thread Eduardo Lima Mitev
Any chance to wrap up this review?

Thanks!

Eduardo


On 12/13/2017 08:32 PM, Eduardo Lima Mitev wrote:
> Hi,
>
> This is the 3rd version of the series adding initial support for ARB_gl_spirv.
>
> Previous versions of this series included also support for 
> ARB_spirv_extensions, but we have decided to split the two to ease review. So 
> I will be sending a second series with only the patches for spirv_extensions.
>
> Notice also that some patches from version 2 were merged in master. These 
> were already reviewed favorably and were fairly independent from the rest of 
> the series.
>
> There are still some patches in this new series with a Reviewed-by tag that 
> we didn't merge yet because we consider they should go in with the rest of 
> the series. The patches missing review are 01, 02, 03, 04 and 07.
>
> As usual, a git tree containing this series can be found at 
>  and the larger, 
> work-in-progress series at 
> .
>
> Thanks for reviewing!
>
> cheers,
> Eduardo
>
> Alejandro Piñeiro (2):
>   i965: initialize SPIR-V capabilities
>   nir/spirv: add gl_spirv_validation method
>
> Eduardo Lima Mitev (6):
>   mesa: Add a reference to gl_shader_spirv_data to gl_linked_shader
>   mesa/glspirv: Add _mesa_spirv_link_shaders() function
>   mesa/program: Link SPIR-V shaders using the SPIR-V code-path
>   mesa/glspirv: Add a _mesa_spirv_to_nir() function
>   i965: Call spirv_to_nir() instead of glsl_to_nir() for SPIR-V shaders
>   i965: Don't call process_glsl_ir() for SPIR-V shaders
>
> Nicolai Hähnle (2):
>   mesa: add gl_constants::SpirVCapabilities
>   mesa: Implement glSpecializeShaderARB
>
>  src/compiler/spirv/nir_spirv.h  |   5 +
>  src/compiler/spirv/spirv_to_nir.c   | 191 +++---
>  src/mesa/drivers/dri/i965/brw_context.c |  20 +++
>  src/mesa/drivers/dri/i965/brw_link.cpp  |   3 +-
>  src/mesa/drivers/dri/i965/brw_program.c |  10 +-
>  src/mesa/main/glspirv.c | 236 
> +++-
>  src/mesa/main/glspirv.h |  11 ++
>  src/mesa/main/mtypes.h  |  11 ++
>  src/mesa/main/shaderobj.c   |   1 +
>  src/mesa/program/ir_to_mesa.cpp |   6 +-
>  10 files changed, 472 insertions(+), 22 deletions(-)
>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 00/25] Initial gl_spirv and spirv_extensions support in Mesa and i965

2017-12-14 Thread Eduardo Lima Mitev
Oops, sorry, wrong thread.

This is version 2 of the series and there is a version 3 which is the
one that needs review.

Eduardo


On 12/15/2017 08:13 AM, Eduardo Lima Mitev wrote:
> Any chance to wrap up this review?
>
> Thanks!
>
> Eduardo
>
> On 11/30/2017 06:28 PM, Eduardo Lima Mitev wrote:
>> Hello,
>>
>> This is the second version of the series providing initial support for 
>> ARB_gl_spirv and ARB_spirv_extensions in Mesa and i965.
>>
>> First version of the series can be found at 
>> .
>>
>> In this series we hope we have addressed all issues detected during the 
>> initial review. Thank you all who participated!
>>
>> Taking the nitpicks and minor fixes apart, most important changes compared 
>> to the first version are:
>>
>> * A dedicated 'spirv' flag was removed from gl_shader struct. Now we use the 
>> nulness of 'spirv_data' member for the same purpose.
>>
>> * The per-program 'spirv' flag was moved out of this series, but will likely 
>> be re-introduced in the next delivery, because it will become necessary.
>>
>> * We enforce one SPIR-V shader per stage, and fail linking if this condition 
>> is not met.
>>
>> * 'SpirVCapabilities' struct of GL context constants is no longer a pointer 
>> but a static struct.
>>
>> As usual, a tree of this series can be found at 
>> .
>>
>> A tree of the larger WIP branch from which this series is taken: 
>> .
>>
>> Thanks in advance for the reviews!
>>
>> cheers,
>> Eduardo
>>
>> Alejandro Piñeiro (9):
>>   spirv_extensions: rename nir_spirv_supported_extensions
>>   mesa: move nir_spirv_supported_capabilities definition
>>   i965: initialize SPIR-V capabilities
>>   spirv_extensions: add GL_ARB_spirv_extensions boilerplate
>>   spirv_extensions: add list of extensions and to_string method
>>   spirv_extensions: define spirv_extensions_supported
>>   spirv_extensions: add spirv_supported_extensions on gl_constants
>>   spirv_extensions: i965: initialize SPIR-V extensions
>>   nir/spirv: add gl_spirv_validation method
>>
>> Eduardo Lima Mitev (8):
>>   mesa/glspirv: Add struct gl_shader_spirv_data
>>   mesa/glspirv: Add a _mesa_spirv_link_shaders() placeholder
>>   mesa/program: Link SPIR-V shaders using the SPIR-V code-path
>>   mesa: Add a reference to gl_shader_spirv_data to gl_linked_shader
>>   mesa/glspirv: Create gl_linked_shader objects for a SPIR-V program
>>   mesa/glspirv: Add a _mesa_spirv_to_nir() function
>>   i965: Call spirv_to_nir() instead of glsl_to_nir() for SPIR-V shaders
>>   i965: Don't call process_glsl_ir() for SPIR-V shaders
>>
>> Neil Roberts (1):
>>   mesa: Add boilerplate for the GL 4.6 alias of glSpecializeShaderARB
>>
>> Nicolai Hähnle (7):
>>   mesa: add GL_ARB_gl_spirv boilerplate
>>   mesa/glspirv: Add struct gl_spirv_module
>>   mesa: implement SPIR-V loading in glShaderBinary
>>   mesa/shaderapi: add a getter for GL_SPIR_V_BINARY_ARB
>>   mesa: refuse to compile SPIR-V shaders or link mixed shaders
>>   mesa: add gl_constants::SpirVCapabilities
>>   mesa: Implement glSpecializeShaderARB
>>
>>  src/amd/vulkan/radv_shader.c|   4 +-
>>  src/compiler/Makefile.sources   |   2 +
>>  src/compiler/spirv/nir_spirv.h  |  21 +-
>>  src/compiler/spirv/spirv_extensions.c   |  77 +++
>>  src/compiler/spirv/spirv_extensions.h   |  63 ++
>>  src/compiler/spirv/spirv_to_nir.c   | 160 +-
>>  src/compiler/spirv/vtn_private.h|   2 +-
>>  src/intel/vulkan/anv_pipeline.c |   4 +-
>>  src/mapi/glapi/gen/ARB_gl_spirv.xml |  21 ++
>>  src/mapi/glapi/gen/ARB_spirv_extensions.xml |  13 ++
>>  src/mapi/glapi/gen/GL4x.xml |  11 +
>>  src/mapi/glapi/gen/Makefile.am  |   2 +
>>  src/mapi/glapi/gen/gl_API.xml   |   8 +
>>  src/mapi/glapi/gen/gl_genexec.py|   1 +
>>  src/mapi/glapi/gen/meson.build  |   2 +
>>  src/mesa/Makefile.sources   |   4 +
>>  src/mesa/drivers/dri/i965/brw_context.c |  26 +++
>>  src/mesa/drivers/dri/i965/brw_link.cpp  |   3 +-
>>  src/mesa/drivers/dri/i965/brw_program.c |  14 +-
>>  src/mesa/main/context.c |   2 +
>>  src/mesa/main/extensions_table.h|   2 +
>>  src/mesa/main/get.c |   7 +
>>  src/mesa/main/get_hash_params.py|   3 +
>>  src/mesa/main/getstring.c   |  12 +
>>  src/mesa/main/glspirv.c | 331 
>> 
>>  src/mesa/main/glspirv.h | 108 +
>>  src/mesa/main/mtypes.h  |  31 +++
>>  src/mesa/main/shaderapi.c   |  60 -
>>  src/mesa/main/shaderobj.c   |   3 +
>>  src/mesa/main/spirv_extensions.c|  60 +

Re: [Mesa-dev] [PATCH v2 00/25] Initial gl_spirv and spirv_extensions support in Mesa and i965

2017-12-14 Thread Eduardo Lima Mitev
Any chance to wrap up this review?

Thanks!

Eduardo

On 11/30/2017 06:28 PM, Eduardo Lima Mitev wrote:
> Hello,
>
> This is the second version of the series providing initial support for 
> ARB_gl_spirv and ARB_spirv_extensions in Mesa and i965.
>
> First version of the series can be found at 
> .
>
> In this series we hope we have addressed all issues detected during the 
> initial review. Thank you all who participated!
>
> Taking the nitpicks and minor fixes apart, most important changes compared to 
> the first version are:
>
> * A dedicated 'spirv' flag was removed from gl_shader struct. Now we use the 
> nulness of 'spirv_data' member for the same purpose.
>
> * The per-program 'spirv' flag was moved out of this series, but will likely 
> be re-introduced in the next delivery, because it will become necessary.
>
> * We enforce one SPIR-V shader per stage, and fail linking if this condition 
> is not met.
>
> * 'SpirVCapabilities' struct of GL context constants is no longer a pointer 
> but a static struct.
>
> As usual, a tree of this series can be found at 
> .
>
> A tree of the larger WIP branch from which this series is taken: 
> .
>
> Thanks in advance for the reviews!
>
> cheers,
> Eduardo
>
> Alejandro Piñeiro (9):
>   spirv_extensions: rename nir_spirv_supported_extensions
>   mesa: move nir_spirv_supported_capabilities definition
>   i965: initialize SPIR-V capabilities
>   spirv_extensions: add GL_ARB_spirv_extensions boilerplate
>   spirv_extensions: add list of extensions and to_string method
>   spirv_extensions: define spirv_extensions_supported
>   spirv_extensions: add spirv_supported_extensions on gl_constants
>   spirv_extensions: i965: initialize SPIR-V extensions
>   nir/spirv: add gl_spirv_validation method
>
> Eduardo Lima Mitev (8):
>   mesa/glspirv: Add struct gl_shader_spirv_data
>   mesa/glspirv: Add a _mesa_spirv_link_shaders() placeholder
>   mesa/program: Link SPIR-V shaders using the SPIR-V code-path
>   mesa: Add a reference to gl_shader_spirv_data to gl_linked_shader
>   mesa/glspirv: Create gl_linked_shader objects for a SPIR-V program
>   mesa/glspirv: Add a _mesa_spirv_to_nir() function
>   i965: Call spirv_to_nir() instead of glsl_to_nir() for SPIR-V shaders
>   i965: Don't call process_glsl_ir() for SPIR-V shaders
>
> Neil Roberts (1):
>   mesa: Add boilerplate for the GL 4.6 alias of glSpecializeShaderARB
>
> Nicolai Hähnle (7):
>   mesa: add GL_ARB_gl_spirv boilerplate
>   mesa/glspirv: Add struct gl_spirv_module
>   mesa: implement SPIR-V loading in glShaderBinary
>   mesa/shaderapi: add a getter for GL_SPIR_V_BINARY_ARB
>   mesa: refuse to compile SPIR-V shaders or link mixed shaders
>   mesa: add gl_constants::SpirVCapabilities
>   mesa: Implement glSpecializeShaderARB
>
>  src/amd/vulkan/radv_shader.c|   4 +-
>  src/compiler/Makefile.sources   |   2 +
>  src/compiler/spirv/nir_spirv.h  |  21 +-
>  src/compiler/spirv/spirv_extensions.c   |  77 +++
>  src/compiler/spirv/spirv_extensions.h   |  63 ++
>  src/compiler/spirv/spirv_to_nir.c   | 160 +-
>  src/compiler/spirv/vtn_private.h|   2 +-
>  src/intel/vulkan/anv_pipeline.c |   4 +-
>  src/mapi/glapi/gen/ARB_gl_spirv.xml |  21 ++
>  src/mapi/glapi/gen/ARB_spirv_extensions.xml |  13 ++
>  src/mapi/glapi/gen/GL4x.xml |  11 +
>  src/mapi/glapi/gen/Makefile.am  |   2 +
>  src/mapi/glapi/gen/gl_API.xml   |   8 +
>  src/mapi/glapi/gen/gl_genexec.py|   1 +
>  src/mapi/glapi/gen/meson.build  |   2 +
>  src/mesa/Makefile.sources   |   4 +
>  src/mesa/drivers/dri/i965/brw_context.c |  26 +++
>  src/mesa/drivers/dri/i965/brw_link.cpp  |   3 +-
>  src/mesa/drivers/dri/i965/brw_program.c |  14 +-
>  src/mesa/main/context.c |   2 +
>  src/mesa/main/extensions_table.h|   2 +
>  src/mesa/main/get.c |   7 +
>  src/mesa/main/get_hash_params.py|   3 +
>  src/mesa/main/getstring.c   |  12 +
>  src/mesa/main/glspirv.c | 331 
> 
>  src/mesa/main/glspirv.h | 108 +
>  src/mesa/main/mtypes.h  |  31 +++
>  src/mesa/main/shaderapi.c   |  60 -
>  src/mesa/main/shaderobj.c   |   3 +
>  src/mesa/main/spirv_extensions.c|  60 +
>  src/mesa/main/spirv_extensions.h|  49 
>  src/mesa/main/tests/dispatch_sanity.cpp |   3 +
>  src/mesa/meson.build|   4 +
>  src/mesa/program/ir_to_mesa.cpp |  23 +-
>  34 files changed, 1098 insertions(+), 38 deletions(-)
>  create mode 

Re: [Mesa-dev] [PATCH 0/4] GL_EXT_disjoint_timer_query series

2017-12-14 Thread Tapani Pälli



On 14.12.2017 21:20, Ian Romanick wrote:

Since you remembered to modify dispatch_sanity.cpp in patch 2, I'm going
to assume that 'make check' still passes.  If that's the case, the series is

Reviewed-by: Ian Romanick 


Yes, 'make check' passes;

Thanks for the review Ian!



On 12/14/2017 04:03 AM, Tapani Pälli wrote:

Hi;

Here's a revisited GL_EXT_disjoint_timer_query series. One patch got
dropped (as discussed with Lionel) and enabling is now via
EXT_disjoint_timer_query boolean as was intended (Ian).

Thanks;

Tapani Pälli (4):
   mesa: add DisjointOperation to gl_shared_state
   glapi: add GL_EXT_disjoint_timer_query
   mesa: GL_EXT_disjoint_timer_query extension API bits
   i965: enable EXT_disjoint_timer_query extension

  src/mapi/glapi/gen/es_EXT.xml| 16 
  src/mapi/glapi/gen/gl_API.xml|  4 ++--
  src/mesa/drivers/dri/i965/intel_extensions.c |  2 ++
  src/mesa/main/extensions_table.h |  1 +
  src/mesa/main/get.c  | 17 +
  src/mesa/main/get_hash_params.py |  5 +
  src/mesa/main/glheader.h |  4 
  src/mesa/main/mtypes.h   |  9 +
  src/mesa/main/queryobj.c |  3 ++-
  src/mesa/main/robustness.c   |  1 +
  src/mesa/main/tests/dispatch_sanity.cpp  |  5 +
  11 files changed, 64 insertions(+), 3 deletions(-)




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.

2017-12-14 Thread Jason Ekstrand
Fine with me

Reviewed-by: Jason Ekstrand 

On Thu, Dec 14, 2017 at 4:56 PM, Kenneth Graunke 
wrote:

> According to the RENDER_SURFACE_STATE internal documentation, the
> R32G32B32_FLOAT restriction is marked "IVB" only.  We choose to apply
> it to Ivybridge and Baytrail, but not Haswell.
>
> Fixes KHR-GL46.texture_size_promotion.functional on Haswell.
>
> Changes these tests from crashing to skipping on Haswell:
> - KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f
> - KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f
> ---
>  src/intel/isl/isl_gen7.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c
> index c42428cba7a..4fa9851233f 100644
> --- a/src/intel/isl/isl_gen7.c
> +++ b/src/intel/isl/isl_gen7.c
> @@ -38,9 +38,11 @@ gen7_format_needs_valign2(const struct isl_device *dev,
>  *  (0x190)
>  *
>  *- VALIGN_4 is not supported for surface format R32G32B32_FLOAT.
> +*
> +* The R32G32B32_FLOAT restriction is dropped on Haswell.
>  */
> return isl_format_is_yuv(format) ||
> -  format == ISL_FORMAT_R32G32B32_FLOAT;
> +  (format == ISL_FORMAT_R32G32B32_FLOAT &&
> !ISL_DEV_IS_HASWELL(dev));
>  }
>
>  bool
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] spirv: Relax the validation conditions of OpSelect

2017-12-14 Thread Jason Ekstrand
It turns out there's already a glslang bug for this and it was closed in
March:

https://github.com/KhronosGroup/glslang/issues/809

Unfortunately, there are applications shipping with these shaders so
failure isn't really an option.

--Jason

On Thu, Dec 14, 2017 at 7:56 PM, Jason Ekstrand 
wrote:

> The Talos Principle contains shaders with an OpSelect between two
> vectors where the condition is a scalar boolean.  This is technically
> against the spec bout nir_builder gracefully handles it by splatting
> out the condition to all the channels.  So long as the condition is a
> boolean, just emit a warning instead of failing.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104246
> ---
>  src/compiler/spirv/spirv_to_nir.c | 18 ++
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/src/compiler/spirv/spirv_to_nir.c
> b/src/compiler/spirv/spirv_to_nir.c
> index 0493dd3..f0476b2 100644
> --- a/src/compiler/spirv/spirv_to_nir.c
> +++ b/src/compiler/spirv/spirv_to_nir.c
> @@ -3511,10 +3511,20 @@ vtn_handle_body_instruction(struct vtn_builder
> *b, SpvOp opcode,
>   vtn_fail("Result type of OpSelect must be a scalar, vector, or
> pointer");
>}
>
> -  vtn_fail_if(sel_val->type->type != sel_type,
> -  "Condition type of OpSelect must be a scalar or vector
> of "
> -  "Boolean type. It must have the same number of
> components "
> -  "as Result Type");
> +  if (unlikely(sel_val->type->type != sel_type)) {
> + if (sel_val->type->type == glsl_bool_type()) {
> +/* This case is illegal but some versions of GLSLang produce
> it.
> + * That's fine, nir_builder will just splat the condition out
> + * which is most likely what the client wanted anyway.
> + */
> +vtn_warn("Condition type of OpSelect must have the same
> number "
> + "of components as Result Type");
> + } else {
> +vtn_fail("Condition type of OpSelect must be a scalar or
> vector "
> + "of Boolean type. It must have the same number of "
> + "components as Result Type");
> + }
> +  }
>
>vtn_fail_if(obj1_val->type != res_val->type ||
>obj2_val->type != res_val->type,
> --
> 2.5.0.400.gff86faf
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 07/20] ac: move some helpers to ac_llvm_build.c

2017-12-14 Thread Dieter Nützel

This one do not apply anylonger after Samuel's commit
amd/common: add ac_build_waitcnt()
#225b19880204024a805cc54b1001d09ef3b58054

For your motivation:
I've tested V1 and V2 of the whole series (before the latest master 
commits) and could ran _all_ my 'normal' stuff.


Even UH run with GREAT tess speed without any hick ups.

GREAT stuff!

V1 + V2 have my tb even before Nicolai formulated his comments.

BTW There are many commits waiting...;-)

Cheers,
Dieter

Am 13.12.2017 08:52, schrieb Timothy Arceri:

We will call these from the radeonsi NIR backend.

Reviewed-by: Nicolai Hähnle 
---
 src/amd/common/ac_llvm_build.c  | 24 +
 src/amd/common/ac_llvm_build.h  |  8 ++
 src/amd/common/ac_nir_to_llvm.c | 58 
+

 3 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c 
b/src/amd/common/ac_llvm_build.c

index b2bf1bf7b51..faa08b6301c 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -91,20 +91,44 @@ ac_llvm_context_init(struct ac_llvm_context *ctx,
LLVMContextRef context,

args[0] = LLVMConstReal(ctx->f32, 2.5);
ctx->fpmath_md_2p5_ulp = LLVMMDNodeInContext(ctx->context, args, 1);

ctx->uniform_md_kind = LLVMGetMDKindIDInContext(ctx->context,
"amdgpu.uniform", 14);

ctx->empty_md = LLVMMDNodeInContext(ctx->context, NULL, 0);
 }

+int
+ac_get_llvm_num_components(LLVMValueRef value)
+{
+   LLVMTypeRef type = LLVMTypeOf(value);
+   unsigned num_components = LLVMGetTypeKind(type) == LLVMVectorTypeKind
+ ? LLVMGetVectorSize(type)
+ : 1;
+   return num_components;
+}
+
+LLVMValueRef
+ac_llvm_extract_elem(struct ac_llvm_context *ac,
+LLVMValueRef value,
+int index)
+{
+   int count = ac_get_llvm_num_components(value);
+
+   if (count == 1)
+   return value;
+
+   return LLVMBuildExtractElement(ac->builder, value,
+  LLVMConstInt(ac->i32, index, false), "");
+}
+
 unsigned
 ac_get_type_size(LLVMTypeRef type)
 {
LLVMTypeKind kind = LLVMGetTypeKind(type);

switch (kind) {
case LLVMIntegerTypeKind:
return LLVMGetIntTypeWidth(type) / 8;
case LLVMFloatTypeKind:
return 4;
diff --git a/src/amd/common/ac_llvm_build.h 
b/src/amd/common/ac_llvm_build.h

index 655dc1dcc86..c14b0d9f019 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -75,20 +75,28 @@ struct ac_llvm_context {

enum chip_class chip_class;

LLVMValueRef lds;
 };

 void
 ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef 
context,

 enum chip_class chip_class);

+int
+ac_get_llvm_num_components(LLVMValueRef value);
+
+LLVMValueRef
+ac_llvm_extract_elem(struct ac_llvm_context *ac,
+LLVMValueRef value,
+int index);
+
 unsigned ac_get_type_size(LLVMTypeRef type);

 LLVMTypeRef ac_to_integer_type(struct ac_llvm_context *ctx, 
LLVMTypeRef t);
 LLVMValueRef ac_to_integer(struct ac_llvm_context *ctx, LLVMValueRef 
v);
 LLVMTypeRef ac_to_float_type(struct ac_llvm_context *ctx, LLVMTypeRef 
t);

 LLVMValueRef ac_to_float(struct ac_llvm_context *ctx, LLVMValueRef v);

 LLVMValueRef
 ac_build_intrinsic(struct ac_llvm_context *ctx, const char *name,
   LLVMTypeRef return_type, LLVMValueRef *params,
diff --git a/src/amd/common/ac_nir_to_llvm.c 
b/src/amd/common/ac_nir_to_llvm.c

index 6f84604d54a..6060df75314 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -995,46 +995,24 @@ static void create_function(struct
nir_to_llvm_context *ctx,
set_userdata_location_shader(ctx, 
AC_UD_PS_SAMPLE_POS_OFFSET,
_sgpr_idx, 1);
}
break;
default:
unreachable("Shader stage not implemented");
}

ctx->shader_info->num_user_sgprs = user_sgpr_idx;
 }

-static int get_llvm_num_components(LLVMValueRef value)
-{
-   LLVMTypeRef type = LLVMTypeOf(value);
-   unsigned num_components = LLVMGetTypeKind(type) == LLVMVectorTypeKind
- ? LLVMGetVectorSize(type)
- : 1;
-   return num_components;
-}
-
-static LLVMValueRef llvm_extract_elem(struct ac_llvm_context *ac,
- LLVMValueRef value,
- int index)
-{
-   int count = get_llvm_num_components(value);
-
-   if (count == 1)
-   return value;
-
-   return LLVMBuildExtractElement(ac->builder, value,
-  LLVMConstInt(ac->i32, index, false), "");
-}
-
 static LLVMValueRef trim_vector(struct 

Re: [Mesa-dev] [PATCH] st/st_glsl_to_nir: call nir_lower_64bit_pack

2017-12-14 Thread Dieter Nützel

Tested-by: Dieter Nützel 

Dieter

Am 14.12.2017 06:02, schrieb Timothy Arceri:

Fixes 56 crashes in radeonsi.
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index 7c9e76a2dce..5683dfe 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -267,20 +267,21 @@ st_nir_assign_uniform_locations(struct gl_program 
*prog,

*size = max;
 }

 static void
 st_nir_opts(nir_shader *nir)
 {
bool progress;
do {
   progress = false;

+  NIR_PASS_V(nir, nir_lower_64bit_pack);
   NIR_PASS(progress, nir, nir_copy_prop);
   NIR_PASS(progress, nir, nir_opt_remove_phis);
   NIR_PASS(progress, nir, nir_opt_dce);
   if (nir_opt_trivial_continues(nir)) {
  progress = true;
  NIR_PASS(progress, nir, nir_copy_prop);
  NIR_PASS(progress, nir, nir_opt_dce);
   }
   NIR_PASS(progress, nir, nir_opt_if);
   NIR_PASS(progress, nir, nir_opt_dead_cf);

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/glsl_to_nir: call post opt functions after opts have finished

2017-12-14 Thread Dieter Nützel

Tested-by: Dieter Nützel 

Dieter

Am 14.12.2017 04:48, schrieb Timothy Arceri:

We need to move this to a separate loop because
nir_compact_varyings() can alter the IR of a previous stage.

Fixes: 6648bd68fd27 "st/glsl_to_nir: enable NIR link time opts"
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index be34031bfb5..7c9e76a2dce 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -621,20 +621,26 @@ st_link_nir(struct gl_context *ctx,
   NIR_PASS_V(nir, nir_lower_system_values);

   nir_shader_gather_info(nir, nir_shader_get_entrypoint(nir));
   shader->Program->info = nir->info;

   if (prev != -1) {

nir_compact_varyings(shader_program->_LinkedShaders[prev]->Program->nir,
   nir, ctx->API != API_OPENGL_COMPAT);
   }
   prev = i;
+   }
+
+   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
+  struct gl_linked_shader *shader = 
shader_program->_LinkedShaders[i];

+  if (shader == NULL)
+ continue;

   st_glsl_to_nir_post_opts(st, shader->Program, shader_program);

   assert(shader->Program);
   if (!ctx->Driver.ProgramStringNotify(ctx,

_mesa_shader_stage_to_program(i),

shader->Program)) {
  _mesa_reference_program(ctx, >Program, NULL);
  return false;
   }

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/glsl_to_nir: add patch support to st_nir_assign_var_locations()

2017-12-14 Thread Dieter Nützel

Tested-by: Dieter Nützel 

Dieter

Am 14.12.2017 00:14, schrieb Timothy Arceri:

---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index 70c5daaa225..be34031bfb5 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -119,46 +119,58 @@ st_nir_assign_vs_in_locations(struct gl_program
*prog, nir_shader *nir)
  exec_list_push_tail(>globals, >node);
   }
}
 }

 static void
 st_nir_assign_var_locations(struct exec_list *var_list, unsigned 
*size,

 gl_shader_stage stage)
 {
unsigned location = 0;
-   unsigned assigned_locations[VARYING_SLOT_MAX];
+   unsigned assigned_locations[VARYING_SLOT_TESS_MAX];
uint64_t processed_locs = 0;
+   uint32_t processed_patch_locs = 0;

nir_foreach_variable(var, var_list) {

   const struct glsl_type *type = var->type;
   if (nir_is_per_vertex_io(var, stage)) {
  assert(glsl_type_is_array(type));
  type = glsl_get_array_element(type);
   }

+  bool processed = false;
+  if (var->data.patch) {
+ unsigned patch_loc = var->data.location - VARYING_SLOT_VAR0;
+ if (processed_patch_locs & (1 << patch_loc))
+processed = true;
+
+ processed_patch_locs |= (1 << patch_loc);
+  } else {
+ if (processed_locs & ((uint64_t)1 << var->data.location))
+processed = true;
+
+ processed_locs |= ((uint64_t)1 << var->data.location);
+  }
+
   /* Because component packing allows varyings to share the same 
location

* we may have already have processed this location.
*/
-  if (var->data.location >= VARYING_SLOT_VAR0 &&
-  processed_locs & ((uint64_t)1 << var->data.location)) {
+  if (processed && var->data.location >= VARYING_SLOT_VAR0) {
  var->data.driver_location = 
assigned_locations[var->data.location];

  *size += type_size(type);
  continue;
   }

   assigned_locations[var->data.location] = location;
   var->data.driver_location = location;
   location += type_size(type);
-
-  processed_locs |= ((uint64_t)1 << var->data.location);
}

*size += location;
 }

 static int
 st_nir_lookup_parameter_index(const struct gl_program_parameter_list 
*params,

   const char *name)
 {
int loc = _mesa_lookup_parameter_index(params, name);

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] spirv: Relax the validation conditions of OpSelect

2017-12-14 Thread Jason Ekstrand
The Talos Principle contains shaders with an OpSelect between two
vectors where the condition is a scalar boolean.  This is technically
against the spec bout nir_builder gracefully handles it by splatting
out the condition to all the channels.  So long as the condition is a
boolean, just emit a warning instead of failing.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104246
---
 src/compiler/spirv/spirv_to_nir.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 0493dd3..f0476b2 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -3511,10 +3511,20 @@ vtn_handle_body_instruction(struct vtn_builder *b, 
SpvOp opcode,
  vtn_fail("Result type of OpSelect must be a scalar, vector, or 
pointer");
   }
 
-  vtn_fail_if(sel_val->type->type != sel_type,
-  "Condition type of OpSelect must be a scalar or vector of "
-  "Boolean type. It must have the same number of components "
-  "as Result Type");
+  if (unlikely(sel_val->type->type != sel_type)) {
+ if (sel_val->type->type == glsl_bool_type()) {
+/* This case is illegal but some versions of GLSLang produce it.
+ * That's fine, nir_builder will just splat the condition out
+ * which is most likely what the client wanted anyway.
+ */
+vtn_warn("Condition type of OpSelect must have the same number "
+ "of components as Result Type");
+ } else {
+vtn_fail("Condition type of OpSelect must be a scalar or vector "
+ "of Boolean type. It must have the same number of "
+ "components as Result Type");
+ }
+  }
 
   vtn_fail_if(obj1_val->type != res_val->type ||
   obj2_val->type != res_val->type,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.

2017-12-14 Thread Kenneth Graunke
According to the RENDER_SURFACE_STATE internal documentation, the
R32G32B32_FLOAT restriction is marked "IVB" only.  We choose to apply
it to Ivybridge and Baytrail, but not Haswell.

Fixes KHR-GL46.texture_size_promotion.functional on Haswell.

Changes these tests from crashing to skipping on Haswell:
- KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f
- KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f
---
 src/intel/isl/isl_gen7.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c
index c42428cba7a..4fa9851233f 100644
--- a/src/intel/isl/isl_gen7.c
+++ b/src/intel/isl/isl_gen7.c
@@ -38,9 +38,11 @@ gen7_format_needs_valign2(const struct isl_device *dev,
 *  (0x190)
 *
 *- VALIGN_4 is not supported for surface format R32G32B32_FLOAT.
+*
+* The R32G32B32_FLOAT restriction is dropped on Haswell.
 */
return isl_format_is_yuv(format) ||
-  format == ISL_FORMAT_R32G32B32_FLOAT;
+  (format == ISL_FORMAT_R32G32B32_FLOAT && !ISL_DEV_IS_HASWELL(dev));
 }
 
 bool
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] drirc: add option to disable ARB_draw_indirect

2017-12-14 Thread Rob Clark
On Wed, Dec 6, 2017 at 3:31 PM, Ian Romanick  wrote:
> On 12/05/2017 08:25 AM, Ilia Mirkin wrote:
>> On Tue, Dec 5, 2017 at 8:18 AM, Emil Velikov  
>> wrote:
>>> Hi Rob,
>>>
>>> On 5 December 2017 at 12:54, Rob Clark  wrote:
 This is a bit sad/annoying.  But with current GPU firmware (at least on
 a5xx) we can support both draw-indirect and base-instance.  But we can't
 support draw-indirect with a non-zero base-instance specified.  So add a
 driconf option to hide the extension from games that are known to use
 both.

 Signed-off-by: Rob Clark 
 ---
 Tbh, I'm also not really sure what to do when/if we got updated firmware
 which handled draw-indirect with base-instance, since we'd need to make
 this option conditional on fw version.  For STK that probably isn't a
 big deal since it doesn't use draw-indirect in a particularly useful way
 (the indirect buffer is generated on CPU).

>>> Couldn't freedreno just return 0 for PIPE_CAP_DRAW_INDIRECT (aka
>>> disable the extension) as it detects buggy FW?
>>> This is what radeons have been doing as they encounter iffy firmware or 
>>> LLVM.
>>>
>>> AFAICT freedreno doesn't do GL 4.0 or GLES 3.1 so one should be safe.
>>
>> Rob is this -><- close to ES 3.1, so that's not a great option.
>
> And I don't suppose there's a way to get updated firmware?  i965 has
> similar sorts of cases where higher versions are disabled due to missing
> kernel features.
>

so after r/e the instruction set for the CP microcontrollers and
writing a disassembler and assembler[1], and figuring out how the fw
handles CP_DRAW_INDIRECT and CP_DRAW_INDX_INDIRECT packets, I've come
to the conclusion that the issue isn't actually with draw-indirect vs
base-instance (at least not w/ the fw from my pixel2 which md5sum
claims is the same as what is in linux-firmware.. it is possible that
I was using an earlier version of the fw before when I came to this
conclusion).  On the plus side, the PFP/ME microcontrollers that parse
the cmdstream are pretty neat and I learned some useful stuff along
the way.

But thinking a bit about how stk is using GL_MAP_PERSISTENT_BIT to map
and update the draw-indirect buffers, it seems to me there are plenty
of ways this can go wrong w/ tilers (and even more when you throw
re-ordering into the mix).  Possibly I should disable reordering when
the indirect buffer is mapped w/ PERSISTENT bit, although for games
like stk this is probably counter-productive vs just hiding the
draw-indirect extension.. for games that actually use the GPU to write
the draw-indirect buffer it shouldn't be a problem.  So I think a
driconf patch like this probably still ends up being useful in the
end.

BR,
-R

[1] https://github.com/freedreno/envytools/tree/afuc/afuc
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] amd/common: scan which components of gl_WorkGroupID are used

2017-12-14 Thread Samuel Pitoiset



On 12/14/2017 08:32 PM, Bas Nieuwenhuizen wrote:

On Thu, Dec 14, 2017 at 4:48 PM, Samuel Pitoiset
 wrote:

Signed-off-by: Samuel Pitoiset 
---
  src/amd/common/ac_shader_info.c | 8 
  src/amd/common/ac_shader_info.h | 1 +
  2 files changed, 9 insertions(+)

diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
index 09dd4bbd55..01949770d6 100644
--- a/src/amd/common/ac_shader_info.c
+++ b/src/amd/common/ac_shader_info.c
@@ -45,6 +45,14 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
ac_shader_info *info)
 case nir_intrinsic_load_num_work_groups:
 info->cs.uses_grid_size = true;
 break;
+   case nir_intrinsic_load_work_group_id: {
+   unsigned mask = nir_ssa_def_components_read(>dest.ssa);


Nice find that there is an utility function for this.


Not me, Timothy gave me the hint yesterday. :)



Reviewed-by: Bas Nieuwenhuizen 

+   while (mask) {
+   unsigned i = u_bit_scan();
+   info->cs.uses_block_id[i] = true;
+   }
+   break;
+   }
 case nir_intrinsic_load_sample_id:
 info->ps.force_persample = true;
 break;
diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
index 3c809cce13..7beefd02ac 100644
--- a/src/amd/common/ac_shader_info.h
+++ b/src/amd/common/ac_shader_info.h
@@ -43,6 +43,7 @@ struct ac_shader_info {
 } ps;
 struct {
 bool uses_grid_size;
+   bool uses_block_id[3];
 } cs;
  };

--
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 19/20] swr/rast: EXTRACT2 changed from vextract/vinsert to vshuffle

2017-12-14 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 60 ++
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |  3 +-
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 30 +--
 3 files changed, 32 insertions(+), 61 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index bdcafd28a3..0774889af1 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -653,16 +653,14 @@ namespace SwrJit
 }
 else
 {
-Value *src0 = EXTRACT2_F(vSrc, 0);
-Value *src1 = EXTRACT2_F(vSrc, 1);
+Value *src0 = EXTRACT2(vSrc, 0);
+Value *src1 = EXTRACT2(vSrc, 1);
 
-Value *indices0 = EXTRACT2_I(vIndices, 0);
-Value *indices1 = EXTRACT2_I(vIndices, 1);
+Value *indices0 = EXTRACT2(vIndices, 0);
+Value *indices1 = EXTRACT2(vIndices, 1);
 
-Value *vmask16 = VMASK2(vMask);
-
-Value *mask0 = MASK(EXTRACT2_I(vmask16, 0));  // TODO: do this 
better..
-Value *mask1 = MASK(EXTRACT2_I(vmask16, 1));
+Value *mask0 = EXTRACT2(vMask, 0);
+Value *mask1 = EXTRACT2(vMask, 1);
 
 Value *gather0 = GATHERPS(src0, pBase, indices0, mask0, scale);
 Value *gather1 = GATHERPS(src1, pBase, indices1, mask1, scale);
@@ -738,16 +736,14 @@ namespace SwrJit
 }
 else
 {
-Value *src0 = EXTRACT2_F(vSrc, 0);
-Value *src1 = EXTRACT2_F(vSrc, 1);
-
-Value *indices0 = EXTRACT2_I(vIndices, 0);
-Value *indices1 = EXTRACT2_I(vIndices, 1);
+Value *src0 = EXTRACT2(vSrc, 0);
+Value *src1 = EXTRACT2(vSrc, 1);
 
-Value *vmask16 = VMASK2(vMask);
+Value *indices0 = EXTRACT2(vIndices, 0);
+Value *indices1 = EXTRACT2(vIndices, 1);
 
-Value *mask0 = MASK(EXTRACT2_I(vmask16, 0));  // TODO: do this 
better..
-Value *mask1 = MASK(EXTRACT2_I(vmask16, 1));
+Value *mask0 = EXTRACT2(vMask, 0);
+Value *mask1 = EXTRACT2(vMask, 1);
 
 Value *gather0 = GATHERDD(src0, pBase, indices0, mask0, scale);
 Value *gather1 = GATHERDD(src1, pBase, indices1, mask1, scale);
@@ -809,34 +805,12 @@ namespace SwrJit
 }
 
 #if USE_SIMD16_BUILDER
-//
-/// @brief
-Value *Builder::EXTRACT2_F(Value *a2, uint32_t imm)
-{
-const uint32_t i0 = (imm > 0) ? mVWidth : 0;
-
-Value *result = VUNDEF_F();
-
-for (uint32_t i = 0; i < mVWidth; i += 1)
-{
-#if 1
-if (!a2->getType()->getScalarType()->isFloatTy())
-{
-a2 = BITCAST(a2, mSimd2FP32Ty);
-}
-
-#endif
-Value *temp = VEXTRACT(a2, C(i0 + i));
-
-result = VINSERT(result, temp, C(i));
-}
-
-return result;
-}
-
-Value *Builder::EXTRACT2_I(Value *a2, uint32_t imm)
+Value *Builder::EXTRACT2(Value *x, uint32_t imm)
 {
-return BITCAST(EXTRACT2_F(a2, imm), mSimdInt32Ty);
+if (imm == 0)
+return VSHUFFLE(x, UndefValue::get(x->getType()), {0, 1, 2, 3, 4, 
5, 6, 7});
+else
+return VSHUFFLE(x, UndefValue::get(x->getType()), {8, 9, 10, 11, 
12, 13, 14, 15});
 }
 
 Value *Builder::JOIN2(Value *a, Value *b)
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
index 98bc563351..646ed0efb2 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
@@ -117,8 +117,7 @@ Value *VMASK2(Value *mask);
 //
 
 #if USE_SIMD16_BUILDER
-Value *EXTRACT2_F(Value *a2, uint32_t imm);
-Value *EXTRACT2_I(Value *a2, uint32_t imm);
+Value *EXTRACT2(Value *x, uint32_t imm);
 Value *JOIN2(Value *a, Value *b);
 #endif
 
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index 8d97ddfdc9..aa911b58f3 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -1078,14 +1078,12 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 vOffsets16 = ADD(vOffsets16, vInstanceStride16);
 
 // TODO: remove the following simd8 interop stuff once all code paths 
are fully widened to SIMD16..
-Value *vmask16 = VMASK2(vGatherMask16);
 
-Value *vGatherMask  = MASK(EXTRACT2_I(vmask16, 0));
-Value *vGatherMask2 = MASK(EXTRACT2_I(vmask16, 1));
-
-Value *vOffsets  = EXTRACT2_I(vOffsets16, 0);
-Value 

[Mesa-dev] [PATCH 17/20] swr/rast: Replace VPSRL with LSHR

2017-12-14 Thread Tim Rowley
Replace use of x86 intrinsic with general llvm IR instruction.

Generates the same final assembly.
---
 .../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |  2 --
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 30 --
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |  5 
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp|  8 +++---
 4 files changed, 4 insertions(+), 41 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
index 8bbf36d9b8..9544353eb9 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
@@ -47,8 +47,6 @@ intrinsics = [
 ['VGATHERPS_16', 'x86_avx512_gather_dps_512', ['src', 'pBase', 
'indices', 'mask', 'scale']],
 ['VGATHERDD', 'x86_avx2_gather_d_d_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
 ['VGATHERDD_16', 'x86_avx512_gather_dpi_512', ['src', 'pBase', 
'indices', 'mask', 'scale']],
-['VPSRLI', 'x86_avx2_psrli_d', ['src', 'imm']],
-['VPSRLI_16', 'x86_avx512_psrli_d_512', ['src', 'imm']],
 ['VSQRTPS', 'x86_avx_sqrt_ps_256', ['a']],
 ['VRSQRTPS', 'x86_avx_rsqrt_ps_256', ['a']],
 ['VRCPPS', 'x86_avx_rcp_ps_256', ['a']],
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index 684c9fac54..bdcafd28a3 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -808,36 +808,6 @@ namespace SwrJit
 return vGather;
 }
 
-#if USE_SIMD16_BUILDER
-Value *Builder::PSRLI(Value *a, Value *imm)
-{
-return VPSRLI(a, imm);
-}
-
-Value *Builder::PSRLI_16(Value *a, Value *imm)
-{
-Value *result = VUNDEF2_I();
-
-// use avx512 shift right instruction if available
-if (JM()->mArch.AVX512F())
-{
-result = VPSRLI_16(a, imm);
-}
-else
-{
-Value *a0 = EXTRACT2_I(a, 0);
-Value *a1 = EXTRACT2_I(a, 1);
-
-Value *result0 = PSRLI(a0, imm);
-Value *result1 = PSRLI(a1, imm);
-
-result = JOIN2(result0, result1);
-}
-
-return result;
-}
-
-#endif
 #if USE_SIMD16_BUILDER
 //
 /// @brief
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
index 6c883d8f52..98bc563351 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
@@ -143,11 +143,6 @@ void GATHER4DD(const SWR_FORMAT_INFO , Value* 
pSrcBase, Value* byteOffsets,
 
 Value *GATHERPD(Value* src, Value* pBase, Value* indices, Value* mask, uint8_t 
scale = 1);
 
-#if USE_SIMD16_BUILDER
-Value *PSRLI(Value *a, Value *imm);
-Value *PSRLI_16(Value *a, Value *imm);
-
-#endif
 void SCATTERPS(Value* pDst, Value* vSrc, Value* vOffsets, Value* vMask);
 
 void Shuffle8bpcGather4(const SWR_FORMAT_INFO , Value* vGatherInput, 
Value* vGatherOutput[], bool bPackedOutput);
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index 1312ac0009..8d97ddfdc9 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -1422,12 +1422,12 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 // But, we know that elements must be aligned 
for FETCH. :)
 // Right shift the offset by a bit and then 
scale by 2 to remove the sign extension.
 #if USE_SIMD16_BUILDER
-Value *shiftedOffsets = VPSRLI_16(vOffsets16, 
C(1));
+Value *shiftedOffsets = LSHR(vOffsets16, 1);
 pVtxSrc2[currentVertexElement] = 
GATHERPS_16(gatherSrc16, pStreamBase, shiftedOffsets, vGatherMask16, 2);
 
 #else
-Value *vShiftedOffsets = VPSRLI(vOffsets, 
C(1));
-Value *vShiftedOffsets2 = VPSRLI(vOffsets2, 
C(1));
+Value *vShiftedOffsets = LSHR(vOffsets, 1);
+Value *vShiftedOffsets2 = LSHR(vOffsets2, 1);
 
 vVertexElements[currentVertexElement]  = 
GATHERPS(gatherSrc, pStreamBase, vShiftedOffsets, vGatherMask, 2);
 vVertexElements2[currentVertexElement] = 
GATHERPS(gatherSrc2, pStreamBase, vShiftedOffsets2, vGatherMask2, 2);
@@ -1492,7 +1492,7 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 // However, 

[Mesa-dev] [PATCH 20/20] swr/rast: Move more RTAI handling out of binner

2017-12-14 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 13 +
 src/gallium/drivers/swr/rasterizer/core/clip.h |  1 +
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 7ef87c4443..9aa9f9e79b 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -1023,18 +1023,7 @@ void BinPostSetupPointsImpl(
 SIMD_T::store_si(reinterpret_cast(aMTBottom), bbox.ymax);
 
 // store render target array index
-OSALIGNSIMD16(uint32_t) aRTAI[SIMD_WIDTH];
-if (state.backendState.readRenderTargetArrayIndex)
-{
-typename SIMD_T::Vec4 vRtai[2];
-pa.Assemble(VERTEX_SGV_SLOT, vRtai);
-typename SIMD_T::Integer vRtaii = 
SIMD_T::castps_si(vRtai[0][VERTEX_SGV_RTAI_COMP]);
-SIMD_T::store_si(reinterpret_cast(aRTAI), vRtaii);
-}
-else
-{
-SIMD_T::store_si(reinterpret_cast(aRTAI), SIMD_T::setzero_si());
-}
+const uint32_t *aRTAI = reinterpret_cast();
 
 OSALIGNSIMD16(float) aPointSize[SIMD_WIDTH];
 SIMD_T::store_ps(reinterpret_cast(aPointSize), vPointSize);
diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index e5e00d49b0..592c9bfa73 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -646,6 +646,7 @@ public:
 
 PA_STATE_OPT clipPA(pDC, numEmittedPrims, reinterpret_cast([0]), numEmittedVerts, SWR_VTX_NUM_SLOTS, true, 
NumVertsPerPrim, clipTopology);
 clipPA.viewportArrayActive = pa.viewportArrayActive;
+clipPA.rtArrayActive = pa.rtArrayActive;
 
 static const uint32_t primMaskMap[] = { 0x0, 0x1, 0x3, 0x7, 0xf, 
0x1f, 0x3f, 0x7f };
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/20] swr/rast: SIMD16 Fetch - Fully widen 32-bit integer vertex components

2017-12-14 Thread Tim Rowley
Also widen the 16-bit a 8-bit integer vertex component gathers to SIMD16.
---
 .../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |  1 +
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 36 +
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |  3 +
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 86 +-
 4 files changed, 109 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
index ac8b3badf6..8bbf36d9b8 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
@@ -46,6 +46,7 @@ intrinsics = [
 ['VGATHERPS', 'x86_avx2_gather_d_ps_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
 ['VGATHERPS_16', 'x86_avx512_gather_dps_512', ['src', 'pBase', 
'indices', 'mask', 'scale']],
 ['VGATHERDD', 'x86_avx2_gather_d_d_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
+['VGATHERDD_16', 'x86_avx512_gather_dpi_512', ['src', 'pBase', 
'indices', 'mask', 'scale']],
 ['VPSRLI', 'x86_avx2_psrli_d', ['src', 'imm']],
 ['VPSRLI_16', 'x86_avx512_psrli_d_512', ['src', 'imm']],
 ['VSQRTPS', 'x86_avx_sqrt_ps_256', ['a']],
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index 3a486e4c1e..684c9fac54 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -723,6 +723,42 @@ namespace SwrJit
 return vGather;
 }
 
+#if USE_SIMD16_BUILDER
+Value *Builder::GATHERDD_16(Value *vSrc, Value *pBase, Value *vIndices, 
Value *vMask, uint8_t scale)
+{
+Value *vGather = VUNDEF2_F();
+
+// use avx512 gather instruction if available
+if (JM()->mArch.AVX512F())
+{
+// force mask to , required by vgather2
+Value *mask = BITCAST(vMask, mInt16Ty);
+
+vGather = VGATHERDD_16(vSrc, pBase, vIndices, mask, 
C((uint32_t)scale));
+}
+else
+{
+Value *src0 = EXTRACT2_F(vSrc, 0);
+Value *src1 = EXTRACT2_F(vSrc, 1);
+
+Value *indices0 = EXTRACT2_I(vIndices, 0);
+Value *indices1 = EXTRACT2_I(vIndices, 1);
+
+Value *vmask16 = VMASK2(vMask);
+
+Value *mask0 = MASK(EXTRACT2_I(vmask16, 0));  // TODO: do this 
better..
+Value *mask1 = MASK(EXTRACT2_I(vmask16, 1));
+
+Value *gather0 = GATHERDD(src0, pBase, indices0, mask0, scale);
+Value *gather1 = GATHERDD(src1, pBase, indices1, mask1, scale);
+
+vGather = JOIN2(gather0, gather1);
+}
+
+return vGather;
+}
+
+#endif
 //
 /// @brief Generate a masked gather operation in LLVM IR.  If not
 /// supported on the underlying platform, emulate it with loads
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
index 231bd6ad85..6c883d8f52 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
@@ -135,6 +135,9 @@ void GATHER4PS(const SWR_FORMAT_INFO , Value* 
pSrcBase, Value* byteOffsets,
Value* mask, Value* vGatherComponents[], bool bPackedOutput);
 
 Value *GATHERDD(Value* src, Value* pBase, Value* indices, Value* mask, uint8_t 
scale = 1);
+#if USE_SIMD16_BUILDER
+Value *GATHERDD_16(Value *src, Value *pBase, Value *indices, Value *mask, 
uint8_t scale = 1);
+#endif
 void GATHER4DD(const SWR_FORMAT_INFO , Value* pSrcBase, Value* 
byteOffsets,
Value* mask, Value* vGatherComponents[], bool bPackedOutput);
 
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index e0a0770560..ec3b5eafcc 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -1349,14 +1349,6 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 if (compMask)
 {
 #if USE_SIMD16_BUILDER
-#if USE_SIMD16_BUILDER
-#else
-Value *gatherResult[2];
-
-gatherResult[0] = JOIN2(vGatherResult[0], 
vGatherResult2[0]);
-gatherResult[1] = JOIN2(vGatherResult[1], 
vGatherResult2[1]);
-
-#endif
 Value *pVtxOut2 = BITCAST(pVtxOut, 
PointerType::get(VectorType::get(mFP32Ty, mVWidth2), 0));
 
 Shuffle16bpcArgs args = 
std::forward_as_tuple(gatherResult, pVtxOut2, Instruction::CastOps::FPExt, 
CONVERT_NONE,
@@ -1701,6 +1693,9 @@ void FetchJit::JitGatherVertices(const 

[Mesa-dev] [PATCH 18/20] swr/rast: Fix cache of API thread event manager

2017-12-14 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/api.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 25a3f34841..09b482dcc0 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -166,7 +166,7 @@ HANDLE SwrCreateContext(
 
 #if defined(KNOB_ENABLE_AR)
 // cache the API thread event manager, for use with sim layer
-pCreateInfo->hArEventManager = pContext->pArContext[16];
+pCreateInfo->hArEventManager = 
pContext->pArContext[pContext->NumWorkerThreads + 1];
 #endif
 
 // State setup AFTER context is fully initialized
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/20] swr/rast: Replace INSERT2 vextract/vinsert with JOIN2 vshuffle

2017-12-14 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 38 ++---
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |  5 +-
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 92 ++
 3 files changed, 30 insertions(+), 105 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index b2210db717..3a486e4c1e 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -667,8 +667,7 @@ namespace SwrJit
 Value *gather0 = GATHERPS(src0, pBase, indices0, mask0, scale);
 Value *gather1 = GATHERPS(src1, pBase, indices1, mask1, scale);
 
-vGather = INSERT2_F(vGather, gather0, 0);
-vGather = INSERT2_F(vGather, gather1, 1);
+vGather = JOIN2(gather0, gather1);
 }
 
 return vGather;
@@ -796,8 +795,7 @@ namespace SwrJit
 Value *result0 = PSRLI(a0, imm);
 Value *result1 = PSRLI(a1, imm);
 
-result = INSERT2_I(result, result0, 0);
-result = INSERT2_I(result, result1, 1);
+result = JOIN2(result0, result1);
 }
 
 return result;
@@ -835,37 +833,13 @@ namespace SwrJit
 return BITCAST(EXTRACT2_F(a2, imm), mSimdInt32Ty);
 }
 
-//
-/// @brief
-Value *Builder::INSERT2_F(Value *a2, Value *b, uint32_t imm)
+Value *Builder::JOIN2(Value *a, Value *b)
 {
-const uint32_t i0 = (imm > 0) ? mVWidth : 0;
-
-Value *result = BITCAST(a2, mSimd2FP32Ty);
-
-for (uint32_t i = 0; i < mVWidth; i += 1)
-{
-#if 1
-if (!b->getType()->getScalarType()->isFloatTy())
-{
-b = BITCAST(b, mSimdFP32Ty);
-}
-
-#endif
-Value *temp = VEXTRACT(b, C(i));
-
-result = VINSERT(result, temp, C(i0 + i));
-}
-
-return result;
+return VSHUFFLE(a, b,
+{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15});
 }
-
-Value *Builder::INSERT2_I(Value *a2, Value *b, uint32_t imm)
-{
-return BITCAST(INSERT2_F(a2, b, imm), mSimd2Int32Ty);
-}
-
 #endif
+
 //
 /// @brief convert x86  mask to llvm  mask
 Value *Builder::MASK(Value *vmask)
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
index 62360a3ad7..231bd6ad85 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
@@ -119,10 +119,9 @@ Value *VMASK2(Value *mask);
 #if USE_SIMD16_BUILDER
 Value *EXTRACT2_F(Value *a2, uint32_t imm);
 Value *EXTRACT2_I(Value *a2, uint32_t imm);
-Value *INSERT2_F(Value *a2, Value *b, uint32_t imm);
-Value *INSERT2_I(Value *a2, Value *b, uint32_t imm);
-
+Value *JOIN2(Value *a, Value *b);
 #endif
+
 Value *MASKLOADD(Value* src, Value* mask);
 
 void Gather4(const SWR_FORMAT format, Value* pSrcBase, Value* byteOffsets,
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index c960dc77fb..e0a0770560 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -960,10 +960,7 @@ void FetchJit::JitGatherVertices(const FETCH_COMPILE_STATE 
,
 // offset indices by baseVertex
 #if USE_SIMD16_GATHERS
 #if USE_SIMD16_BUILDER
-Value *vIndices16 = VUNDEF2_I();
-
-vIndices16 = INSERT2_I(vIndices16, vIndices,  0);
-vIndices16 = INSERT2_I(vIndices16, vIndices2, 1);
+Value *vIndices16 = JOIN2(vIndices, vIndices2);
 
 vCurIndices16 = ADD(vIndices16, vBaseVertex16);
 #else
@@ -982,10 +979,7 @@ void FetchJit::JitGatherVertices(const FETCH_COMPILE_STATE 
,
 // offset indices by baseVertex
 #if USE_SIMD16_GATHERS
 #if USE_SIMD16_BUILDER
-Value *vIndices16 = VUNDEF2_I();
-
-vIndices16 = INSERT2_I(vIndices16, vIndices,  0);
-vIndices16 = INSERT2_I(vIndices16, vIndices2, 1);
+Value *vIndices16 = JOIN2(vIndices, vIndices2);
 
 vCurIndices16 = ADD(vIndices16, vBaseVertex16);
 #else
@@ -1206,9 +1200,7 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 {
 #if USE_SIMD16_BUILDER
 // pack adjacent pairs of SIMD8s into SIMD16s
-pVtxSrc2[currentVertexElement] = VUNDEF2_F();
-pVtxSrc2[currentVertexElement] = 
INSERT2_F(pVtxSrc2[currentVertexElement], pResults[c],  0);
-pVtxSrc2[currentVertexElement] = 
INSERT2_F(pVtxSrc2[currentVertexElement], pResults2[c], 1);
+

[Mesa-dev] [PATCH 15/20] swr/rast: Pull of RTAI gather & offset out of clip/bin code

2017-12-14 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 118 +++-
 src/gallium/drivers/swr/rasterizer/core/clip.cpp   |  30 ++--
 src/gallium/drivers/swr/rasterizer/core/clip.h |  35 +++--
 src/gallium/drivers/swr/rasterizer/core/context.h  |   4 +-
 .../drivers/swr/rasterizer/core/frontend.cpp   | 153 +++--
 src/gallium/drivers/swr/rasterizer/core/frontend.h |   8 +-
 src/gallium/drivers/swr/rasterizer/core/pa.h   |   1 +
 7 files changed, 203 insertions(+), 146 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index a664ed812f..7ef87c4443 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -45,7 +45,8 @@ void BinPostSetupLinesImpl(
 typename SIMD_T::Float recipW[],
 uint32_t primMask,
 typename SIMD_T::Integer const ,
-typename SIMD_T::Integer const );
+typename SIMD_T::Integer const ,
+typename SIMD_T::Integer const );
 
 template 
 void BinPostSetupPointsImpl(
@@ -55,7 +56,8 @@ void BinPostSetupPointsImpl(
 typename SIMD_T::Vec4 prim[],
 uint32_t primMask,
 typename SIMD_T::Integer const ,
-typename SIMD_T::Integer const );
+typename SIMD_T::Integer const ,
+typename SIMD_T::Integer const );
 
 //
 /// @brief Processes attributes for the backend based on linkage mask and
@@ -308,9 +310,11 @@ void SIMDCALL BinTrianglesImpl(
 typename SIMD_T::Vec4 tri[3],
 uint32_t triMask,
 typename SIMD_T::Integer const ,
-typename SIMD_T::Integer const )
+typename SIMD_T::Integer const ,
+typename SIMD_T::Integer const )
 {
 SWR_CONTEXT *pContext = pDC->pContext;
+const uint32_t *aRTAI = reinterpret_cast();
 
 AR_BEGIN(FEBinTriangles, pDC->drawId);
 
@@ -604,21 +608,21 @@ endBinTriangles:
 recipW[0] = vRecipW0;
 recipW[1] = vRecipW1;
 
-BinPostSetupLinesImpl(pDC, pa, workerId, line, 
recipW, triMask, primID, viewportIdx);
+BinPostSetupLinesImpl(pDC, pa, workerId, line, 
recipW, triMask, primID, viewportIdx, rtIdx);
 
 line[0] = tri[1];
 line[1] = tri[2];
 recipW[0] = vRecipW1;
 recipW[1] = vRecipW2;
 
-BinPostSetupLinesImpl(pDC, pa, workerId, line, 
recipW, triMask, primID, viewportIdx);
+BinPostSetupLinesImpl(pDC, pa, workerId, line, 
recipW, triMask, primID, viewportIdx, rtIdx);
 
 line[0] = tri[2];
 line[1] = tri[0];
 recipW[0] = vRecipW2;
 recipW[1] = vRecipW0;
 
-BinPostSetupLinesImpl(pDC, pa, workerId, line, 
recipW, triMask, primID, viewportIdx);
+BinPostSetupLinesImpl(pDC, pa, workerId, line, 
recipW, triMask, primID, viewportIdx, rtIdx);
 
 AR_END(FEBinTriangles, 1);
 return;
@@ -626,9 +630,9 @@ endBinTriangles:
 else if (rastState.fillMode == SWR_FILLMODE_POINT)
 {
 // Bin 3 points
-BinPostSetupPointsImpl(pDC, pa, workerId, [0], 
triMask, primID, viewportIdx);
-BinPostSetupPointsImpl(pDC, pa, workerId, [1], 
triMask, primID, viewportIdx);
-BinPostSetupPointsImpl(pDC, pa, workerId, [2], 
triMask, primID, viewportIdx);
+BinPostSetupPointsImpl(pDC, pa, workerId, [0], 
triMask, primID, viewportIdx, rtIdx);
+BinPostSetupPointsImpl(pDC, pa, workerId, [1], 
triMask, primID, viewportIdx, rtIdx);
+BinPostSetupPointsImpl(pDC, pa, workerId, [2], 
triMask, primID, viewportIdx, rtIdx);
 
 AR_END(FEBinTriangles, 1);
 return;
@@ -659,22 +663,6 @@ endBinTriangles:
 TransposeVertices(vHorizZ, tri[0].z, tri[1].z, tri[2].z);
 TransposeVertices(vHorizW, vRecipW0, vRecipW1, vRecipW2);
 
-// store render target array index
-OSALIGNSIMD16(uint32_t) aRTAI[SIMD_WIDTH];
-if (state.backendState.readRenderTargetArrayIndex)
-{
-typename SIMD_T::Vec4 vRtai[3];
-pa.Assemble(VERTEX_SGV_SLOT, vRtai);
-typename SIMD_T::Integer vRtaii;
-vRtaii = SIMD_T::castps_si(vRtai[0][VERTEX_SGV_RTAI_COMP]);
-SIMD_T::store_si(reinterpret_cast(aRTAI), 
vRtaii);
-}
-else
-{
-SIMD_T::store_si(reinterpret_cast(aRTAI), 
SIMD_T::setzero_si());
-}
-
-
 // scan remaining valid triangles and bin each separately
 while (_BitScanForward(, triMask))
 {
@@ -763,9 +751,10 @@ void BinTriangles(
 simdvector tri[3],
 uint32_t triMask,
 simdscalari const ,
-simdscalari const )
+simdscalari const ,
+simdscalari const )
 {
-BinTrianglesImpl(pDC, pa, workerId, tri, 
triMask, primID, viewportIdx);
+

[Mesa-dev] [PATCH 16/20] swr/rast: Rework thread binding parameters for machine partitioning

2017-12-14 Thread Tim Rowley
Add BASE_NUMA_NODE, BASE_CORE, BASE_THREAD parameters to
SwrCreateContext.

Add optional SWR_API_THREADING_INFO parameter to SwrCreateContext to
control reservation of API threads.

Add SwrBindApiThread() function to allow binding of API threads to
reserved HW threads.
---
 .../drivers/swr/rasterizer/codegen/knob_defs.py|  29 +-
 src/gallium/drivers/swr/rasterizer/core/api.cpp|  40 ++-
 src/gallium/drivers/swr/rasterizer/core/api.h  |  33 +++
 src/gallium/drivers/swr/rasterizer/core/context.h  |   1 +
 .../drivers/swr/rasterizer/core/threads.cpp| 299 +++--
 src/gallium/drivers/swr/rasterizer/core/threads.h  |   4 +
 .../drivers/swr/rasterizer/core/tilemgr.cpp|   4 +-
 7 files changed, 322 insertions(+), 88 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/knob_defs.py 
b/src/gallium/drivers/swr/rasterizer/codegen/knob_defs.py
index 09e3124602..30803927e3 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/knob_defs.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/knob_defs.py
@@ -62,15 +62,33 @@ KNOBS = [
 'category'  : 'perf',
 }],
 
-['MAX_NUMA_NODES', {
+['BASE_NUMA_NODE', {
 'type'  : 'uint32_t',
 'default'   : '0',
+'desc'  : ['Starting NUMA node index to use when allocating 
compute resources.',
+   'Setting this to a non-zero value will reduce the 
maximum # of NUMA nodes used.'],
+'category'  : 'perf',
+'advanced'  : True,
+}],
+
+['MAX_NUMA_NODES', {
+'type'  : 'uint32_t',
+'default'   : '1' if sys.platform == 'win32' else '0',
 'desc'  : ['Maximum # of NUMA-nodes per system used for worker 
threads',
'  0 == ALL NUMA-nodes in the system',
'  N == Use at most N NUMA-nodes for rendering'],
 'category'  : 'perf',
 }],
 
+['BASE_CORE', {
+'type'  : 'uint32_t',
+'default'   : '0',
+'desc'  : ['Starting core index to use when allocating compute 
resources.',
+   'Setting this to a non-zero value will reduce the 
maximum # of cores used.'],
+'category'  : 'perf',
+'advanced'  : True,
+}],
+
 ['MAX_CORES_PER_NUMA_NODE', {
 'type'  : 'uint32_t',
 'default'   : '0',
@@ -80,6 +98,15 @@ KNOBS = [
 'category'  : 'perf',
 }],
 
+['BASE_THREAD', {
+'type'  : 'uint32_t',
+'default'   : '0',
+'desc'  : ['Starting thread index to use when allocating compute 
resources.',
+   'Setting this to a non-zero value will reduce the 
maximum # of threads used.'],
+'category'  : 'perf',
+'advanced'  : True,
+}],
+
 ['MAX_THREADS_PER_CORE', {
 'type'  : 'uint32_t',
 'default'   : '1',
diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 9265440904..25a3f34841 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -95,16 +95,32 @@ HANDLE SwrCreateContext(
 pContext->dsRing[dc].pArena = new 
CachingArena(pContext->cachingArenaAllocator);
 }
 
-pContext->threadInfo.MAX_WORKER_THREADS= KNOB_MAX_WORKER_THREADS;
-pContext->threadInfo.MAX_NUMA_NODES= KNOB_MAX_NUMA_NODES;
-pContext->threadInfo.MAX_CORES_PER_NUMA_NODE   = 
KNOB_MAX_CORES_PER_NUMA_NODE;
-pContext->threadInfo.MAX_THREADS_PER_CORE  = KNOB_MAX_THREADS_PER_CORE;
-pContext->threadInfo.SINGLE_THREADED   = KNOB_SINGLE_THREADED;
-
 if (pCreateInfo->pThreadInfo)
 {
 pContext->threadInfo = *pCreateInfo->pThreadInfo;
 }
+else
+{
+pContext->threadInfo.MAX_WORKER_THREADS = 
KNOB_MAX_WORKER_THREADS;
+pContext->threadInfo.BASE_NUMA_NODE = KNOB_BASE_NUMA_NODE;
+pContext->threadInfo.BASE_CORE  = KNOB_BASE_CORE;
+pContext->threadInfo.BASE_THREAD= KNOB_BASE_THREAD;
+pContext->threadInfo.MAX_NUMA_NODES = KNOB_MAX_NUMA_NODES;
+pContext->threadInfo.MAX_CORES_PER_NUMA_NODE= 
KNOB_MAX_CORES_PER_NUMA_NODE;
+pContext->threadInfo.MAX_THREADS_PER_CORE   = 
KNOB_MAX_THREADS_PER_CORE;
+pContext->threadInfo.SINGLE_THREADED= KNOB_SINGLE_THREADED;
+}
+
+if (pCreateInfo->pApiThreadInfo)
+{
+pContext->apiThreadInfo = *pCreateInfo->pApiThreadInfo;
+}
+else
+{
+pContext->apiThreadInfo.bindAPIThread0  = true;
+pContext->apiThreadInfo.numAPIReservedThreads   = 1;
+pContext->apiThreadInfo.numAPIThreadsPerCore= 1;
+}
 
 memset(>WaitLock, 0, sizeof(pContext->WaitLock));
 memset(>FifosNotEmpty, 0, sizeof(pContext->FifosNotEmpty));
@@ -113,6 +129,11 @@ HANDLE SwrCreateContext(
 
 

[Mesa-dev] [PATCH 14/20] swr/rast: Remove no-op VBROADCAST of vID

2017-12-14 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index ec3b5eafcc..1312ac0009 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -3101,7 +3101,7 @@ Value* FetchJit::GenerateCompCtrlVector(const 
ComponentControl ctrl)
 #else
 Value* pId = BITCAST(LOAD(GEP(mpFetchInfo, { 0, 
SWR_FETCH_CONTEXT_VertexID })), mSimdFP32Ty);
 #endif
-return VBROADCAST(pId);
+return pId;
 }
 case StoreInstanceId:
 {
@@ -3129,7 +3129,7 @@ Value* FetchJit::GenerateCompCtrlVector2(const 
ComponentControl ctrl)
 
 Value *pId = JOIN2(pId_lo, pId_hi);
 
-return VBROADCAST2(pId);
+return pId;
 }
 case StoreInstanceId:
 {
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/20] swr/rast: Remove unneeded copy of gather mask

2017-12-14 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 22 +-
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 80 ++
 2 files changed, 23 insertions(+), 79 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index 8ffe05b41c..0221106664 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -1107,23 +1107,19 @@ namespace SwrJit
 }
 
 void Builder::GATHER4PS(const SWR_FORMAT_INFO , Value* pSrcBase, 
Value* byteOffsets, 
-Value* mask, Value* vGatherComponents[], bool 
bPackedOutput)
+Value* vMask, Value* vGatherComponents[], bool 
bPackedOutput)
 {
 switch(info.bpp / info.numComps)
 {
 case 16: 
 {
 Value* vGatherResult[2];
-Value *vMask;
 
 // TODO: vGatherMaskedVal
 Value* vGatherMaskedVal = VIMMED1((float)0);
 
 // always have at least one component out of x or y to 
fetch
 
-// save mask as it is zero'd out after each gather
-vMask = mask;
-
 vGatherResult[0] = GATHERPS(vGatherMaskedVal, pSrcBase, 
byteOffsets, vMask);
 // e.g. result of first 8x32bit integer gather for 16bit 
components
 // 256i - 01234567
@@ -1135,7 +1131,6 @@ namespace SwrJit
 {
 // offset base to the next components(zw) in the 
vertex to gather
 pSrcBase = GEP(pSrcBase, C((char)4));
-vMask = mask;
 
 vGatherResult[1] =  GATHERPS(vGatherMaskedVal, 
pSrcBase, byteOffsets, vMask);
 // e.g. result of second 8x32bit integer gather for 
16bit components
@@ -1164,9 +1159,6 @@ namespace SwrJit
 {
 uint32_t swizzleIndex = info.swizzle[i];
 
-// save mask as it is zero'd out after each gather
-Value *vMask = mask;
-
 // Gather a SIMD of components
 vGatherComponents[swizzleIndex] = 
GATHERPS(vGatherComponents[swizzleIndex], pSrcBase, byteOffsets, vMask);
 
@@ -1182,14 +1174,14 @@ namespace SwrJit
 }
 
 void Builder::GATHER4DD(const SWR_FORMAT_INFO , Value* pSrcBase, 
Value* byteOffsets,
-Value* mask, Value* vGatherComponents[], bool 
bPackedOutput)
+Value* vMask, Value* vGatherComponents[], bool 
bPackedOutput)
 {
 switch (info.bpp / info.numComps)
 {
 case 8:
 {
 Value* vGatherMaskedVal = VIMMED1((int32_t)0);
-Value* vGatherResult = GATHERDD(vGatherMaskedVal, pSrcBase, 
byteOffsets, mask);
+Value* vGatherResult = GATHERDD(vGatherMaskedVal, pSrcBase, 
byteOffsets, vMask);
 // e.g. result of an 8x32bit integer gather for 8bit components
 // 256i - 01234567
 //xyzw xyzw xyzw xyzw xyzw xyzw xyzw xyzw 
@@ -1200,16 +1192,12 @@ namespace SwrJit
 case 16:
 {
 Value* vGatherResult[2];
-Value *vMask;
 
 // TODO: vGatherMaskedVal
 Value* vGatherMaskedVal = VIMMED1((int32_t)0);
 
 // always have at least one component out of x or y to fetch
 
-// save mask as it is zero'd out after each gather
-vMask = mask;
-
 vGatherResult[0] = GATHERDD(vGatherMaskedVal, pSrcBase, 
byteOffsets, vMask);
 // e.g. result of first 8x32bit integer gather for 16bit 
components
 // 256i - 01234567
@@ -1221,7 +1209,6 @@ namespace SwrJit
 {
 // offset base to the next components(zw) in the vertex to 
gather
 pSrcBase = GEP(pSrcBase, C((char)4));
-vMask = mask;
 
 vGatherResult[1] = GATHERDD(vGatherMaskedVal, pSrcBase, 
byteOffsets, vMask);
 // e.g. result of second 8x32bit integer gather for 16bit 
components
@@ -1251,9 +1238,6 @@ namespace SwrJit
 {
 uint32_t swizzleIndex = info.swizzle[i];
 
-// save mask as it is zero'd out after each gather
-Value *vMask = mask;
-
 // Gather a SIMD of components
 vGatherComponents[swizzleIndex] = 
GATHERDD(vGatherComponents[swizzleIndex], pSrcBase, byteOffsets, vMask);
 
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 

[Mesa-dev] [PATCH 11/20] swr/rast: SIMD16 Fetch - Fully widen 16-bit float vertex components

2017-12-14 Thread Tim Rowley
---
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 55 +++---
 1 file changed, 48 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index 2065db3475..c960dc77fb 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -1277,6 +1277,43 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 case 16:
 {
 #if USE_SIMD16_GATHERS
+#if USE_SIMD16_BUILDER
+Value *gatherResult[2];
+
+// if we have at least one component out of x or y to fetch
+if (isComponentEnabled(compMask, 0) || 
isComponentEnabled(compMask, 1))
+{
+gatherResult[0] = GATHERPS_16(gatherSrc16, 
pStreamBase, vOffsets16, vGatherMask16);
+
+// e.g. result of first 8x32bit integer gather for 
16bit components
+// 256i - 01234567
+//xyxy xyxy xyxy xyxy xyxy xyxy xyxy xyxy
+//
+}
+else
+{
+gatherResult[0] = VUNDEF2_I();
+}
+
+// if we have at least one component out of z or w to fetch
+if (isComponentEnabled(compMask, 2) || 
isComponentEnabled(compMask, 3))
+{
+// offset base to the next components(zw) in the 
vertex to gather
+pStreamBase = GEP(pStreamBase, C((char)4));
+
+gatherResult[1] = GATHERPS_16(gatherSrc16, 
pStreamBase, vOffsets16, vGatherMask16);
+
+// e.g. result of second 8x32bit integer gather for 
16bit components
+// 256i - 01234567
+//zwzw zwzw zwzw zwzw zwzw zwzw zwzw zwzw 
+//
+}
+else
+{
+gatherResult[1] = VUNDEF2_I();
+}
+
+#else
 Value *vGatherResult[2];
 Value *vGatherResult2[2];
 
@@ -1315,10 +1352,13 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 vGatherResult2[1] = VUNDEF_I();
 }
 
+#endif
 // if we have at least one component to shuffle into place
 if (compMask)
 {
 #if USE_SIMD16_BUILDER
+#if USE_SIMD16_BUILDER
+#else
 Value *gatherResult[2];
 
 gatherResult[0] = VUNDEF2_I();
@@ -1330,6 +1370,7 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 gatherResult[1] = INSERT2_I(gatherResult[1], 
vGatherResult[1],  0);
 gatherResult[1] = INSERT2_I(gatherResult[1], 
vGatherResult2[1], 1);
 
+#endif
 Value *pVtxOut2 = BITCAST(pVtxOut, 
PointerType::get(VectorType::get(mFP32Ty, mVWidth2), 0));
 
 Shuffle16bpcArgs args = 
std::forward_as_tuple(gatherResult, pVtxOut2, Instruction::CastOps::FPExt, 
CONVERT_NONE,
@@ -1511,21 +1552,21 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 // if we need to gather the component
 if (compCtrl[i] == StoreSrc)
 {
-Value *vMaskLo  = VSHUFFLE(vGatherMask, 
VUNDEF(mInt1Ty, 8), C({ 0, 1, 2, 3 }));
+Value *vMaskLo  = VSHUFFLE(vGatherMask,  
VUNDEF(mInt1Ty, 8), C({ 0, 1, 2, 3 }));
 Value *vMaskLo2 = VSHUFFLE(vGatherMask2, 
VUNDEF(mInt1Ty, 8), C({ 0, 1, 2, 3 }));
-Value *vMaskHi  = VSHUFFLE(vGatherMask, 
VUNDEF(mInt1Ty, 8), C({ 4, 5, 6, 7 }));
+Value *vMaskHi  = VSHUFFLE(vGatherMask,  
VUNDEF(mInt1Ty, 8), C({ 4, 5, 6, 7 }));
 Value *vMaskHi2 = VSHUFFLE(vGatherMask2, 
VUNDEF(mInt1Ty, 8), C({ 4, 5, 6, 7 }));
 
-Value *vOffsetsLo  = VEXTRACTI128(vOffsets, 
C(0));
+Value *vOffsetsLo  = VEXTRACTI128(vOffsets,  
C(0));
 Value *vOffsetsLo2 = VEXTRACTI128(vOffsets2, 
C(0));
-Value *vOffsetsHi  = VEXTRACTI128(vOffsets, 
C(1));
+Value *vOffsetsHi  = VEXTRACTI128(vOffsets,  
C(1));
 Value *vOffsetsHi2 = VEXTRACTI128(vOffsets2, 
C(1));
 
 Value *vZeroDouble = VECTOR_SPLAT(4, 
ConstantFP::get(IRB()->getDoubleTy(), 0.0f));
 
-

[Mesa-dev] [PATCH 03/20] swr/rast: Corrections to multi-scissor handling

2017-12-14 Thread Tim Rowley
binner's GatherScissors() will be turned into a real gather in the not
too distant future.
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 176 ++---
 1 file changed, 88 insertions(+), 88 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 52375f8956..8a5356b168 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -226,117 +226,117 @@ static void GatherScissors(const SWR_RECT 
*pScissorsInFixedPoint, const uint32_t
 simdscalari , simdscalari , simdscalari , 
simdscalari )
 {
 scisXmin = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[0]].xmin,
-pScissorsInFixedPoint[pViewportIndex[1]].xmin,
-pScissorsInFixedPoint[pViewportIndex[2]].xmin,
-pScissorsInFixedPoint[pViewportIndex[3]].xmin,
-pScissorsInFixedPoint[pViewportIndex[4]].xmin,
-pScissorsInFixedPoint[pViewportIndex[5]].xmin,
+pScissorsInFixedPoint[pViewportIndex[7]].xmin,
 pScissorsInFixedPoint[pViewportIndex[6]].xmin,
-pScissorsInFixedPoint[pViewportIndex[7]].xmin);
+pScissorsInFixedPoint[pViewportIndex[5]].xmin,
+pScissorsInFixedPoint[pViewportIndex[4]].xmin,
+pScissorsInFixedPoint[pViewportIndex[3]].xmin,
+pScissorsInFixedPoint[pViewportIndex[2]].xmin,
+pScissorsInFixedPoint[pViewportIndex[1]].xmin,
+pScissorsInFixedPoint[pViewportIndex[0]].xmin);
 scisYmin = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[0]].ymin,
-pScissorsInFixedPoint[pViewportIndex[1]].ymin,
-pScissorsInFixedPoint[pViewportIndex[2]].ymin,
-pScissorsInFixedPoint[pViewportIndex[3]].ymin,
-pScissorsInFixedPoint[pViewportIndex[4]].ymin,
-pScissorsInFixedPoint[pViewportIndex[5]].ymin,
+pScissorsInFixedPoint[pViewportIndex[7]].ymin,
 pScissorsInFixedPoint[pViewportIndex[6]].ymin,
-pScissorsInFixedPoint[pViewportIndex[7]].ymin);
+pScissorsInFixedPoint[pViewportIndex[5]].ymin,
+pScissorsInFixedPoint[pViewportIndex[4]].ymin,
+pScissorsInFixedPoint[pViewportIndex[3]].ymin,
+pScissorsInFixedPoint[pViewportIndex[2]].ymin,
+pScissorsInFixedPoint[pViewportIndex[1]].ymin,
+pScissorsInFixedPoint[pViewportIndex[0]].ymin);
 scisXmax = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[0]].xmax,
-pScissorsInFixedPoint[pViewportIndex[1]].xmax,
-pScissorsInFixedPoint[pViewportIndex[2]].xmax,
-pScissorsInFixedPoint[pViewportIndex[3]].xmax,
-pScissorsInFixedPoint[pViewportIndex[4]].xmax,
-pScissorsInFixedPoint[pViewportIndex[5]].xmax,
+pScissorsInFixedPoint[pViewportIndex[7]].xmax,
 pScissorsInFixedPoint[pViewportIndex[6]].xmax,
-pScissorsInFixedPoint[pViewportIndex[7]].xmax);
+pScissorsInFixedPoint[pViewportIndex[5]].xmax,
+pScissorsInFixedPoint[pViewportIndex[4]].xmax,
+pScissorsInFixedPoint[pViewportIndex[3]].xmax,
+pScissorsInFixedPoint[pViewportIndex[2]].xmax,
+pScissorsInFixedPoint[pViewportIndex[1]].xmax,
+pScissorsInFixedPoint[pViewportIndex[0]].xmax);
 scisYmax = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[0]].ymax,
-pScissorsInFixedPoint[pViewportIndex[1]].ymax,
-pScissorsInFixedPoint[pViewportIndex[2]].ymax,
-pScissorsInFixedPoint[pViewportIndex[3]].ymax,
-pScissorsInFixedPoint[pViewportIndex[4]].ymax,
-pScissorsInFixedPoint[pViewportIndex[5]].ymax,
+pScissorsInFixedPoint[pViewportIndex[7]].ymax,
 pScissorsInFixedPoint[pViewportIndex[6]].ymax,
-pScissorsInFixedPoint[pViewportIndex[7]].ymax);
+pScissorsInFixedPoint[pViewportIndex[5]].ymax,
+pScissorsInFixedPoint[pViewportIndex[4]].ymax,
+pScissorsInFixedPoint[pViewportIndex[3]].ymax,
+pScissorsInFixedPoint[pViewportIndex[2]].ymax,
+pScissorsInFixedPoint[pViewportIndex[01]].ymax,
+pScissorsInFixedPoint[pViewportIndex[00]].ymax);
 }
 
 static void GatherScissors(const SWR_RECT *pScissorsInFixedPoint, const 
uint32_t *pViewportIndex,
 simd16scalari , simd16scalari , simd16scalari , 
simd16scalari )
 {
 scisXmin = _simd16_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[0]].xmin,
-pScissorsInFixedPoint[pViewportIndex[1]].xmin,
-pScissorsInFixedPoint[pViewportIndex[2]].xmin,
-pScissorsInFixedPoint[pViewportIndex[3]].xmin,
-pScissorsInFixedPoint[pViewportIndex[4]].xmin,
-pScissorsInFixedPoint[pViewportIndex[5]].xmin,
-pScissorsInFixedPoint[pViewportIndex[6]].xmin,
-pScissorsInFixedPoint[pViewportIndex[7]].xmin,
-pScissorsInFixedPoint[pViewportIndex[8]].xmin,
-pScissorsInFixedPoint[pViewportIndex[9]].xmin,
-pScissorsInFixedPoint[pViewportIndex[10]].xmin,

[Mesa-dev] [PATCH 08/20] swr/rast: Pull most of the VPAI manipulation out of the binner/clipper

2017-12-14 Thread Tim Rowley
Move out of binner/clipper; hand them down from the frontend code instead.
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 124 ++---
 src/gallium/drivers/swr/rasterizer/core/clip.cpp   |  25 ++---
 src/gallium/drivers/swr/rasterizer/core/clip.h |  58 +++---
 src/gallium/drivers/swr/rasterizer/core/context.h  |   4 +-
 .../drivers/swr/rasterizer/core/frontend.cpp   | 112 ++-
 src/gallium/drivers/swr/rasterizer/core/frontend.h |   8 +-
 src/gallium/drivers/swr/rasterizer/core/pa.h   |   4 +-
 7 files changed, 177 insertions(+), 158 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 22996c5a5d..a664ed812f 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -307,7 +307,8 @@ void SIMDCALL BinTrianglesImpl(
 uint32_t workerId,
 typename SIMD_T::Vec4 tri[3],
 uint32_t triMask,
-typename SIMD_T::Integer const )
+typename SIMD_T::Integer const ,
+typename SIMD_T::Integer const )
 {
 SWR_CONTEXT *pContext = pDC->pContext;
 
@@ -323,31 +324,6 @@ void SIMDCALL BinTrianglesImpl(
 typename SIMD_T::Float vRecipW1 = SIMD_T::set1_ps(1.0f);
 typename SIMD_T::Float vRecipW2 = SIMD_T::set1_ps(1.0f);
 
-typename SIMD_T::Integer viewportIdx = SIMD_T::setzero_si();
-typename SIMD_T::Vec4 vpiAttrib[3];
-typename SIMD_T::Integer vpai = SIMD_T::setzero_si();
-
-if (state.backendState.readViewportArrayIndex)
-{
-pa.Assemble(VERTEX_SGV_SLOT, vpiAttrib);
-
-vpai = SIMD_T::castps_si(vpiAttrib[0][VERTEX_SGV_VAI_COMP]);
-}
-
-
-if (state.backendState.readViewportArrayIndex) // VPAIOffsets are 
guaranteed 0-15 -- no OOB issues if they are offsets from 0 
-{
-// OOB indices => forced to zero.
-vpai = SIMD_T::max_epi32(vpai, SIMD_T::setzero_si());
-typename SIMD_T::Integer vNumViewports = 
SIMD_T::set1_epi32(KNOB_NUM_VIEWPORTS_SCISSORS);
-typename SIMD_T::Integer vClearMask = SIMD_T::cmplt_epi32(vpai, 
vNumViewports);
-viewportIdx = SIMD_T::and_si(vClearMask, vpai);
-}
-else
-{
-viewportIdx = vpai;
-}
-
 if (feState.vpTransformDisable)
 {
 // RHW is passed in directly when VP transform is disabled
@@ -375,7 +351,7 @@ void SIMDCALL BinTrianglesImpl(
 tri[2].v[2] = SIMD_T::mul_ps(tri[2].v[2], vRecipW2);
 
 // Viewport transform to screen space coords
-if (state.backendState.readViewportArrayIndex)
+if (pa.viewportArrayActive)
 {
 viewportTransform<3>(tri, state.vpMatrices, viewportIdx);
 }
@@ -568,8 +544,8 @@ void SIMDCALL BinTrianglesImpl(
 /// @todo:  Look at speeding this up -- weigh against corresponding costs 
in rasterizer.
 {
 typename SIMD_T::Integer scisXmin, scisYmin, scisXmax, scisYmax;
+if (pa.viewportArrayActive)
 
-if (state.backendState.readViewportArrayIndex)
 {
 GatherScissors([0], pViewportIndex, 
scisXmin, scisYmin, scisXmax, scisYmax);
 }
@@ -786,9 +762,10 @@ void BinTriangles(
 uint32_t workerId,
 simdvector tri[3],
 uint32_t triMask,
-simdscalari const )
+simdscalari const ,
+simdscalari const )
 {
-BinTrianglesImpl(pDC, pa, workerId, tri, 
triMask, primID);
+BinTrianglesImpl(pDC, pa, workerId, tri, 
triMask, primID, viewportIdx);
 }
 
 #if USE_SIMD16_FRONTEND
@@ -799,9 +776,10 @@ void SIMDCALL BinTriangles_simd16(
 uint32_t workerId,
 simd16vector tri[3],
 uint32_t triMask,
-simd16scalari const )
+simd16scalari const ,
+simd16scalari const )
 {
-BinTrianglesImpl(pDC, pa, workerId, tri, 
triMask, primID);
+BinTrianglesImpl(pDC, pa, workerId, tri, 
triMask, primID, viewportIdx);
 }
 
 #endif
@@ -1026,7 +1004,7 @@ void BinPostSetupPointsImpl(
 {
 typename SIMD_T::Integer scisXmin, scisYmin, scisXmax, scisYmax;
 
-if (state.backendState.readViewportArrayIndex)
+if (pa.viewportArrayActive)
 {
 GatherScissors([0], pViewportIndex, 
scisXmin, scisYmin, scisXmax, scisYmax);
 }
@@ -1176,38 +1154,13 @@ void BinPointsImpl(
 uint32_t workerId,
 typename SIMD_T::Vec4 prim[3],
 uint32_t primMask,
-typename SIMD_T::Integer const )
+typename SIMD_T::Integer const ,
+typename SIMD_T::Integer const )
 {
 const API_STATE& state = GetApiState(pDC);
 const SWR_FRONTEND_STATE& feState = state.frontendState;
 const SWR_RASTSTATE& rastState = state.rastState;
 
-// Read back viewport index if required
-typename SIMD_T::Integer viewportIdx = SIMD_T::setzero_si();
-typename SIMD_T::Vec4 vpiAttrib[1];
-typename SIMD_T::Integer vpai = 

[Mesa-dev] [PATCH 02/20] swr/rast: Binner fixes for viewport index offset handling

2017-12-14 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 9 -
 src/gallium/drivers/swr/rasterizer/core/clip.h | 5 -
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 9d1f0d8799..52375f8956 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -470,6 +470,10 @@ void SIMDCALL BinTrianglesImpl(
 typename SIMD_T::Integer vClearMask = SIMD_T::cmplt_epi32(vpai, 
vNumViewports);
 viewportIdx = SIMD_T::and_si(vClearMask, vpai);
 }
+else
+{
+viewportIdx = vpai;
+}
 
 if (feState.vpTransformDisable)
 {
@@ -1326,6 +1330,10 @@ void BinPointsImpl(
 typename SIMD_T::Integer vClearMask = SIMD_T::cmplt_epi32(vpai, 
vNumViewports);
 viewportIdx = SIMD_T::and_si(vClearMask, vpai);
 }
+else
+{
+viewportIdx = vpai;
+}
 
 if (!feState.vpTransformDisable)
 {
@@ -1647,7 +1655,6 @@ void SIMDCALL BinLinesImpl(
 if (state.backendState.readViewportArrayIndex)
 {
 pa.Assemble(VERTEX_SGV_SLOT, vpiAttrib);
-
 vpai = SIMD_T::castps_si(vpiAttrib[0][VERTEX_SGV_VAI_COMP]);
 }
 
diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index 0d3d78057f..9d8bbc19e6 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -694,7 +694,6 @@ public:
 if (state.backendState.readViewportArrayIndex)
 {
 pa.Assemble(VERTEX_SGV_SLOT, vpiAttrib);
-
 vpai = SIMD_T::castps_si(vpiAttrib[0][VERTEX_SGV_VAI_COMP]);
 }
 
@@ -707,6 +706,10 @@ public:
 typename SIMD_T::Integer vClearMask = SIMD_T::cmplt_epi32(vpai, 
vNumViewports);
 viewportIdx = SIMD_T::and_si(vClearMask, vpai);
 }
+else
+{
+viewportIdx = vpai;
+}
 
 ComputeClipCodes(prim, viewportIdx);
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/20] swr: update rasterizer

2017-12-14 Thread Tim Rowley
Highlights include simd16 work, thread pool initialization rework,
and code cleanup.

Tim Rowley (20):
  swr/rast: Remove unneeded copy of gather mask
  swr/rast: Binner fixes for viewport index offset handling
  swr/rast: Corrections to multi-scissor handling
  swr/rast: WIP - Widen fetch shader to SIMD16
  swr/rast: Convert gather masks to Nx1bit
  swr/rast: Rewrite Shuffle8bpcGatherd using shuffle
  swr/rast: Move GatherScissors to header
  swr/rast: Pull most of the VPAI manipulation out of the binner/clipper
  swr/rast: Pass prim to ClipSimd
  swr/rast: SIMD16 Fetch - Fully widen 32-bit float vertex components
  swr/rast: SIMD16 Fetch - Fully widen 16-bit float vertex components
  swr/rast: Replace INSERT2 vextract/vinsert with JOIN2 vshuffle
  swr/rast: SIMD16 Fetch - Fully widen 32-bit integer vertex components
  swr/rast: Remove no-op VBROADCAST of vID
  swr/rast: Pull of RTAI gather & offset out of clip/bin code
  swr/rast: Rework thread binding parameters for machine partitioning
  swr/rast: Replace VPSRL with LSHR
  swr/rast: Fix cache of API thread event manager
  swr/rast: EXTRACT2 changed from vextract/vinsert to vshuffle
  swr/rast: Move more RTAI handling out of binner

 .../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |4 +-
 .../drivers/swr/rasterizer/codegen/knob_defs.py|   29 +-
 src/gallium/drivers/swr/rasterizer/core/api.cpp|   42 +-
 src/gallium/drivers/swr/rasterizer/core/api.h  |   33 +
 src/gallium/drivers/swr/rasterizer/core/binner.cpp |  345 ++-
 src/gallium/drivers/swr/rasterizer/core/binner.h   |  127 +++
 src/gallium/drivers/swr/rasterizer/core/clip.cpp   |   31 +-
 src/gallium/drivers/swr/rasterizer/core/clip.h |   67 +-
 src/gallium/drivers/swr/rasterizer/core/context.h  |5 +-
 .../drivers/swr/rasterizer/core/frontend.cpp   |  179 +++-
 src/gallium/drivers/swr/rasterizer/core/frontend.h |8 +-
 src/gallium/drivers/swr/rasterizer/core/pa.h   |5 +-
 .../drivers/swr/rasterizer/core/threads.cpp|  299 --
 src/gallium/drivers/swr/rasterizer/core/threads.h  |4 +
 .../drivers/swr/rasterizer/core/tilemgr.cpp|4 +-
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp |  157 ++-
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |   13 +-
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 1038 
 18 files changed, 1657 insertions(+), 733 deletions(-)

-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/20] swr/rast: WIP - Widen fetch shader to SIMD16

2017-12-14 Thread Tim Rowley
Widen vertex gather/storage to SIMD16 for all component types.
---
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 716 -
 1 file changed, 689 insertions(+), 27 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index 337bb7f660..6c0e658e68 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -70,6 +70,9 @@ struct FetchJit : public Builder
 #else
 void Shuffle8bpcGatherd(Shuffle8bpcArgs );
 #endif
+#if USE_SIMD16_BUILDER
+void Shuffle8bpcGatherd2(Shuffle8bpcArgs );
+#endif
 
 typedef std::tuple Shuffle16bpcArgs;
@@ -78,6 +81,9 @@ struct FetchJit : public Builder
 #else
 void Shuffle16bpcGather(Shuffle16bpcArgs );
 #endif
+#if USE_SIMD16_BUILDER
+void Shuffle16bpcGather2(Shuffle16bpcArgs );
+#endif
 
 void StoreVertexElements(Value* pVtxOut, const uint32_t outputElt, const 
uint32_t numEltsToStore, Value* ()[4]);
 #if USE_SIMD16_BUILDER
@@ -726,7 +732,7 @@ void FetchJit::CreateGatherOddFormats(SWR_FORMAT format, 
Value* pMask, Value* pB
 // only works if pixel size is <= 32bits
 SWR_ASSERT(info.bpp <= 32);
 
-   Value* pGather = GATHERDD(VIMMED1(0), pBase, pOffsets, pMask);
+Value *pGather = GATHERDD(VIMMED1(0), pBase, pOffsets, pMask);
 
 for (uint32_t comp = 0; comp < 4; ++comp)
 {
@@ -825,6 +831,9 @@ void FetchJit::JitGatherVertices(const FETCH_COMPILE_STATE 
,
 Value* vVertexElements[4];
 #if USE_SIMD16_GATHERS
 Value* vVertexElements2[4];
+#if USE_SIMD16_BUILDER
+Value *pVtxSrc2[4];
+#endif
 #endif
 
 Value* startVertex = LOAD(mpFetchInfo, {0, SWR_FETCH_CONTEXT_StartVertex});
@@ -961,6 +970,7 @@ void FetchJit::JitGatherVertices(const FETCH_COMPILE_STATE 
,
 #if USE_SIMD16_GATHERS
 // override cur indices with 0 if pitch is 0
 Value* pZeroPitchMask = ICMP_EQ(vStride, VIMMED1(0));
+vCurIndices = SELECT(pZeroPitchMask, VIMMED1(0), vCurIndices);
 vCurIndices2 = SELECT(pZeroPitchMask, VIMMED1(0), vCurIndices2);
 
 // are vertices partially OOB?
@@ -983,7 +993,7 @@ void FetchJit::JitGatherVertices(const FETCH_COMPILE_STATE 
,
 
 // only fetch lanes that pass both tests
 vGatherMask = AND(vMaxGatherMask, vMinGatherMask);
-vGatherMask2 = AND(vMaxGatherMask, vMinGatherMask2);
+vGatherMask2 = AND(vMaxGatherMask2, vMinGatherMask2);
 }
 else
 {
@@ -1074,15 +1084,32 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 {
 if (isComponentEnabled(compMask, c))
 {
-vVertexElements[currentVertexElement] = pResults[c];
+#if USE_SIMD16_BUILDER
+// pack adjacent pairs of SIMD8s into SIMD16s
+pVtxSrc2[currentVertexElement] = VUNDEF2_F();
+pVtxSrc2[currentVertexElement] = 
INSERT2_F(pVtxSrc2[currentVertexElement], pResults[c],  0);
+pVtxSrc2[currentVertexElement] = 
INSERT2_F(pVtxSrc2[currentVertexElement], pResults2[c], 1);
+
+#else
+vVertexElements[currentVertexElement]  = pResults[c];
 vVertexElements2[currentVertexElement] = pResults2[c];
-currentVertexElement++;
+
+#endif
+currentVertexElement += 1;
 
 if (currentVertexElement > 3)
 {
+#if USE_SIMD16_BUILDER
+// store SIMD16s
+Value *pVtxOut2 = BITCAST(pVtxOut, 
PointerType::get(VectorType::get(mFP32Ty, mVWidth2), 0));
+
+StoreVertexElements2(pVtxOut2, outputElt, 4, pVtxSrc2);
+
+#else
 StoreVertexElements(pVtxOut, outputElt, 4, 
vVertexElements);
 StoreVertexElements(GEP(pVtxOut, C(1)), outputElt, 4, 
vVertexElements2);
 
+#endif
 outputElt += 1;
 
 // reset to the next vVertexElement to output
@@ -1113,9 +1140,12 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 else if(info.type[0] == SWR_TYPE_FLOAT)
 {
 ///@todo: support 64 bit vb accesses
-Value* gatherSrc = VIMMED1(0.0f);
+Value *gatherSrc = VIMMED1(0.0f);
 #if USE_SIMD16_GATHERS
-Value* gatherSrc2 = VIMMED1(0.0f);
+Value *gatherSrc2 = VIMMED1(0.0f);
+#if USE_SIMD16_BUILDER
+Value *gatherSrc16 = VIMMED2_1(0.0f);
+#endif
 #endif
 
 SWR_ASSERT(IsUniformFormat((SWR_FORMAT)ied.Format), 
@@ -1127,8 +1157,8 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 case 16:
 {
 #if 

[Mesa-dev] [PATCH 05/20] swr/rast: Convert gather masks to Nx1bit

2017-12-14 Thread Tim Rowley
Simplifies calling code, gets gather function interface closer to llvm's
masked_gather.
---
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 20 +
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 34 +-
 2 files changed, 14 insertions(+), 40 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index 0221106664..04092541e5 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -602,7 +602,7 @@ namespace SwrJit
 if(JM()->mArch.AVX2())
 {
 // force mask to , required by vgather
-Value *mask = BITCAST(vMask, mSimdFP32Ty);
+Value *mask = BITCAST(VMASK(vMask), mSimdFP32Ty);
 
 vGather = VGATHERPS(vSrc, pBase, vIndices, mask, C(scale));
 }
@@ -617,7 +617,6 @@ namespace SwrJit
 vGather = VUNDEF_F();
 Value *vScaleVec = VIMMED1((uint32_t)scale);
 Value *vOffsets = MUL(vIndices,vScaleVec);
-Value *mask = MASK(vMask);
 for(uint32_t i = 0; i < mVWidth; ++i)
 {
 // single component byte index
@@ -627,7 +626,7 @@ namespace SwrJit
 loadAddress = BITCAST(loadAddress,PointerType::get(mFP32Ty,0));
 // pointer to the value to load if we're masking off a 
component
 Value *maskLoadAddress = GEP(vSrcPtr,{C(0), C(i)});
-Value *selMask = VEXTRACT(mask,C(i));
+Value *selMask = VEXTRACT(vMask,C(i));
 // switch in a safe address to load if we're trying to access 
a vertex 
 Value *validAddress = SELECT(selMask, loadAddress, 
maskLoadAddress);
 Value *val = LOAD(validAddress);
@@ -648,7 +647,7 @@ namespace SwrJit
 if (JM()->mArch.AVX512F())
 {
 // force mask to , required by vgather2
-Value *mask = BITCAST(MASK2(vMask), mInt16Ty);
+Value *mask = BITCAST(vMask, mInt16Ty);
 
 vGather = VGATHERPS2(vSrc, pBase, vIndices, mask, 
C((uint32_t)scale));
 }
@@ -689,7 +688,7 @@ namespace SwrJit
 // use avx2 gather instruction if available
 if(JM()->mArch.AVX2())
 {
-vGather = VGATHERDD(vSrc, pBase, vIndices, vMask, C(scale));
+vGather = VGATHERDD(vSrc, pBase, vIndices, VMASK(vMask), C(scale));
 }
 else
 {
@@ -702,7 +701,6 @@ namespace SwrJit
 vGather = VUNDEF_I();
 Value *vScaleVec = VIMMED1((uint32_t)scale);
 Value *vOffsets = MUL(vIndices, vScaleVec);
-Value *mask = MASK(vMask);
 for(uint32_t i = 0; i < mVWidth; ++i)
 {
 // single component byte index
@@ -712,7 +710,7 @@ namespace SwrJit
 loadAddress = BITCAST(loadAddress, PointerType::get(mInt32Ty, 
0));
 // pointer to the value to load if we're masking off a 
component
 Value *maskLoadAddress = GEP(vSrcPtr, {C(0), C(i)});
-Value *selMask = VEXTRACT(mask, C(i));
+Value *selMask = VEXTRACT(vMask, C(i));
 // switch in a safe address to load if we're trying to access 
a vertex 
 Value *validAddress = SELECT(selMask, loadAddress, 
maskLoadAddress);
 Value *val = LOAD(validAddress, C(0));
@@ -739,6 +737,7 @@ namespace SwrJit
 // use avx2 gather instruction if available
 if(JM()->mArch.AVX2())
 {
+vMask = BITCAST(S_EXT(vMask, VectorType::get(mInt64Ty, 
mVWidth/2)), VectorType::get(mDoubleTy, mVWidth/2));
 vGather = VGATHERPD(vSrc, pBase, vIndices, vMask, C(scale));
 }
 else
@@ -752,7 +751,6 @@ namespace SwrJit
 vGather = UndefValue::get(VectorType::get(mDoubleTy, 4));
 Value *vScaleVec = VECTOR_SPLAT(4, C((uint32_t)scale));
 Value *vOffsets = MUL(vIndices,vScaleVec);
-Value *mask = MASK(vMask);
 for(uint32_t i = 0; i < mVWidth/2; ++i)
 {
 // single component byte index
@@ -762,7 +760,7 @@ namespace SwrJit
 loadAddress = 
BITCAST(loadAddress,PointerType::get(mDoubleTy,0));
 // pointer to the value to load if we're masking off a 
component
 Value *maskLoadAddress = GEP(vSrcPtr,{C(0), C(i)});
-Value *selMask = VEXTRACT(mask,C(i));
+Value *selMask = VEXTRACT(vMask,C(i));
 // switch in a safe address to load if we're trying to access 
a vertex
 Value *validAddress = SELECT(selMask, loadAddress, 
maskLoadAddress);
 Value *val = LOAD(validAddress);
@@ -1094,14 +1092,10 @@ namespace SwrJit
 const SWR_FORMAT_INFO  = GetFormatInfo(format);
 if(info.type[0] == 

[Mesa-dev] [PATCH 06/20] swr/rast: Rewrite Shuffle8bpcGatherd using shuffle

2017-12-14 Thread Tim Rowley
Ease future code maintenance, prepare for folding simd8 and simd16 versions.
---
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 244 ++---
 1 file changed, 62 insertions(+), 182 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index 67a4a04072..a847cb74da 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -2014,206 +2014,86 @@ void FetchJit::Shuffle8bpcGatherd(Shuffle8bpcArgs 
)
 const uint32_t ()[4] = std::get<9>(args);
 
 // cast types
-Type* vGatherTy = mSimdInt32Ty;
 Type* v32x8Ty =  VectorType::get(mInt8Ty, mVWidth * 4 ); // vwidth is 
units of 32 bits
 
-// have to do extra work for sign extending
-if ((extendType == Instruction::CastOps::SExt) || (extendType == 
Instruction::CastOps::SIToFP)){
-Type* v16x8Ty = VectorType::get(mInt8Ty, mVWidth * 2); // 8x16bit ints 
in a 128bit lane
-Type* v128Ty = VectorType::get(IntegerType::getIntNTy(JM()->mContext, 
128), mVWidth / 4); // vwidth is units of 32 bits
-
-// shuffle mask, including any swizzling
-const char x = (char)swizzle[0]; const char y = (char)swizzle[1];
-const char z = (char)swizzle[2]; const char w = (char)swizzle[3];
-Value* vConstMask = C({char(x), char(x+4), char(x+8), char(x+12),
-char(y), char(y+4), char(y+8), char(y+12),
-char(z), char(z+4), char(z+8), char(z+12),
-char(w), char(w+4), char(w+8), char(w+12),
-char(x), char(x+4), char(x+8), char(x+12),
-char(y), char(y+4), char(y+8), char(y+12),
-char(z), char(z+4), char(z+8), char(z+12),
-char(w), char(w+4), char(w+8), char(w+12)});
-
-Value* vShufResult = BITCAST(PSHUFB(BITCAST(vGatherResult, v32x8Ty), 
vConstMask), vGatherTy);
-// after pshufb: group components together in each 128bit lane
-// 256i - 01234567
-//       
-
-Value* vi128XY = nullptr;
-if(isComponentEnabled(compMask, 0) || isComponentEnabled(compMask, 1)){
-vi128XY = BITCAST(PERMD(vShufResult, C({0, 4, 0, 0, 1, 5, 
0, 0})), v128Ty);
-// after PERMD: move and pack xy and zw components in low 64 bits 
of each 128bit lane
-// 256i - 01234567
-//  dcdc dcdc   dcdc dcdc (dc - don't care)
-}
-
-// do the same for zw components
-Value* vi128ZW = nullptr;
-if(isComponentEnabled(compMask, 2) || isComponentEnabled(compMask, 3)){
-vi128ZW = BITCAST(PERMD(vShufResult, C({2, 6, 0, 0, 3, 7, 
0, 0})), v128Ty);
-}
-
-// init denormalize variables if needed
-Instruction::CastOps fpCast;
-Value* conversionFactor;
-
-switch (conversionType)
-{
-case CONVERT_NORMALIZED:
-fpCast = Instruction::CastOps::SIToFP;
-conversionFactor = VIMMED1((float)(1.0 / 127.0));
-break;
-case CONVERT_SSCALED:
-fpCast = Instruction::CastOps::SIToFP;
-conversionFactor = VIMMED1((float)(1.0));
-break;
-case CONVERT_USCALED:
-SWR_INVALID("Type should not be sign extended!");
-conversionFactor = nullptr;
-break;
-default:
-SWR_ASSERT(conversionType == CONVERT_NONE);
-conversionFactor = nullptr;
-break;
-}
+for (uint32_t i = 0; i < 4; i++)
+{
+if (!isComponentEnabled(compMask, i))
+continue;
 
-// sign extend all enabled components. If we have a fill 
vVertexElements, output to current simdvertex
-for (uint32_t i = 0; i < 4; i++)
+if (compCtrl[i] == ComponentControl::StoreSrc)
 {
-if (isComponentEnabled(compMask, i))
-{
-if (compCtrl[i] == ComponentControl::StoreSrc)
-{
-// if x or z, extract 128bits from lane 0, else for y or 
w, extract from lane 1
-uint32_t lane = ((i == 0) || (i == 2)) ? 0 : 1;
-// if x or y, use vi128XY permute result, else use vi128ZW
-Value* selectedPermute = (i < 2) ? vi128XY : vi128ZW;
-
-// sign extend
-vVertexElements[currentVertexElement] = 
PMOVSXBD(BITCAST(VEXTRACT(selectedPermute, C(lane)), v16x8Ty));
-
-// denormalize if needed
-if (conversionType != CONVERT_NONE)
-{
-vVertexElements[currentVertexElement] = 
FMUL(CAST(fpCast, vVertexElements[currentVertexElement], mSimdFP32Ty), 
conversionFactor);
-}
-  

[Mesa-dev] [PATCH 10/20] swr/rast: SIMD16 Fetch - Fully widen 32-bit float vertex components

2017-12-14 Thread Tim Rowley
---
 .../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |   3 +-
 .../drivers/swr/rasterizer/jitter/builder_misc.cpp |  41 -
 .../drivers/swr/rasterizer/jitter/builder_misc.h   |   7 +-
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp| 175 ++---
 4 files changed, 194 insertions(+), 32 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
index 44fc857371..ac8b3badf6 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
@@ -44,9 +44,10 @@ inst_aliases = {
 intrinsics = [
 ['VGATHERPD', 'x86_avx2_gather_d_pd_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
 ['VGATHERPS', 'x86_avx2_gather_d_ps_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
-['VGATHERPS2', 'x86_avx512_gather_dps_512', ['src', 'pBase', 
'indices', 'mask', 'scale']],
+['VGATHERPS_16', 'x86_avx512_gather_dps_512', ['src', 'pBase', 
'indices', 'mask', 'scale']],
 ['VGATHERDD', 'x86_avx2_gather_d_d_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
 ['VPSRLI', 'x86_avx2_psrli_d', ['src', 'imm']],
+['VPSRLI_16', 'x86_avx512_psrli_d_512', ['src', 'imm']],
 ['VSQRTPS', 'x86_avx_sqrt_ps_256', ['a']],
 ['VRSQRTPS', 'x86_avx_rsqrt_ps_256', ['a']],
 ['VRCPPS', 'x86_avx_rcp_ps_256', ['a']],
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
index 04092541e5..b2210db717 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
@@ -639,7 +639,7 @@ namespace SwrJit
 }
 
 #if USE_SIMD16_BUILDER
-Value *Builder::GATHERPS2(Value *vSrc, Value *pBase, Value *vIndices, 
Value *vMask, uint8_t scale)
+Value *Builder::GATHERPS_16(Value *vSrc, Value *pBase, Value *vIndices, 
Value *vMask, uint8_t scale)
 {
 Value *vGather = VUNDEF2_F();
 
@@ -649,7 +649,7 @@ namespace SwrJit
 // force mask to , required by vgather2
 Value *mask = BITCAST(vMask, mInt16Ty);
 
-vGather = VGATHERPS2(vSrc, pBase, vIndices, mask, 
C((uint32_t)scale));
+vGather = VGATHERPS_16(vSrc, pBase, vIndices, mask, 
C((uint32_t)scale));
 }
 else
 {
@@ -659,8 +659,10 @@ namespace SwrJit
 Value *indices0 = EXTRACT2_I(vIndices, 0);
 Value *indices1 = EXTRACT2_I(vIndices, 1);
 
-Value *mask0 = EXTRACT2_I(vMask, 0);
-Value *mask1 = EXTRACT2_I(vMask, 1);
+Value *vmask16 = VMASK2(vMask);
+
+Value *mask0 = MASK(EXTRACT2_I(vmask16, 0));  // TODO: do this 
better..
+Value *mask1 = MASK(EXTRACT2_I(vmask16, 1));
 
 Value *gather0 = GATHERPS(src0, pBase, indices0, mask0, scale);
 Value *gather1 = GATHERPS(src1, pBase, indices1, mask1, scale);
@@ -771,6 +773,37 @@ namespace SwrJit
 return vGather;
 }
 
+#if USE_SIMD16_BUILDER
+Value *Builder::PSRLI(Value *a, Value *imm)
+{
+return VPSRLI(a, imm);
+}
+
+Value *Builder::PSRLI_16(Value *a, Value *imm)
+{
+Value *result = VUNDEF2_I();
+
+// use avx512 shift right instruction if available
+if (JM()->mArch.AVX512F())
+{
+result = VPSRLI_16(a, imm);
+}
+else
+{
+Value *a0 = EXTRACT2_I(a, 0);
+Value *a1 = EXTRACT2_I(a, 1);
+
+Value *result0 = PSRLI(a0, imm);
+Value *result1 = PSRLI(a1, imm);
+
+result = INSERT2_I(result, result0, 0);
+result = INSERT2_I(result, result1, 1);
+}
+
+return result;
+}
+
+#endif
 #if USE_SIMD16_BUILDER
 //
 /// @brief
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h 
b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
index d858a827db..62360a3ad7 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
+++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.h
@@ -130,7 +130,7 @@ void Gather4(const SWR_FORMAT format, Value* pSrcBase, 
Value* byteOffsets,
 
 Value *GATHERPS(Value *src, Value *pBase, Value *indices, Value *mask, uint8_t 
scale = 1);
 #if USE_SIMD16_BUILDER
-Value *GATHERPS2(Value *src, Value *pBase, Value *indices, Value *mask, 
uint8_t scale = 1);
+Value *GATHERPS_16(Value *src, Value *pBase, Value *indices, Value *mask, 
uint8_t scale = 1);
 #endif
 void GATHER4PS(const SWR_FORMAT_INFO , Value* pSrcBase, Value* 
byteOffsets,
Value* mask, Value* vGatherComponents[], bool bPackedOutput);
@@ -141,6 +141,11 @@ void GATHER4DD(const SWR_FORMAT_INFO , Value* 
pSrcBase, Value* byteOffsets,
 
 Value *GATHERPD(Value* src, Value* 

[Mesa-dev] [PATCH 07/20] swr/rast: Move GatherScissors to header

2017-12-14 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 127 -
 src/gallium/drivers/swr/rasterizer/core/binner.h   | 127 +
 2 files changed, 127 insertions(+), 127 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 8a5356b168..22996c5a5d 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -212,133 +212,6 @@ INLINE void ProcessAttributes(
 }
 }
 
-//
-/// @brief  Gather scissor rect data based on per-prim viewport indices.
-/// @param pScissorsInFixedPoint - array of scissor rects in 16.8 fixed point.
-/// @param pViewportIndex - array of per-primitive vewport indexes.
-/// @param scisXmin - output vector of per-prmitive scissor rect Xmin data.
-/// @param scisYmin - output vector of per-prmitive scissor rect Ymin data.
-/// @param scisXmax - output vector of per-prmitive scissor rect Xmax data.
-/// @param scisYmax - output vector of per-prmitive scissor rect Ymax data.
-//
-/// @todo:  Look at speeding this up -- weigh against corresponding costs in 
rasterizer.
-static void GatherScissors(const SWR_RECT *pScissorsInFixedPoint, const 
uint32_t *pViewportIndex,
-simdscalari , simdscalari , simdscalari , 
simdscalari )
-{
-scisXmin = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[7]].xmin,
-pScissorsInFixedPoint[pViewportIndex[6]].xmin,
-pScissorsInFixedPoint[pViewportIndex[5]].xmin,
-pScissorsInFixedPoint[pViewportIndex[4]].xmin,
-pScissorsInFixedPoint[pViewportIndex[3]].xmin,
-pScissorsInFixedPoint[pViewportIndex[2]].xmin,
-pScissorsInFixedPoint[pViewportIndex[1]].xmin,
-pScissorsInFixedPoint[pViewportIndex[0]].xmin);
-scisYmin = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[7]].ymin,
-pScissorsInFixedPoint[pViewportIndex[6]].ymin,
-pScissorsInFixedPoint[pViewportIndex[5]].ymin,
-pScissorsInFixedPoint[pViewportIndex[4]].ymin,
-pScissorsInFixedPoint[pViewportIndex[3]].ymin,
-pScissorsInFixedPoint[pViewportIndex[2]].ymin,
-pScissorsInFixedPoint[pViewportIndex[1]].ymin,
-pScissorsInFixedPoint[pViewportIndex[0]].ymin);
-scisXmax = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[7]].xmax,
-pScissorsInFixedPoint[pViewportIndex[6]].xmax,
-pScissorsInFixedPoint[pViewportIndex[5]].xmax,
-pScissorsInFixedPoint[pViewportIndex[4]].xmax,
-pScissorsInFixedPoint[pViewportIndex[3]].xmax,
-pScissorsInFixedPoint[pViewportIndex[2]].xmax,
-pScissorsInFixedPoint[pViewportIndex[1]].xmax,
-pScissorsInFixedPoint[pViewportIndex[0]].xmax);
-scisYmax = _simd_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[7]].ymax,
-pScissorsInFixedPoint[pViewportIndex[6]].ymax,
-pScissorsInFixedPoint[pViewportIndex[5]].ymax,
-pScissorsInFixedPoint[pViewportIndex[4]].ymax,
-pScissorsInFixedPoint[pViewportIndex[3]].ymax,
-pScissorsInFixedPoint[pViewportIndex[2]].ymax,
-pScissorsInFixedPoint[pViewportIndex[01]].ymax,
-pScissorsInFixedPoint[pViewportIndex[00]].ymax);
-}
-
-static void GatherScissors(const SWR_RECT *pScissorsInFixedPoint, const 
uint32_t *pViewportIndex,
-simd16scalari , simd16scalari , simd16scalari , 
simd16scalari )
-{
-scisXmin = _simd16_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[15]].xmin,
-pScissorsInFixedPoint[pViewportIndex[14]].xmin,
-pScissorsInFixedPoint[pViewportIndex[13]].xmin,
-pScissorsInFixedPoint[pViewportIndex[12]].xmin,
-pScissorsInFixedPoint[pViewportIndex[11]].xmin,
-pScissorsInFixedPoint[pViewportIndex[10]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 9]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 8]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 7]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 6]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 5]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 4]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 3]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 2]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 1]].xmin,
-pScissorsInFixedPoint[pViewportIndex[ 0]].xmin);
-
-scisYmin = _simd16_set_epi32(
-pScissorsInFixedPoint[pViewportIndex[15]].ymin,
-pScissorsInFixedPoint[pViewportIndex[14]].ymin,
-pScissorsInFixedPoint[pViewportIndex[13]].ymin,
-pScissorsInFixedPoint[pViewportIndex[12]].ymin,
-pScissorsInFixedPoint[pViewportIndex[11]].ymin,
-pScissorsInFixedPoint[pViewportIndex[10]].ymin,
-pScissorsInFixedPoint[pViewportIndex[ 9]].ymin,
-pScissorsInFixedPoint[pViewportIndex[ 8]].ymin,
-

[Mesa-dev] [PATCH 09/20] swr/rast: Pass prim to ClipSimd

2017-12-14 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/clip.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index 148f661ab4..8b947668d3 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -437,7 +437,7 @@ public:
 return SIMD_T::movemask_ps(vClipCullMask);
 }
 
-void ClipSimd(const typename SIMD_T::Float , const typename 
SIMD_T::Float , PA_STATE , const typename SIMD_T::Integer 
, const typename SIMD_T::Integer )
+void ClipSimd(const typename SIMD_T::Vec4 prim[], const typename 
SIMD_T::Float , const typename SIMD_T::Float , PA_STATE 
, const typename SIMD_T::Integer , const typename SIMD_T::Integer 
)
 {
 // input/output vertex store for clipper
 SIMDVERTEX_T vertices[7]; // maximum 7 verts generated per 
triangle
@@ -452,10 +452,9 @@ public:
 
 // assemble pos
 typename SIMD_T::Vec4 tmpVector[NumVertsPerPrim];
-pa.Assemble(VERTEX_POSITION_SLOT, tmpVector);
 for (uint32_t i = 0; i < NumVertsPerPrim; ++i)
 {
-vertices[i].attrib[VERTEX_POSITION_SLOT] = tmpVector[i];
+vertices[i].attrib[VERTEX_POSITION_SLOT] = prim[i];
 }
 
 // assemble attribs
@@ -568,7 +567,8 @@ public:
 SIMDVERTEX_T transposedPrims[2];
 
 #endif
-for (uint32_t inputPrim = 0; inputPrim < pa.NumPrims(); ++inputPrim)
+uint32_t numInputPrims = pa.NumPrims();
+for (uint32_t inputPrim = 0; inputPrim < numInputPrims; ++inputPrim)
 {
 uint32_t numEmittedVerts = pVertexCount[inputPrim];
 if (numEmittedVerts < NumVertsPerPrim)
@@ -716,7 +716,7 @@ public:
 AR_BEGIN(FEGuardbandClip, pa.pDC->drawId);
 // we have to clip tris, execute the clipper, which will also
 // call the binner
-ClipSimd(SIMD_T::vmask_ps(primMask), SIMD_T::vmask_ps(clipMask), 
pa, primId, viewportIdx);
+ClipSimd(prim, SIMD_T::vmask_ps(primMask), 
SIMD_T::vmask_ps(clipMask), pa, primId, viewportIdx);
 AR_END(FEGuardbandClip, 1);
 }
 else if (validMask)
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radv: set FORCE_SIMD_DIST(1) for compute when profitable

2017-12-14 Thread Samuel Pitoiset



On 12/14/2017 08:35 PM, Bas Nieuwenhuizen wrote:

Reviewed-by: Bas Nieuwenhuizen 

Would it make sense to move the compute_resource_limits calculation to
pipeline creation time?


Yeah, possibly.



On Thu, Dec 14, 2017 at 3:51 PM, Samuel Pitoiset
 wrote:

Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset 
---
  src/amd/vulkan/radv_cmd_buffer.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index d6aaff707b..4a048485c8 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -2561,6 +2561,7 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
  {
 struct radv_shader_variant *compute_shader;
 struct radv_pipeline *pipeline = cmd_buffer->state.compute_pipeline;
+   struct radv_device *device = cmd_buffer->device;
 unsigned compute_resource_limits;
 unsigned waves_per_threadgroup;
 uint64_t va;
@@ -2602,6 +2603,19 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
 compute_resource_limits =
 S_00B854_SIMD_DEST_CNTL(waves_per_threadgroup % 4 == 0);

+   if (device->physical_device->rad_info.chip_class >= CIK) {
+   unsigned num_cu_per_se =
+   
device->physical_device->rad_info.num_good_compute_units /
+   device->physical_device->rad_info.max_se;
+
+   /* Force even distribution on all SIMDs in CU if the workgroup
+* size is 64. This has shown some good improvements if # of
+* CUs per SE is not a multiple of 4.
+*/
+   if (num_cu_per_se % 4 && waves_per_threadgroup == 1)
+   compute_resource_limits |= S_00B854_FORCE_SIMD_DIST(1);
+   }
+
 radeon_set_sh_reg(cmd_buffer->cs, R_00B854_COMPUTE_RESOURCE_LIMITS,
   compute_resource_limits);

--
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] radv: always emit all compute block components

2017-12-14 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Thu, Dec 14, 2017 at 12:51 PM, Samuel Pitoiset
 wrote:
> The number of grid components is always 3 when gl_NumWorkGroups
> is declared, because it relies on the number of components of
> nir_instrinsic_load_num_work_groups.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/common/ac_nir_to_llvm.c  |  9 ++---
>  src/amd/vulkan/radv_cmd_buffer.c | 15 +--
>  2 files changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index f3602a267d..ce25e57eba 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -745,8 +745,10 @@ static void create_function(struct nir_to_llvm_context 
> *ctx,
> switch (stage) {
> case MESA_SHADER_COMPUTE:
> radv_define_common_user_sgprs_phase1(ctx, stage, 
> has_previous_stage, previous_stage, _sgpr_info, , _sets);
> -   if (ctx->shader_info->info.cs.grid_components_used)
> -   add_user_sgpr_argument(, 
> LLVMVectorType(ctx->ac.i32, ctx->shader_info->info.cs.grid_components_used), 
> >num_work_groups); /* grid size */
> +   if (ctx->shader_info->info.cs.grid_components_used) {
> +   add_user_sgpr_argument(, ctx->ac.v3i32,
> +  >num_work_groups);
> +   }
> add_sgpr_argument(, ctx->ac.v3i32, >workgroup_ids);
> add_sgpr_argument(, ctx->ac.i32, >tg_size);
> add_vgpr_argument(, ctx->ac.v3i32, 
> >local_invocation_ids);
> @@ -950,7 +952,8 @@ static void create_function(struct nir_to_llvm_context 
> *ctx,
> switch (stage) {
> case MESA_SHADER_COMPUTE:
> if (ctx->shader_info->info.cs.grid_components_used) {
> -   set_userdata_location_shader(ctx, AC_UD_CS_GRID_SIZE, 
> _sgpr_idx, ctx->shader_info->info.cs.grid_components_used);
> +   set_userdata_location_shader(ctx, AC_UD_CS_GRID_SIZE,
> +_sgpr_idx, 3);
> }
> break;
> case MESA_SHADER_VERTEX:
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 68371dbbe7..e68c5a4038 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -3487,9 +3487,6 @@ radv_emit_dispatch_packets(struct radv_cmd_buffer 
> *cmd_buffer,
> struct radeon_winsys_cs *cs = cmd_buffer->cs;
> struct ac_userdata_info *loc;
> unsigned dispatch_initiator;
> -   uint8_t grid_used;
> -
> -   grid_used = compute_shader->info.info.cs.grid_components_used;
>
> loc = radv_lookup_user_sgpr(pipeline, MESA_SHADER_COMPUTE,
> AC_UD_CS_GRID_SIZE);
> @@ -3514,7 +3511,7 @@ radv_emit_dispatch_packets(struct radv_cmd_buffer 
> *cmd_buffer,
> radv_cs_add_buffer(ws, cs, info->indirect->bo, 8);
>
> if (loc->sgpr_idx != -1) {
> -   for (unsigned i = 0; i < grid_used; ++i) {
> +   for (unsigned i = 0; i < 3; ++i) {
> radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
> radeon_emit(cs, 
> COPY_DATA_SRC_SEL(COPY_DATA_MEM) |
> 
> COPY_DATA_DST_SEL(COPY_DATA_REG));
> @@ -3581,15 +3578,13 @@ radv_emit_dispatch_packets(struct radv_cmd_buffer 
> *cmd_buffer,
>
> if (loc->sgpr_idx != -1) {
> assert(!loc->indirect);
> -   assert(loc->num_sgprs == grid_used);
> +   assert(loc->num_sgprs == 3);
>
> radeon_set_sh_reg_seq(cs, 
> R_00B900_COMPUTE_USER_DATA_0 +
> - loc->sgpr_idx * 4, 
> grid_used);
> + loc->sgpr_idx * 4, 3);
> radeon_emit(cs, blocks[0]);
> -   if (grid_used > 1)
> -   radeon_emit(cs, blocks[1]);
> -   if (grid_used > 2)
> -   radeon_emit(cs, blocks[2]);
> +   radeon_emit(cs, blocks[1]);
> +   radeon_emit(cs, blocks[2]);
> }
>
> radeon_emit(cs, PKT3(PKT3_DISPATCH_DIRECT, 3, 0) |
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] amd/common: add ac_get_spi_shader_z_format()

2017-12-14 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

for the series.

On Thu, Dec 14, 2017 at 1:51 PM, Samuel Pitoiset
 wrote:
> ac_shader_util.c will contain shader helpers for RadeonSI
> and RADV.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/Makefile.sources|  5 -
>  src/amd/common/ac_shader_util.c | 45 
> +
>  src/amd/common/ac_shader_util.h | 33 ++
>  src/amd/common/meson.build  |  2 ++
>  4 files changed, 84 insertions(+), 1 deletion(-)
>  create mode 100644 src/amd/common/ac_shader_util.c
>  create mode 100644 src/amd/common/ac_shader_util.h
>
> diff --git a/src/amd/Makefile.sources b/src/amd/Makefile.sources
> index 1bc5a7fe7e..10c4827e19 100644
> --- a/src/amd/Makefile.sources
> +++ b/src/amd/Makefile.sources
> @@ -46,7 +46,10 @@ AMD_COMPILER_FILES = \
> common/ac_llvm_util.h \
> common/ac_shader_abi.h \
> common/ac_shader_info.c \
> -   common/ac_shader_info.h
> +   common/ac_shader_info.h \
> +   common/ac_shader_util.c \
> +   common/ac_shader_util.h
> +
>
>  AMD_NIR_FILES = \
> common/ac_nir_to_llvm.c \
> diff --git a/src/amd/common/ac_shader_util.c b/src/amd/common/ac_shader_util.c
> new file mode 100644
> index 00..9d33a46559
> --- /dev/null
> +++ b/src/amd/common/ac_shader_util.c
> @@ -0,0 +1,45 @@
> +/*
> + * Copyright 2012 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "ac_shader_util.h"
> +#include "sid.h"
> +
> +unsigned
> +ac_get_spi_shader_z_format(bool writes_z, bool writes_stencil,
> +  bool writes_samplemask)
> +{
> +   if (writes_z) {
> +   /* Z needs 32 bits. */
> +   if (writes_samplemask)
> +   return V_028710_SPI_SHADER_32_ABGR;
> +   else if (writes_stencil)
> +   return V_028710_SPI_SHADER_32_GR;
> +   else
> +   return V_028710_SPI_SHADER_32_R;
> +   } else if (writes_stencil || writes_samplemask) {
> +   /* Both stencil and sample mask need only 16 bits. */
> +   return V_028710_SPI_SHADER_UINT16_ABGR;
> +   } else {
> +   return V_028710_SPI_SHADER_ZERO;
> +   }
> +}
> diff --git a/src/amd/common/ac_shader_util.h b/src/amd/common/ac_shader_util.h
> new file mode 100644
> index 00..1f971e76f1
> --- /dev/null
> +++ b/src/amd/common/ac_shader_util.h
> @@ -0,0 +1,33 @@
> +/*
> + * Copyright 2012 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#ifndef AC_SHADER_UTIL_H
> +#define 

Re: [Mesa-dev] [PATCH v2 3/4] meson: build clover

2017-12-14 Thread Jan Vesely
On Wed, 2017-12-13 at 14:56 -0800, Dylan Baker wrote:
> Quoting Jan Vesely (2017-12-13 14:23:21)
> > On Wed, 2017-12-13 at 13:54 -0800, Dylan Baker wrote:
> > > Quoting Jan Vesely (2017-12-13 12:53:25)
> > > > On Wed, 2017-12-13 at 09:47 -0800, Dylan Baker wrote:
> > > > > +if (with_gallium_va or with_gallium_vdpau or with_gallium_omx or
> > > > > +with_gallium_xvmc or with_dri)
> > > > > +  pipe_loader_link_with += libgalliumvl
> > > > > +else
> > > > > +  pipe_loader_link_with += libgalliumvl_stubs
> > > > > +endif
> > > > > +if with_gallium_va or with_gallium_vdpau or with_gallium_omx or 
> > > > > with_gallium_xvmc  
> > > > 
> > > > git am complains about whitespace errors at the end of the above line.
> > > 
> > > I can fix that.
> > > 
> > > > 
> > > > I tested with:
> > > > meson -Ddri-drivers= -Dgallium-drivers=r600 -Dopengl=true 
> > > > -Dplatforms=x11 -Dopencl=true
> > > > 
> > > > meson asked for libdrm_amdgpu dependency even though I'm only building 
> > > > clover+r600g driver.
> > > 
> > > That's probably because you didn't add `-Dvulkan-drivers=`, since radv 
> > > does depend
> > > on libdrm_amdgpu. If you add that and still get a request for 
> > > libdrm_amdgpu let
> > > me know and I'll look into it further.
> > 
> > right, that fixes it. sorry for the noise.
> > 
> > > 
> > > > after a bit of fiddling with PATH and PK_CONFIG_PATH to pick up the 
> > > > latest llvm/liblclc
> > > > linking failed with:
> > > > src/gallium/auxiliary/libgallium.a(gallivm_lp_bld_misc.cpp.o):(.data.rel.ro._ZTI26DelegatingJITMemoryManager[_ZTI26DelegatingJITMemoryManager]+0x10):
> > > >  undefined reference to `typeinfo for llvm::RTDyldMemoryManager'
> > > > collect2: error: ld returned 1 exit status
> > > > 
> > > > this looks like it did not pick up the rtti setting from llvm-config:
> > > > $ ~/.local/bin/llvm-config --has-rtti
> > > > NO
> > > > $ ~/.local/bin/llvm-config --cxxflags | grep -o fno-rtti
> > > > fno-rtti
> > > > 
> > > > rtti setting is quite messy since clover uses dynamic_cast. I think it
> > > > should be OK to only support rtti build of llvm if it's detected at
> > > > configure time
> > > > 
> > > > it'd also be nice for meson to remember llvm-config location provided
> > > > at configure time. otherwise I need to set PATH every time I run ninja
> > > > in case it tries to reconfigure. I guess that's what "TODO llvm-prefix" 
> > > > will achieve, right?
> > > 
> > > I'm not sure what the right way to solve this is, maybe to cache any 
> > > relavent
> > > environment variables between runs in meson itself, since pkg-config has 
> > > the
> > > same problem with PKG_CONFIG_PATH. ATM there is no way to implement the
> > > llvm-prefix in meson the way it is in autotools.
> > 
> > would it be easier to explicitly set location of llvm-config and
> > libclc.pc?
> > currently it works OK with system packages,
> > you can add Tested-by: Jan Vesely 
> > 
> > I'd need a way to permanently redirect the configuration to use local
> > builds of both llvm and libclc to use meson as my daily driver.
> > 
> > >
> > > 
> > > > 
> > > > in the end I got meson built clover to run (clinfo + simple demo) on my
> > > > turks with these changes:
> > > > * build and install libdrm_amdgpu -- should not be necessary for r600g
> > > > only build
> > > > * switch to distro (fedora) provided libclc and llvm -- avoids rtti
> > > > build problem (note libclc is just tagging along llvm since my local
> > > > builds install headers to the same location)
> > > > * fiddle with pipe-loader dir, for some reason LIBGL_DRIVERS_PATH did
> > > > not work when pointed to meson built pipe_r600.so. I'm not sure if this
> > > > is meson specific, it might be just my ignorance.
> > > 
> > > I'm not certain, though Curro probably knows, but the dynamic pipeloader 
> > > is
> > > hard-coded to search $install/$libdir/gallium-pipe for pipe drivers, so 
> > > you may
> > > need to run ninja install to make it work. Alternatively LD_LIBRARY_PATH 
> > > might
> > > work as well.
> > 
> > I assumed that it was some loader configuration outside meson. the
> > surprising part was that it tried to open cwd local paths:
> > openat(AT_FDCWD, "lib64/gallium-pipe/pipe_r600.so", O_RDONLY|O_CLOEXEC) = 
> > -1 ENOENT (No such file or directory)
> > 
> > Is this expected without explicitly setting install prefix?
> > 
> > Jan
> 
> If you don't set a prefix you'll get the system default, for fedora (because 
> of
> the merged /usr) that's /. Does it work with autotools without install?

I use a symlink from install target to build dir to make it work on
autotools build.
My point was that meson defines PIPE_SEARCH_DIR to be relative path
'-DPIPE_SEARCH_DIR="lib64/gallium-pipe"'
even if I configure meson using --prefix=$HOME/.local/

Jan

>  I can't
> image it would. But dynamic library loading in C is hardly my domain of
> expertise so I could be totally wrong :)
> 
> Dylan


signature.asc
Description: This is a 

Re: [Mesa-dev] [PATCH 3/3] radv: set FORCE_SIMD_DIST(1) for compute when profitable

2017-12-14 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

Would it make sense to move the compute_resource_limits calculation to
pipeline creation time?

On Thu, Dec 14, 2017 at 3:51 PM, Samuel Pitoiset
 wrote:
> Ported from RadeonSI.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index d6aaff707b..4a048485c8 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -2561,6 +2561,7 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
> *cmd_buffer)
>  {
> struct radv_shader_variant *compute_shader;
> struct radv_pipeline *pipeline = cmd_buffer->state.compute_pipeline;
> +   struct radv_device *device = cmd_buffer->device;
> unsigned compute_resource_limits;
> unsigned waves_per_threadgroup;
> uint64_t va;
> @@ -2602,6 +2603,19 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
> *cmd_buffer)
> compute_resource_limits =
> S_00B854_SIMD_DEST_CNTL(waves_per_threadgroup % 4 == 0);
>
> +   if (device->physical_device->rad_info.chip_class >= CIK) {
> +   unsigned num_cu_per_se =
> +   
> device->physical_device->rad_info.num_good_compute_units /
> +   device->physical_device->rad_info.max_se;
> +
> +   /* Force even distribution on all SIMDs in CU if the workgroup
> +* size is 64. This has shown some good improvements if # of
> +* CUs per SE is not a multiple of 4.
> +*/
> +   if (num_cu_per_se % 4 && waves_per_threadgroup == 1)
> +   compute_resource_limits |= 
> S_00B854_FORCE_SIMD_DIST(1);
> +   }
> +
> radeon_set_sh_reg(cmd_buffer->cs, R_00B854_COMPUTE_RESOURCE_LIMITS,
>   compute_resource_limits);
>
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components

2017-12-14 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

for the series.

On Thu, Dec 14, 2017 at 4:48 PM, Samuel Pitoiset
 wrote:
> We should also not load the input SGPRs and VGPRS, but
> let's start with this for now.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_shader.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index 4a3fdfa80e..907c1986f8 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -392,13 +392,18 @@ radv_fill_shader_variant(struct radv_device *device,
> break;
> case MESA_SHADER_FRAGMENT:
> break;
> -   case MESA_SHADER_COMPUTE:
> +   case MESA_SHADER_COMPUTE: {
> +   struct ac_shader_info *info = >info.info;
> variant->rsrc2 |=
> -   S_00B84C_TGID_X_EN(1) | S_00B84C_TGID_Y_EN(1) |
> -   S_00B84C_TGID_Z_EN(1) | S_00B84C_TIDIG_COMP_CNT(2) |
> +   S_00B84C_TGID_X_EN(info->cs.uses_block_id[0]) |
> +   S_00B84C_TGID_Y_EN(info->cs.uses_block_id[1]) |
> +   S_00B84C_TGID_Z_EN(info->cs.uses_block_id[2]) |
> +   S_00B84C_TIDIG_COMP_CNT(info->cs.uses_thread_id[2] ? 
> 2 :
> +   info->cs.uses_thread_id[1] ? 
> 1 : 0) |
> S_00B84C_TG_SIZE_EN(1) |
> S_00B84C_LDS_SIZE(variant->config.lds_size);
> break;
> +   }
> default:
> unreachable("unsupported shader type");
> break;
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] amd/common: scan which components of gl_WorkGroupID are used

2017-12-14 Thread Bas Nieuwenhuizen
On Thu, Dec 14, 2017 at 4:48 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/common/ac_shader_info.c | 8 
>  src/amd/common/ac_shader_info.h | 1 +
>  2 files changed, 9 insertions(+)
>
> diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
> index 09dd4bbd55..01949770d6 100644
> --- a/src/amd/common/ac_shader_info.c
> +++ b/src/amd/common/ac_shader_info.c
> @@ -45,6 +45,14 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
> ac_shader_info *info)
> case nir_intrinsic_load_num_work_groups:
> info->cs.uses_grid_size = true;
> break;
> +   case nir_intrinsic_load_work_group_id: {
> +   unsigned mask = nir_ssa_def_components_read(>dest.ssa);

Nice find that there is an utility function for this.

Reviewed-by: Bas Nieuwenhuizen 
> +   while (mask) {
> +   unsigned i = u_bit_scan();
> +   info->cs.uses_block_id[i] = true;
> +   }
> +   break;
> +   }
> case nir_intrinsic_load_sample_id:
> info->ps.force_persample = true;
> break;
> diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
> index 3c809cce13..7beefd02ac 100644
> --- a/src/amd/common/ac_shader_info.h
> +++ b/src/amd/common/ac_shader_info.h
> @@ -43,6 +43,7 @@ struct ac_shader_info {
> } ps;
> struct {
> bool uses_grid_size;
> +   bool uses_block_id[3];
> } cs;
>  };
>
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: do not load the local invocation index when it's unused

2017-12-14 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Thu, Dec 14, 2017 at 5:32 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/common/ac_nir_to_llvm.c | 3 ++-
>  src/amd/common/ac_shader_info.c | 3 +++
>  src/amd/common/ac_shader_info.h | 1 +
>  src/amd/vulkan/radv_shader.c| 2 +-
>  4 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 0e1d7e0082..2fe346b012 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -751,7 +751,8 @@ static void create_function(struct nir_to_llvm_context 
> *ctx,
>>num_work_groups);
> }
> add_sgpr_argument(, ctx->ac.v3i32, >workgroup_ids);
> -   add_sgpr_argument(, ctx->ac.i32, >tg_size);
> +   if (ctx->shader_info->info.cs.uses_local_invocation_idx)
> +   add_sgpr_argument(, ctx->ac.i32, >tg_size);
> add_vgpr_argument(, ctx->ac.v3i32, 
> >local_invocation_ids);
> break;
> case MESA_SHADER_VERTEX:
> diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
> index 87744ed23e..3299b47e6b 100644
> --- a/src/amd/common/ac_shader_info.c
> +++ b/src/amd/common/ac_shader_info.c
> @@ -58,6 +58,9 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
> ac_shader_info *info)
> }
> break;
> }
> +   case nir_intrinsic_load_local_invocation_index:
> +   info->cs.uses_local_invocation_idx = true;
> +   break;
> case nir_intrinsic_load_sample_id:
> info->ps.force_persample = true;
> break;
> diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
> index 0136d5af40..79e5615254 100644
> --- a/src/amd/common/ac_shader_info.h
> +++ b/src/amd/common/ac_shader_info.h
> @@ -45,6 +45,7 @@ struct ac_shader_info {
> bool uses_grid_size;
> bool uses_block_id[3];
> bool uses_thread_id[3];
> +   bool uses_local_invocation_idx;
> } cs;
>  };
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index 907c1986f8..ab8ba42511 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -400,7 +400,7 @@ radv_fill_shader_variant(struct radv_device *device,
> S_00B84C_TGID_Z_EN(info->cs.uses_block_id[2]) |
> S_00B84C_TIDIG_COMP_CNT(info->cs.uses_thread_id[2] ? 
> 2 :
> info->cs.uses_thread_id[1] ? 
> 1 : 0) |
> -   S_00B84C_TG_SIZE_EN(1) |
> +   
> S_00B84C_TG_SIZE_EN(info->cs.uses_local_invocation_idx) |
> S_00B84C_LDS_SIZE(variant->config.lds_size);
> break;
> }
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] GL_EXT_disjoint_timer_query series

2017-12-14 Thread Ian Romanick
Since you remembered to modify dispatch_sanity.cpp in patch 2, I'm going
to assume that 'make check' still passes.  If that's the case, the series is

Reviewed-by: Ian Romanick 

On 12/14/2017 04:03 AM, Tapani Pälli wrote:
> Hi;
> 
> Here's a revisited GL_EXT_disjoint_timer_query series. One patch got
> dropped (as discussed with Lionel) and enabling is now via
> EXT_disjoint_timer_query boolean as was intended (Ian).
> 
> Thanks;
> 
> Tapani Pälli (4):
>   mesa: add DisjointOperation to gl_shared_state
>   glapi: add GL_EXT_disjoint_timer_query
>   mesa: GL_EXT_disjoint_timer_query extension API bits
>   i965: enable EXT_disjoint_timer_query extension
> 
>  src/mapi/glapi/gen/es_EXT.xml| 16 
>  src/mapi/glapi/gen/gl_API.xml|  4 ++--
>  src/mesa/drivers/dri/i965/intel_extensions.c |  2 ++
>  src/mesa/main/extensions_table.h |  1 +
>  src/mesa/main/get.c  | 17 +
>  src/mesa/main/get_hash_params.py |  5 +
>  src/mesa/main/glheader.h |  4 
>  src/mesa/main/mtypes.h   |  9 +
>  src/mesa/main/queryobj.c |  3 ++-
>  src/mesa/main/robustness.c   |  1 +
>  src/mesa/main/tests/dispatch_sanity.cpp  |  5 +
>  11 files changed, 64 insertions(+), 3 deletions(-)
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] util: scons: wire up the sha1 test

2017-12-14 Thread Emil Velikov
From: Emil Velikov 

Cc: 
Fixes: 513d7ffa23d ("util: Add a SHA1 unit test program")
Signed-off-by: Emil Velikov 
---
We want this and the original commit for stable, to catch any 
breakage that may happen.

 src/util/SConscript | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/util/SConscript b/src/util/SConscript
index 0c3c98a5f4c..66a0d1c04ff 100644
--- a/src/util/SConscript
+++ b/src/util/SConscript
@@ -63,3 +63,10 @@ roundeven_test = env.Program(
 source = ['roundeven_test.c'],
 )
 env.UnitTest("roundeven_test", roundeven_test)
+
+env.Prepend(LIBS = [mesautil])
+mesa_sha1_test = env.Program(
+target = 'mesa-sha1_test',
+source = ['mesa-sha1_test.c'],
+)
+env.UnitTest("mesa-sha1_test", mesa_sha1_test)
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/5] i965: set ASTC5x5 workaround texture type tracking on texture validate

2017-12-14 Thread kevin . rogovin
From: Kevin Rogovin 

Signed-off-by: Kevin Rogovin 
---
 src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c 
b/src/mesa/drivers/dri/i965/intel_tex_validate.c
index 2b7798c..812c0c7 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
@@ -188,11 +188,24 @@ brw_validate_textures(struct brw_context *brw)
struct gl_context *ctx = >ctx;
const int max_enabled_unit = ctx->Texture._MaxEnabledTexImageUnit;
 
+   brw->astc5x5_wa.texture_astc5x5_present = false;
+   brw->astc5x5_wa.texture_with_auxilary_present = false;
for (int unit = 0; unit <= max_enabled_unit; unit++) {
   struct gl_texture_unit *tex_unit = >Texture.Unit[unit];
 
   if (tex_unit->_Current) {
+ struct intel_texture_object *tex =
+intel_texture_object(tex_unit->_Current);
+ struct intel_mipmap_tree *mt = tex->mt;
+
  intel_finalize_mipmap_tree(brw, unit);
+ if (mt && mt->aux_usage != ISL_AUX_USAGE_NONE) {
+brw->astc5x5_wa.texture_with_auxilary_present = true;
+ }
+ if (tex->_Format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ tex->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5) {
+brw->astc5x5_wa.texture_astc5x5_present = true;
+ }
   }
}
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/5] i965: define astx5x5 workaround infrastructure

2017-12-14 Thread kevin . rogovin
From: Kevin Rogovin 

Signed-off-by: Kevin Rogovin 
---
 src/mesa/drivers/dri/i965/Makefile.sources|  1 +
 src/mesa/drivers/dri/i965/brw_context.c   |  6 +
 src/mesa/drivers/dri/i965/brw_context.h   | 24 ++
 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c   | 36 +++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  1 +
 src/mesa/drivers/dri/i965/meson.build |  1 +
 6 files changed, 69 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index d928f71..4698fcb 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -77,6 +77,7 @@ i965_FILES = \
gen7_urb.c \
gen8_depth_state.c \
gen8_multisample_state.c \
+gen9_astc5x5_wa.c \
hsw_queryobj.c \
hsw_sol.c \
intel_batchbuffer.c \
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 126c187..f3ccbda 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -1073,6 +1073,12 @@ brwCreateContext(gl_api api,
if (ctx->Extensions.INTEL_performance_query)
   brw_init_performance_queries(brw);
 
+   brw->astc5x5_wa.required = (devinfo->gen == 9);
+   brw->astc5x5_wa.mode = BRW_ASTC5x5_WA_MODE_NONE;
+   brw->astc5x5_wa.texture_astc5x5_present = false;
+   brw->astc5x5_wa.texture_with_auxilary_present = false;
+   brw->astc5x5_wa.blorp_sampling_from_astc5x5 = false;
+
vbo_use_buffer_objects(ctx);
vbo_always_unmap_buffers(ctx);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 0f0aad8..60a1d3b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -166,6 +166,12 @@ enum brw_cache_id {
BRW_MAX_CACHE
 };
 
+enum brw_astc5x5_wa_mode_t {
+   BRW_ASTC5x5_WA_MODE_NONE,
+   BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5,
+   BRW_ASTC5x5_WA_MODE_HAS_AUX,
+};
+
 enum brw_state_id {
/* brw_cache_ids must come first - see brw_program_cache.c */
BRW_STATE_URB_FENCE = BRW_MAX_CACHE,
@@ -1263,6 +1269,19 @@ struct brw_context
 */
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];
 
+   /* Certain GEN's have a hardware bug where the sampler hangs if it attempts
+* to access auxilary buffers and an ASTC5x5 compressed buffer. The 
workaround
+* is to make sure that the texture cache is cleared between such accesses
+* and that such accesses have a command streamer stall between them.
+*/
+   struct {
+  bool required;
+  enum brw_astc5x5_wa_mode_t mode;
+  bool texture_astc5x5_present;
+  bool texture_with_auxilary_present;
+  bool blorp_sampling_from_astc5x5;
+   } astc5x5_wa;
+
__DRIcontext *driContext;
struct intel_screen *screen;
 };
@@ -1695,6 +1714,11 @@ void brw_query_internal_format(struct gl_context *ctx, 
GLenum target,
GLenum internalFormat, GLenum pname,
GLint *params);
 
+/* gen9_astc5x5_wa.c */
+void gen9_set_astc5x5_wa_mode(struct brw_context *brw,
+ enum brw_astc5x5_wa_mode_t mode);
+void gen9_astc5x5_perform_wa(struct brw_context *brw);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c 
b/src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c
new file mode 100644
index 000..247fd00
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c
@@ -0,0 +1,36 @@
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "intel_mipmap_tree.h"
+
+void
+gen9_set_astc5x5_wa_mode(struct brw_context *brw,
+enum brw_astc5x5_wa_mode_t mode)
+{
+   if (!brw->astc5x5_wa.required ||
+   mode == BRW_ASTC5x5_WA_MODE_NONE ||
+   brw->astc5x5_wa.mode == mode) {
+  return;
+   }
+
+   if (brw->astc5x5_wa.mode != BRW_ASTC5x5_WA_MODE_NONE) {
+  const uint32_t flags = PIPE_CONTROL_CS_STALL |
+ PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+  brw_emit_pipe_control_flush(brw, flags);
+   }
+
+   brw->astc5x5_wa.mode = mode;
+}
+
+void
+gen9_astc5x5_perform_wa(struct brw_context *brw)
+{
+   if (!brw->astc5x5_wa.required) {
+  return;
+   }
+
+   if (brw->astc5x5_wa.texture_astc5x5_present) {
+  gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5);
+   } else if (brw->astc5x5_wa.texture_with_auxilary_present) {
+  gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_AUX);
+   }
+}
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 91a6506..b7e2450 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -613,6 +613,7 @@ brw_new_batch(struct brw_context *brw)
 
/* Create a new batchbuffer and 

[Mesa-dev] [PATCH v2 4/5] i965: use ASTC5x5 workaround in brw_compute

2017-12-14 Thread kevin . rogovin
From: Kevin Rogovin 

Signed-off-by: Kevin Rogovin 
---
 src/mesa/drivers/dri/i965/brw_compute.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index 9be7523..c8d90f5 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -179,6 +179,12 @@ brw_dispatch_compute_common(struct gl_context *ctx)
 
brw_predraw_resolve_inputs(brw, false);
 
+   /* if necessary, perform astc5x5 workarounds to make sure sampling
+* from astc5x5 and textures with an auxilary surface have a command
+* streamer stall and texture invalidate between them.
+*/
+   gen9_astc5x5_perform_wa(brw);
+
/* Flush the batch if the batch/state buffers are nearly full.  We can
 * grow them if needed, but this is not free, so we'd like to avoid it.
 */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/5] i965: use ASTC5x5 workaround in brw_draw

2017-12-14 Thread kevin . rogovin
From: Kevin Rogovin 

Signed-off-by: Kevin Rogovin 
---
 src/mesa/drivers/dri/i965/brw_draw.c | 16 ++--
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  5 +
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 7e29dcf..2d3fb75 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -376,6 +376,8 @@ intel_disable_rb_aux_buffer(struct brw_context *brw,
  *
  * Resolve the depth buffer's HiZ buffer, resolve the depth buffer of each
  * enabled depth texture, and flush the render cache for any dirty textures.
+ * In addition, if the ASTC5x5 workaround is needed and if ASTC5x5 textures
+ * are present, resolve textures so that auxilary buffers are not needed.
  */
 void
 brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering)
@@ -413,9 +415,13 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool 
rendering)
  num_layers = INTEL_REMAINING_LAYERS;
   }
 
-  const bool disable_aux = rendering &&
+  const bool astc_disables_aux = (brw->astc5x5_wa.required &&
+ brw->astc5x5_wa.texture_astc5x5_present &&
+ tex_obj->mt->aux_usage != ISL_AUX_USAGE_NONE);
+
+  const bool disable_aux = (rendering &&
  intel_disable_rb_aux_buffer(brw, tex_obj->mt, min_level, num_levels,
- "for sampling");
+ "for sampling")) || astc_disables_aux;
 
   intel_miptree_prepare_texture(brw, tex_obj->mt, view_format,
 min_level, num_levels,
@@ -684,6 +690,12 @@ brw_prepare_drawing(struct gl_context *ctx,
brw_predraw_resolve_inputs(brw, true);
brw_predraw_resolve_framebuffer(brw);
 
+   /* if necessary, perform astc5x5 workarounds to make sure sampling
+* from astc5x5 and textures with an auxilary surface have a command
+* streamer stall and texture invalidate between them.
+*/
+   gen9_astc5x5_perform_wa(brw);
+
/* Bind all inputs, derive varying and size information:
 */
brw_merge_inputs(brw, arrays);
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index adf60a8..ccdb537 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -447,6 +447,11 @@ brw_aux_surface_disabled(const struct brw_context *brw,
 {
const struct gl_framebuffer *fb = brw->ctx.DrawBuffer;
 
+   if (brw->astc5x5_wa.required &&
+   brw->astc5x5_wa.texture_astc5x5_present) {
+  return true;
+   }
+
for (unsigned i = 0; i < fb->_NumColorDrawBuffers; i++) {
   const struct intel_renderbuffer *irb =
  intel_renderbuffer(fb->_ColorDrawBuffers[i]);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5/5] i965: ASTC5x5 workaround logic for blorp

2017-12-14 Thread kevin . rogovin
From: Kevin Rogovin 

Signed-off-by: Kevin Rogovin 
---
 src/mesa/drivers/dri/i965/genX_blorp_exec.c |  5 +
 src/mesa/drivers/dri/i965/intel_tex_image.c | 16 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c 
b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
index e8bc52e..97791b7 100644
--- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
+++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
@@ -230,6 +230,11 @@ genX(blorp_exec)(struct blorp_batch *batch,
struct gl_context *ctx = >ctx;
bool check_aperture_failed_once = false;
 
+   if (brw->astc5x5_wa.blorp_sampling_from_astc5x5) {
+  gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5);
+   } else {
+  gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_AUX);
+   }
/* Flush the sampler and render caches.  We definitely need to flush the
 * sampler cache so that we get updated contents from the render cache for
 * the glBlitFramebuffer() source.  Also, we are sometimes warned in the
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 37c8e24..60028bb 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -759,10 +759,18 @@ intel_get_tex_sub_image(struct gl_context *ctx,
DBG("%s\n", __func__);
 
if (_mesa_is_bufferobj(ctx->Pack.BufferObj)) {
-  if (intel_gettexsubimage_blorp(brw, texImage,
- xoffset, yoffset, zoffset,
- width, height, depth, format, type,
- pixels, >Pack))
+  bool blorp_success;
+
+  brw->astc5x5_wa.blorp_sampling_from_astc5x5 =
+ (texImage->TexFormat == MESA_FORMAT_RGBA_ASTC_5x5 ||
+  texImage->TexFormat == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5);
+  blorp_success = intel_gettexsubimage_blorp(brw, texImage,
+ xoffset, yoffset, zoffset,
+ width, height, depth,
+ format, type, pixels,
+ >Pack);
+  brw->astc5x5_wa.blorp_sampling_from_astc5x5 = false;
+  if (blorp_success)
  return;
 
   perf_debug("%s: fallback to CPU mapping in PBO case\n", __func__);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround

2017-12-14 Thread kevin . rogovin
From: Kevin Rogovin 

This patch series implements a needed workaround for Gen9 for ASTC5x5
sampler reads. The crux of the work around is to make sure that the
sampler does not read an ASTC5x5 texture and a surface with an auxilary
buffer without having a texture cache invalidate and command streamer
stall between such accesses.

With this patch series applied to the (current) master branch of mesa,
carchase works on my SKL GT4.

v2:
  Rename workaround functions from brw_ to gen9_
  (suggested/requested by Topi Pohjolainen).

  Place texture resolve to avoid using auxilary surface
  when ASTC5x5 is detected in brw_predraw_resolve_inputs()
  instead of another detected function; doing so allows
  one to avoid walking the textures again.
  (suggested/requested by Topi Pohjolainen).

  Emit command streamer stall in addition to texture
  invalidate.
  (original short-coming caught by Jason Ekstrand)

  Place workaround function in (new) dedicated file.

  Minor path re-ordering to accomodate changes.

Kevin Rogovin (5):
  i965: define astx5x5 workaround infrastructure
  i965: set ASTC5x5 workaround texture type tracking on texture validate
  i965: use ASTC5x5 workaround in brw_draw
  i965: use ASTC5x5 workaround in brw_compute
  i965: ASTC5x5 workaround logic for blorp

 src/mesa/drivers/dri/i965/Makefile.sources   |  1 +
 src/mesa/drivers/dri/i965/brw_compute.c  |  6 
 src/mesa/drivers/dri/i965/brw_context.c  |  6 
 src/mesa/drivers/dri/i965/brw_context.h  | 24 
 src/mesa/drivers/dri/i965/brw_draw.c | 16 +--
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  5 
 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c  | 36 
 src/mesa/drivers/dri/i965/genX_blorp_exec.c  |  5 
 src/mesa/drivers/dri/i965/intel_batchbuffer.c|  1 +
 src/mesa/drivers/dri/i965/intel_tex_image.c  | 16 ---
 src/mesa/drivers/dri/i965/intel_tex_validate.c   | 13 +
 src/mesa/drivers/dri/i965/meson.build|  1 +
 12 files changed, 124 insertions(+), 6 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] AMD WX7100 screen display problem on AArch64 architecture server.

2017-12-14 Thread Vedran Miletić
On 12/13/2017 07:46 AM, Lvzhihong (ReJohn) wrote:
> Hi,
> 
>    We met a problem on ubuntu17.10 for arm server with amdgpu(AMD
> RADEON PRO WX7100),  we use open source driver which are integrated in
> ubuntu17.10. And the architecture is AArch64-linux-gnu.
> 
>  we install :
> 
>  apt-get install xserver-xorg xinit xfce4 and mesa-utils glmark2
> 
>  we start x server :
> 
>   startx
> 
>  and then the monitor shows the screen and the screen is
> blurred( something wrong).
> 
>  And I have tried some opengl applications, the output has same
> problem.(something is missing or  in the wrong place.)
> 
>  
> 
>  But in a x86_64 architecture server, with same OS. The screen
> output is normal. (I check xorg\DDX\mesa\libdrm etc.all the versions are
> the same with aarch64 server.)
> 
> What I have done:
> 
>  1、I upgrade kernel to 4.15-rc2 ,upgrade DRM to 3.23,upgrade
> DDX to 1.40,upgrade mesa to 17.2.6, but the problem still exist.
> 
>  2、I enable ‘shadowprimary’ option,*the screen output became
> normal*, but the*performance drop quickly*——glxgears drop from 4800fps
> to 600fps, glmark drop from 4300 score to 730 score.
> 
> 
> 
>  I doubt there are something different in aarch64 and x86_64 but
> I don’t know.
> 
>  Any Advice or trial  suggestion are welcome.
> 
>  Thanks.
> 
>  
> 
> Rejohn.
> 
>  
> 
>  
> 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

Hi Lvzhihong,

we usually track problems like these in Bugzilla so they do not get
forgotten. I made a report for you, please CC yourself there and any
details you can think of:
https://bugs.freedesktop.org/show_bug.cgi?id=104266

Regards,
Vedran

-- 
Vedran Miletić
vedran.miletic.net
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: do not load the local invocation index when it's unused

2017-12-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_nir_to_llvm.c | 3 ++-
 src/amd/common/ac_shader_info.c | 3 +++
 src/amd/common/ac_shader_info.h | 1 +
 src/amd/vulkan/radv_shader.c| 2 +-
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 0e1d7e0082..2fe346b012 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -751,7 +751,8 @@ static void create_function(struct nir_to_llvm_context *ctx,
   >num_work_groups);
}
add_sgpr_argument(, ctx->ac.v3i32, >workgroup_ids);
-   add_sgpr_argument(, ctx->ac.i32, >tg_size);
+   if (ctx->shader_info->info.cs.uses_local_invocation_idx)
+   add_sgpr_argument(, ctx->ac.i32, >tg_size);
add_vgpr_argument(, ctx->ac.v3i32, 
>local_invocation_ids);
break;
case MESA_SHADER_VERTEX:
diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
index 87744ed23e..3299b47e6b 100644
--- a/src/amd/common/ac_shader_info.c
+++ b/src/amd/common/ac_shader_info.c
@@ -58,6 +58,9 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
ac_shader_info *info)
}
break;
}
+   case nir_intrinsic_load_local_invocation_index:
+   info->cs.uses_local_invocation_idx = true;
+   break;
case nir_intrinsic_load_sample_id:
info->ps.force_persample = true;
break;
diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
index 0136d5af40..79e5615254 100644
--- a/src/amd/common/ac_shader_info.h
+++ b/src/amd/common/ac_shader_info.h
@@ -45,6 +45,7 @@ struct ac_shader_info {
bool uses_grid_size;
bool uses_block_id[3];
bool uses_thread_id[3];
+   bool uses_local_invocation_idx;
} cs;
 };
 
diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 907c1986f8..ab8ba42511 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -400,7 +400,7 @@ radv_fill_shader_variant(struct radv_device *device,
S_00B84C_TGID_Z_EN(info->cs.uses_block_id[2]) |
S_00B84C_TIDIG_COMP_CNT(info->cs.uses_thread_id[2] ? 2 :
info->cs.uses_thread_id[1] ? 1 
: 0) |
-   S_00B84C_TG_SIZE_EN(1) |
+   S_00B84C_TG_SIZE_EN(info->cs.uses_local_invocation_idx) 
|
S_00B84C_LDS_SIZE(variant->config.lds_size);
break;
}
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components

2017-12-14 Thread Samuel Pitoiset
We should also not load the input SGPRs and VGPRS, but
let's start with this for now.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_shader.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 4a3fdfa80e..907c1986f8 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -392,13 +392,18 @@ radv_fill_shader_variant(struct radv_device *device,
break;
case MESA_SHADER_FRAGMENT:
break;
-   case MESA_SHADER_COMPUTE:
+   case MESA_SHADER_COMPUTE: {
+   struct ac_shader_info *info = >info.info;
variant->rsrc2 |=
-   S_00B84C_TGID_X_EN(1) | S_00B84C_TGID_Y_EN(1) |
-   S_00B84C_TGID_Z_EN(1) | S_00B84C_TIDIG_COMP_CNT(2) |
+   S_00B84C_TGID_X_EN(info->cs.uses_block_id[0]) |
+   S_00B84C_TGID_Y_EN(info->cs.uses_block_id[1]) |
+   S_00B84C_TGID_Z_EN(info->cs.uses_block_id[2]) |
+   S_00B84C_TIDIG_COMP_CNT(info->cs.uses_thread_id[2] ? 2 :
+   info->cs.uses_thread_id[1] ? 1 
: 0) |
S_00B84C_TG_SIZE_EN(1) |
S_00B84C_LDS_SIZE(variant->config.lds_size);
break;
+   }
default:
unreachable("unsupported shader type");
break;
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] amd/common: scan which components of gl_LocalInvocationID are used

2017-12-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_shader_info.c | 7 ++-
 src/amd/common/ac_shader_info.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
index 01949770d6..87744ed23e 100644
--- a/src/amd/common/ac_shader_info.c
+++ b/src/amd/common/ac_shader_info.c
@@ -45,11 +45,16 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
ac_shader_info *info)
case nir_intrinsic_load_num_work_groups:
info->cs.uses_grid_size = true;
break;
+   case nir_intrinsic_load_local_invocation_id:
case nir_intrinsic_load_work_group_id: {
unsigned mask = nir_ssa_def_components_read(>dest.ssa);
while (mask) {
unsigned i = u_bit_scan();
-   info->cs.uses_block_id[i] = true;
+
+   if (instr->intrinsic == 
nir_intrinsic_load_work_group_id)
+   info->cs.uses_block_id[i] = true;
+   else
+   info->cs.uses_thread_id[i] = true;
}
break;
}
diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
index 7beefd02ac..0136d5af40 100644
--- a/src/amd/common/ac_shader_info.h
+++ b/src/amd/common/ac_shader_info.h
@@ -44,6 +44,7 @@ struct ac_shader_info {
struct {
bool uses_grid_size;
bool uses_block_id[3];
+   bool uses_thread_id[3];
} cs;
 };
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] amd/common: scan which components of gl_WorkGroupID are used

2017-12-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_shader_info.c | 8 
 src/amd/common/ac_shader_info.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
index 09dd4bbd55..01949770d6 100644
--- a/src/amd/common/ac_shader_info.c
+++ b/src/amd/common/ac_shader_info.c
@@ -45,6 +45,14 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
ac_shader_info *info)
case nir_intrinsic_load_num_work_groups:
info->cs.uses_grid_size = true;
break;
+   case nir_intrinsic_load_work_group_id: {
+   unsigned mask = nir_ssa_def_components_read(>dest.ssa);
+   while (mask) {
+   unsigned i = u_bit_scan();
+   info->cs.uses_block_id[i] = true;
+   }
+   break;
+   }
case nir_intrinsic_load_sample_id:
info->ps.force_persample = true;
break;
diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
index 3c809cce13..7beefd02ac 100644
--- a/src/amd/common/ac_shader_info.h
+++ b/src/amd/common/ac_shader_info.h
@@ -43,6 +43,7 @@ struct ac_shader_info {
} ps;
struct {
bool uses_grid_size;
+   bool uses_block_id[3];
} cs;
 };
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] radeon/vce: determine idr by pic type

2017-12-14 Thread Leo Liu



On 12/13/2017 01:59 PM, boyuan.zh...@amd.com wrote:

From: Boyuan Zhang 

Signed-off-by: Boyuan Zhang 


Reviewed-by: Leo Liu 


---
  src/gallium/drivers/radeon/radeon_vce_52.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_vce_52.c 
b/src/gallium/drivers/radeon/radeon_vce_52.c
index 10bf718..a941c47 100644
--- a/src/gallium/drivers/radeon/radeon_vce_52.c
+++ b/src/gallium/drivers/radeon/radeon_vce_52.c
@@ -162,7 +162,7 @@ void si_vce_52_get_param(struct rvce_encoder *enc, struct 
pipe_h264_enc_picture_
enc->enc_pic.addrmode_arraymode_disrdo_distwoinstants = 
0x0201;
else
enc->enc_pic.addrmode_arraymode_disrdo_distwoinstants = 
0x01000201;
-   enc->enc_pic.is_idr = pic->is_idr;
+   enc->enc_pic.is_idr = (pic->picture_type == 
PIPE_H264_ENC_PICTURE_TYPE_IDR);
  }
  
  static void create(struct rvce_encoder *enc)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] radv: calculate best compute resource limits

2017-12-14 Thread Samuel Pitoiset
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index eae5d40e19..d6aaff707b 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -2561,6 +2561,8 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
 {
struct radv_shader_variant *compute_shader;
struct radv_pipeline *pipeline = cmd_buffer->state.compute_pipeline;
+   unsigned compute_resource_limits;
+   unsigned waves_per_threadgroup;
uint64_t va;
 
if (!pipeline || pipeline == cmd_buffer->state.emitted_compute_pipeline)
@@ -2572,7 +2574,7 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
va = radv_buffer_get_va(compute_shader->bo) + compute_shader->bo_offset;
 
MAYBE_UNUSED unsigned cdw_max = 
radeon_check_space(cmd_buffer->device->ws,
-  cmd_buffer->cs, 16);
+  cmd_buffer->cs, 19);
 
radeon_set_sh_reg_seq(cmd_buffer->cs, R_00B830_COMPUTE_PGM_LO, 2);
radeon_emit(cmd_buffer->cs, va >> 8);
@@ -2592,6 +2594,17 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
  S_00B860_WAVES(pipeline->max_waves) |
  S_00B860_WAVESIZE(pipeline->scratch_bytes_per_wave >> 
10));
 
+   /* Calculate best compute resource limits. */
+   waves_per_threadgroup =
+   DIV_ROUND_UP(compute_shader->info.cs.block_size[0] *
+compute_shader->info.cs.block_size[1] *
+compute_shader->info.cs.block_size[2], 64);
+   compute_resource_limits =
+   S_00B854_SIMD_DEST_CNTL(waves_per_threadgroup % 4 == 0);
+
+   radeon_set_sh_reg(cmd_buffer->cs, R_00B854_COMPUTE_RESOURCE_LIMITS,
+ compute_resource_limits);
+
radeon_set_sh_reg_seq(cmd_buffer->cs, R_00B81C_COMPUTE_NUM_THREAD_X, 3);
radeon_emit(cmd_buffer->cs,

S_00B81C_NUM_THREAD_FULL(compute_shader->info.cs.block_size[0]));
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radv: set FORCE_SIMD_DIST(1) for compute when profitable

2017-12-14 Thread Samuel Pitoiset
Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index d6aaff707b..4a048485c8 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -2561,6 +2561,7 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
 {
struct radv_shader_variant *compute_shader;
struct radv_pipeline *pipeline = cmd_buffer->state.compute_pipeline;
+   struct radv_device *device = cmd_buffer->device;
unsigned compute_resource_limits;
unsigned waves_per_threadgroup;
uint64_t va;
@@ -2602,6 +2603,19 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
compute_resource_limits =
S_00B854_SIMD_DEST_CNTL(waves_per_threadgroup % 4 == 0);
 
+   if (device->physical_device->rad_info.chip_class >= CIK) {
+   unsigned num_cu_per_se =
+   
device->physical_device->rad_info.num_good_compute_units /
+   device->physical_device->rad_info.max_se;
+
+   /* Force even distribution on all SIMDs in CU if the workgroup
+* size is 64. This has shown some good improvements if # of
+* CUs per SE is not a multiple of 4.
+*/
+   if (num_cu_per_se % 4 && waves_per_threadgroup == 1)
+   compute_resource_limits |= S_00B854_FORCE_SIMD_DIST(1);
+   }
+
radeon_set_sh_reg(cmd_buffer->cs, R_00B854_COMPUTE_RESOURCE_LIMITS,
  compute_resource_limits);
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radv: store the dispatch initiator into the device

2017-12-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 12 +---
 src/amd/vulkan/radv_device.c | 10 ++
 src/amd/vulkan/radv_private.h|  1 +
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index e68c5a4038..eae5d40e19 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -3483,26 +3483,16 @@ radv_emit_dispatch_packets(struct radv_cmd_buffer 
*cmd_buffer,
 {
struct radv_pipeline *pipeline = cmd_buffer->state.compute_pipeline;
struct radv_shader_variant *compute_shader = 
pipeline->shaders[MESA_SHADER_COMPUTE];
+   unsigned dispatch_initiator = cmd_buffer->device->dispatch_initiator;
struct radeon_winsys *ws = cmd_buffer->device->ws;
struct radeon_winsys_cs *cs = cmd_buffer->cs;
struct ac_userdata_info *loc;
-   unsigned dispatch_initiator;
 
loc = radv_lookup_user_sgpr(pipeline, MESA_SHADER_COMPUTE,
AC_UD_CS_GRID_SIZE);
 
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(ws, cs, 25);
 
-   dispatch_initiator = S_00B800_COMPUTE_SHADER_EN(1) |
-S_00B800_FORCE_START_AT_000(1);
-
-   if (cmd_buffer->device->physical_device->rad_info.chip_class >= CIK) {
-   /* If the KMD allows it (there is a KMD hw register for it),
-* allow launching waves out-of-order.
-*/
-   dispatch_initiator |= S_00B800_ORDER_MODE(1);
-   }
-
if (info->indirect) {
uint64_t va = radv_buffer_get_va(info->indirect->bo);
 
diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 5a0dd64727..7c0971d190 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -1101,6 +1101,16 @@ VkResult radv_CreateDevice(
device->scratch_waves = MAX2(32 * 
physical_device->rad_info.num_good_compute_units,
 max_threads_per_block / 64);
 
+   device->dispatch_initiator = S_00B800_COMPUTE_SHADER_EN(1) |
+S_00B800_FORCE_START_AT_000(1);
+
+   if (device->physical_device->rad_info.chip_class >= CIK) {
+   /* If the KMD allows it (there is a KMD hw register for it),
+* allow launching waves out-of-order.
+*/
+   device->dispatch_initiator |= S_00B800_ORDER_MODE(1);
+   }
+
radv_device_init_gs_info(device);
 
device->tess_offchip_block_dw_size =
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 16afd6d692..2e1362c446 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -542,6 +542,7 @@ struct radv_device {
bool dfsm_allowed;
uint32_t tess_offchip_block_dw_size;
uint32_t scratch_waves;
+   uint32_t dispatch_initiator;
 
uint32_t gs_table_depth;
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] Revert "i965: Disable regular fast-clears (CCS_D) on gen9+"

2017-12-14 Thread Eero Tamminen

Hi,

As expected, this series fixes the perf regression in GfxBench when fast 
clears were disabled.  On SKL GT2:

* 2-5% Manhattan 3.1
* 1% AztecRuins & CarChase (on top of Francisco's large improvement 
between the perf regression and this fix)


On 14.12.2017 03:54, Jason Ekstrand wrote:

Better commit message:

     Re-enable regular fast-clears (CCS_D) on gen9+

     This reverts commit ee57b15ec764736e2d5360beaef9fb2045ed0f68, "i965:
     Disable regular fast-clears (CCS_D) on gen9+".  How taht we've 
fixed the

     issue with too many different aux usages in the render cache, it should
     be safe to re-enable CCS_D for sRGB.


* s/How taht/Now that/

* Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=104163

* Tested-by: Eero Tamminen 


- Eero


On Wed, Dec 13, 2017 at 5:52 PM, Jason Ekstrand > wrote:


This reverts commit ee57b15ec764736e2d5360beaef9fb2045ed0f68.

Cc: "17.3" >
---
  src/mesa/drivers/dri/i965/brw_meta_util.c     | 10 -
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 57
---
  2 files changed, 25 insertions(+), 42 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_util.c
b/src/mesa/drivers/dri/i965/brw_meta_util.c
index 54dc6a5..b311815 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_util.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_util.c
@@ -293,17 +293,7 @@ brw_is_color_fast_clear_compatible(struct
brw_context *brw,
         brw->mesa_to_isl_render_format[mt->format])
        return false;

-   /* Gen9 doesn't support fast clear on single-sampled SRGB
buffers. When
-    * GL_FRAMEBUFFER_SRGB is enabled any color renderbuffers will be
-    * resolved in intel_update_state. In that case it's pointless
to do a
-    * fast clear because it's very likely to be immediately resolved.
-    */
     const bool srgb_rb = _mesa_get_srgb_format_linear(mt->format)
!= mt->format;
-   if (devinfo->gen >= 9 &&
-       mt->surf.samples == 1 &&
-       ctx->Color.sRGBEnabled && srgb_rb)
-      return false;
-
    /* Gen10 doesn't automatically decode the clear color of sRGB
buffers. Since
     * we currently don't perform this decode in software, avoid a
fast-clear
     * altogether. TODO: Do this in software.
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index c1a4ce1..b87d356 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -207,13 +207,7 @@ intel_miptree_supports_ccs(struct brw_context *brw,
     if (!brw->mesa_format_supports_render[mt->format])
        return false;

-   if (devinfo->gen >= 9) {
-      mesa_format linear_format =
_mesa_get_srgb_format_linear(mt->format);
-      const enum isl_format isl_format =
-         brw_isl_format_for_mesa_format(linear_format);
-      return isl_format_supports_ccs_e(>screen->devinfo,
isl_format);
-   } else
-      return true;
+   return true;
  }

  static bool
@@ -256,7 +250,7 @@ intel_miptree_supports_hiz(const struct
brw_context *brw,
   * our HW tends to support more linear formats than sRGB ones, we
use this
   * format variant for check for CCS_E compatibility.
   */
-MAYBE_UNUSED static bool
+static bool
  format_ccs_e_compat_with_miptree(const struct gen_device_info
*devinfo,
                                   const struct intel_mipmap_tree *mt,
                                   enum isl_format access_format)
@@ -290,12 +284,13 @@ intel_miptree_supports_ccs_e(struct
brw_context *brw,
     if (!intel_miptree_supports_ccs(brw, mt))
        return false;

-   /* Fast clear can be also used to clear srgb surfaces by using
equivalent
-    * linear format. This trick, however, can't be extended to be
used with
-    * lossless compression and therefore a check is needed to see
if the format
-    * really is linear.
+   /* Many window system buffers are sRGB even if they are never
rendered as
+    * sRGB.  For those, we want CCS_E for when sRGBEncode is
false.  When the
+    * surface is used as sRGB, we fall back to CCS_D.
      */
-   return _mesa_get_srgb_format_linear(mt->format) == mt->format;
+   mesa_format linear_format =
_mesa_get_srgb_format_linear(mt->format);
+   enum isl_format isl_format =
brw_isl_format_for_mesa_format(linear_format);
+   return isl_format_supports_ccs_e(>screen->devinfo, isl_format);
  }

  /**
@@ -2686,29 +2681,27 @@ intel_miptree_render_aux_usage(struct
brw_context *brw,
        return ISL_AUX_USAGE_MCS;

 

[Mesa-dev] [ANNOUNCE] mesa 17.2.7

2017-12-14 Thread Emil Velikov
Mesa 17.2.7 is now available.

In this release we have:

The current queue consists of a variety of fixes, with a sizeable hunk in the
shared GLSL codebase.

Whereas for individual drivers - i965 has a crash fix for when playing various
Valve games, r600 and nouveau have tweaks in their compiler backends. Fast
clears on radeonsi and RADV are better now, while the VAAPI encoding is playing
nicely with GStreamer.

The WGL state tracker and SWR driver have also seen minor improvements.

To top of up - Mesa should build fine with the latest glibc 2.17.


Alex Smith (1):
  radv: Add LLVM version to the device name string

Andres Gomez (2):
  docs: add sha256 checksums for 17.2.6
  docs: remove bug 103626 from fix list as per 17.2.6

Ben Crocker (2):
  docs/llvmpipe.html: Minor edits
  docs/llvmpipe: document ppc64le as alternative architecture to x86.

Dave Airlie (1):
  r600/sb: handle jump after target to end of program. (v2)

Denis Pauk (1):
  gallium/{r600, radeonsi}: Fix segfault with color format (v2)

Eduardo Lima Mitev (3):
  glsl_parser_extra: Add utility to copy symbols between symbol tables
  glsl: Use the utility function to copy symbols between symbol tables
  glsl/linker: Check that re-declared, inter-shader built-in blocks match

Emil Velikov (4):
  gl_table.py: add extern C guard for the generated glapitable.h
  cherry-ignore: radeonsi: allow DMABUF exports for local buffers
  Update version to 17.2.7
  docs: add release notes for 17.2.7

Eric Anholt (1):
  broadcom/vc4: Fix handling of GFXH-515 workaround with a start
vertex count.

Eric Engestrom (1):
  compiler: use NDEBUG to guard asserts

Fabian Bieler (2):
  glsl: Match order of gl_LightSourceParameters elements.
  glsl: Fix gl_NormalScale.

Frank Richter (1):
  gallium/wgl: fix default pixel format issue

George Kyriazis (1):
  swr: Handle resource across context changes

Gert Wollny (2):
  r600: Emit EOP for more CF instruction types
  r600/sb: do not convert if-blocks that contain indirect array access

Ilia Mirkin (1):
  glsl: fix derived cs variables

James Legg (1):
  nir/opcodes: Fix constant-folding of bitfield_insert

Jason Ekstrand (1):
  i965: Disable regular fast-clears (CCS_D) on gen9+

Juan A. Suarez Romero (1):
  glsl: add varying resources for arrays of complex types

Julien Isorce (1):
  st/va: change frame_idx from array to hash table

Kai Wasserbäch (1):
  docs: Point to apt.llvm.org for development snapshot packages

Kenneth Graunke (3):
  meta: Initialize depth/clear values on declaration.
  meta: Fix ClearTexture with GL_DEPTH_COMPONENT.
  i965: Fix Smooth Point Enables.

Marek Olšák (3):
  radeonsi: fix layered DCC fast clear
  radeonsi/gfx9: fix importing shared textures with DCC
  radeonsi: flush the context after resource_copy_region for buffer exports

Matt Turner (4):
  i965/fs: Handle negating immediates on MADs when propagating saturates
  util: Fix SHA1 implementation on big endian
  util: Fix disk_cache index calculation on big endian
  i965/fs: Unpack count argument to 64-bit shift ops on Atom

Nicolai Hähnle (3):
  radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check
  glsl: allow any l-value of an input variable as interpolant in
interpolateAt*
  glsl: fix interpolateAtXxx(some_vec[idx], ...) with dynamic idx

Pierre Moreau (1):
  nvc0/ir: Properly lower 64-bit shifts when the shift value is >32

Tapani Pälli (1):
  mesa/gles: adjust internal format in glTexSubImage2D error checks

Timothy Arceri (1):
  glsl: get correct member type when processing xfb ifc arrays

Vadym Shovkoplias (2):
  intel/blorp: Fix possible NULL pointer dereferencing
  glx/dri3: Remove unused deviceName variable

Vinson Lee (1):
  anv: Check if memfd_create is already defined.

git tag: mesa-17.2.7

https://mesa.freedesktop.org/archive/mesa-17.2.7.tar.gz
MD5:  beb9d8ec1d8a7fa2f4fa589c8fbb3cbc  mesa-17.2.7.tar.gz
SHA1: 09aee8970e5715325ec5abf64e61341af9b0b98b  mesa-17.2.7.tar.gz
SHA256: e8d837a1cd55014e636e9caf6c75cfbe1b3e4be9ab3fa125f5ef38398aa12e97
 mesa-17.2.7.tar.gz
SHA512: 
af7f6b38eb9e3d51371c62c774807b40e1e2ee1194799bac34c67c7a1e89d8e515f720079ae3481bd73c4cdf7d7a99a06814a570a506956a218defd168764bc0
 mesa-17.2.7.tar.gz
PGP:  https://mesa.freedesktop.org/archive/mesa-17.2.7.tar.gz.sig

https://mesa.freedesktop.org/archive/mesa-17.2.7.tar.xz
MD5:  adf3750455e94db222c6f246e37556e5  mesa-17.2.7.tar.xz
SHA1: a44873523d8f1a5679a0f1850a1b200d3f064c4f  mesa-17.2.7.tar.xz
SHA256: 50cfdea8df55045797b4d0409591c04c784d9551c4da09b8178874dbe5a37a68
 mesa-17.2.7.tar.xz
SHA512: 
f7cd06aa3ffb8ab80358304fa6a554f75c66105371072dae3a6f8f8e2a13891c8ac9eaf13c5defa74fa1236fed386ccd8c8b107e5fe80f9384237c9b1e726898
 mesa-17.2.7.tar.xz
PGP:  https://mesa.freedesktop.org/archive/mesa-17.2.7.tar.xz.sig
___
mesa-dev mailing 

Re: [Mesa-dev] [PATCH] radeonsi: don't use fast color clear for small images even on APUs

2017-12-14 Thread Samuel Pitoiset



On 12/13/2017 12:53 AM, Marek Olšák wrote:

From: Marek Olšák 

Increase the limit and handle non-square images better.

This makes glxgears 20% faster on APUs, and a little more on dGPUs.
We all use and love glxgears.


We love it. :)

Reviewed-by: Samuel Pitoiset 


---
  src/gallium/drivers/radeonsi/si_clear.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_clear.c 
b/src/gallium/drivers/radeonsi/si_clear.c
index 0ac83f4..464b9d7 100644
--- a/src/gallium/drivers/radeonsi/si_clear.c
+++ b/src/gallium/drivers/radeonsi/si_clear.c
@@ -418,26 +418,25 @@ static void si_do_fast_color_clear(struct si_context 
*sctx,
sctx->b.family == CHIP_STONEY)
tex->num_slow_clears++;
}
  
  		bool need_decompress_pass = false;
  
  		/* Use a slow clear for small surfaces where the cost of

 * the eliminate pass can be higher than the benefit of fast
 * clear. The closed driver does this, but the numbers may 
differ.
 *
-* Always use fast clear on APUs.
+* This helps on both dGPUs and APUs, even small APUs like 
Mullins.
 */
-   bool too_small = sctx->screen->info.has_dedicated_vram &&
-tex->resource.b.b.nr_samples <= 1 &&
-tex->resource.b.b.width0 <= 256 &&
-tex->resource.b.b.height0 <= 256;
+   bool too_small = tex->resource.b.b.nr_samples <= 1 &&
+tex->resource.b.b.width0 *
+tex->resource.b.b.height0 <= 512 * 512;
  
  		/* Try to clear DCC first, otherwise try CMASK. */

if (vi_dcc_enabled(tex, 0)) {
uint32_t reset_value;
bool clear_words_needed;
  
  			if (sctx->screen->debug_flags & DBG(NO_DCC_CLEAR))

continue;
  
  			/* This can only occur with MSAA. */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radv: export SampleMask from pixel shaders at full rate

2017-12-14 Thread Samuel Pitoiset
Use 16_ABGR instead of 32_ABGR if Z isn't written.

Ported from RadeonSI.

No CTS regressions on Polaris.

v2: - make use of ac_get_spi_shader_z_format()

Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_nir_to_llvm.c | 46 +++--
 src/amd/vulkan/radv_pipeline.c  | 11 +-
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index f3602a267d..6dc5bf5903 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -32,6 +32,7 @@
 #include 
 #include "ac_shader_abi.h"
 #include "ac_shader_info.h"
+#include "ac_shader_util.h"
 #include "ac_exp_param.h"
 
 enum radeon_llvm_calling_convention {
@@ -6206,19 +6207,42 @@ si_export_mrt_z(struct nir_to_llvm_context *ctx,
args.out[2] = LLVMGetUndef(ctx->ac.f32); /* B, sample mask */
args.out[3] = LLVMGetUndef(ctx->ac.f32); /* A, alpha to mask */
 
-   if (depth) {
-   args.out[0] = depth;
-   args.enabled_channels |= 0x1;
-   }
+   unsigned format = ac_get_spi_shader_z_format(depth != NULL,
+stencil != NULL,
+samplemask != NULL);
+
+   if (format == V_028710_SPI_SHADER_UINT16_ABGR) {
+   assert(!depth);
+   args.compr = 1; /* COMPR flag */
+
+   if (stencil) {
+   /* Stencil should be in X[23:16]. */
+   stencil = ac_to_integer(>ac, stencil);
+   stencil = LLVMBuildShl(ctx->builder, stencil,
+  LLVMConstInt(ctx->ac.i32, 16, 
0), "");
+   args.out[0] = ac_to_float(>ac, stencil);
+   args.enabled_channels |= 0x3;
+   }
+   if (samplemask) {
+   /* SampleMask should be in Y[15:0]. */
+   args.out[1] = samplemask;
+   args.enabled_channels |= 0xc;
+   }
+   } else {
+   if (depth) {
+   args.out[0] = depth;
+   args.enabled_channels |= 0x1;
+   }
 
-   if (stencil) {
-   args.out[1] = stencil;
-   args.enabled_channels |= 0x2;
-   }
+   if (stencil) {
+   args.out[1] = stencil;
+   args.enabled_channels |= 0x2;
+   }
 
-   if (samplemask) {
-   args.out[2] = samplemask;
-   args.enabled_channels |= 0x4;
+   if (samplemask) {
+   args.out[2] = samplemask;
+   args.enabled_channels |= 0x4;
+   }
}
 
/* SI (except OLAND and HAINAN) has a bug that it only looks
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 0146d6935e..1ada69d92f 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -46,6 +46,7 @@
 #include "vk_format.h"
 #include "util/debug.h"
 #include "ac_exp_param.h"
+#include "ac_shader_util.h"
 
 static void
 radv_pipeline_destroy(struct radv_device *device,
@@ -2108,11 +2109,11 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
if (pipeline->device->physical_device->has_rbplus)
pipeline->graphics.db_shader_control |= 
S_02880C_DUAL_QUAD_DISABLE(1);
 
-   pipeline->graphics.shader_z_format =
-   ps->info.fs.writes_sample_mask ? V_028710_SPI_SHADER_32_ABGR :
-   ps->info.fs.writes_stencil ? V_028710_SPI_SHADER_32_GR :
-   ps->info.fs.writes_z ? V_028710_SPI_SHADER_32_R :
-   V_028710_SPI_SHADER_ZERO;
+   unsigned shader_z_format =
+   ac_get_spi_shader_z_format(ps->info.fs.writes_z,
+  ps->info.fs.writes_stencil,
+  ps->info.fs.writes_sample_mask);
+   pipeline->graphics.shader_z_format = shader_z_format;
 
calculate_vgt_gs_mode(pipeline);
calculate_vs_outinfo(pipeline);
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] radeonsi: make use of ac_get_spi_shader_z_format()

2017-12-14 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/radeonsi/si_shader.c| 22 ++
 src/gallium/drivers/radeonsi/si_shader.h|  2 --
 src/gallium/drivers/radeonsi/si_state_shaders.c |  3 ++-
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0077054749..5e49d655fc 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -37,6 +37,7 @@
 #include "ac_binary.h"
 #include "ac_llvm_util.h"
 #include "ac_exp_param.h"
+#include "ac_shader_util.h"
 #include "si_shader_internal.h"
 #include "si_pipe.h"
 #include "sid.h"
@@ -3419,25 +3420,6 @@ struct si_ps_exports {
struct ac_export_args args[10];
 };
 
-unsigned si_get_spi_shader_z_format(bool writes_z, bool writes_stencil,
-   bool writes_samplemask)
-{
-   if (writes_z) {
-   /* Z needs 32 bits. */
-   if (writes_samplemask)
-   return V_028710_SPI_SHADER_32_ABGR;
-   else if (writes_stencil)
-   return V_028710_SPI_SHADER_32_GR;
-   else
-   return V_028710_SPI_SHADER_32_R;
-   } else if (writes_stencil || writes_samplemask) {
-   /* Both stencil and sample mask need only 16 bits. */
-   return V_028710_SPI_SHADER_UINT16_ABGR;
-   } else {
-   return V_028710_SPI_SHADER_ZERO;
-   }
-}
-
 static void si_export_mrt_z(struct lp_build_tgsi_context *bld_base,
LLVMValueRef depth, LLVMValueRef stencil,
LLVMValueRef samplemask, struct si_ps_exports *exp)
@@ -3446,7 +3428,7 @@ static void si_export_mrt_z(struct lp_build_tgsi_context 
*bld_base,
struct lp_build_context *base = _base->base;
struct ac_export_args args;
unsigned mask = 0;
-   unsigned format = si_get_spi_shader_z_format(depth != NULL,
+   unsigned format = ac_get_spi_shader_z_format(depth != NULL,
 stencil != NULL,
 samplemask != NULL);
 
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index bcb5c9da4c..c981d3562e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -647,8 +647,6 @@ void si_shader_apply_scratch_relocs(struct si_shader 
*shader,
 void si_shader_binary_read_config(struct ac_shader_binary *binary,
  struct si_shader_config *conf,
  unsigned symbol_offset);
-unsigned si_get_spi_shader_z_format(bool writes_z, bool writes_stencil,
-   bool writes_samplemask);
 const char *si_get_shader_name(const struct si_shader *shader, unsigned 
processor);
 
 /* si_shader_nir.c */
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 4f683b8514..25854a1fde 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -37,6 +37,7 @@
 #include "util/disk_cache.h"
 #include "util/mesa-sha1.h"
 #include "ac_exp_param.h"
+#include "ac_shader_util.h"
 
 /* SHADER_CACHE */
 
@@ -1123,7 +1124,7 @@ static void si_shader_ps(struct si_shader *shader)
si_pm4_set_reg(pm4, R_0286D8_SPI_PS_IN_CONTROL, spi_ps_in_control);
 
si_pm4_set_reg(pm4, R_028710_SPI_SHADER_Z_FORMAT,
-  si_get_spi_shader_z_format(info->writes_z,
+  ac_get_spi_shader_z_format(info->writes_z,
  info->writes_stencil,
  info->writes_samplemask));
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] amd/common: add ac_get_spi_shader_z_format()

2017-12-14 Thread Samuel Pitoiset
ac_shader_util.c will contain shader helpers for RadeonSI
and RADV.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/Makefile.sources|  5 -
 src/amd/common/ac_shader_util.c | 45 +
 src/amd/common/ac_shader_util.h | 33 ++
 src/amd/common/meson.build  |  2 ++
 4 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 src/amd/common/ac_shader_util.c
 create mode 100644 src/amd/common/ac_shader_util.h

diff --git a/src/amd/Makefile.sources b/src/amd/Makefile.sources
index 1bc5a7fe7e..10c4827e19 100644
--- a/src/amd/Makefile.sources
+++ b/src/amd/Makefile.sources
@@ -46,7 +46,10 @@ AMD_COMPILER_FILES = \
common/ac_llvm_util.h \
common/ac_shader_abi.h \
common/ac_shader_info.c \
-   common/ac_shader_info.h
+   common/ac_shader_info.h \
+   common/ac_shader_util.c \
+   common/ac_shader_util.h
+
 
 AMD_NIR_FILES = \
common/ac_nir_to_llvm.c \
diff --git a/src/amd/common/ac_shader_util.c b/src/amd/common/ac_shader_util.c
new file mode 100644
index 00..9d33a46559
--- /dev/null
+++ b/src/amd/common/ac_shader_util.c
@@ -0,0 +1,45 @@
+/*
+ * Copyright 2012 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "ac_shader_util.h"
+#include "sid.h"
+
+unsigned
+ac_get_spi_shader_z_format(bool writes_z, bool writes_stencil,
+  bool writes_samplemask)
+{
+   if (writes_z) {
+   /* Z needs 32 bits. */
+   if (writes_samplemask)
+   return V_028710_SPI_SHADER_32_ABGR;
+   else if (writes_stencil)
+   return V_028710_SPI_SHADER_32_GR;
+   else
+   return V_028710_SPI_SHADER_32_R;
+   } else if (writes_stencil || writes_samplemask) {
+   /* Both stencil and sample mask need only 16 bits. */
+   return V_028710_SPI_SHADER_UINT16_ABGR;
+   } else {
+   return V_028710_SPI_SHADER_ZERO;
+   }
+}
diff --git a/src/amd/common/ac_shader_util.h b/src/amd/common/ac_shader_util.h
new file mode 100644
index 00..1f971e76f1
--- /dev/null
+++ b/src/amd/common/ac_shader_util.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright 2012 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef AC_SHADER_UTIL_H
+#define AC_SHADER_UTIL_H
+
+#include 
+
+unsigned
+ac_get_spi_shader_z_format(bool writes_z, bool writes_stencil,
+  bool writes_samplemask);
+
+#endif
diff --git a/src/amd/common/meson.build b/src/amd/common/meson.build
index 8c526675c4..63c1517543 100644
--- a/src/amd/common/meson.build
+++ b/src/amd/common/meson.build
@@ -38,6 +38,8 @@ amd_common_files = files(
   

[Mesa-dev] [PATCH 1/4] mesa: add DisjointOperation to gl_shared_state

2017-12-14 Thread Tapani Pälli
This state will be used by EXT_disjoint_timer_query. As first
usage, patch sets DisjointOperation true when gpu reset happens.

Signed-off-by: Tapani Pälli 
Reviewed-by: Lionel Landwerlin 
---
 src/mesa/main/mtypes.h | 8 
 src/mesa/main/robustness.c | 1 +
 2 files changed, 9 insertions(+)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index b372921e9f..0aac49402e 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3310,6 +3310,14 @@ struct gl_shared_state
/** EXT_external_objects */
struct _mesa_HashTable *MemoryObjects;
 
+   /**
+* Some context in this share group was affected by a disjoint
+* operation. This operation can be anything that has effects on
+* values of timer queries in such manner that they become invalid for
+* performance metrics. As example gpu reset, counter overflow or gpu
+* frequency changes.
+*/
+   bool DisjointOperation;
 };
 
 
diff --git a/src/mesa/main/robustness.c b/src/mesa/main/robustness.c
index a61c07f125..e7d7007da4 100644
--- a/src/mesa/main/robustness.c
+++ b/src/mesa/main/robustness.c
@@ -145,6 +145,7 @@ _mesa_GetGraphicsResetStatusARB( void )
*/
   if (status != GL_NO_ERROR) {
  ctx->Shared->ShareGroupReset = true;
+ ctx->Shared->DisjointOperation = true;
   } else if (ctx->Shared->ShareGroupReset && !ctx->ShareGroupReset) {
  status = GL_INNOCENT_CONTEXT_RESET_ARB;
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] i965: enable EXT_disjoint_timer_query extension

2017-12-14 Thread Tapani Pälli
Following dEQP cases pass:
   dEQP-EGL.functional.get_proc_address.extension.gl_ext_disjoint_timer_query
   dEQP-EGL.functional.client_extensions.disjoint

Piglit test 'ext_disjoint_timer_query-simple' passes with these changes.

No changes/regression observed in Intel CI.

Signed-off-by: Tapani Pälli 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 4d17393948..cc961e051f 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -197,6 +197,8 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.OES_sample_variables = true;
 
   ctx->Extensions.ARB_timer_query = brw->screen->hw_has_timestamp;
+  ctx->Extensions.EXT_disjoint_timer_query =
+ ctx->Extensions.ARB_timer_query;
 
   /* Only enable this in core profile because other parts of Mesa behave
* slightly differently when the extension is enabled.
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] mesa: GL_EXT_disjoint_timer_query extension API bits

2017-12-14 Thread Tapani Pälli
Patch adds GL_GPU_DISJOINT_EXT and enables to use timer queries when
EXT_disjoint_timer_query is enabled.

v2: enable extension only when EXT_disjoint_timer_query set

Signed-off-by: Tapani Pälli 
Reviewed-by: Lionel Landwerlin  (v1)
---
 src/mesa/main/extensions_table.h |  1 +
 src/mesa/main/get.c  | 17 +
 src/mesa/main/get_hash_params.py |  5 +
 src/mesa/main/glheader.h |  4 
 src/mesa/main/mtypes.h   |  1 +
 src/mesa/main/queryobj.c |  3 ++-
 6 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h
index ab15ceb941..3dec6ea12f 100644
--- a/src/mesa/main/extensions_table.h
+++ b/src/mesa/main/extensions_table.h
@@ -210,6 +210,7 @@ EXT(EXT_copy_image  , OES_copy_image
 EXT(EXT_copy_texture, dummy_true   
  , GLL,  x ,  x ,  x , 1995)
 EXT(EXT_depth_bounds_test   , EXT_depth_bounds_test
  , GLL, GLC,  x ,  x , 2002)
 EXT(EXT_discard_framebuffer , dummy_true   
  ,  x ,  x , ES1, ES2, 2009)
+EXT(EXT_disjoint_timer_query, EXT_disjoint_timer_query 
  ,  x ,  x ,  x , ES2, 2016)
 EXT(EXT_draw_buffers, dummy_true   
  ,  x ,  x ,  x , ES2, 2012)
 EXT(EXT_draw_buffers2   , EXT_draw_buffers2
  , GLL, GLC,  x ,  x , 2006)
 EXT(EXT_draw_buffers_indexed, ARB_draw_buffers_blend   
  ,  x ,  x ,  x ,  30, 2014)
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index c1b1a89ee0..7f2d72aa4b 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -578,6 +578,13 @@ static const int extra_EXT_provoking_vertex_32[] = {
EXTRA_END
 };
 
+static const int extra_EXT_disjoint_timer_query[] = {
+   EXTRA_API_ES2,
+   EXTRA_API_ES3,
+   EXT(EXT_disjoint_timer_query),
+   EXTRA_END
+};
+
 
 /* This is the big table describing all the enums we accept in
  * glGet*v().  The table is partitioned into six parts: enums
@@ -1160,6 +1167,16 @@ find_custom_value(struct gl_context *ctx, const struct 
value_desc *d, union valu
  v->value_int_n.ints[0] = GL_PROGRAM_BINARY_FORMAT_MESA;
   }
   break;
+   /* GL_EXT_disjoint_timer_query */
+   case GL_GPU_DISJOINT_EXT:
+  {
+ simple_mtx_lock(>Shared->Mutex);
+ v->value_int = ctx->Shared->DisjointOperation;
+ /* Reset state as expected by the spec. */
+ ctx->Shared->DisjointOperation = false;
+ simple_mtx_unlock(>Shared->Mutex);
+  }
+  break;
}
 }
 
diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index eac250a1ec..bc71574cca 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -254,6 +254,11 @@ descriptor=[
   [ "POINT_SIZE_ARRAY_BUFFER_BINDING_OES", "LOC_CUSTOM, TYPE_INT, 0, NO_EXTRA" 
],
 ]},
 
+# Enums in GLES2, GLES3
+{ "apis": ["GLES2", "GLES3"], "params": [
+  [ "GPU_DISJOINT_EXT", "LOC_CUSTOM, TYPE_INT, 0, 
extra_EXT_disjoint_timer_query" ],
+]},
+
 { "apis": ["GL", "GL_CORE", "GLES2"], "params": [
 # == GL_MAX_TEXTURE_COORDS_NV
   [ "MAX_TEXTURE_COORDS_ARB", "CONTEXT_INT(Const.MaxTextureCoordUnits), 
extra_ARB_fragment_program" ],
diff --git a/src/mesa/main/glheader.h b/src/mesa/main/glheader.h
index 3f2a923782..35a442a77b 100644
--- a/src/mesa/main/glheader.h
+++ b/src/mesa/main/glheader.h
@@ -144,6 +144,10 @@ typedef void *GLeglImageOES;
 #define GL_FRAGMENT_SHADER_DISCARDS_SAMPLES_EXT 0x8A52
 #endif
 
+#ifndef GL_EXT_disjoint_timer_query
+#define GL_GPU_DISJOINT_EXT 0x8FBB
+#endif
+
 /* Inexplicably, GL_HALF_FLOAT_OES has a different value than GL_HALF_FLOAT.
  */
 #ifndef GL_HALF_FLOAT_OES
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 0aac49402e..a29d78b101 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -4168,6 +4168,7 @@ struct gl_extensions
GLboolean EXT_blend_func_separate;
GLboolean EXT_blend_minmax;
GLboolean EXT_depth_bounds_test;
+   GLboolean EXT_disjoint_timer_query;
GLboolean EXT_draw_buffers2;
GLboolean EXT_framebuffer_multisample;
GLboolean EXT_framebuffer_multisample_blit_scaled;
diff --git a/src/mesa/main/queryobj.c b/src/mesa/main/queryobj.c
index d966814a76..79600d7db1 100644
--- a/src/mesa/main/queryobj.c
+++ b/src/mesa/main/queryobj.c
@@ -822,7 +822,8 @@ get_query_object(struct gl_context *ctx, const char *func,
if (buf && buf != ctx->Shared->NullBufferObj) {
   bool is_64bit = ptype == GL_INT64_ARB ||
  ptype == GL_UNSIGNED_INT64_ARB;
-  if (!ctx->Extensions.ARB_query_buffer_object) {
+  if (!ctx->Extensions.ARB_query_buffer_object &&
+  !ctx->Extensions.EXT_disjoint_timer_query) {
  

[Mesa-dev] [PATCH 2/4] glapi: add GL_EXT_disjoint_timer_query

2017-12-14 Thread Tapani Pälli
Most entrypoints already available via other extensions like
GL_EXT_occlusion_query_boolean, GL_EXT_timer_query.

Signed-off-by: Tapani Pälli 
Reviewed-by: Lionel Landwerlin 
---
 src/mapi/glapi/gen/es_EXT.xml   | 16 
 src/mapi/glapi/gen/gl_API.xml   |  4 ++--
 src/mesa/main/tests/dispatch_sanity.cpp |  5 +
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index f19007366f..e5104259b6 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -847,6 +847,22 @@
 
 
 
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
 
 
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index d3594cfe19..d13a3bfd83 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -12944,12 +12944,12 @@
 
 
 
-
+
 
 
 
 
-
+
 
 
 
diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
b/src/mesa/main/tests/dispatch_sanity.cpp
index 00754deb46..d697343627 100644
--- a/src/mesa/main/tests/dispatch_sanity.cpp
+++ b/src/mesa/main/tests/dispatch_sanity.cpp
@@ -2441,6 +2441,11 @@ const struct function gles2_functions_possible[] = {
{ "glGetQueryObjectivEXT", 20, -1 },
{ "glGetQueryObjectuivEXT", 20, -1 },
 
+   /* GL_EXT_disjoint_timer_query */
+   { "glGetQueryObjecti64vEXT", 20, -1 },
+   { "glGetQueryObjectui64vEXT", 20, -1 },
+   { "glQueryCounterEXT", 20, -1 },
+
{ NULL, 0, -1 }
 };
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/4] GL_EXT_disjoint_timer_query series

2017-12-14 Thread Tapani Pälli
Hi;

Here's a revisited GL_EXT_disjoint_timer_query series. One patch got
dropped (as discussed with Lionel) and enabling is now via
EXT_disjoint_timer_query boolean as was intended (Ian).

Thanks;

Tapani Pälli (4):
  mesa: add DisjointOperation to gl_shared_state
  glapi: add GL_EXT_disjoint_timer_query
  mesa: GL_EXT_disjoint_timer_query extension API bits
  i965: enable EXT_disjoint_timer_query extension

 src/mapi/glapi/gen/es_EXT.xml| 16 
 src/mapi/glapi/gen/gl_API.xml|  4 ++--
 src/mesa/drivers/dri/i965/intel_extensions.c |  2 ++
 src/mesa/main/extensions_table.h |  1 +
 src/mesa/main/get.c  | 17 +
 src/mesa/main/get_hash_params.py |  5 +
 src/mesa/main/glheader.h |  4 
 src/mesa/main/mtypes.h   |  9 +
 src/mesa/main/queryobj.c |  3 ++-
 src/mesa/main/robustness.c   |  1 +
 src/mesa/main/tests/dispatch_sanity.cpp  |  5 +
 11 files changed, 64 insertions(+), 3 deletions(-)

-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 02/20] ac: add load_tes_inputs() to the abi

2017-12-14 Thread Timothy Arceri

On 13/12/17 18:52, Timothy Arceri wrote:

V2: drop type param and just use ctx->i32


I forgot to add that this drops the ctx->nctx check. Both driver now 
just follow the same path, the strangeness I had been seeing is no 
longer present it was probably just a bug during development.




---
  src/amd/common/ac_nir_to_llvm.c  | 62 
  src/amd/common/ac_shader_abi.h   | 11 ++
  src/gallium/drivers/radeonsi/si_shader.c |  1 +
  3 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index f3602a267de..bad3badfc94 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2852,53 +2852,50 @@ store_tcs_output(struct nir_to_llvm_context *ctx,
}
  
  	if (writemask == 0xF) {

ac_build_buffer_store_dword(>ac, 
ctx->hs_ring_tess_offchip, src, 4,
buf_addr, ctx->oc_lds,
(base * 4), 1, 0, true, false);
}
  }
  
  static LLVMValueRef

-load_tes_input(struct nir_to_llvm_context *ctx,
-  const nir_intrinsic_instr *instr)
+load_tes_input(struct ac_shader_abi *abi,
+  LLVMValueRef vertex_index,
+  LLVMValueRef param_index,
+  unsigned const_index,
+  unsigned location,
+  unsigned driver_location,
+  unsigned component,
+  unsigned num_components,
+  bool is_patch,
+  bool is_compact)
  {
+   struct nir_to_llvm_context *ctx = nir_to_llvm_context_from_abi(abi);
LLVMValueRef buf_addr;
LLVMValueRef result;
-   LLVMValueRef vertex_index = NULL;
-   LLVMValueRef indir_index = NULL;
-   unsigned const_index = 0;
-   unsigned param;
-   const bool per_vertex = nir_is_per_vertex_io(instr->variables[0]->var, 
ctx->stage);
-   const bool is_compact = instr->variables[0]->var->data.compact;
+   unsigned param = shader_io_get_unique_index(location);
  
-	get_deref_offset(ctx->nir, instr->variables[0],

-false, NULL, per_vertex ? _index : NULL,
-_index, _index);
-   param = 
shader_io_get_unique_index(instr->variables[0]->var->data.location);
-   if (instr->variables[0]->var->data.location == VARYING_SLOT_CLIP_DIST0 
&&
-   is_compact && const_index > 3) {
+   if (location == VARYING_SLOT_CLIP_DIST0 && is_compact && const_index > 
3) {
const_index -= 3;
param++;
}
  
-	unsigned comp = instr->variables[0]->var->data.location_frac;

buf_addr = get_tcs_tes_buffer_address_params(ctx, param, const_index,
-is_compact, vertex_index, 
indir_index);
+is_compact, vertex_index, 
param_index);
  
-	LLVMValueRef comp_offset = LLVMConstInt(ctx->ac.i32, comp * 4, false);

+   LLVMValueRef comp_offset = LLVMConstInt(ctx->ac.i32, component * 4, 
false);
buf_addr = LLVMBuildAdd(ctx->builder, buf_addr, comp_offset, "");
  
-	result = ac_build_buffer_load(>ac, ctx->hs_ring_tess_offchip, instr->num_components, NULL,

+   result = ac_build_buffer_load(>ac, ctx->hs_ring_tess_offchip, 
num_components, NULL,
  buf_addr, ctx->oc_lds, is_compact ? (4 * 
const_index) : 0, 1, 0, true, false);
-   result = trim_vector(>ac, result, instr->num_components);
-   result = LLVMBuildBitCast(ctx->builder, result, get_def_type(ctx->nir, 
>dest.ssa), "");
+   result = trim_vector(>ac, result, num_components);
return result;
  }
  
  static LLVMValueRef

  load_gs_input(struct ac_shader_abi *abi,
  unsigned location,
  unsigned driver_location,
  unsigned component,
  unsigned num_components,
  unsigned vertex_index,
@@ -3000,22 +2997,42 @@ static LLVMValueRef visit_load_var(struct 
ac_nir_context *ctx,
get_deref_offset(ctx, instr->variables[0], vs_in, NULL, NULL,
  _index, _index);
  
  	if (instr->dest.ssa.bit_size == 64)

ve *= 2;
  
  	switch (instr->variables[0]->var->data.mode) {

case nir_var_shader_in:
if (ctx->stage == MESA_SHADER_TESS_CTRL)
return load_tcs_input(ctx->nctx, instr);
-   if (ctx->stage == MESA_SHADER_TESS_EVAL)
-   return load_tes_input(ctx->nctx, instr);
+   if (ctx->stage == MESA_SHADER_TESS_EVAL) {
+   LLVMValueRef result;
+   LLVMValueRef vertex_index = NULL;
+   LLVMValueRef indir_index = NULL;
+   unsigned const_index = 0;
+   unsigned location = 
instr->variables[0]->var->data.location;
+   

Re: [Mesa-dev] [PATCH] radv: export SampleMask from pixel shaders at full rate

2017-12-14 Thread Samuel Pitoiset



On 12/13/2017 09:21 PM, Bas Nieuwenhuizen wrote:

On Tue, Dec 12, 2017 at 6:08 PM, Samuel Pitoiset
 wrote:

Use 16_ABGR instead of 32_ABGR if Z isn't written.

Ported from RadeonSI.

No CTS regressions on Polaris.

Signed-off-by: Samuel Pitoiset 
---
  src/amd/common/ac_nir_to_llvm.c | 65 ++---
  src/amd/vulkan/radv_pipeline.c  | 29 ++
  2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 663b27d265..5916619e97 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -6166,6 +6166,26 @@ si_export_mrt_color(struct nir_to_llvm_context *ctx,
 return true;
  }

+static unsigned
+si_get_spi_shader_z_format(bool writes_z, bool writes_stencil,
+  bool writes_samplemask)
+{
+   if (writes_z) {
+   /* Z needs 32 bits. */
+   if (writes_samplemask)
+   return V_028710_SPI_SHADER_32_ABGR;
+   else if (writes_stencil)
+   return V_028710_SPI_SHADER_32_GR;
+   else
+   return V_028710_SPI_SHADER_32_R;
+   } else if (writes_stencil || writes_samplemask) {
+   /* Both stencil and sample mask need only 16 bits. */
+   return V_028710_SPI_SHADER_UINT16_ABGR;
+   } else {
+   return V_028710_SPI_SHADER_ZERO;
+   }
+}


I'm not a fan of having this function in two places. Can we export the
format from the compiler to radv, or the other way around?


Yeah, I'm not a big fan as well. Exporting the format from the compiler 
to radv seems the best solution. I will update the patch.




Otherwise,

Reviewed-by: Bas Nieuwenhuizen 


+
  static void
  si_export_mrt_z(struct nir_to_llvm_context *ctx,
 LLVMValueRef depth, LLVMValueRef stencil,
@@ -6184,19 +6204,42 @@ si_export_mrt_z(struct nir_to_llvm_context *ctx,
 args.out[2] = LLVMGetUndef(ctx->ac.f32); /* B, sample mask */
 args.out[3] = LLVMGetUndef(ctx->ac.f32); /* A, alpha to mask */

-   if (depth) {
-   args.out[0] = depth;
-   args.enabled_channels |= 0x1;
-   }
+   unsigned format = si_get_spi_shader_z_format(depth != NULL,
+stencil != NULL,
+samplemask != NULL);
+
+   if (format == V_028710_SPI_SHADER_UINT16_ABGR) {
+   assert(!depth);
+   args.compr = 1; /* COMPR flag */
+
+   if (stencil) {
+   /* Stencil should be in X[23:16]. */
+   stencil = ac_to_integer(>ac, stencil);
+   stencil = LLVMBuildShl(ctx->builder, stencil,
+  LLVMConstInt(ctx->ac.i32, 16, 0), 
"");
+   args.out[0] = ac_to_float(>ac, stencil);
+   args.enabled_channels |= 0x3;
+   }
+   if (samplemask) {
+   /* SampleMask should be in Y[15:0]. */
+   args.out[1] = samplemask;
+   args.enabled_channels |= 0xc;
+   }
+   } else {
+   if (depth) {
+   args.out[0] = depth;
+   args.enabled_channels |= 0x1;
+   }

-   if (stencil) {
-   args.out[1] = stencil;
-   args.enabled_channels |= 0x2;
-   }
+   if (stencil) {
+   args.out[1] = stencil;
+   args.enabled_channels |= 0x2;
+   }

-   if (samplemask) {
-   args.out[2] = samplemask;
-   args.enabled_channels |= 0x4;
+   if (samplemask) {
+   args.out[2] = samplemask;
+   args.enabled_channels |= 0x4;
+   }
 }

 /* SI (except OLAND and HAINAN) has a bug that it only looks
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 0146d6935e..baaf5c4c77 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -2013,6 +2013,25 @@ radv_pipeline_stage_to_user_data_0(struct radv_pipeline 
*pipeline,
 }
  }

+static unsigned
+si_get_spi_shader_z_format(bool writes_z, bool writes_stencil,
+  bool writes_samplemask)
+{
+   if (writes_z) {
+   /* Z needs 32 bits. */
+   if (writes_samplemask)
+   return V_028710_SPI_SHADER_32_ABGR;
+   else if (writes_stencil)
+   return V_028710_SPI_SHADER_32_GR;
+   else
+   return V_028710_SPI_SHADER_32_R;
+   } else if (writes_stencil || writes_samplemask) {
+   /* Both stencil 

[Mesa-dev] [PATCH 1/2] radv: always emit all compute block components

2017-12-14 Thread Samuel Pitoiset
The number of grid components is always 3 when gl_NumWorkGroups
is declared, because it relies on the number of components of
nir_instrinsic_load_num_work_groups.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_nir_to_llvm.c  |  9 ++---
 src/amd/vulkan/radv_cmd_buffer.c | 15 +--
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index f3602a267d..ce25e57eba 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -745,8 +745,10 @@ static void create_function(struct nir_to_llvm_context 
*ctx,
switch (stage) {
case MESA_SHADER_COMPUTE:
radv_define_common_user_sgprs_phase1(ctx, stage, 
has_previous_stage, previous_stage, _sgpr_info, , _sets);
-   if (ctx->shader_info->info.cs.grid_components_used)
-   add_user_sgpr_argument(, 
LLVMVectorType(ctx->ac.i32, ctx->shader_info->info.cs.grid_components_used), 
>num_work_groups); /* grid size */
+   if (ctx->shader_info->info.cs.grid_components_used) {
+   add_user_sgpr_argument(, ctx->ac.v3i32,
+  >num_work_groups);
+   }
add_sgpr_argument(, ctx->ac.v3i32, >workgroup_ids);
add_sgpr_argument(, ctx->ac.i32, >tg_size);
add_vgpr_argument(, ctx->ac.v3i32, 
>local_invocation_ids);
@@ -950,7 +952,8 @@ static void create_function(struct nir_to_llvm_context *ctx,
switch (stage) {
case MESA_SHADER_COMPUTE:
if (ctx->shader_info->info.cs.grid_components_used) {
-   set_userdata_location_shader(ctx, AC_UD_CS_GRID_SIZE, 
_sgpr_idx, ctx->shader_info->info.cs.grid_components_used);
+   set_userdata_location_shader(ctx, AC_UD_CS_GRID_SIZE,
+_sgpr_idx, 3);
}
break;
case MESA_SHADER_VERTEX:
diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 68371dbbe7..e68c5a4038 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -3487,9 +3487,6 @@ radv_emit_dispatch_packets(struct radv_cmd_buffer 
*cmd_buffer,
struct radeon_winsys_cs *cs = cmd_buffer->cs;
struct ac_userdata_info *loc;
unsigned dispatch_initiator;
-   uint8_t grid_used;
-
-   grid_used = compute_shader->info.info.cs.grid_components_used;
 
loc = radv_lookup_user_sgpr(pipeline, MESA_SHADER_COMPUTE,
AC_UD_CS_GRID_SIZE);
@@ -3514,7 +3511,7 @@ radv_emit_dispatch_packets(struct radv_cmd_buffer 
*cmd_buffer,
radv_cs_add_buffer(ws, cs, info->indirect->bo, 8);
 
if (loc->sgpr_idx != -1) {
-   for (unsigned i = 0; i < grid_used; ++i) {
+   for (unsigned i = 0; i < 3; ++i) {
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, 
COPY_DATA_SRC_SEL(COPY_DATA_MEM) |

COPY_DATA_DST_SEL(COPY_DATA_REG));
@@ -3581,15 +3578,13 @@ radv_emit_dispatch_packets(struct radv_cmd_buffer 
*cmd_buffer,
 
if (loc->sgpr_idx != -1) {
assert(!loc->indirect);
-   assert(loc->num_sgprs == grid_used);
+   assert(loc->num_sgprs == 3);
 
radeon_set_sh_reg_seq(cs, R_00B900_COMPUTE_USER_DATA_0 +
- loc->sgpr_idx * 4, grid_used);
+ loc->sgpr_idx * 4, 3);
radeon_emit(cs, blocks[0]);
-   if (grid_used > 1)
-   radeon_emit(cs, blocks[1]);
-   if (grid_used > 2)
-   radeon_emit(cs, blocks[2]);
+   radeon_emit(cs, blocks[1]);
+   radeon_emit(cs, blocks[2]);
}
 
radeon_emit(cs, PKT3(PKT3_DISPATCH_DIRECT, 3, 0) |
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radv: replace grid_components_used by uses_grid_size

2017-12-14 Thread Samuel Pitoiset
Use a boolean instead because the number of needed SGPRs
is always 3.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/common/ac_nir_to_llvm.c | 7 ---
 src/amd/common/ac_shader_info.c | 2 +-
 src/amd/common/ac_shader_info.h | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index ce25e57eba..0e1d7e0082 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -576,7 +576,8 @@ static void allocate_user_sgprs(struct nir_to_llvm_context 
*ctx,
 
switch (ctx->stage) {
case MESA_SHADER_COMPUTE:
-   user_sgpr_info->sgpr_count += 
ctx->shader_info->info.cs.grid_components_used;
+   if (ctx->shader_info->info.cs.uses_grid_size)
+   user_sgpr_info->sgpr_count += 3;
break;
case MESA_SHADER_FRAGMENT:
user_sgpr_info->sgpr_count += 
ctx->shader_info->info.ps.needs_sample_positions;
@@ -745,7 +746,7 @@ static void create_function(struct nir_to_llvm_context *ctx,
switch (stage) {
case MESA_SHADER_COMPUTE:
radv_define_common_user_sgprs_phase1(ctx, stage, 
has_previous_stage, previous_stage, _sgpr_info, , _sets);
-   if (ctx->shader_info->info.cs.grid_components_used) {
+   if (ctx->shader_info->info.cs.uses_grid_size) {
add_user_sgpr_argument(, ctx->ac.v3i32,
   >num_work_groups);
}
@@ -951,7 +952,7 @@ static void create_function(struct nir_to_llvm_context *ctx,
 
switch (stage) {
case MESA_SHADER_COMPUTE:
-   if (ctx->shader_info->info.cs.grid_components_used) {
+   if (ctx->shader_info->info.cs.uses_grid_size) {
set_userdata_location_shader(ctx, AC_UD_CS_GRID_SIZE,
 _sgpr_idx, 3);
}
diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
index 53e584065c..09dd4bbd55 100644
--- a/src/amd/common/ac_shader_info.c
+++ b/src/amd/common/ac_shader_info.c
@@ -43,7 +43,7 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
ac_shader_info *info)
info->vs.needs_instance_id = true;
break;
case nir_intrinsic_load_num_work_groups:
-   info->cs.grid_components_used = instr->num_components;
+   info->cs.uses_grid_size = true;
break;
case nir_intrinsic_load_sample_id:
info->ps.force_persample = true;
diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
index c1d36a667e..3c809cce13 100644
--- a/src/amd/common/ac_shader_info.h
+++ b/src/amd/common/ac_shader_info.h
@@ -42,7 +42,7 @@ struct ac_shader_info {
bool uses_input_attachments;
} ps;
struct {
-   uint8_t grid_components_used;
+   bool uses_grid_size;
} cs;
 };
 
-- 
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/11] intel/tools: Unify batch decoding between aubinators

2017-12-14 Thread Lionel Landwerlin

This looks really good :)
I can't find anything to nitpick :

Reviewed-by: Lionel Landwerlin 


On 13/12/17 20:05, Jason Ekstrand wrote:

Both aubinator and aubinator_error_decode try and do the same task of
decoding batches.  They both have code to try and decode various things
such as shaders from the batch.  All of that code is completely different
between the two.

This little series reworks the two to use a common gen_print_batch
function.  In order to handle dynamic states, a callback is provided that
lets the decoder ask for a mapped buffer given a memory address.  The basic
structure of the new code is taken from aubinator but many of the state
decoding details are taken from aubinator_error_decode.  This new version
should have all the goodness of both.  In particular, aubinator_error_decode
should now properly handle 48-bit addresses (not well tested) and it should
also be able to handle batch chaining.  Meanwhile, aubinator gains the
field-based decoding scheme so there are a lot fewer gen-specific manual
decoding paths.  Everyone wins!

The reason I haven't converted INTEL_DEBUG=bat yet is treefold:

  1) We really should come up with some mechanism to say how many dynamic
 states you have for a given pointers packet.  We could do some sort of
 callback but, if we want it to work for aubinator or error_decode, we
 will need some form of batch annotations.

  2) I'm not convinced I didn't break gen4-5.  (I don't think it worked in
 either of those tools in the first place).

  3) I put it in src/intel/tools because I really don't want to rewrite the
 build system right now to put it somewhere more common.  We can't put
 it in src/intel/common along with the other decoder stuff because it
 depends on the compiler and we don't want i965 to depend on
 src/intel/tools.

Jason Ekstrand (11):
   intel/decoder: Expose the raw field value in the iterator
   intel/tools: Add the start of a generic batch decoder
   intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOAD
   intel/batch-decoder: Decode vertex and index buffers
   intel/batch-decoder: Decode graphics shaders
   intel/tools: Switch aubinator_error_decode over to the gen_print_batch
   intel/batch-decoder: Decode constants, binding tables, and samplers
   intel/decoder: Add a gen_print_group_length helper
   intel/batch-decoder: Decode dynamic state
   intel/batch-decoder: Decode registers
   intel/tools: Convert aubinator over to the common framework

  src/intel/Makefile.tools.am  |   2 +
  src/intel/common/gen_decoder.c   |  20 +-
  src/intel/common/gen_decoder.h   |  56 +++
  src/intel/tools/aubinator.c  | 719 ++
  src/intel/tools/aubinator_error_decode.c | 238 ++---
  src/intel/tools/gen_batch_decoder.c  | 834 +++
  src/intel/tools/meson.build  |   6 +-
  7 files changed, 976 insertions(+), 899 deletions(-)
  create mode 100644 src/intel/tools/gen_batch_decoder.c



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: Deallocate screens and buffers on exit.

2017-12-14 Thread Ricardo Barreira
This allows dclose()'ing this code in dynamically-linked library without
leaking memory.
---
 src/gallium/state_trackers/osmesa/osmesa.c | 77 +-
 1 file changed, 55 insertions(+), 22 deletions(-)

diff --git a/src/gallium/state_trackers/osmesa/osmesa.c 
b/src/gallium/state_trackers/osmesa/osmesa.c
index 8baec0a0e4..7336dbd0c5 100644
--- a/src/gallium/state_trackers/osmesa/osmesa.c
+++ b/src/gallium/state_trackers/osmesa/osmesa.c
@@ -113,6 +113,11 @@ struct osmesa_context
struct pp_queue_t *pp;
 };
 
+/**
+ * Singleton st_manager object (see get_st_manager / destroy_st_manager
+ * functions).
+ */
+static struct st_manager *stmgr = NULL;
 
 /**
  * Linked list of all osmesa_buffers.
@@ -123,7 +128,11 @@ struct osmesa_context
  * a frame.
  */
 static struct osmesa_buffer *BufferList = NULL;
-
+/*
+ * Indicates whether the cleanup function for BufferList has already been
+ * registered with atexit().
+ */
+static boolean destroy_buffers_registered = FALSE;
 
 /**
  * Called from the ST manager.
@@ -149,6 +158,19 @@ get_st_api(void)
return stapi;
 }
 
+/**
+ * Destroy singleton st_manager object.
+ */
+static void
+destroy_st_manager(void)
+{
+   if (stmgr) {
+  if (stmgr->screen) {
+ stmgr->screen->destroy(stmgr->screen);
+  }
+  FREE(stmgr);
+   }
+}
 
 /**
  * Create/return a singleton st_manager object.
@@ -156,8 +178,10 @@ get_st_api(void)
 static struct st_manager *
 get_st_manager(void)
 {
-   static struct st_manager *stmgr = NULL;
if (!stmgr) {
+  if (atexit(destroy_st_manager) != 0) {
+ return NULL;
+  }
   stmgr = CALLOC_STRUCT(st_manager);
   if (stmgr) {
  stmgr->screen = osmesa_create_screen();
@@ -457,6 +481,24 @@ osmesa_create_st_framebuffer(void)
return stfbi;
 }
 
+static void
+osmesa_destroy_buffer(struct osmesa_buffer *osbuffer)
+{
+   FREE(osbuffer->stfb);
+   FREE(osbuffer);
+}
+
+static void
+destroy_buffers(void)
+{
+   struct osmesa_buffer *buffer;
+   struct osmesa_buffer *next_buffer;
+
+   for (buffer = BufferList; buffer; buffer = next_buffer) {
+  next_buffer = buffer->next;
+  osmesa_destroy_buffer(buffer);
+   }
+}
 
 /**
  * Create new buffer and add to linked list.
@@ -466,7 +508,17 @@ osmesa_create_buffer(enum pipe_format color_format,
  enum pipe_format ds_format,
  enum pipe_format accum_format)
 {
-   struct osmesa_buffer *osbuffer = CALLOC_STRUCT(osmesa_buffer);
+   struct osmesa_buffer *osbuffer;
+
+   if (!destroy_buffers_registered) {
+  if (atexit(destroy_buffers) != 0) {
+ return NULL;
+  }
+
+  destroy_buffers_registered = TRUE;
+   }
+
+   osbuffer = CALLOC_STRUCT(osmesa_buffer);
if (osbuffer) {
   osbuffer->stfb = osmesa_create_st_framebuffer();
 
@@ -510,22 +562,6 @@ osmesa_find_buffer(enum pipe_format color_format,
 }
 
 
-static void
-osmesa_destroy_buffer(struct osmesa_buffer *osbuffer)
-{
-   struct st_api *stapi = get_st_api();
-
-   /*
-* Notify the state manager that the associated framebuffer interface
-* is no longer valid.
-*/
-   stapi->destroy_drawable(stapi, osbuffer->stfb);
-
-   FREE(osbuffer->stfb);
-   FREE(osbuffer);
-}
-
-
 
 /**/
 /*Public Functions*/
@@ -790,9 +826,6 @@ OSMesaMakeCurrent(OSMesaContext osmesa, void *buffer, 
GLenum type,
osbuffer->height = height;
osbuffer->map = buffer;
 
-   /* XXX unused for now */
-   (void) osmesa_destroy_buffer;
-
osmesa->current_buffer = osbuffer;
osmesa->type = type;
 
-- 
2.15.1.504.g5279b80103-goog

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev