[Mesa-dev] [PATCH v4] anv: fix alphaToCoverage when there is no color attachment

2019-05-06 Thread Iago Toral Quiroga
-by: Iago Toral Quiroga --- src/intel/vulkan/anv_pipeline.c | 56 - 1 file changed, 42 insertions(+), 14 deletions(-) diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c index 20eab548fb2..f15f0896266 100644 --- a/src/intel/vulkan

[Mesa-dev] [PATCH v3] anv: fix alphaToCoverage when there is no color attachment

2019-05-03 Thread Iago Toral Quiroga
-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.* Signed-off-by: Samuel Iglesias Gonsálvez Signed-off-by: Iago Toral Quiroga --- src/intel/vulkan/anv_pipeline.c | 48 + 1 file changed, 37 insertions(+), 11 deletions(-) diff --git a/src/intel/vulkan/anv_pipeline.c b

[Mesa-dev] [PATCH v2] anv: fix alphaToCoverage when there is no color attachment

2019-05-02 Thread Iago Toral Quiroga
a bit (Iago) Fixes the following CTS tests: dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.* Signed-off-by: Samuel Iglesias Gonsálvez Signed-off-by: Iago Toral Quiroga --- src/intel/vulkan/anv_pipeline.c | 25 ++--- 1 file changed, 18 insertions(+),

[Mesa-dev] [PATCH] anv/query: ensure visibility of reset before copying query results

2019-04-30 Thread Iago Toral Quiroga
Specifically, vkCmdCopyQueryPoolResults is required to see the effect of a previous vkCmdResetQueryPool. This may not work currently when query execution is still on going, as some of the queries may become available asynchronously after the reset. Fixes new CTS tests:

[Mesa-dev] [PATCH v2 3/3] intel/compiler: implement more algebraic optimizations

2019-03-04 Thread Iago Toral Quiroga
Now that we propagate constants to the first source of 2src instructions we see more opportunities of constant folding in the backend. v2: - The hardware only uses 5 bits (or 6 bits for Q/UQ) from the shift count parameter in SHL/SHR instructions, so do the same in constant propagation

[Mesa-dev] [PATCH v2 1/3] intel/compiler: allow constant propagation for int quotient and reminder

2019-03-04 Thread Iago Toral Quiroga
And let combine constants promote the constants if needed. --- src/intel/compiler/brw_fs_combine_constants.cpp | 2 ++ src/intel/compiler/brw_fs_copy_propagation.cpp | 4 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp

[Mesa-dev] [PATCH v2 2/3] intel/compiler: allow constant propagation to first source of 2src instructions

2019-03-04 Thread Iago Toral Quiroga
Even if it is not supported by the hardware, we will fix it up in the combine constants pass. v2: - This will enable new constant folding opportunities in the algebraic pass for MUL or ADD with types other than F, so do not assert on that type. For now we just skip anything that is not

[Mesa-dev] [PATCH v2 0/3] intel: propagate constants to first source of 2-src instructions

2019-03-04 Thread Iago Toral Quiroga
to patch 2 to Jenkins and verified that it came back green. Iago Toral Quiroga (3): intel/compiler: allow constant propagation for int quotient and reminder intel/compiler: allow constant propagation to first source of 2src instructions intel/compiler: implement more algebraic optimizations

[Mesa-dev] [PATCH 2/3] intel/compiler: allow constant propagation to first source of 2-src instructions

2019-02-27 Thread Iago Toral Quiroga
Even if it is not supported by the hardware, we will fix it up in the combine constants pass. --- .../compiler/brw_fs_combine_constants.cpp | 37 ++--- .../compiler/brw_fs_copy_propagation.cpp | 55 +-- 2 files changed, 56 insertions(+), 36 deletions(-) diff

[Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-27 Thread Iago Toral Quiroga
Now that we propagate constants to the first source of 2src instructions we see more opportunities of constant folding in the backend. Shader-db results on KBL: total instructions in shared programs: 14965607 -> 14855983 (-0.73%) instructions in affected programs: 3988102 -> 3878478 (-2.75%)

[Mesa-dev] [PATCH 1/3] intel/compiler: allow constant propagation for int quotient and reminder

2019-02-27 Thread Iago Toral Quiroga
And let combine constants promote the constants if needed. --- src/intel/compiler/brw_fs_combine_constants.cpp | 2 ++ src/intel/compiler/brw_fs_copy_propagation.cpp | 4 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp

[Mesa-dev] [PATCH 0/3] intel: propagate constants to first source of 2-src instructions

2019-02-27 Thread Iago Toral Quiroga
results, so that part has been removed. Iago Iago Toral Quiroga (3): intel/compiler: allow constant propagation for int quotient and reminder intel/compiler: allow constant propagation to first source of 2-src instructions intel/compiler: implement more algebraic optimizations

[Mesa-dev] [PATCH v5 33/40] intel/compiler: also set F execution type for mixed float mode in BDW

2019-02-27 Thread Iago Toral Quiroga
The section 'Execution Data Types' of 3D Media GPGPU volume, which describes execution types, is exactly the same in BDW and SKL+. Also, this section states that there is a single execution type, so it makes sense that this is the wider of the two floating point types involved in mixed float

[Mesa-dev] [PATCH v5 02/40] intel/compiler: add a NIR pass to lower conversions

2019-02-15 Thread Iago Toral Quiroga
Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles

[Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions

2019-02-12 Thread Iago Toral Quiroga
--- src/intel/compiler/brw_eu_validate.c| 64 - src/intel/compiler/test_eu_validate.cpp | 122 2 files changed, 185 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_eu_validate.c b/src/intel/compiler/brw_eu_validate.c index

[Mesa-dev] [PATCH v4 03/40] intel/compiler: split float to 64-bit opcodes from int to 64-bit

2019-02-12 Thread Iago Toral Quiroga
Going forward having these split is a bit more convenient since these two groups have different restrictions. v2: - Rebased on top of new regioning lowering pass. Reviewed-by: Topi Pohjolainen (v1) Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 7 +++ 1 file changed,

[Mesa-dev] [PATCH v4 24/40] intel/compiler: implement isign for int8

2019-02-12 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 3a6e4a2eb60..40c0481ac53 100644 ---

[Mesa-dev] [PATCH v4 15/40] intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits

2019-02-12 Thread Iago Toral Quiroga
We are now using these bits, so don't assert that they are not set. In gen8, if these bits are set compaction is not possible. On gen9 and CHV platforms set_3src_control_index() checks these bits (and others) against a table to validate if the particular bit combination is eligible for compaction

[Mesa-dev] [PATCH v4 17/40] intel/compiler: set correct precision fields for 3-source float instructions

2019-02-12 Thread Iago Toral Quiroga
Source0 and Destination extract the floating-point precision automatically from the SrcType and DstType instruction fields respectively when they are set to types :F or :HF. For Source1 and Source2 operands, we use the new 1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1

[Mesa-dev] [PATCH v4 08/40] intel/compiler: implement 16-bit fsign

2019-02-12 Thread Iago Toral Quiroga
v2: - make 16-bit be its own separate case (Jason) v3: - Drop the result_int temporary (Jason) Reviewed-by: Topi Pohjolainen (v1) Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git

[Mesa-dev] [PATCH v4 13/40] intel/compiler: add instruction setters for Src1Type and Src2Type.

2019-02-12 Thread Iago Toral Quiroga
The original SrcType is a 3-bit field that takes a subset of the types supported for the hardware for 3-source instructions. Since gen8, when the half-float type was added, 3-source floating point operations can use use mixed precision mode, where not all the operands have the same floating-point

[Mesa-dev] [PATCH v4 01/40] compiler/nir: add an is_conversion field to nir_op_info

2019-02-12 Thread Iago Toral Quiroga
This is set to True only for numeric conversion opcodes. --- src/compiler/nir/nir.h| 3 ++ src/compiler/nir/nir_opcodes.py | 73 +-- src/compiler/nir/nir_opcodes_c.py | 1 + 3 files changed, 44 insertions(+), 33 deletions(-) diff --git

[Mesa-dev] [PATCH v4 37/40] intel/compiler: validate region restrictions for mixed float mode

2019-02-12 Thread Iago Toral Quiroga
--- src/intel/compiler/brw_eu_validate.c| 256 ++ src/intel/compiler/test_eu_validate.cpp | 618 2 files changed, 874 insertions(+) diff --git a/src/intel/compiler/brw_eu_validate.c b/src/intel/compiler/brw_eu_validate.c index ed9c8fe59dd..a61d4c46e81 100644

[Mesa-dev] [PATCH v4 21/40] intel/compiler: split is_partial_write() into two variants

2019-02-12 Thread Iago Toral Quiroga
This function is used in two different scenarios that for 32-bit instructions are the same, but for 16-bit instructions are not. One scenario is that in which we are working at a SIMD8 register level and we need to know if a register is fully defined or written. This is useful, for example, in

[Mesa-dev] [PATCH v4 00/40] intel: VK_KHR_shader_float16_int8 implementation

2019-02-12 Thread Iago Toral Quiroga
for testing in the itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at https://github.com/Igalia/mesa. Iago Toral Quiroga (40): compiler/nir: add an is_conversion field to nir_op_info intel/compiler: add a NIR pass to lower conversions intel/compiler: split float to 64-bit

[Mesa-dev] [PATCH v4 14/40] intel/compiler: add new half-float register type for 3-src instructions

2019-02-12 Thread Iago Toral Quiroga
This is available since gen8. v2: restore previously existing assertion. v3: don't use separate tables for gen7 and gen8, just assert that we don't use half-float before gen8 (Matt) Reviewed-by: Topi Pohjolainen (v1) --- src/intel/compiler/brw_reg_type.c | 4 1 file changed, 4

[Mesa-dev] [PATCH v4 10/40] compiler/nir: add lowering option for 16-bit fmod

2019-02-12 Thread Iago Toral Quiroga
And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by: Jason Ekstrand --- src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opt_algebraic.py | 1 + src/intel/compiler/brw_compiler.c | 1 + 3 files changed, 3 insertions(+)

[Mesa-dev] [PATCH v4 04/40] intel/compiler: handle b2i/b2f with other integer conversion opcodes

2019-02-12 Thread Iago Toral Quiroga
Since we handle booleans as integers this makes more sense. v2: - rebased to incorporate new boolean conversion opcodes v3: - rebased on top regioning lowering pass Reviewed-by: Jason Ekstrand (v1) Reviewed-by: Topi Pohjolainen (v2) --- src/intel/compiler/brw_fs_nir.cpp | 16

[Mesa-dev] [PATCH v4 33/40] intel/compiler: also set F execution type for mixed float mode in BDW

2019-02-12 Thread Iago Toral Quiroga
The section 'Execution Data Types' of 3D Media GPGPU volume, which describes execution types, is exactly the same in BDW and SKL+. Also, this section states that there is a single execution type, so it makes sense that this is the wider of the two floating point types involved in mixed float

[Mesa-dev] [PATCH v4 36/40] intel/compiler: skip validating restrictions on operand types for mixed float

2019-02-12 Thread Iago Toral Quiroga
Mixed float instructions are those that use both F and HF operands as their sources or destination, except for regular conversions. There are specific rules for mixed float operation mode with its own set of restrictions, which involve rules that are incompatible with general restrictions. For

[Mesa-dev] [PATCH v4 12/40] compiler/nir: add lowering for 16-bit ldexp

2019-02-12 Thread Iago Toral Quiroga
v2 (Topi): - Make bit-size handling order be 16-bit, 32-bit, 64-bit - Clamp lower exponent range at -28 instead of -30. Reviewed-by: Topi Pohjolainen Reviewed-by: Jason Ekstrand --- src/compiler/nir/nir_opt_algebraic.py | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff

[Mesa-dev] [PATCH v4 23/40] intel/compiler: rework conversion opcodes

2019-02-12 Thread Iago Toral Quiroga
Now that we have the regioning lowering pass we can just put all of these opcodes together in a single block and we can just assert on the few cases of conversion instructions that are not supported in hardware and that should be lowered in brw_nir_lower_conversions. The only cases what we still

[Mesa-dev] [PATCH v4 39/40] anv/pipeline: support Float16 and Int8 SPIR-V capabilities in gen8+

2019-02-12 Thread Iago Toral Quiroga
v2: - Merge Float16 and Int8 capabilities into a single patch (Jason) - Merged patch that enabled SPIR-V front-end checks for these caps (except for Int8, which was already merged) Reviewed-by: Jason Ekstrand (v1) --- src/compiler/shader_info.h| 1 +

[Mesa-dev] [PATCH v4 28/40] intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit

2019-02-12 Thread Iago Toral Quiroga
There are no 8-bit immediates, so assert in that case. 16-bit immediates are replicated in each word of a 32-bit immediate, so we only need to check the lower 16-bits. v2: - Fix is_zero with half-float to consider -0 as well (Jason). - Fix is_negative_one for word type. Reviewed-by: Jason

[Mesa-dev] [PATCH v4 29/40] intel/compiler: add a brw_reg_type_is_integer helper

2019-02-12 Thread Iago Toral Quiroga
v2: - Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV Reviewed-by: Jason Ekstrand (v1) --- src/intel/compiler/brw_reg_type.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/intel/compiler/brw_reg_type.h b/src/intel/compiler/brw_reg_type.h index

[Mesa-dev] [PATCH v4 19/40] intel/compiler: fix ddy for half-float in Broadwell

2019-02-12 Thread Iago Toral Quiroga
Broadwell has restrictions that apply to Align16 half-float that make the Align16 implementation of this invalid for this platform. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in cherryview, gen9 or gen10, where the Align16 implementation seems to

[Mesa-dev] [PATCH v4 27/40] intel/compiler: generalize the combine constants pass

2019-02-12 Thread Iago Toral Quiroga
At the very least we need it to handle HF too, since we are doing constant propagation for MAD and LRP, which relies on this pass to promote the immediates to GRF in the end, but ideally we want it to support even more types so we can take advantage of it to improve register pressure in some

[Mesa-dev] [PATCH v4 26/40] intel/eu: force stride of 2 on NULL register for Byte instructions

2019-02-12 Thread Iago Toral Quiroga
The hardware only allows a stride of 1 on a Byte destination for raw byte MOV instructions. This is required even when the destination is the NULL register. Rather than making sure that we emit a proper NULL:B destination every time we need one, just fix it at emission time. Reviewed-by: Jason

[Mesa-dev] [PATCH v4 25/40] intel/compiler: ask for an integer type if requesting an 8-bit type

2019-02-12 Thread Iago Toral Quiroga
v2: - Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason) Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index

[Mesa-dev] [PATCH v4 30/40] intel/compiler: fix cmod propagation for non 32-bit types

2019-02-12 Thread Iago Toral Quiroga
v2: - Do not propagate if the bit-size changes Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_cmod_propagation.cpp | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp

[Mesa-dev] [PATCH v4 35/40] intel/compiler: validate conversions between 64-bit and 8-bit types

2019-02-12 Thread Iago Toral Quiroga
--- src/intel/compiler/brw_eu_validate.c| 10 +- src/intel/compiler/test_eu_validate.cpp | 46 + 2 files changed, 55 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_eu_validate.c b/src/intel/compiler/brw_eu_validate.c index

[Mesa-dev] [PATCH v4 18/40] intel/compiler: fix ddx and ddy for 16-bit float

2019-02-12 Thread Iago Toral Quiroga
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components in a single SIMD register, so for example, component Y of a 16-bit vec2 starts is at byte offset 16B. This means that when we compute the offset of the elements to be differentiated we should not stomp whatever base

[Mesa-dev] [PATCH v4 38/40] compiler/spirv: move the check for Int8 capability

2019-02-12 Thread Iago Toral Quiroga
So it is right after the checks for the other various Int* capabilities. --- src/compiler/spirv/spirv_to_nir.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index 1cbc926c818..7e07de2bfc0 100644

[Mesa-dev] [PATCH v4 11/40] compiler/nir: add lowering for 16-bit flrp

2019-02-12 Thread Iago Toral Quiroga
And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by: Jason Ekstrand --- src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opt_algebraic.py | 1 + src/intel/compiler/brw_compiler.c | 1 + 3 files changed, 3 insertions(+) diff --git

[Mesa-dev] [PATCH v4 02/40] intel/compiler: add a NIR pass to lower conversions

2019-02-12 Thread Iago Toral Quiroga
Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles

[Mesa-dev] [PATCH v4 40/40] anv/device: expose VK_KHR_shader_float16_int8 in gen8+

2019-02-12 Thread Iago Toral Quiroga
v2 (Jason): - Merge shaderFloat16 and shaderInt8 enablement into a single patch. - Merge extension enable. Reviewed-by: Jason Ekstrand (v1) --- src/intel/vulkan/anv_device.c | 9 + src/intel/vulkan/anv_extensions.py | 1 + 2 files changed, 10 insertions(+) diff --git

[Mesa-dev] [PATCH v4 09/40] intel/compiler: drop unnecessary temporary from 32-bit fsign implementation

2019-02-12 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 64e24f86b5a..f59e9ad4e2b 100644 --- a/src/intel/compiler/brw_fs_nir.cpp +++

[Mesa-dev] [PATCH v4 07/40] intel/compiler: handle extended math restrictions for half-float

2019-02-12 Thread Iago Toral Quiroga
Extended math with half-float operands is only supported since gen9, but it is limited to SIMD8. In gen8 we lower it to 32-bit. v2: quashed together the following patches (Jason): - intel/compiler: allow extended math functions with HF operands - intel/compiler: lower 16-bit extended math to

[Mesa-dev] [PATCH v4 22/40] intel/compiler: activate 16-bit bit-size lowerings also for 8-bit

2019-02-12 Thread Iago Toral Quiroga
Particularly, we need the same lowewrings we use for 16-bit integers. Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_nir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 9b26a6c3d6f..1d62f2adde8

[Mesa-dev] [PATCH v4 31/40] intel/compiler: remove inexact algebraic optimizations from the backend

2019-02-12 Thread Iago Toral Quiroga
NIR already has these and correctly considers exact/inexact qualification, whereas the backend doesn't and can apply the optimizations where it shouldn't. This happened to be the case in a handful of Tomb Raider shaders, where NIR would skip the optimizations because of a precise qualification but

[Mesa-dev] [PATCH v4 16/40] intel/compiler: allow half-float on 3-source instructions since gen8

2019-02-12 Thread Iago Toral Quiroga
Reviewed-by: Topi Pohjolainen Reviewed-by: Jason Ekstrand Reviewed-by: Matt Turner --- src/intel/compiler/brw_eu_emit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c index 2f31d9591fc..30037e71b00

[Mesa-dev] [PATCH v4 32/40] intel/compiler: skip MAD algebraic optimization for half-float or mixed mode

2019-02-12 Thread Iago Toral Quiroga
It is very likely that this optimzation is never useful and we'll probably just end up removing it, so let's not bother adding more cases to it for now. --- src/intel/compiler/brw_fs.cpp | 4 1 file changed, 4 insertions(+) diff --git a/src/intel/compiler/brw_fs.cpp

[Mesa-dev] [PATCH v4 20/40] intel/compiler: workaround for SIMD8 half-float MAD in gen8

2019-02-12 Thread Iago Toral Quiroga
Empirical testing shows that gen8 has a bug where MAD instructions with a half-float source starting at a non-zero offset fail to execute properly. This scenario usually happened in SIMD8 executions, where we used to pack vector components Y and W in the second half of SIMD registers (therefore,

[Mesa-dev] [PATCH v4 06/40] intel/compiler: lower some 16-bit float operations to 32-bit

2019-02-12 Thread Iago Toral Quiroga
The hardware doesn't support half-float for these. Reviewed-by: Topi Pohjolainen Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_nir.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 7e3dbc9e447..75513e5113c

[Mesa-dev] [PATCH v4 05/40] intel/compiler: assert restrictions on conversions to half-float

2019-02-12 Thread Iago Toral Quiroga
There are some hardware restrictions that brw_nir_lower_conversions should have taken care of before we get here. v2: - rebased on top of regioning lowering pass Reviewed-by: Topi Pohjolainen (v1) Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 5 +++-- 1 file changed, 3

[Mesa-dev] [PATCH] intel/compiler: update validator to account for half-float exec type promotion

2019-01-23 Thread Iago Toral Quiroga
Commit c84ec70b3a72 implemented execution type promotion to 32-bit for conversions involving half-float registers, which empirical testing suggested was required, but it did not incorporate this change into the assembly validator logic. This commits adds that, preventing validation errors like

[Mesa-dev] [PATCH v3 28/42] intel/compiler: handle 64-bit float to 8-bit integer conversions

2019-01-15 Thread Iago Toral Quiroga
These are not directly supported in hardware and brw_nir_lower_conversions should have taken care of that before we get here. Also, while we are at it, make sure 64-bit integer to 8-bit are also properly split by the same lowering pass, since they have the same hardware restrictions. ---

[Mesa-dev] [PATCH v3 16/42] intel/compiler: add instruction setters for Src1Type and Src2Type.

2019-01-15 Thread Iago Toral Quiroga
The original SrcType is a 3-bit field that takes a subset of the types supported for the hardware for 3-source instructions. Since gen8, when the half-float type was added, 3-source floating point operations can use use mixed precision mode, where not all the operands have the same floating-point

[Mesa-dev] [PATCH v3 35/42] anv/device: expose shaderFloat16 and shaderInt8 in gen8+

2019-01-15 Thread Iago Toral Quiroga
v2 (Jason): - Merge Float16 and Int8 into a single patch. - Merge extension enable. Reviewed-by: Jason Ekstrand (v1) --- src/intel/vulkan/anv_device.c | 9 + src/intel/vulkan/anv_extensions.py | 1 + 2 files changed, 10 insertions(+) diff --git a/src/intel/vulkan/anv_device.c

[Mesa-dev] [PATCH v3 21/42] intel/compiler: set correct precision fields for 3-source float instructions

2019-01-15 Thread Iago Toral Quiroga
Source0 and Destination extract the floating-point precision automatically from the SrcType and DstType instruction fields respectively when they are set to types :F or :HF. For Source1 and Source2 operands, we use the new 1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1

[Mesa-dev] [PATCH v3 03/42] intel/compiler: split float to 64-bit opcodes from int to 64-bit

2019-01-15 Thread Iago Toral Quiroga
Going forward having these split is a bit more convenient since these two groups have different restrictions. v2: - Rebased on top of new regioning lowering pass. Reviewed-by: Topi Pohjolainen (v1) --- src/intel/compiler/brw_fs_nir.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git

[Mesa-dev] [PATCH v3 27/42] intel/compiler: activate 16-bit bit-size lowerings also for 8-bit

2019-01-15 Thread Iago Toral Quiroga
Particularly, we need the same lowewrings we use for 16-bit integers. Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_nir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 3b2909da33e..2dfbf8824dc

[Mesa-dev] [PATCH v3 32/42] intel/eu: force stride of 2 on NULL register for Byte instructions

2019-01-15 Thread Iago Toral Quiroga
The hardware only allows a stride of 1 on a Byte destination for raw byte MOV instructions. This is required even when the destination is the NULL register. Rather than making sure that we emit a proper NULL:B destination every time we need one, just fix it at emission time. Reviewed-by: Jason

[Mesa-dev] [PATCH v3 39/42] intel/compiler: remove MAD/LRP algebraic optimizations from the backend

2019-01-15 Thread Iago Toral Quiroga
NIR already has these so they are redundant. A run of shader-db confirms that the only cases where these backend optimizations are activated are some Tomb Raider shaders where the affected variables are qualified as "precise", which is why NIR won't apply them and why the backend shouldn't either

[Mesa-dev] [PATCH v3 17/42] intel/compiler: add new half-float register type for 3-src instructions

2019-01-15 Thread Iago Toral Quiroga
This is available since gen8. v2: restore previously existing assertion. Reviewed-by: Topi Pohjolainen (v1) --- src/intel/compiler/brw_reg_type.c | 36 +++ 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_reg_type.c

[Mesa-dev] [PATCH v3 19/42] intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits

2019-01-15 Thread Iago Toral Quiroga
We are now using these bits, so don't assert that they are not set, just avoid compaction in that case. Reviewed-by: Topi Pohjolainen --- src/intel/compiler/brw_eu_compact.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_eu_compact.c

[Mesa-dev] [PATCH v3 41/42] intel/compiler: fix combine constants for Align16 with half-float prior to gen9

2019-01-15 Thread Iago Toral Quiroga
There is a hardware restriction where <0,1,0>:HF in Align16 doesn't replicate a single 16-bit channel, but instead it replicates a full 32-bit channel. --- .../compiler/brw_fs_combine_constants.cpp | 24 +-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git

[Mesa-dev] [PATCH v3 38/42] intel/compiler: fix cmod propagation for non 32-bit types

2019-01-15 Thread Iago Toral Quiroga
v2: - Do not propagate if the bit-size changes --- src/intel/compiler/brw_fs_cmod_propagation.cpp | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp b/src/intel/compiler/brw_fs_cmod_propagation.cpp index

[Mesa-dev] [PATCH v3 20/42] intel/compiler: allow half-float on 3-source instructions since gen8

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Topi Pohjolainen --- src/intel/compiler/brw_eu_emit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c index e21df4624b3..a785f96b650 100644 --- a/src/intel/compiler/brw_eu_emit.c +++

[Mesa-dev] [PATCH v3 26/42] intel/compiler: split is_partial_write() into two variants

2019-01-15 Thread Iago Toral Quiroga
This function is used in two different scenarios that for 32-bit instructions are the same, but for 16-bit instructions are not. One scenario is that in which we are working at a SIMD8 register level and we need to know if a register is fully defined or written. This is useful, for example, in

[Mesa-dev] [PATCH v3 12/42] compiler/nir: add lowering for 16-bit flrp

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opt_algebraic.py | 1 + 2 files changed, 2 insertions(+) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 19056e79206..adcc8e36cc9 100644 --- a/src/compiler/nir/nir.h +++

[Mesa-dev] [PATCH v3 07/42] intel/compiler: lower 16-bit extended math to 32-bit prior to gen9

2019-01-15 Thread Iago Toral Quiroga
Extended math doesn't support half-float on these generations. Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_nir.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index

[Mesa-dev] [PATCH v3 10/42] compiler/nir: add lowering option for 16-bit fmod

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/compiler/nir/nir.h| 1 + src/compiler/nir/nir_opt_algebraic.py | 1 + 2 files changed, 2 insertions(+) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 3cb2d166cb3..19056e79206 100644 --- a/src/compiler/nir/nir.h +++

[Mesa-dev] [PATCH v3 34/42] anv/pipeline: support Float16 and Int8 capabilities in gen8+

2019-01-15 Thread Iago Toral Quiroga
v2: - Merge Float16 and Int8 in a single patch (Jason) Reviewed-by: Jason Ekstrand (v1) --- src/intel/vulkan/anv_pipeline.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c index 899160746d4..663d1c77fa5 100644 ---

[Mesa-dev] [PATCH v3 40/42] intel/compiler: support half-float in the combine constants pass

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Topi Pohjolainen --- .../compiler/brw_fs_combine_constants.cpp | 60 +++ 1 file changed, 49 insertions(+), 11 deletions(-) diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp b/src/intel/compiler/brw_fs_combine_constants.cpp index

[Mesa-dev] [PATCH v3 13/42] intel/compiler: lower 16-bit flrp

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_compiler.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/intel/compiler/brw_compiler.c b/src/intel/compiler/brw_compiler.c index f885e79c3e6..04a1a7cac4e 100644 --- a/src/intel/compiler/brw_compiler.c +++

[Mesa-dev] [PATCH v3 29/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Topi Pohjolainen --- src/intel/compiler/brw_fs_nir.cpp | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index e454578d99b..a739562c3ab 100644 ---

[Mesa-dev] [PATCH v3 30/42] intel/compiler: implement isign for int8

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index a739562c3ab..a3d193b8a44 100644 ---

[Mesa-dev] [PATCH v3 36/42] intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit

2019-01-15 Thread Iago Toral Quiroga
There are no 8-bit immediates, so assert in that case. 16-bit immediates are replicated in each word of a 32-bit immediate, so we only need to check the lower 16-bits. v2: - Fix is_zero with half-float to consider -0 as well (Jason). - Fix is_negative_one for word type. ---

[Mesa-dev] [PATCH v3 31/42] intel/compiler: ask for an integer type if requesting an 8-bit type

2019-01-15 Thread Iago Toral Quiroga
--- src/intel/compiler/brw_fs_nir.cpp | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index a3d193b8a44..ccf1891b925 100644 --- a/src/intel/compiler/brw_fs_nir.cpp +++

[Mesa-dev] [PATCH v3 06/42] intel/compiler: lower some 16-bit float operations to 32-bit

2019-01-15 Thread Iago Toral Quiroga
The hardware doesn't support half-float for these. Reviewed-by: Topi Pohjolainen Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_nir.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 572ab824a94..f0fe7f870c2

[Mesa-dev] [PATCH v3 23/42] intel/compiler: fix ddx and ddy for 16-bit float

2019-01-15 Thread Iago Toral Quiroga
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components in a single SIMD register, so for example, component Y of a 16-bit vec2 starts is at byte offset 16B. This means that when we compute the offset of the elements to be differentiated we should not stomp whatever base

[Mesa-dev] [PATCH v3 37/42] intel/compiler: add a brw_reg_type_is_integer helper

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_reg_type.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/intel/compiler/brw_reg_type.h b/src/intel/compiler/brw_reg_type.h index ffbec90d3fe..a3365b7e34c 100644 --- a/src/intel/compiler/brw_reg_type.h +++

[Mesa-dev] [PATCH v3 25/42] intel/compiler: workaround for SIMD8 half-float MAD in gen8

2019-01-15 Thread Iago Toral Quiroga
Broadwell hardware has a bug that manifests in SIMD8 executions of 16-bit MAD instructions when any of the sources is a Y or W component. We pack these components in the same SIMD register as components X and Z respectively, but starting at offset 16B (so they live in the second half of the

[Mesa-dev] [PATCH v3 33/42] compiler/spirv: add support for Float16 and Int8 capabilities

2019-01-15 Thread Iago Toral Quiroga
v2: - Merge Float16 and Int8 capabilities into a single patch (Jason) Reviewed-by: Jason Ekstrand (v1) --- src/compiler/shader_info.h| 2 ++ src/compiler/spirv/spirv_to_nir.c | 8 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/src/compiler/shader_info.h

[Mesa-dev] [PATCH v3 14/42] compiler/nir: add lowering for 16-bit ldexp

2019-01-15 Thread Iago Toral Quiroga
v2 (Topi): - Make bit-size handling order be 16-bit, 32-bit, 64-bit - Clamp lower exponent range at -28 instead of -30. Reviewed-by: Topi Pohjolainen Reviewed-by: Jason Ekstrand --- src/compiler/nir/nir_opt_algebraic.py | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff

[Mesa-dev] [PATCH v3 42/42] intel/compiler: allow propagating HF immediates to MAD/LRP

2019-01-15 Thread Iago Toral Quiroga
Even if we don't do 3-src algebraic optimizations for MAD and LRP in the backend any more, the combine constants pass can still do a fine job putting grouping these constants into single registers for better register pressure. v2: - updated comment to reference register pressure benefits rather

[Mesa-dev] [PATCH v3 18/42] intel/compiler: add a helper function to query hardware type table

2019-01-15 Thread Iago Toral Quiroga
We open coded this in a couple of places, so a helper function is probably sensible. Plus it makes it more consistent with the 3src hardware type case. Suggested-by: Topi Pohjolainen --- src/intel/compiler/brw_reg_type.c | 34 --- 1 file changed, 18 insertions(+), 16

[Mesa-dev] [PATCH v3 05/42] intel/compiler: assert restrictions on conversions to half-float

2019-01-15 Thread Iago Toral Quiroga
There are some hardware restrictions that brw_nir_lower_conversions should have taken care of before we get here. v2: - rebased on top of regioning lowering pass Reviewed-by: Topi Pohjolainen (v1) --- src/intel/compiler/brw_fs_nir.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)

[Mesa-dev] [PATCH v3 11/42] intel/compiler: lower 16-bit fmod

2019-01-15 Thread Iago Toral Quiroga
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_compiler.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/intel/compiler/brw_compiler.c b/src/intel/compiler/brw_compiler.c index fe632c5badc..f885e79c3e6 100644 --- a/src/intel/compiler/brw_compiler.c +++

[Mesa-dev] [PATCH v3 04/42] intel/compiler: handle b2i/b2f with other integer conversion opcodes

2019-01-15 Thread Iago Toral Quiroga
Since we handle booleans as integers this makes more sense. v2: - rebased to incorporate new boolean conversion opcodes v3: - rebased on top regioning lowering pass Reviewed-by: Jason Ekstrand (v1) Reviewed-by: Topi Pohjolainen (v2) --- src/intel/compiler/brw_fs_nir.cpp | 16

[Mesa-dev] [PATCH v3 15/42] intel/compiler: Extended Math is limited to SIMD8 on half-float

2019-01-15 Thread Iago Toral Quiroga
From the Skylake PRM, Extended Math Function: "The execution size must be no more than 8 when half-floats are used in source or destination operand." Earlier generations do not support Extended Math with half-float. v2: - Rewrite the code to make it more readable (Jason). v3: - Use

[Mesa-dev] [PATCH v3 08/42] intel/compiler: implement 16-bit fsign

2019-01-15 Thread Iago Toral Quiroga
v2: - make 16-bit be its own separate case (Jason) Reviewed-by: Topi Pohjolainen --- src/intel/compiler/brw_fs_nir.cpp | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index

[Mesa-dev] [PATCH v3 22/42] intel/compiler: don't propagate HF immediates to 3-src instructions

2019-01-15 Thread Iago Toral Quiroga
3-src instructions don't support immediates, but since 36bc5f06dd22, we allow them on MAD and LRP relying on the combine constants pass to fix it up later. However, that pass is specialized for 32-bit float immediates and can't handle HF constants at present, so this patch ensures that

[Mesa-dev] [PATCH v3 24/42] intel/compiler: fix ddy for half-float in gen8

2019-01-15 Thread Iago Toral Quiroga
We use ALign16 mode for this, since it is more convenient, but the PRM for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region restrictions', Section '1. Special Restrictions': "In Align16 mode, the channel selects and channel enables apply to a pair of half-floats, because

[Mesa-dev] [PATCH v3 00/42] intel: VK_KHR_shader_float16_int8 implementation

2019-01-15 Thread Iago Toral Quiroga
. This version of the series also dropped the SPIR-V compiler patches that have already been merged. As always, a branch for with these patches is available for testing in the itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at https://github.com/Igalia/mesa. Iago Toral Quiroga (42

[Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

2019-01-15 Thread Iago Toral Quiroga
v2: adapted to work with the new regioning lowering pass Reviewed-by: Topi Pohjolainen (v1) --- src/intel/compiler/brw_ir_fs.h | 33 ++--- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h

[Mesa-dev] [PATCH v3 09/42] intel/compiler: allow extended math functions with HF operands

2019-01-15 Thread Iago Toral Quiroga
The PRM states that half-float operands are supported since gen9. Reviewed-by: Topi Pohjolainen --- src/intel/compiler/brw_eu_emit.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c index

[Mesa-dev] [PATCH v3 02/42] intel/compiler: add a NIR pass to lower conversions

2019-01-15 Thread Iago Toral Quiroga
Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles

[Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-14 Thread Iago Toral Quiroga
We had defined MAX_IMAGES as 8, which we used to size the array for image push constant data. The comment there stated that this was for gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes that array, asserting that we don't exceed that number of images, which imposes a limit of

  1   2   3   4   5   6   7   8   9   10   >