-by: Iago Toral Quiroga
---
src/intel/vulkan/anv_pipeline.c | 56 -
1 file changed, 42 insertions(+), 14 deletions(-)
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 20eab548fb2..f15f0896266 100644
--- a/src/intel/vulkan
-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*
Signed-off-by: Samuel Iglesias Gonsálvez
Signed-off-by: Iago Toral Quiroga
---
src/intel/vulkan/anv_pipeline.c | 48 +
1 file changed, 37 insertions(+), 11 deletions(-)
diff --git a/src/intel/vulkan/anv_pipeline.c b
a bit (Iago)
Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*
Signed-off-by: Samuel Iglesias Gonsálvez
Signed-off-by: Iago Toral Quiroga
---
src/intel/vulkan/anv_pipeline.c | 25 ++---
1 file changed, 18 insertions(+),
Specifically, vkCmdCopyQueryPoolResults is required to see the effect
of a previous vkCmdResetQueryPool. This may not work currently when
query execution is still on going, as some of the queries may become
available asynchronously after the reset.
Fixes new CTS tests:
Now that we propagate constants to the first source of 2src instructions we
see more opportunities of constant folding in the backend.
v2:
- The hardware only uses 5 bits (or 6 bits for Q/UQ) from the shift
count parameter in SHL/SHR instructions, so do the same in constant
propagation
And let combine constants promote the constants if needed.
---
src/intel/compiler/brw_fs_combine_constants.cpp | 2 ++
src/intel/compiler/brw_fs_copy_propagation.cpp | 4
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp
Even if it is not supported by the hardware, we will fix it up
in the combine constants pass.
v2:
- This will enable new constant folding opportunities in the algebraic pass
for MUL or ADD with types other than F, so do not assert on that type.
For now we just skip anything that is not
to patch 2 to Jenkins and verified that
it came back green.
Iago Toral Quiroga (3):
intel/compiler: allow constant propagation for int quotient and
reminder
intel/compiler: allow constant propagation to first source of 2src
instructions
intel/compiler: implement more algebraic optimizations
Even if it is not supported by the hardware, we will fix it up
in the combine constants pass.
---
.../compiler/brw_fs_combine_constants.cpp | 37 ++---
.../compiler/brw_fs_copy_propagation.cpp | 55 +--
2 files changed, 56 insertions(+), 36 deletions(-)
diff
Now that we propagate constants to the first source of 2src instructions we
see more opportunities of constant folding in the backend.
Shader-db results on KBL:
total instructions in shared programs: 14965607 -> 14855983 (-0.73%)
instructions in affected programs: 3988102 -> 3878478 (-2.75%)
And let combine constants promote the constants if needed.
---
src/intel/compiler/brw_fs_combine_constants.cpp | 2 ++
src/intel/compiler/brw_fs_copy_propagation.cpp | 4
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp
results, so that part has been
removed.
Iago
Iago Toral Quiroga (3):
intel/compiler: allow constant propagation for int quotient and
reminder
intel/compiler: allow constant propagation to first source of 2-src
instructions
intel/compiler: implement more algebraic optimizations
The section 'Execution Data Types' of 3D Media GPGPU volume, which
describes execution types, is exactly the same in BDW and SKL+.
Also, this section states that there is a single execution type, so it
makes sense that this is the wider of the two floating point types
involved in mixed float
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.
v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles
---
src/intel/compiler/brw_eu_validate.c| 64 -
src/intel/compiler/test_eu_validate.cpp | 122
2 files changed, 185 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_eu_validate.c
b/src/intel/compiler/brw_eu_validate.c
index
Going forward having these split is a bit more convenient since these two
groups have different restrictions.
v2:
- Rebased on top of new regioning lowering pass.
Reviewed-by: Topi Pohjolainen (v1)
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 7 +++
1 file changed,
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 25 +
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 3a6e4a2eb60..40c0481ac53 100644
---
We are now using these bits, so don't assert that they are not set. In gen8,
if these bits are set compaction is not possible. On gen9 and CHV platforms
set_3src_control_index() checks these bits (and others) against a table to
validate if the particular bit combination is eligible for compaction
Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
v2:
- make 16-bit be its own separate case (Jason)
v3:
- Drop the result_int temporary (Jason)
Reviewed-by: Topi Pohjolainen (v1)
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git
The original SrcType is a 3-bit field that takes a subset of the types
supported for the hardware for 3-source instructions. Since gen8,
when the half-float type was added, 3-source floating point operations
can use use mixed precision mode, where not all the operands have the
same floating-point
This is set to True only for numeric conversion opcodes.
---
src/compiler/nir/nir.h| 3 ++
src/compiler/nir/nir_opcodes.py | 73 +--
src/compiler/nir/nir_opcodes_c.py | 1 +
3 files changed, 44 insertions(+), 33 deletions(-)
diff --git
---
src/intel/compiler/brw_eu_validate.c| 256 ++
src/intel/compiler/test_eu_validate.cpp | 618
2 files changed, 874 insertions(+)
diff --git a/src/intel/compiler/brw_eu_validate.c
b/src/intel/compiler/brw_eu_validate.c
index ed9c8fe59dd..a61d4c46e81 100644
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.
One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in
for testing in the
itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
https://github.com/Igalia/mesa.
Iago Toral Quiroga (40):
compiler/nir: add an is_conversion field to nir_op_info
intel/compiler: add a NIR pass to lower conversions
intel/compiler: split float to 64-bit
This is available since gen8.
v2: restore previously existing assertion.
v3: don't use separate tables for gen7 and gen8, just assert that we
don't use half-float before gen8 (Matt)
Reviewed-by: Topi Pohjolainen (v1)
---
src/intel/compiler/brw_reg_type.c | 4
1 file changed, 4
And enable it on Intel.
v2:
- Squash the change to enable this lowering on Intel (Jason)
Reviewed-by: Jason Ekstrand
---
src/compiler/nir/nir.h| 1 +
src/compiler/nir/nir_opt_algebraic.py | 1 +
src/intel/compiler/brw_compiler.c | 1 +
3 files changed, 3 insertions(+)
Since we handle booleans as integers this makes more sense.
v2:
- rebased to incorporate new boolean conversion opcodes
v3:
- rebased on top regioning lowering pass
Reviewed-by: Jason Ekstrand (v1)
Reviewed-by: Topi Pohjolainen (v2)
---
src/intel/compiler/brw_fs_nir.cpp | 16
The section 'Execution Data Types' of 3D Media GPGPU volume, which
describes execution types, is exactly the same in BDW and SKL+.
Also, this section states that there is a single execution type, so it
makes sense that this is the wider of the two floating point types
involved in mixed float
Mixed float instructions are those that use both F and HF operands as their
sources or destination, except for regular conversions.
There are specific rules for mixed float operation mode with its own set
of restrictions, which involve rules that are incompatible with general
restrictions. For
v2 (Topi):
- Make bit-size handling order be 16-bit, 32-bit, 64-bit
- Clamp lower exponent range at -28 instead of -30.
Reviewed-by: Topi Pohjolainen
Reviewed-by: Jason Ekstrand
---
src/compiler/nir/nir_opt_algebraic.py | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff
Now that we have the regioning lowering pass we can just put all of these
opcodes together in a single block and we can just assert on the few cases
of conversion instructions that are not supported in hardware and that should
be lowered in brw_nir_lower_conversions.
The only cases what we still
v2:
- Merge Float16 and Int8 capabilities into a single patch (Jason)
- Merged patch that enabled SPIR-V front-end checks for these caps
(except for Int8, which was already merged)
Reviewed-by: Jason Ekstrand (v1)
---
src/compiler/shader_info.h| 1 +
There are no 8-bit immediates, so assert in that case.
16-bit immediates are replicated in each word of a 32-bit immediate, so
we only need to check the lower 16-bits.
v2:
- Fix is_zero with half-float to consider -0 as well (Jason).
- Fix is_negative_one for word type.
Reviewed-by: Jason
v2:
- Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV
Reviewed-by: Jason Ekstrand (v1)
---
src/intel/compiler/brw_reg_type.h | 18 ++
1 file changed, 18 insertions(+)
diff --git a/src/intel/compiler/brw_reg_type.h
b/src/intel/compiler/brw_reg_type.h
index
Broadwell has restrictions that apply to Align16 half-float that
make the Align16 implementation of this invalid for this platform.
Use the gen11 path for this instead, which uses Align1 mode.
The restriction is not present in cherryview, gen9 or gen10, where
the Align16 implementation seems to
At the very least we need it to handle HF too, since we are doing
constant propagation for MAD and LRP, which relies on this pass
to promote the immediates to GRF in the end, but ideally
we want it to support even more types so we can take advantage
of it to improve register pressure in some
The hardware only allows a stride of 1 on a Byte destination for raw
byte MOV instructions. This is required even when the destination
is the NULL register.
Rather than making sure that we emit a proper NULL:B destination
every time we need one, just fix it at emission time.
Reviewed-by: Jason
v2:
- Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason)
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index
v2:
- Do not propagate if the bit-size changes
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_cmod_propagation.cpp | 14 +-
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp
---
src/intel/compiler/brw_eu_validate.c| 10 +-
src/intel/compiler/test_eu_validate.cpp | 46 +
2 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_eu_validate.c
b/src/intel/compiler/brw_eu_validate.c
index
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base
So it is right after the checks for the other various Int* capabilities.
---
src/compiler/spirv/spirv_to_nir.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/src/compiler/spirv/spirv_to_nir.c
b/src/compiler/spirv/spirv_to_nir.c
index 1cbc926c818..7e07de2bfc0 100644
And enable it on Intel.
v2:
- Squash the change to enable it on Intel (Jason)
Reviewed-by: Jason Ekstrand
---
src/compiler/nir/nir.h| 1 +
src/compiler/nir/nir_opt_algebraic.py | 1 +
src/intel/compiler/brw_compiler.c | 1 +
3 files changed, 3 insertions(+)
diff --git
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.
v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles
v2 (Jason):
- Merge shaderFloat16 and shaderInt8 enablement into a single patch.
- Merge extension enable.
Reviewed-by: Jason Ekstrand (v1)
---
src/intel/vulkan/anv_device.c | 9 +
src/intel/vulkan/anv_extensions.py | 1 +
2 files changed, 10 insertions(+)
diff --git
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 64e24f86b5a..f59e9ad4e2b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++
Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.
v2: quashed together the following patches (Jason):
- intel/compiler: allow extended math functions with HF operands
- intel/compiler: lower 16-bit extended math to
Particularly, we need the same lowewrings we use for 16-bit
integers.
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_nir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 9b26a6c3d6f..1d62f2adde8
NIR already has these and correctly considers exact/inexact qualification,
whereas the backend doesn't and can apply the optimizations where it
shouldn't. This happened to be the case in a handful of Tomb Raider shaders,
where NIR would skip the optimizations because of a precise qualification
but
Reviewed-by: Topi Pohjolainen
Reviewed-by: Jason Ekstrand
Reviewed-by: Matt Turner
---
src/intel/compiler/brw_eu_emit.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index 2f31d9591fc..30037e71b00
It is very likely that this optimzation is never useful and we'll probably
just end up removing it, so let's not bother adding more cases to it for
now.
---
src/intel/compiler/brw_fs.cpp | 4
1 file changed, 4 insertions(+)
diff --git a/src/intel/compiler/brw_fs.cpp
Empirical testing shows that gen8 has a bug where MAD instructions with
a half-float source starting at a non-zero offset fail to execute
properly.
This scenario usually happened in SIMD8 executions, where we used to
pack vector components Y and W in the second half of SIMD registers
(therefore,
The hardware doesn't support half-float for these.
Reviewed-by: Topi Pohjolainen
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_nir.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 7e3dbc9e447..75513e5113c
There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.
v2:
- rebased on top of regioning lowering pass
Reviewed-by: Topi Pohjolainen (v1)
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 5 +++--
1 file changed, 3
Commit c84ec70b3a72 implemented execution type promotion to 32-bit for
conversions involving half-float registers, which empirical testing suggested
was required, but it did not incorporate this change into the assembly validator
logic. This commits adds that, preventing validation errors like
These are not directly supported in hardware and brw_nir_lower_conversions
should have taken care of that before we get here. Also, while we are
at it, make sure 64-bit integer to 8-bit are also properly split by
the same lowering pass, since they have the same hardware restrictions.
---
The original SrcType is a 3-bit field that takes a subset of the types
supported for the hardware for 3-source instructions. Since gen8,
when the half-float type was added, 3-source floating point operations
can use use mixed precision mode, where not all the operands have the
same floating-point
v2 (Jason):
- Merge Float16 and Int8 into a single patch.
- Merge extension enable.
Reviewed-by: Jason Ekstrand (v1)
---
src/intel/vulkan/anv_device.c | 9 +
src/intel/vulkan/anv_extensions.py | 1 +
2 files changed, 10 insertions(+)
diff --git a/src/intel/vulkan/anv_device.c
Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
Going forward having these split is a bit more convenient since these two
groups have different restrictions.
v2:
- Rebased on top of new regioning lowering pass.
Reviewed-by: Topi Pohjolainen (v1)
---
src/intel/compiler/brw_fs_nir.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git
Particularly, we need the same lowewrings we use for 16-bit
integers.
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_nir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 3b2909da33e..2dfbf8824dc
The hardware only allows a stride of 1 on a Byte destination for raw
byte MOV instructions. This is required even when the destination
is the NULL register.
Rather than making sure that we emit a proper NULL:B destination
every time we need one, just fix it at emission time.
Reviewed-by: Jason
NIR already has these so they are redundant. A run of shader-db confirms
that the only cases where these backend optimizations are activated
are some Tomb Raider shaders where the affected variables are qualified
as "precise", which is why NIR won't apply them and why the backend
shouldn't either
This is available since gen8.
v2: restore previously existing assertion.
Reviewed-by: Topi Pohjolainen (v1)
---
src/intel/compiler/brw_reg_type.c | 36 +++
1 file changed, 32 insertions(+), 4 deletions(-)
diff --git a/src/intel/compiler/brw_reg_type.c
We are now using these bits, so don't assert that they are not set, just
avoid compaction in that case.
Reviewed-by: Topi Pohjolainen
---
src/intel/compiler/brw_eu_compact.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_eu_compact.c
There is a hardware restriction where <0,1,0>:HF in Align16 doesn't replicate
a single 16-bit channel, but instead it replicates a full 32-bit channel.
---
.../compiler/brw_fs_combine_constants.cpp | 24 +--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git
v2:
- Do not propagate if the bit-size changes
---
src/intel/compiler/brw_fs_cmod_propagation.cpp | 14 +-
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp
b/src/intel/compiler/brw_fs_cmod_propagation.cpp
index
Reviewed-by: Topi Pohjolainen
---
src/intel/compiler/brw_eu_emit.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index e21df4624b3..a785f96b650 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.
One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in
Reviewed-by: Jason Ekstrand
---
src/compiler/nir/nir.h| 1 +
src/compiler/nir/nir_opt_algebraic.py | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 19056e79206..adcc8e36cc9 100644
--- a/src/compiler/nir/nir.h
+++
Extended math doesn't support half-float on these generations.
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_nir.c | 13 -
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index
Reviewed-by: Jason Ekstrand
---
src/compiler/nir/nir.h| 1 +
src/compiler/nir/nir_opt_algebraic.py | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 3cb2d166cb3..19056e79206 100644
--- a/src/compiler/nir/nir.h
+++
v2:
- Merge Float16 and Int8 in a single patch (Jason)
Reviewed-by: Jason Ekstrand (v1)
---
src/intel/vulkan/anv_pipeline.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 899160746d4..663d1c77fa5 100644
---
Reviewed-by: Topi Pohjolainen
---
.../compiler/brw_fs_combine_constants.cpp | 60 +++
1 file changed, 49 insertions(+), 11 deletions(-)
diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp
b/src/intel/compiler/brw_fs_combine_constants.cpp
index
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_compiler.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/intel/compiler/brw_compiler.c
b/src/intel/compiler/brw_compiler.c
index f885e79c3e6..04a1a7cac4e 100644
--- a/src/intel/compiler/brw_compiler.c
+++
Reviewed-by: Topi Pohjolainen
---
src/intel/compiler/brw_fs_nir.cpp | 13 +
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index e454578d99b..a739562c3ab 100644
---
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 25 +
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index a739562c3ab..a3d193b8a44 100644
---
There are no 8-bit immediates, so assert in that case.
16-bit immediates are replicated in each word of a 32-bit immediate, so
we only need to check the lower 16-bits.
v2:
- Fix is_zero with half-float to consider -0 as well (Jason).
- Fix is_negative_one for word type.
---
---
src/intel/compiler/brw_fs_nir.cpp | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index a3d193b8a44..ccf1891b925 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++
The hardware doesn't support half-float for these.
Reviewed-by: Topi Pohjolainen
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_nir.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 572ab824a94..f0fe7f870c2
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_reg_type.h | 18 ++
1 file changed, 18 insertions(+)
diff --git a/src/intel/compiler/brw_reg_type.h
b/src/intel/compiler/brw_reg_type.h
index ffbec90d3fe..a3365b7e34c 100644
--- a/src/intel/compiler/brw_reg_type.h
+++
Broadwell hardware has a bug that manifests in SIMD8 executions of
16-bit MAD instructions when any of the sources is a Y or W component.
We pack these components in the same SIMD register as components X and
Z respectively, but starting at offset 16B (so they live in the second
half of the
v2:
- Merge Float16 and Int8 capabilities into a single patch (Jason)
Reviewed-by: Jason Ekstrand (v1)
---
src/compiler/shader_info.h| 2 ++
src/compiler/spirv/spirv_to_nir.c | 8 ++--
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/compiler/shader_info.h
v2 (Topi):
- Make bit-size handling order be 16-bit, 32-bit, 64-bit
- Clamp lower exponent range at -28 instead of -30.
Reviewed-by: Topi Pohjolainen
Reviewed-by: Jason Ekstrand
---
src/compiler/nir/nir_opt_algebraic.py | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff
Even if we don't do 3-src algebraic optimizations for MAD and LRP in
the backend any more, the combine constants pass can still do a fine
job putting grouping these constants into single registers for better
register pressure.
v2:
- updated comment to reference register pressure benefits rather
We open coded this in a couple of places, so a helper function is probably
sensible. Plus it makes it more consistent with the 3src hardware type case.
Suggested-by: Topi Pohjolainen
---
src/intel/compiler/brw_reg_type.c | 34 ---
1 file changed, 18 insertions(+), 16
There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.
v2:
- rebased on top of regioning lowering pass
Reviewed-by: Topi Pohjolainen (v1)
---
src/intel/compiler/brw_fs_nir.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_compiler.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/intel/compiler/brw_compiler.c
b/src/intel/compiler/brw_compiler.c
index fe632c5badc..f885e79c3e6 100644
--- a/src/intel/compiler/brw_compiler.c
+++
Since we handle booleans as integers this makes more sense.
v2:
- rebased to incorporate new boolean conversion opcodes
v3:
- rebased on top regioning lowering pass
Reviewed-by: Jason Ekstrand (v1)
Reviewed-by: Topi Pohjolainen (v2)
---
src/intel/compiler/brw_fs_nir.cpp | 16
From the Skylake PRM, Extended Math Function:
"The execution size must be no more than 8 when half-floats
are used in source or destination operand."
Earlier generations do not support Extended Math with half-float.
v2:
- Rewrite the code to make it more readable (Jason).
v3:
- Use
v2:
- make 16-bit be its own separate case (Jason)
Reviewed-by: Topi Pohjolainen
---
src/intel/compiler/brw_fs_nir.cpp | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index
3-src instructions don't support immediates, but since 36bc5f06dd22,
we allow them on MAD and LRP relying on the combine constants pass to
fix it up later. However, that pass is specialized for 32-bit float
immediates and can't handle HF constants at present, so this patch
ensures that
We use ALign16 mode for this, since it is more convenient, but the PRM
for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
restrictions', Section '1. Special Restrictions':
"In Align16 mode, the channel selects and channel enables apply to a
pair of half-floats, because
. This version of the
series also dropped the SPIR-V compiler patches that have already been merged.
As always, a branch for with these patches is available for testing in the
itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
https://github.com/Igalia/mesa.
Iago Toral Quiroga (42
v2: adapted to work with the new regioning lowering pass
Reviewed-by: Topi Pohjolainen (v1)
---
src/intel/compiler/brw_ir_fs.h | 33 ++---
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
The PRM states that half-float operands are supported since gen9.
Reviewed-by: Topi Pohjolainen
---
src/intel/compiler/brw_eu_emit.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.
v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles
We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of
1 - 100 of 1720 matches
Mail list logo