We need to emit 2 32-bit load messages to load a full dvec4. If only
1 or 2 double components are needed dead-code-elimination will remove
the second one.
We also need to shuffle the result of the 32-bit messages to form
valid 64-bit SIMD4x2 data.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |
We need to shuffle the data before it is written to the URB. Also,
dvec3/4 need two vec4 slots.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 29 ++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the the 16B offset into the register and use X/Y swizzles.
The above, however, has the caveat that we can't do that without
violating register region
Certain swizzles like XYZW can be supported by translating only the first two
64-bit swizzle channels to 32-bit channels. This happens with swizzles such
that the first two logical components, when translated to 32-bit channels and
replicated across the second dvec2 row, select the same channels
This came in handy when debugging the payload setup for Tess Eval,
since it prints correct subnr for attributes that can be loaded
in the second half of a register.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
Use these helpers to implement d2f and f2d. We will reuse these helpers when
we implement things like d2i or i2d as well.
---
src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 54 +++---
2 files changed, 39 insertions(+), 20
SIMD4x2 64bit data is stored in register space like this:
r0.0:DF x0 y0 z0 w0
r0.1:DF x1 y1 z1 w1
When we need to write data such as this to memory using 32-bit write
messages we need to shuffle it in this fashion:
r0.0:DF x0 y0 x1 y1
r0.1:DF z0 w0 z1 w1
and emit two 32-bit write messages,
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 30 +++---
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b79fd5e..45d49e9 100644
---
By exploiting gen7's hardware decompression bug with vstride=0 we gain the
capacity to support additional swizzle combinations.
This also fixes ZW writes from X/Y channels like in:
mov r2.z:df r0.:df
Because DF regions use 2-wide rows with a vstride of 2, the region generated
for the source
The pass does not support doubles in its current form.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 06fa38f..675b7fc 100644
---
v2: do it in the same fashion as the FS backend for consistency (Curro)
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 490cbae..69fdb1e 100644
---
Stages that use interleaved attributes generate regions with a vstride=0
that can hit the gen7 hardware decompression bug.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 28 ++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git
64-bit scratch read/writes require to shuffle data around so we need
to have access to the full 64-bit data. We will do the right thing
for these when we emit the messages.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 +
1 file changed, 9 insertions(+)
diff --git
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 16
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e732bf4..426faf0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++
From: "Juan A. Suarez Romero"
Our current data flow analysis does not take into account that channels
on 64-bit operands are 64-bit. This is a problem when the same register
is accessed using both 64-bit and 32-bit channels. This is very common
in operations where we need to
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 ++---
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 426faf0..56a46ad 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++
From: Samuel Iglesias Gonsálvez
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
From the HSW PRM, Command Reference, QtrCtrl:
"NibCtrl is only allowed for SIMD4 instructions with a DF (Double Float)
source or destination type."
v2: Assert that the type is DF (Samuel)
v3: Don't set the default group to 0 and then set it only for 4-wide
instructions. Instead,
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 19 +--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index b0b5f39..f12a114 100644
---
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 2bde628..3191eab 100644
---
the weirdness of the double register allocation...
Signed-off-by: Connor Abbott <connor.w.abb...@intel.com>
Signed-off-by: Iago Toral Quiroga <ito...@igalia.com>
---
src/mesa/drivers/dri/i965/brw_defines.h | 2 ++
src/mesa/drivers/dri/i965/brw_shader.cpp | 4 +++
src/
There is a hardware bug affecting compressed double-precision bcsel
instructions in align16 mode by which they won't read predication mask
properly. The bug does not affect other predicated instructions
and it does not affect bcsel in Align1 mode either. This was found
empirically and verified by
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4.h | 5 +
1 file changed, 5 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 1505ba6..86e58f3 100644
---
v2: Setup for a 64-bit scratch read by checking the type size of the
correct register (Iago)
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
v2 (Curo):
- Generate the flag register with a predicated MOV instead of a CMP
instruction, which has the benefit that we can skip loading a DF
0.0 constant.
- Avoid the PICK_LOW_32BIT + MOV by using the flag result and a
SEL to set the boolean result.
---
The hardware only supports 32-bit swizzles, which means that we can
only access directly channels XY of a DF making access to channels ZW
more difficult, specially considering the various regioning restrictions
imposed by the hardware. The combination of both things makes handling
ramdom swizzles
v2: do this inside dst_reg_for_nir_reg() instead of its callers
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 815082e..860ec51 100644
---
---
src/mesa/drivers/dri/i965/brw_vec4_builder.h | 39 ++--
1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_builder.h
b/src/mesa/drivers/dri/i965/brw_vec4_builder.h
index dab6e03..8352542 100644
---
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 619e010..4e7515c 100644
---
The BDW PRM says that it is not supported, but it seems that gen7 is also
affected, since doing DepCtrl on double-float instructions leads to
GPU hangs in some cases, which is probably not surprising knowing that
this is not supported in new hardware iterations. The SKL PRMs do not
mention this
These align1 opcodes do partial writes of 64-bit data. The problem is that we
want to use them to write on the same register to implement packDouble2x32 and
from the point of view of DCE, since both opcodes write to the same register,
only the last one stands and decides to eliminate the first,
Spilling of 64-bit data requires data shuffling for the corresponding
scratch read/write messages. This produces unsupported swizzle regions
and writemasks that we need to scalarize.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 18 ++
1 file changed, 18 insertions(+)
diff --git
In this case we need to shuffle the 64-bit data before we write it
to memory, source from reg_offset + 1 to write components Z and W
and consider that each DF channel is twice as big.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 --
1 file changed, 32
This will come in handy when we implement a simd lowering pass in a
follow-up patch.
---
src/mesa/drivers/dri/i965/brw_ir_vec4.h | 41 +
1 file changed, 41 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
In the vec4 backend the generator sets to 8 the execution size for all
instructions by default, however, to implement 64-bit floating-point we
will need to split certain instruction into smaller sizes so we need the
IR to convey this information like we do in the scalar backend. This patch
uses
We can't propagate the conditional modifier from one instruction to
another of a different execution size / group, since that would change
the channels affected by the conditional.
---
src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
These opcodes will set the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this to implement packDouble2x32.
We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index af76730..5048c4e 100644
---
src/mesa/drivers/dri/i965/brw_reg.h | 13 +
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h
b/src/mesa/drivers/dri/i965/brw_reg.h
index 3b46d27..8907c9c 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++
Basically, this involves considering the bit-size information to set
the appropriate type on both operands and destination.
v2 (Curro)
- Don't use two temporaries (and write one of them twice ) to obtain
the nir_alu_type.
Reviewed-by: Francisco Jerez
---
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 12
1 file changed, 12 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 04f70ef..2631bf3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 0d4c8f5..05e7f29 100644
---
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index c0cb141..088ed13 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index fdd3cba..4dffd76 100644
---
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 ++
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 05e7f29..ce95c8d 100644
---
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 24
1 file changed, 24 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ce95c8d..b75337c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
From: Connor Abbott <connor.w.abb...@intel.com>
Less duplication, one one less case to handle for doubles and support
for sized NIR types.
v2: Fix call to get_instance by swapping rows and columns params (Iago)
Signed-off-by: Iago Toral Quiroga <ito...@igalia.com>
Reviewed-by: Fra
v2: Make dst_reg_for_nir_reg() handle this for nir_register since we
want to have the correct type set before we call offset().
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
From: Connor Abbott
v2 (Curro):
- Do not special-case for a bit-size of 64, divide the bit_size by 32
instead.
- Use DIV_ROUND_UP so we can handle sub-32-bit types.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 7 ---
1 file changed, 4 insertions(+), 3
From: Connor Abbott
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_reg.h | 6 ++
1 file changed, 6 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h
b/src/mesa/drivers/dri/i965/brw_reg.h
index 8907c9c..1fa2595
v2: use a predicated MOV instead of a CMP, like we do in d2b, to skip
loading a double immediate.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 64 +++---
1 file changed, 49 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
The opcodes are not specific for conversions to/from float since we need
the same for conversions to/from other 32-bit types. Rename the opcodes
accordingly and change the asserts to check the size of the types involved
instead.
---
src/mesa/drivers/dri/i965/brw_defines.h | 4
From: Connor Abbott
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 2631bf3..37c3d7c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 14 ++
1 file changed, 14 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 0170d21..cc10247 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
For 32-bit instructions we want to use <4,4,1> regions for VGRF
sources so we should really set a width of 4 (we were setting 8).
For 64-bit instructions we want to use a width of 2 because the
hardware uses 32-bit swizzles, meaning that we can only address 2
consecutive 64-bit components in a
These opcodes will pick the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this, for example, to do things like
unpackDouble2x32.
We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_disasm.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 5e51be7..1d2a4d2 100644
---
immediates
Iago Toral Quiroga (92):
i965/vec4/nir: Add bit-size information to types
i965/vec4/nir: support doubles in ALU operations
i965/vec4/nir: fix emitting 64-bit immediates
i965/vec4: add double/float conversion pseudo-opcodes
i965/vec4: translate d2f/f2d
i965: fix subnr overflow
So that it has the same semantics as the scalar backend implementation. The
helper will now take a simd width (which is always 8 in vec4 mode) and step
as many scalar components as specified by that width, respecting the size of
the scalar channels.
---
src/mesa/drivers/dri/i965/brw_ir_vec4.h
---
src/mesa/drivers/dri/i965/brw_ir_vec4.h | 46 +
1 file changed, 46 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index a8e5f4a..ef79e33 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++
In a later patch we want to change the semantics of offset() to be in terms
of SIMD width and scalar channels so it is consistent with the definition
of the same helper in the scalar backend. However, some uses of offset()
in the vec4 backend do not operate naturally in terms of these
semantics.
-August/125057.html
Iago Toral Quiroga (3):
i965/vec4: add a byte_offset helper
i965/vec4: use byte_offset() instead of offset()
i965/vec4: make offset() work in terms of a simd width and scalar
components
src/mesa/drivers/dri/i965/brw_ir_vec4.h| 58 --
src
Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain
This hack was introduced in commit 03ac2c7223f7645e3:
i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs
Specifically to fixup the code we emitted to deal with gl_PointSize inputs
in dual instance mode, where we were emitting a MOV to copy the point
size from .w (where the
So we can access it in the vec4 backend to handle byte offsets into
registers.
---
src/mesa/drivers/dri/i965/brw_ir_fs.h | 6 --
src/mesa/drivers/dri/i965/brw_shader.h | 6 ++
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h
This will make it more consistent with the FS implementation of the same
helper and will provide more flexibility that will come in handy, for
example, when we add a SIMD lowering pass in the vec4 backend.
v2:
- Move the switch statement to an add_byte_offset helper that takes a pointer
to a
This will make it more consistent with the FS implementation of the same
helper and will provide more flexibility that will come in handy, for
example, when we add a SIMD lowering pass in the vec4 backend.
v2:
- Move the switch statement to add_byte_offset (Iago)
- Remove the assert on the
semantics of the offset() helper. The align16/fp64 series should be rebased
on top of these changes too.
Iago Toral Quiroga (4):
i965: move subreg_offset to backend_reg
i965/vec4: make the offset() operate in terms of width and type
i965/vec4/cse: adapt to changes in offset() helper
i965/vec4
---
src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 19 +--
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index 0c1f0c3..eaf95c8 100644
---
This commit should be squashed with the previous one.
---
src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp | 12 +++-
src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 8
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-
This will make it more consistent with the FS implementation of the same
helper and will provide more flexibility that will come in handy, for
example, when we add a SIMD lowering pass in the vec4 backend.
---
src/mesa/drivers/dri/i965/brw_ir_vec4.h | 47 ++---
1 file
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 265bb17..ae8704a 100644
---
There is a single bit for this, so it is a binary 0 or 1 meaning
offset 0B or 16B respectively.
---
src/mesa/drivers/dri/i965/brw_disasm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c
b/src/mesa/drivers/dri/i965/brw_disasm.c
index
---
src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 30 --
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index 0c1f0c3..d1bd9fa 100644
---
We only set this to true when fixing up 64bit regions and for one
specific purpose only, so check that nothing else sets this to true.
This helped me find a bug where the field was incorrectly initialized
to true in some cases.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 30 +++---
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 95b408e..68efea6 100644
---
There is a hardware bug affecting compressed double-precision bcsel
instructions in align16 mode by which they won't read predication mask
properly, leading to incorrect behavior at least in non-uniform control
flow scenarios. The bug does not affect other predicated instructions
and it does not
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index c9b8edf..d7c6bf4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the middle of the SIMD register and use X/Y swizzles.
The above, however, has the caveat that we can't do that without
violating register region restrictions
Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.
Also, we should check that if we are coalescing a register from another
MOV we should be reading the
Also, we use reg_offset=1 with DF uniforms when we try to access
components Z/W, so print reg_offset for them too.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
We make scalar sources in 3src instructions use subnr instead of
swizzles because they don't really use swizzles.
With doubles it is more complicated because we use vstride=0 in
more scenarios in which they don't produce scalar regions. Also
RepCtrl is not allowed with 64-bit operands, so we
---
src/mesa/drivers/dri/i965/brw_disasm.c | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c
b/src/mesa/drivers/dri/i965/brw_disasm.c
index c8bdeab..d5e9916 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 9400baa..a366548 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index c55d594..8316691 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++
We need to consider the fact that dvec3/4 require two vec4 slots.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 1b190ab..95b408e
the weirdness of the double register allocation...
Signed-off-by: Connor Abbott <connor.w.abb...@intel.com>
Signed-off-by: Iago Toral Quiroga <ito...@igalia.com>
---
src/mesa/drivers/dri/i965/brw_defines.h | 2 ++
src/mesa/drivers/dri/i965/brw_shader.cpp | 4 +++
src/
The pass does not support doubles in its current form. I'm not even sure that
it should, since it would basically change the type of the operation and that
could have implications for things like SSBO writes, etc.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
1 file changed, 1 insertion(+)
These opcodes will pick the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this, for example, to things like
unpackDouble2x32.
We can't do this in Align16 because we would need data to cross the
vec4 boundary.
---
src/mesa/drivers/dri/i965/brw_defines.h | 2
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 ++
1 file changed, 18 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 1525a3d..4014020 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
In the vec4 backend the generator sets the execution size for all
instructions to 8, however, we will have to split certain DF instructions
to have an execution size of 4, so we need to indicate this explicitly in the
IR for the generator to set the right execution size for them.
We will use this
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index df927e7..095a27d 100644
---
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index cf35f2e..fde7b60 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
The hardware only supports 32-bit swizzles, which means that a swizzle
like XYZW only selects channels XY of a DF, making access to channels ZW
more difficult, specially considering the various regioning restrictions
imposed by the hardware. The combination of both things makes handling
ramdom
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 82bf927..dd06a32 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++
Use these helpers to implement d2f and f2d. We will reuse these helpers when
we implement things like d2i or i2d as well.
---
src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 53 +++---
2 files changed, 38 insertions(+), 20
From the BDW PRM, Workarounds chapter:
"DF->f format conversion for Align16 has wrong emask calculation when
source is immediate."
So detect the case and move the immediate source to a VGRF before we attempt
the conversion.
Notice that Broadwell and later are strictly scalar at the
The opcodes are not specific for conversions to/from float since we need
the same for conversions to/from other 32-bit types. Rename the opcodes
accordingly and change the asserts to check the size of the types involved
instead.
---
src/mesa/drivers/dri/i965/brw_defines.h | 4
These opcodes do partial writes of 64-bit data. The problem is that we intend
to use them to write on the same register to implement packDouble2x32 and
from the point of view of DCE, since both opcodes write to the same register,
only the last one stands and decides to eliminate the first, which
601 - 700 of 1720 matches
Mail list logo