https://bugs.freedesktop.org/show_bug.cgi?id=98172
--- Comment #2 from Michel Dänzer ---
Created attachment 127204
--> https://bugs.freedesktop.org/attachment.cgi?id=127204&action=edit
Work with a local reference of so->fence
Does this patch help?
--
You are receiving this mail because:
You
Reviewed-by: Nicolai Hähnle
On 10.10.2016 13:25, Marek Olšák wrote:
From: Marek Olšák
The kernel patch has been sent to amd-gfx.
---
src/gallium/drivers/radeonsi/si_compute.c | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/radeonsi/si_compute.c
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_disasm.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 5e51be7..1d2a4d2 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm
It's been some time since we sent the first version of the patches, so here is
a v2, which adds:
1. Feedback from Curro to v1. I think the only thing missing is the suggestion
to change the semantics of the offset() helper in vec4 to match those in the
scalar backend. I sent this as a separate ser
Shift this down and maintain the exact same behaviour as the
current code.
Signed-off-by: Edward O'Callaghan
---
src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c
b/src/amd/vulkan/win
These opcodes will pick the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this, for example, to do things like
unpackDouble2x32.
We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high)
Drop/add a few newlines where appropriate and drop a couple of
unnessary braces.
Signed-off-by: Edward O'Callaghan
---
src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c | 16 ++--
src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h | 2 +-
src/amd/vulkan/winsys/amdgpu/radv_amdgpu_su
For 32-bit instructions we want to use <4,4,1> regions for VGRF
sources so we should really set a width of 4 (we were setting 8).
For 64-bit instructions we want to use a width of 2 because the
hardware uses 32-bit swizzles, meaning that we can only address 2
consecutive 64-bit components in a row
Nothing major here, patch 3 is the only interesting one.
Edward O'Callaghan (3):
[PATCH 1/3] radv/winsys: Trivial style and readability fixups
[PATCH 2/3] radv/winsys: Move a 'default:' to the end of case stmt
[PATCH 3/3] radv/winsys: Fix mem leak at failed do_winsys_init() call
___
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 2631bf3..37c3d7c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/dri
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 14 ++
1 file changed, 14 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 0170d21..cc10247 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/
The opcodes are not specific for conversions to/from float since we need
the same for conversions to/from other 32-bit types. Rename the opcodes
accordingly and change the asserts to check the size of the types involved
instead.
---
src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
From: Connor Abbott
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 6aa9102..c29cfb5 100644
--- a/src/mesa/drivers/dri/i965/brw_v
Probably unlikely however ensure we don't leak a heap allocation
on the fail path.
Signed-off-by: Edward O'Callaghan
---
src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
b/src/amd/vulkan/winsy
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 ++
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 05e7f29..ce95c8d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 24
1 file changed, 24 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ce95c8d..b75337c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b
From: Connor Abbott
Less duplication, one one less case to handle for doubles and support
for sized NIR types.
v2: Fix call to get_instance by swapping rows and columns params (Iago)
Signed-off-by: Iago Toral Quiroga
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
v2: Make dst_reg_for_nir_reg() handle this for nir_register since we
want to have the correct type set before we call offset().
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/
From: Connor Abbott
v2 (Curro):
- Do not special-case for a bit-size of 64, divide the bit_size by 32
instead.
- Use DIV_ROUND_UP so we can handle sub-32-bit types.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/s
From: Connor Abbott
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_reg.h | 6 ++
1 file changed, 6 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h
b/src/mesa/drivers/dri/i965/brw_reg.h
index 8907c9c..1fa2595 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+
v2: use a predicated MOV instead of a CMP, like we do in d2b, to skip
loading a double immediate.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 64 +++---
1 file changed, 49 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/sr
These opcodes will set the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this to implement packDouble2x32.
We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index af76730..5048c4e 100644
--- a/src/mesa/drivers/dri
---
src/mesa/drivers/dri/i965/brw_reg.h | 13 +
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_reg.h
b/src/mesa/drivers/dri/i965/brw_reg.h
index 3b46d27..8907c9c 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i9
Basically, this involves considering the bit-size information to set
the appropriate type on both operands and destination.
v2 (Curro)
- Don't use two temporaries (and write one of them twice ) to obtain
the nir_alu_type.
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index fdd3cba..4dffd76 100644
--- a/src/mesa/drivers/dri/i965/
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 12
1 file changed, 12 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 04f70ef..2631bf3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/dr
From: Connor Abbott
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 0d4c8f5..05e7f29 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index c0cb141..088ed13 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src
In this case we need to shuffle the 64-bit data before we write it
to memory, source from reg_offset + 1 to write components Z and W
and consider that each DF channel is twice as big.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 --
1 file changed, 32 insertions(
This will come in handy when we implement a simd lowering pass in a
follow-up patch.
---
src/mesa/drivers/dri/i965/brw_ir_vec4.h | 41 +
1 file changed, 41 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
In the vec4 backend the generator sets to 8 the execution size for all
instructions by default, however, to implement 64-bit floating-point we
will need to split certain instruction into smaller sizes so we need the
IR to convey this information like we do in the scalar backend. This patch
uses the
We can't propagate the conditional modifier from one instruction to
another of a different execution size / group, since that would change
the channels affected by the conditional.
---
src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
The BDW PRM says that it is not supported, but it seems that gen7 is also
affected, since doing DepCtrl on double-float instructions leads to
GPU hangs in some cases, which is probably not surprising knowing that
this is not supported in new hardware iterations. The SKL PRMs do not
mention this res
These align1 opcodes do partial writes of 64-bit data. The problem is that we
want to use them to write on the same register to implement packDouble2x32 and
from the point of view of DCE, since both opcodes write to the same register,
only the last one stands and decides to eliminate the first, whi
v2: Setup for a 64-bit scratch read by checking the type size of the
correct register (Iago)
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/driver
v2 (Curo):
- Generate the flag register with a predicated MOV instead of a CMP
instruction, which has the benefit that we can skip loading a DF
0.0 constant.
- Avoid the PICK_LOW_32BIT + MOV by using the flag result and a
SEL to set the boolean result.
---
src/mesa/drivers/dri/i965
The hardware only supports 32-bit swizzles, which means that we can
only access directly channels XY of a DF making access to channels ZW
more difficult, specially considering the various regioning restrictions
imposed by the hardware. The combination of both things makes handling
ramdom swizzles o
Spilling of 64-bit data requires data shuffling for the corresponding
scratch read/write messages. This produces unsupported swizzle regions
and writemasks that we need to scalarize.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 18 ++
1 file changed, 18 insertions(+)
diff --git a/
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 619e010..4e7515c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.c
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4.h | 5 +
1 file changed, 5 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 1505ba6..86e58f3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drive
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 ++---
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 426faf0..56a46ad 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/sr
From: Samuel Iglesias Gonsálvez
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 4d5fa96..1da
From the HSW PRM, Command Reference, QtrCtrl:
"NibCtrl is only allowed for SIMD4 instructions with a DF (Double Float)
source or destination type."
v2: Assert that the type is DF (Samuel)
v3: Don't set the default group to 0 and then set it only for 4-wide
instructions. Instead, assert
v2: do this inside dst_reg_for_nir_reg() instead of its callers
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 815082e..860ec51 100644
--- a/src/mesa
These need to be emitted as align1 MOV's, since they need to have a
stride of 2 on the float register (whether src or dest) so that data
from another thread doesn't cross the middle of a SIMD8 register.
v2 (Iago):
- The float-to-double needs to align 32-bit data to 64-bit before doing the
conversi
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 19 +--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index b0b5f39..f12a114 100644
--- a/src/mesa/drivers/dri/i965
---
src/mesa/drivers/dri/i965/brw_vec4_builder.h | 39 ++--
1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_builder.h
b/src/mesa/drivers/dri/i965/brw_vec4_builder.h
index dab6e03..8352542 100644
--- a/src/mesa/drivers/dri/i
Stages that use interleaved attributes generate regions with a vstride=0
that can hit the gen7 hardware decompression bug.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 28 ++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4
64-bit scratch read/writes require to shuffle data around so we need
to have access to the full 64-bit data. We will do the right thing
for these when we emit the messages.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 +
1 file changed, 9 insertions(+)
diff --git a/src/mesa/drivers/dri/
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 16
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e732bf4..426faf0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa
There is a hardware bug affecting compressed double-precision bcsel
instructions in align16 mode by which they won't read predication mask
properly. The bug does not affect other predicated instructions
and it does not affect bcsel in Align1 mode either. This was found
empirically and verified by C
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 2bde628..3191eab 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
From: Connor Abbott
v2: Also check if the instruction source target is 64-bit. (Samuel)
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propag
From: "Juan A. Suarez Romero"
Our current data flow analysis does not take into account that channels
on 64-bit operands are 64-bit. This is a problem when the same register
is accessed using both 64-bit and 32-bit channels. This is very common
in operations where we need to access 64-bit data in
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 30 +++---
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b79fd5e..45d49e9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cp
By exploiting gen7's hardware decompression bug with vstride=0 we gain the
capacity to support additional swizzle combinations.
This also fixes ZW writes from X/Y channels like in:
mov r2.z:df r0.:df
Because DF regions use 2-wide rows with a vstride of 2, the region generated
for the source
The pass does not support doubles in its current form.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 06fa38f..675b7fc 100644
--- a/src/mesa/drivers/dri/i965/brw_v
v2: do it in the same fashion as the FS backend for consistency (Curro)
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 490cbae..69fdb1e 100644
--- a/src/mesa/dr
Use these helpers to implement d2f and f2d. We will reuse these helpers when
we implement things like d2i or i2d as well.
---
src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 54 +++---
2 files changed, 39 insertions(+), 20 d
SIMD4x2 64bit data is stored in register space like this:
r0.0:DF x0 y0 z0 w0
r0.1:DF x1 y1 z1 w1
When we need to write data such as this to memory using 32-bit write
messages we need to shuffle it in this fashion:
r0.0:DF x0 y0 x1 y1
r0.1:DF z0 w0 z1 w1
and emit two 32-bit write messages,
The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the the 16B offset into the register and use X/Y swizzles.
The above, however, has the caveat that we can't do that without
violating register region restricti
Certain swizzles like XYZW can be supported by translating only the first two
64-bit swizzle channels to 32-bit channels. This happens with swizzles such
that the first two logical components, when translated to 32-bit channels and
replicated across the second dvec2 row, select the same channels sp
Because the meaning of the swizzles and writemasks involved is different,
so replacing the source would lead to different semantics.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propag
This came in handy when debugging the payload setup for Tess Eval,
since it prints correct subnr for attributes that can be loaded
in the second half of a register.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i
When 64-bit registers are (un)spilled, we need to execute data shuffling
code before writing to or after reading from memory. If we have instructions
that operate on 64-bit data via 32-bit instructions, (un)spills for the
register produced by 32-bit instructions will not do data shuffling at all
(b
We need to emit 2 32-bit load messages to load a full dvec4. If only
1 or 2 double components are needed dead-code-elimination will remove
the second one.
We also need to shuffle the result of the 32-bit messages to form
valid 64-bit SIMD4x2 data.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |
---
src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
index c8fa2ca..a1aa672 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
+++ b/src/m
---
src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 72 +++---
1 file changed, 55 insertions(+), 17 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
index 226dcb4..f2a4507 100644
--- a/src/mesa/drivers/dri/i965
In gen < 8 instructions that write more than one register need to read
more than one register too. Make sure we don't break that restriction
by copy propagating from a uniform.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git a/sr
Otherwise we end up producing code that violates the register region
restriction that says that when execsize == width and hstride != 0
the vstride can't be 0.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/mesa/driv
Same requirements as for UBO loads.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 31 +-
1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index f234e65..001a62f 100
From: Samuel Iglesias Gonsálvez
max_vector_size is used in the vec4 backend to pad out the uniform
components to match a size that is a multiple of a vec4. Double and dvec2
uniforms only require a single vec4 slot, not two.
Signed-off-by: Samuel Iglesias Gonsálvez
Signed-off-by: Iago Toral Quir
We need to shuffle the data before it is written to the URB. Also,
dvec3/4 need two vec4 slots.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 29 ++---
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/d
We make scalar sources in 3src instructions use subnr instead of
swizzles because they don't really use swizzles.
With doubles it is more complicated because we use vstride=0 in
more scenarios in which they don't produce scalar regions. Also
RepCtrl=1 is not allowed with 64-bit operands, so we sho
Use a width of 2 with 64-bit attributes.
Also, if we have a dvec3/4 attribute that gets split across two registers
such that components XY are stored in the second half of a register and
components ZW are stored in the first half of the next, we need to fix
regioning for any instruction that reads
This way callers don't need to know about 64-bit particularities and
we reuse some code.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 22 ++-
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 81 ++
2 files changed, 50 insertions(+), 53 deletions(-)
diff --git a
From: Samuel Iglesias Gonsálvez
v2 (Iago):
- Adapt 64-bit path to component packing changes.
Signed-off-by: Samuel Iglesias Gonsálvez
Signed-off-by: Iago Toral Quiroga
---
src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 51 ++-
1 file changed, 34 insertions(+), 17 d
Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.
Also, we should check that if we are coalescing a register from another
MOV we should be reading the
Basically, ALIGN1 mode will ignore swizzles on the input vectors so we don't
want the copy propagation pass to mess with them.
---
.../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 24 ++
1 file changed, 24 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_
---
src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 31 +++---
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index bef897a..229d7b2 100644
--- a/src/mesa/drivers/dri/i965/
Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 71 ++
1 file changed, 53 insertions(+), 18 deletions(-)
diff --git a/src
Mostly the same stuff as usual: we ned to shuffle the data before we
write and we need to emit two 32-bit write messages (with appropriate
32-bit writemask channels set) for a full dvec4 scratch write.
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 64 ++
1 file chang
From: Samuel Iglesias Gonsálvez
This means we would copy propagate partial reads or writes and that can affect
the result.
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/driver
Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 860ec51..c825aeb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/m
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 75a8473..2bde628 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 7af65ab..7f6acc3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.
v2 (Curro):
- Use swi
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 ++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 37c3d7c..815082e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_
v2 (Curro):
- Print it also for execsize < 4.
- QtrCtrl is still in effect, so print 2 * qtr_ctl + nib_ctl + 1
- Do not read the nib ctl from the instruction in gen < 7,
the field only exists in gen7+.
---
src/mesa/drivers/dri/i965/brw_disasm.c | 6 +-
1 file changed, 5 insertions(+)
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 60 +-
1 file changed, 30 insertions(+), 30 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 75e47f9..0788ba2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec
The previous patch made sure that we do not generate MAD instructions
for any NIR's 64-bit ffma, but there is nothing preventing i965 from
producing MAD instructions as a result of lowerings or optimization
passes. This patch makes sure that any 64-bit MAD produced inside the
driver after translati
We need to consider the fact that dvec3/4 require two vec4 slots.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e5391b9..b79fd5e 10064
---
src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 31 --
1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
index f62dc9c..914396c 100644
--- a/src/mesa/drivers/dri/i965/
There is a single bit for this, so it is a binary 0 or 1 meaning
offset 0B or 16B respectively.
v2:
- Since brw_inst_dst_da16_subreg_nr() is known to be 1, remove it
from the expression (Curro)
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_disasm.c | 2 +-
1 file changed,
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 20 +---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 0788ba2..b0bc2d5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src
Just like the exec_size, we are going to need this in the vec4 backend
when we implement a simd splitting pass.
---
src/mesa/drivers/dri/i965/brw_ir_fs.h | 9 -
src/mesa/drivers/dri/i965/brw_shader.h | 9 +
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 +
3 fi
From the BDW PRM, Workarounds chapter:
"DF->f format conversion for Align16 has wrong emask calculation when
source is immediate."
So detect the case and move the immediate source to a VGRF before we attempt
the conversion.
Notice that Broadwell and later are strictly scalar at the moment
The hardware can only operate with 32-bit swizzles, which is a rather
limiting restriction. However, the idea is not to expose this to the
optimization passes, which would be a mess to deal with. Instead, we let
the bulk of the vec4 backend ignore this fact and we fix the swizzles right
at codegen
1 - 100 of 330 matches
Mail list logo